Streaming Systems


Tyler Akidau - 2018
    As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.Expanded from Tyler Akidau's popular blog posts Streaming 101 and Streaming 102, this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.You'll explore:How streaming and batch data processing patterns compareThe core principles and concepts behind robust out-of-order data processingHow watermarks track progress and completeness in infinite datasetsHow exactly-once data processing techniques ensure correctnessHow the concepts of streams and tables form the foundations of both batch and streaming data processingThe practical motivations behind a powerful persistent state mechanism, driven by a real-world exampleHow time-varying relations provide a link between stream processing and the world of SQL and relational algebra

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World


Pedro Domingos - 2015
    In The Master Algorithm, Pedro Domingos lifts the veil to give us a peek inside the learning machines that power Google, Amazon, and your smartphone. He assembles a blueprint for the future universal learner--the Master Algorithm--and discusses what it will mean for business, science, and society. If data-ism is today's philosophy, this book is its bible.

The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary


Eric S. Raymond - 1999
    According to the August Forrester Report, 56 percent of IT managers interviewed at Global 2,500 companies are already using some type of open source software in their infrastructure and another 6 percent will install it in the next two years. This revolutionary model for collaborative software development is being embraced and studied by many of the biggest players in the high-tech industry, from Sun Microsystems to IBM to Intel.The Cathedral & the Bazaar is a must for anyone who cares about the future of the computer industry or the dynamics of the information economy. Already, billions of dollars have been made and lost based on the ideas in this book. Its conclusions will be studied, debated, and implemented for years to come. According to Bob Young, "This is Eric Raymond's great contribution to the success of the open source revolution, to the adoption of Linux-based operating systems, and to the success of open source users and the companies that supply them."The interest in open source software development has grown enormously in the past year. This revised and expanded paperback edition includes new material on open source developments in 1999 and 2000. Raymond's clear and effective writing style accurately describing the benefits of open source software has been key to its success. With major vendors creating acceptance for open source within companies, independent vendors will become the open source story in 2001.

Grokking Algorithms An Illustrated Guide For Programmers and Other Curious People


Aditya Y. Bhargava - 2015
    The algorithms you'll use most often as a programmer have already been discovered, tested, and proven. If you want to take a hard pass on Knuth's brilliant but impenetrable theories and the dense multi-page proofs you'll find in most textbooks, this is the book for you. This fully-illustrated and engaging guide makes it easy for you to learn how to use algorithms effectively in your own programs.Grokking Algorithms is a disarming take on a core computer science topic. In it, you'll learn how to apply common algorithms to the practical problems you face in day-to-day life as a programmer. You'll start with problems like sorting and searching. As you build up your skills in thinking algorithmically, you'll tackle more complex concerns such as data compression or artificial intelligence. Whether you're writing business software, video games, mobile apps, or system utilities, you'll learn algorithmic techniques for solving problems that you thought were out of your grasp. For example, you'll be able to:Write a spell checker using graph algorithmsUnderstand how data compression works using Huffman codingIdentify problems that take too long to solve with naive algorithms, and attack them with algorithms that give you an approximate answer insteadEach carefully-presented example includes helpful diagrams and fully-annotated code samples in Python. By the end of this book, you will know some of the most widely applicable algorithms as well as how and when to use them.

Learning Spark: Lightning-Fast Big Data Analysis


Holden Karau - 2013
    How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates. Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm Learn how to deploy interactive, batch, and streaming applications Connect to data sources including HDFS, Hive, JSON, and S3 Master advanced topics like data partitioning and shared variables

Scrum: a Breathtakingly Brief and Agile Introduction


Chris Sims - 2012
    A pocket-sized overview of roles, artifacts and the sprint cycle, adapted from the bestseller The Elements of Scrum by Chris Sims & Hillary Louise Johnson

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions


Gregor Hohpe - 2003
    The authors also include examples covering a variety of different integration technologies, such as JMS, MSMQ, TIBCO ActiveEnterprise, Microsoft BizTalk, SOAP, and XSL. A case study describing a bond trading system illustrates the patterns in practice, and the book offers a look at emerging standards, as well as insights into what the future of enterprise integration might hold. This book provides a consistent vocabulary and visual notation framework to describe large-scale integration solutions across many technologies. It also explores in detail the advantages and limitations of asynchronous messaging architectures. The authors present practical advice on designing code that connects an application to a messaging system, and provide extensive information to help you determine when to send a message, how to route it to the proper destination, and how to monitor the health of a messaging system. If you want to know how to manage, monitor, and maintain a messaging system once it is in use, get this book.

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale


Neha Narkhede - 2017
    And how to move all of this data becomes nearly as important as the data itself. If you� re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you� ll learn Kafka� s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka� s operational measurementsExplore how Kafka� s stream delivery capabilities make it a perfect source for stream processing systems

Apprenticeship Patterns: Guidance for the Aspiring Software Craftsman


Dave Hoover - 2009
    To grow professionally, you also need soft skills and effective learning techniques. Honing those skills is what this book is all about. Authors Dave Hoover and Adewale Oshineye have cataloged dozens of behavior patterns to help you perfect essential aspects of your craft. Compiled from years of research, many interviews, and feedback from O'Reilly's online forum, these patterns address difficult situations that programmers, administrators, and DBAs face every day. And it's not just about financial success. Apprenticeship Patterns also approaches software development as a means to personal fulfillment. Discover how this book can help you make the best of both your life and your career. Solutions to some common obstacles that this book explores in-depth include:Burned out at work? "Nurture Your Passion" by finding a pet project to rediscover the joy of problem solving.Feeling overwhelmed by new information? Re-explore familiar territory by building something you've built before, then use "Retreat into Competence" to move forward again.Stuck in your learning? Seek a team of experienced and talented developers with whom you can "Be the Worst" for a while. "Brilliant stuff! Reading this book was like being in a time machine that pulled me back to those key learning moments in my career as a professional software developer and, instead of having to learn best practices the hard way, I had a guru sitting on my shoulder guiding me every step towards master craftsmanship. I'll certainly be recommending this book to clients. I wish I had this book 14 years ago!" -Russ Miles, CEO, OpenCredo

Google Hacks: Tips & Tools for Finding and Using the World's Information


Rael Dornfest - 2003
    But few people realize that Google also gives you hundreds of cool ways to organize and play with information.Since we released the last edition of this bestselling book, Google has added many new features and services to its expanding universe: Google Earth, Google Talk, Google Maps, Google Blog Search, Video Search, Music Search, Google Base, Google Reader, and Google Desktop among them. We've found ways to get these new services to do even more.The expanded third edition of Google Hacks is a brand-new and infinitely more useful book for this powerful search engine. You'll not only find dozens of hacks for the new Google services, but plenty of updated tips, tricks and scripts for hacking the old ones. Now you can make a Google Earth movie, visualize your web site traffic with Google Analytics, post pictures to your blog with Picasa, or access Gmail in your favorite email client. Industrial strength and real-world tested, this new collection enables you to mine a ton of information within Google's reach. And have a lot of fun while doing it:Search Google over IM with a Google Talk bot Build a customized Google Map and add it to your own web site Cover your searching tracks and take back your browsing privacy Turn any Google query into an RSS feed that you can monitor in Google Reader or the newsreader of your choice Keep tabs on blogs in new, useful ways Turn Gmail into an external hard drive for Windows, Mac, or Linux Beef up your web pages with search, ads, news feeds, and more Program Google with the Google API and language of your choice For those of you concerned about Google as an emerging Big Brother, this new edition also offers advice and concrete tips for protecting your privacy. Get into the world of Google and bend it to your will!

Head First Data Analysis: A Learner's Guide to Big Numbers, Statistics, and Good Decisions


Michael G. Milton - 2009
    If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to:Determine which data sources to use for collecting information Assess data quality and distinguish signal from noise Build basic data models to illuminate patterns, and assimilate new information into the models Cope with ambiguous information Design experiments to test hypotheses and draw conclusions Use segmentation to organize your data within discrete market groups Visualize data distributions to reveal new relationships and persuade others Predict the future with sampling and probability models Clean your data to make it useful Communicate the results of your analysis to your audience Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Data Analysis uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.

Big data @ work : dispelling the myths, uncovering the opportunities


Thomas H. Davenport - 2014
    The author was—at first.When the term “big data” first came on the scene, bestselling author Tom Davenport (Competing on Analytics, Analytics at Work) thought it was just another example of technology hype. But his research in the years that followed changed his mind.Now, in clear, conversational language, Davenport explains what big data means—and why everyone in business needs to know about it. Big Data at Work covers all the bases: what big data means from a technical, consumer, and management perspective; what its opportunities and costs are; where it can have real business impact; and which aspects of this hot topic have been oversold.This book will help you understand:• Why big data is important to you and your organization• What technology you need to manage it• How big data could change your job, your company, and your industry• How to hire, rent, or develop the kinds of people who make big data work• The key success factors in implementing any big data project• How big data is leading to a new approach to managing analyticsWith dozens of company examples, including UPS, GE, Amazon, United Healthcare, Citigroup, and many others, this book will help you seize all opportunities—from improving decisions, products, and services to strengthening customer relationships. It will show you how to put big data to work in your own organization so that you too can harness the power of this ever-evolving new resource.

Secrets and Lies: Digital Security in a Networked World


Bruce Schneier - 2000
    Identity Theft. Corporate Espionage. National secrets compromised. Can anyone promise security in our digital world?The man who introduced cryptography to the boardroom says no. But in this fascinating read, he shows us how to come closer by developing security measures in terms of context, tools, and strategy. Security is a process, not a product – one that system administrators and corporate executives alike must understand to survive.This edition updated with new information about post-9/11 security.

Site Reliability Engineering: How Google Runs Production Systems


Betsy Beyer - 2016
    So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?In this collection of essays and articles, key members of Google's Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You'll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient--lessons directly applicable to your organization.This book is divided into four sections: Introduction--Learn what site reliability engineering is and why it differs from conventional IT industry practicesPrinciples--Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)Practices--Understand the theory and practice of an SRE's day-to-day work: building and operating large distributed computing systemsManagement--Explore Google's best practices for training, communication, and meetings that your organization can use

Creating a Data-Driven Organization: Practical Advice from the Trenches


Carl Anderson - 2015
    This practical book shows you how true data-drivenness involves processes that require genuine buy-in across your company, from analysts and management to the C-Suite and the board.Through interviews and examples from data scientists and analytics leaders in a variety of industries, author Carl Anderson explains the analytics value chain you need to adopt when building predictive business models—from data collection and analysis to the insights and leadership that drive concrete actions. You’ll learn what works and what doesn’t, and why creating a data-driven culture throughout your organization is essential. Start from the bottom up: learn how to collect the right data the right way Hire analysts with the right skills, and organize them into teams Examine statistical and visualization tools, and fact-based story-telling methods Collect and analyze data while respecting privacy and ethics Understand how analysts and their managers can help spur a data-driven culture Learn the importance of data leadership and C-level positions such as chief data officer and chief analytics officer