Book picks similar to
Creating a Data-Driven Enterprise with DataOps by Ashish Thusoo
data-science-engineering
short-reads
professional
all-things-data
Ubuntu: The Beginner's Guide
Jonathan Moeller - 2011
In the Guide, you'll learn how to: -Use the Ubuntu command line. -Manage users, groups, and file permissions. -Install software on a Ubuntu system, both from the command line and the GUI. -Configure network settings. -Use the vi editor to edit system configuration files. -Install and configure a Samba server for file sharing. -Install SSH for remote system control using public key/private key encryption. -Install a DHCP server for IP address management. -Install a LAMP server. -Install web applications like WordPress and Drupal. -Configure an FTP server. -Manage ebooks. -Convert digital media. -Manage and configure Unity, the default Ubuntu environment. -Manage and halt processes from the command line. -Set up both a VNC server and a client. -Enjoy games on Ubuntu. -And many other topics.
Streaming Systems
Tyler Akidau - 2018
As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.Expanded from Tyler Akidau's popular blog posts Streaming 101 and Streaming 102, this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.You'll explore:How streaming and batch data processing patterns compareThe core principles and concepts behind robust out-of-order data processingHow watermarks track progress and completeness in infinite datasetsHow exactly-once data processing techniques ensure correctnessHow the concepts of streams and tables form the foundations of both batch and streaming data processingThe practical motivations behind a powerful persistent state mechanism, driven by a real-world exampleHow time-varying relations provide a link between stream processing and the world of SQL and relational algebra
Thinking with Data
Max Shron - 2014
In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Data helps you learn techniques for turning data into knowledge you can use. You’ll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You’ll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved.Learn a framework for scoping data projectsUnderstand how to pin down the details of an idea, receive feedback, and begin prototypingUse the tools of arguments to ask good questions, build projects in stages, and communicate resultsExplore data-specific patterns of reasoning and learn how to build more useful argumentsDelve into causal reasoning and learn how it permeates data workPut everything together, using extended examples to see the method of full problem thinking in action
Too Big to Ignore: The Business Case for Big Data
Phil Simon - 2013
Progressive Insurance tracks real-time customer driving patterns and uses that information to offer rates truly commensurate with individual safety. Google accurately predicts local flu outbreaks based upon thousands of user search queries. Amazon provides remarkably insightful, relevant, and timely product recommendations to its hundreds of millions of customers. Quantcast lets companies target precise audiences and key demographics throughout the Web. NASA runs contests via gamification site TopCoder, awarding prizes to those with the most innovative and cost-effective solutions to its problems. Explorys offers penetrating and previously unknown insights into healthcare behavior.How do these organizations and municipalities do it? Technology is certainly a big part, but in each case the answer lies deeper than that. Individuals at these organizations have realized that they don't have to be Nate Silver to reap massive benefits from today's new and emerging types of data. And each of these organizations has embraced Big Data, allowing them to make astute and otherwise impossible observations, actions, and predictions.It's time to start thinking big.In Too Big to Ignore, recognized technology expert and award-winning author Phil Simon explores an unassailably important trend: Big Data, the massive amounts, new types, and multifaceted sources of information streaming at us faster than ever. Never before have we seen data with the volume, velocity, and variety of today. Big Data is no temporary blip of fad. In fact, it is only going to intensify in the coming years, and its ramifications for the future of business are impossible to overstate.Too Big to Ignore explains why Big Data is a big deal. Simon provides commonsense, jargon-free advice for people and organizations looking to understand and leverage Big Data. Rife with case studies, examples, analysis, and quotes from real-world Big Data practitioners, the book is required reading for chief executives, company owners, industry leaders, and business professionals.
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Rob Collie - 2016
Written by the world’s foremost PowerPivot blogger and practitioner, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce. It includes lessons on the difference between calculated columns and measures; how formulas can be reused across reports of completely different shapes; how to merge disjointed sets of data into unified reports; how to make certain columns in a pivot behave as if the pivot were filtered while other columns do not; and how to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar. The “pattern-like” techniques and best practices contained in this book have been developed and refined over two years of onsite training with Excel users around the world, and the key lessons from those seminars costing thousands of dollars per day are now available to within the pages of this easy-to-follow guide. This updated second edition covers new features introduced with Office 2015.
Doing Data Science
Cathy O'Neil - 2013
But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.
SQL Pocket Guide
Jonathan Gennick - 2003
It's used to create and maintain database objects, place data into those objects, query the data, modify the data, and, finally, delete data that is no longer needed. Databases lie at the heart of many, if not most business applications. Chances are very good that if you're involved with software development, you're using SQL to some degree. And if you're using SQL, you should own a good reference or two.Now available in an updated second edition, our very popular "SQL Pocket Guide" is a major help to programmers, database administrators, and everyone who uses SQL in their day-to-day work. The "SQL Pocket Guide" is a concise reference to frequently used SQL statements and commonly used SQL functions. Not just an endless collection of syntax diagrams, this portable guide addresses the language's complexity head on and leads by example. The information in this edition has been updated to reflect the latest versions of the most commonly used SQL variants including: Oracle Database 10g, Release 2 (includingthe free Oracle Database 10g Express Edition (XE))Microsoft SQL Server 2005MySQL 5IBM DB2 8.2PostreSQL 8.1 database
Real World Java EE Patterns--Rethinking Best Practices
Adam Bien - 2009
:-)
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
Bruce Schneier - 2015
Your online and in-store purchasing patterns are recorded, and reveal if you're unemployed, sick, or pregnant. Your e-mails and texts expose your intimate and casual friends. Google knows what you’re thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.The powers that surveil us do more than simply store this information. Corporations use surveillance to manipulate not only the news articles and advertisements we each see, but also the prices we’re offered. Governments use surveillance to discriminate, censor, chill free speech, and put people in danger worldwide. And both sides share this information with each other or, even worse, lose it to cybercriminals in huge data breaches.Much of this is voluntary: we cooperate with corporate surveillance because it promises us convenience, and we submit to government surveillance because it promises us protection. The result is a mass surveillance society of our own making. But have we given up more than we’ve gained? In Data and Goliath, security expert Bruce Schneier offers another path, one that values both security and privacy. He brings his bestseller up-to-date with a new preface covering the latest developments, and then shows us exactly what we can do to reform government surveillance programs, shake up surveillance-based business models, and protect our individual privacy. You'll never look at your phone, your computer, your credit cards, or even your car in the same way again.
Social Physics: How Good Ideas Spread— The Lessons from a New Science
Alex Pentland - 2014
Over years of groundbreaking experiments, he has distilled remarkable discoveries significant enough to become the bedrock of a whole new scientific field: social physics. Humans have more in common with bees than we like to admit: We’re social creatures first and foremost. Our most important habits of action—and most basic notions of common sense—are wired into us through our coordination in social groups. Social physics is about idea flow, the way human social networks spread ideas and transform those ideas into behaviors. Thanks to the millions of digital bread crumbs people leave behind via smartphones, GPS devices, and the Internet, the amount of new information we have about human activity is truly profound. Until now, sociologists have depended on limited data sets and surveys that tell us how people say they think and behave, rather than what they actually do. As a result, we’ve been stuck with the same stale social structures—classes, markets—and a focus on individual actors, data snapshots, and steady states. Pentland shows that, in fact, humans respond much more powerfully to social incentives that involve rewarding others and strengthening the ties that bind than incentives that involve only their own economic self-interest. Pentland and his teams have found that they can study patterns of information exchange in a social network without any knowledge of the actual content of the information and predict with stunning accuracy how productive and effective that network is, whether it’s a business or an entire city. We can maximize a group’s collective intelligence to improve performance and use social incentives to create new organizations and guide them through disruptive change in a way that maximizes the good. At every level of interaction, from small groups to large cities, social networks can be tuned to increase exploration and engagement, thus vastly improving idea flow. Social Physics will change the way we think about how we learn and how our social groups work—and can be made to work better, at every level of society. Pentland leads readers to the edge of the most important revolution in the study of social behavior in a generation, an entirely new way to look at life itself.
Cassandra: The Definitive Guide
Eben Hewitt - 2010
Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.Author Eben Hewitt demonstrates the advantages of Cassandra's nonrelational design, and pays special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.Understand the tenets of Cassandra's column-oriented structureLearn how to write, update, and read Cassandra dataDiscover how to add or remove nodes from the cluster as your application requiresExamine a working application that translates from a relational model to Cassandra's data modelUse examples for writing clients in Java, Python, and C#Use the JMX interface to monitor a cluster's usage, memory patterns, and moreTune memory settings, data storage, and caching for better performance
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
Pramod J. Sadalage - 2012
Advocates of NoSQL databases claim they can be used to build systems that are more performant, scale better, and are easier to program." ""NoSQL Distilled" is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further. The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j. In addition, by drawing on Pramod Sadalage's pioneering work, "NoSQL Distilled" shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
Eric Redmond - 2012
As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.
Exploring CQRS and Event Sourcing
Dominic Betts - 2012
It presents a learning journey, not definitive guidance. It describes the experiences of a development team with no prior CQRS proficiency in building, deploying (to Windows Azure), and maintaining a sample real-world, complex, enterprise system to showcase various CQRS and ES concepts, challenges, and techniques.The development team did not work in isolation; we actively sought input from industry experts and from a wide group of advisors to ensure that the guidance is both detailed and practical.The CQRS pattern and event sourcing are not mere simplistic solutions to the problems associated with large-scale, distributed systems. By providing you with both a working application and written guidance, we expect you’ll be well prepared to embark on your own CQRS journey.
Architecting the Cloud: Design Decisions for Cloud Computing Service Models (Saas, Paas, and Iaas)
Michael J. Kavis - 2013
However, before you can decide on a cloud model, you need to determine what the ideal cloud service model is for your business. Helping you cut through all the haze, Architecting the Cloud is vendor neutral and guides you in making one of the most critical technology decisions that you will face: selecting the right cloud service model(s) based on a combination of both business and technology requirements.Guides corporations through key cloud design considerations Discusses the pros and cons of each cloud service model Highlights major design considerations in areas such as security, data privacy, logging, data storage, SLA monitoring, and more Clearly defines the services cloud providers offer for each service model and the cloud services IT must provide Arming you with the information you need to choose the right cloud service provider, Architecting the Cloud is a comprehensive guide covering everything you need to be aware of in selecting the right cloud service model for you.