The Art of Data Science: A Guide for Anyone Who Works with Data


Roger D. Peng - 2015
    The authors have extensive experience both managing data analysts and conducting their own data analyses, and have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science.

The Art of Statistics: How to Learn from Data


David Spiegelhalter - 2019
      Statistics are everywhere, as integral to science as they are to business, and in the popular media hundreds of times a day. In this age of big data, a basic grasp of statistical literacy is more important than ever if we want to separate the fact from the fiction, the ostentatious embellishments from the raw evidence -- and even more so if we hope to participate in the future, rather than being simple bystanders. In The Art of Statistics, world-renowned statistician David Spiegelhalter shows readers how to derive knowledge from raw data by focusing on the concepts and connections behind the math. Drawing on real world examples to introduce complex issues, he shows us how statistics can help us determine the luckiest passenger on the Titanic, whether a notorious serial killer could have been caught earlier, and if screening for ovarian cancer is beneficial. The Art of Statistics not only shows us how mathematicians have used statistical science to solve these problems -- it teaches us how we too can think like statisticians. We learn how to clarify our questions, assumptions, and expectations when approaching a problem, and -- perhaps even more importantly -- we learn how to responsibly interpret the answers we receive. Combining the incomparable insight of an expert with the playful enthusiasm of an aficionado, The Art of Statistics is the definitive guide to stats that every modern person needs.

The Elements of Data Analytic Style


Jeffrey Leek - 2015
    This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. It is based in part on the authors blog posts, lecture materials, and tutorials. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world that has enrolled more than 1.76 million people. The book is useful as a companion to introductory courses in data science or data analysis. It is also a useful reference tool for people tasked with reading and critiquing data analyses. It is based on the authors popular open-source guides available through his Github account (https://github.com/jtleek). The paper is also available through Leanpub (https://leanpub.com/datastyle), if the book is purchased on that platform you are entitled to lifetime free updates.

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die


Eric Siegel - 2013
    Rather than a "how to" for hands-on techies, the book entices lay-readers and experts alike by covering new case studies and the latest state-of-the-art techniques.You have been predicted — by companies, governments, law enforcement, hospitals, and universities. Their computers say, "I knew you were going to do that!" These institutions are seizing upon the power to predict whether you're going to click, buy, lie, or die.Why? For good reason: predicting human behavior combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales.How? Prediction is powered by the world's most potent, booming unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future — lifting a bit of the fog off our hazy view of tomorrow — means pay dirt.In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: -What type of mortgage risk Chase Bank predicted before the recession. -Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves. -Why early retirement decreases life expectancy and vegetarians miss fewer flights. -Five reasons why organizations predict death, including one health insurance company. -How U.S. Bank, European wireless carrier Telenor, and Obama's 2012 campaign calculated the way to most strongly influence each individual. -How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! -How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. -How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. -What's predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia. A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.Predictive analytics transcends human perception. This book's final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can't even be sure has happened afterward — but that can be predicted in advance?Whether you are a consumer of it — or consumed by it — get a handle on the power of Predictive Analytics.

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World


Bruce Schneier - 2015
    Your online and in-store purchasing patterns are recorded, and reveal if you're unemployed, sick, or pregnant. Your e-mails and texts expose your intimate and casual friends. Google knows what you’re thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.The powers that surveil us do more than simply store this information. Corporations use surveillance to manipulate not only the news articles and advertisements we each see, but also the prices we’re offered. Governments use surveillance to discriminate, censor, chill free speech, and put people in danger worldwide. And both sides share this information with each other or, even worse, lose it to cybercriminals in huge data breaches.Much of this is voluntary: we cooperate with corporate surveillance because it promises us convenience, and we submit to government surveillance because it promises us protection. The result is a mass surveillance society of our own making. But have we given up more than we’ve gained? In Data and Goliath, security expert Bruce Schneier offers another path, one that values both security and privacy. He brings his bestseller up-to-date with a new preface covering the latest developments, and then shows us exactly what we can do to reform government surveillance programs, shake up surveillance-based business models, and protect our individual privacy. You'll never look at your phone, your computer, your credit cards, or even your car in the same way again.

Information Dashboard Design: The Effective Visual Communication of Data


Stephen Few - 2006
    Although dashboards are potentially powerful, this potential is rarely realized. The greatest display technology in the world won't solve this if you fail to use effective visual design. And if a dashboard fails to tell you precisely what you need to know in an instant, you'll never use it, even if it's filled with cute gauges, meters, and traffic lights. Don't let your investment in dashboard technology go to waste.This book will teach you the visual design skills you need to create dashboards that communicate clearly, rapidly, and compellingly. Information Dashboard Design will explain how to:Avoid the thirteen mistakes common to dashboard design Provide viewers with the information they need quickly and clearly Apply what we now know about visual perception to the visual presentation of information Minimize distractions, cliches, and unnecessary embellishments that create confusion Organize business information to support meaning and usability Create an aesthetically pleasing viewing experience Maintain consistency of design to provide accurate interpretation Optimize the power of dashboard technology by pairing it with visual effectiveness Stephen Few has over 20 years of experience as an IT innovator, consultant, and educator. As Principal of the consultancy Perceptual Edge, Stephen focuses on data visualization for analyzing and communicating quantitative business information. He provides consulting and training services, speaks frequently at conferences, and teaches in the MBA program at the University of California in Berkeley. He is also the author of Show Me the Numbers: Designing Tables and Graphs to Enlighten. Visit his website at www.perceptualedge.com.

Prediction Machines: The Simple Economics of Artificial Intelligence


Ajay Agrawal - 2018
    But facing the sea change that AI will bring can be paralyzing. How should companies set strategies, governments design policies, and people plan their lives for a world so different from what we know? In the face of such uncertainty, many analysts either cower in fear or predict an impossibly sunny future.But in Prediction Machines, three eminent economists recast the rise of AI as a drop in the cost of prediction. With this single, masterful stroke, they lift the curtain on the AI-is-magic hype and show how basic tools from economics provide clarity about the AI revolution and a basis for action by CEOs, managers, policy makers, investors, and entrepreneurs.When AI is framed as cheap prediction, its extraordinary potential becomes clear: Prediction is at the heart of making decisions under uncertainty. Our businesses and personal lives are riddled with such decisions. Prediction tools increase productivity--operating machines, handling documents, communicating with customers. Uncertainty constrains strategy. Better prediction creates opportunities for new business structures and strategies to compete. Penetrating, fun, and always insightful and practical, Prediction Machines follows its inescapable logic to explain how to navigate the changes on the horizon. The impact of AI will be profound, but the economic framework for understanding it is surprisingly simple.

The Pragmatic Programmer: From Journeyman to Master


Andy Hunt - 1999
    It covers topics ranging from personal responsibility and career development to architectural techniques for keeping your code flexible and easy to adapt and reuse. Read this book, and you'll learn how toFight software rot; Avoid the trap of duplicating knowledge; Write flexible, dynamic, and adaptable code; Avoid programming by coincidence; Bullet-proof your code with contracts, assertions, and exceptions; Capture real requirements; Test ruthlessly and effectively; Delight your users; Build teams of pragmatic programmers; and Make your developments more precise with automation. Written as a series of self-contained sections and filled with entertaining anecdotes, thoughtful examples, and interesting analogies, The Pragmatic Programmer illustrates the best practices and major pitfalls of many different aspects of software development. Whether you're a new coder, an experienced programmer, or a manager responsible for software projects, use these lessons daily, and you'll quickly see improvements in personal productivity, accuracy, and job satisfaction. You'll learn skills and develop habits and attitudes that form the foundation for long-term success in your career. You'll become a Pragmatic Programmer.

Thinking with Data


Max Shron - 2014
    In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Data helps you learn techniques for turning data into knowledge you can use. You’ll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You’ll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved.Learn a framework for scoping data projectsUnderstand how to pin down the details of an idea, receive feedback, and begin prototypingUse the tools of arguments to ask good questions, build projects in stages, and communicate resultsExplore data-specific patterns of reasoning and learn how to build more useful argumentsDelve into causal reasoning and learn how it permeates data workPut everything together, using extended examples to see the method of full problem thinking in action

The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity


Alan Cooper - 1999
    Cooper details many of these meta functions to explain his central thesis: programmers need to seriously re-evaluate the many user-hostile concepts deeply embedded within the software development process. Rather than provide users with a straightforward set of options, programmers often pile on the bells and whistles and ignore or de-prioritise lingering bugs. For the average user, increased functionality is a great burden, adding to the recurrent chorus that plays: "computers are hard, mysterious, unwieldy things." (An average user, Cooper asserts, who doesn't think that way or who has memorised all the esoteric commands and now lords it over others, has simply been desensitised by too many years of badly designed software.) Cooper's writing style is often overblown, with a pantheon of cutesy terminology (i.e. "dancing bearware") and insider back-patting. (When presenting software to Bill Gates, he reports that Gates replied: "How did you do that?" to which he writes: "I love stumping Bill!") More seriously, he is also unable to see beyond software development's importance--a sin he accuses programmers of throughout the book. Even with that in mind, the central questions Cooper asks are too important to ignore: Are we making users happier? Are we improving the process by which they get work done? Are we making their work hours more effective? Cooper looks to programmers, business managers and what he calls "interaction designers" to question current assumptions and mindsets. Plainly, he asserts that the goal of computer usage should be "not to make anyone feel stupid." Our distance from that goal reinforces the need to rethink entrenched priorities in software planning. -- Jennifer Buckendorff, Amazon.com

Hands-On Machine Learning with Scikit-Learn and TensorFlow


Aurélien Géron - 2017
    Now that machine learning is thriving, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.By using concrete examples, minimal theory, and two production-ready Python frameworks—Scikit-Learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn how to use a range of techniques, starting with simple Linear Regression and progressing to Deep Neural Networks. If you have some programming experience and you’re ready to code a machine learning project, this guide is for you.This hands-on book shows you how to use:Scikit-Learn, an accessible framework that implements many algorithms efficiently and serves as a great machine learning entry pointTensorFlow, a more complex library for distributed numerical computation, ideal for training and running very large neural networksPractical code examples that you can apply without learning excessive machine learning theory or algorithm details

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement


Eric Redmond - 2012
    As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.

Big Data: A Revolution That Will Transform How We Live, Work, and Think


Viktor Mayer-Schönberger - 2013
    “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing.www.big-data-book.com

Deep Learning with Python


François Chollet - 2017
    It is the technology behind photo tagging systems at Facebook and Google, self-driving cars, speech recognition systems on your smartphone, and much more.In particular, Deep learning excels at solving machine perception problems: understanding the content of image data, video data, or sound data. Here's a simple example: say you have a large collection of images, and that you want tags associated with each image, for example, "dog," "cat," etc. Deep learning can allow you to create a system that understands how to map such tags to images, learning only from examples. This system can then be applied to new images, automating the task of photo tagging. A deep learning model only has to be fed examples of a task to start generating useful results on new data.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.