Doing Data Science


Cathy O'Neil - 2013
    But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing


Ron Kohavi - 2020
    This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions. Learn how to - Use the scientific method to evaluate hypotheses using controlled experiments - Define key metrics and ideally an Overall Evaluation Criterion - Test for trustworthiness of the results and alert experimenters to violated assumptions - Build a scalable platform that lowers the marginal cost of experiments close to zero - Avoid pitfalls like carryover effects and Twyman's law - Understand how statistical issues play out in practice.

The Data Detective: Ten Easy Rules to Make Sense of Statistics


Tim Harford - 2020
    That’s a mistake, Tim Harford says in The Data Detective. We shouldn’t be suspicious of statistics—we need to understand what they mean and how they can improve our lives: they are, at heart, human behavior seen through the prism of numbers and are often “the only way of grasping much of what is going on around us.” If we can toss aside our fears and learn to approach them clearly—understanding how our own preconceptions lead us astray—statistics can point to ways we can live better and work smarter.As “perhaps the best popular economics writer in the world” (New Statesman), Tim Harford is an expert at taking complicated ideas and untangling them for millions of readers. In The Data Detective, he uses new research in science and psychology to set out ten strategies for using statistics to erase our biases and replace them with new ideas that use virtues like patience, curiosity, and good sense to better understand ourselves and the world. As a result, The Data Detective is a big-idea book about statistics and human behavior that is fresh, unexpected, and insightful.

Building Data Science Teams


D.J. Patil - 2011
    In this in-depth report, data scientist DJ Patil explains the skills, perspectives, tools and processes that position data science teams for success.Topics include: What it means to be "data driven." The unique roles of data scientists. The four essential qualities of data scientists. Patil's first-hand experience building the LinkedIn data science team.

Football Hackers: The Science and Art of a Data Revolution


Christoph Biermann - 2019
    Football's data revolution has only just begun. The arrival of advanced metrics and detailed analysis is already reshaping the modern game. We can now fully assess player performance, analyse the role of luck and measure what really leads to victory. There is no turning back.Now the race is on between football's wealthiest clubs and a group of outsiders, nerds and rule-breakers, who are turning the game on its head with their staggering innovations. Winning is no longer just about what happens out on the pitch, it's now a battle taking place in boardrooms and on screens across international borders with the world's brightest minds driving for an edge over their fiercest rivals.Christoph Biermann has moved in the midst of these disruptive upheavals, talking to scientists, coaches, managers, scouts and psychologists in the world's major clubs, traveling across Europe and the US and revealing the hidden - and often jaw-dropping - truths behind the beautiful game. 'A book full of exciting ideas and inside views on modern football. The most exciting book in an exciting time for football.' Thomas Hitzlsperger

The Wisdom of Crowds


James Surowiecki - 2004
    With boundless erudition and in delightfully clear prose, Surowiecki ranges across fields as diverse as popular culture, psychology, ant biology, behavioral economics, artificial intelligence, military history, and politics to show how this simple idea offers important lessons for how we live our lives, select our leaders, run our companies, and think about our world.

Applied Predictive Modeling


Max Kuhn - 2013
    Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms. Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance-all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code f

The Book: Playing the Percentages in Baseball


Tom M. Tango - 2006
    Continuing in the grand tradition of sabermetrics, the authors provide a revolutionary way to think about baseball with principles that can be applied at every level, from high school to the major leagues. Tom Tango, Mitchel Lichtman, and Andrew Dolphin cover topics such as batting and pitching matchups, platooning, the benefits and risks of intentional walks and sacrifices, the legitimacy of alleged clutch hitters, and many of baseball's other theories on hitting, fielding, pitching, and even baserunning. They analyze when a strategy is a good idea and when it's a bad idea, and how to more closely watch the inside game of baseball. Whenever you hear an announcer talk about the unwritten rule or say that so-and-so is going by the book in bringing in a situational substitute, The Book reviews the facts and determines what the real case is. If you want to know what the folks in baseball should be doing, find out in The Book.

Innumeracy: Mathematical Illiteracy and Its Consequences


John Allen Paulos - 1988
    Dozens of examples in innumeracy show us how it affects not only personal economics and travel plans, but explains mis-chosen mates, inappropriate drug-testing, and the allure of pseudo-science.

Data Science from Scratch: First Principles with Python


Joel Grus - 2015
    In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Dear Data


Giorgia Lupi - 2016
    The result is described as “a thought-provoking visual feast”.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

The Cartoon Introduction to Statistics


Grady Klein - 2013
    Employing an irresistible cast of dragon-riding Vikings, lizard-throwing giants, and feuding aliens, the renowned illustrator Grady Klein and the award-winning statistician Alan Dabney teach you how to collect reliable data, make confident statements based on limited information, and judge the usefulness of polls and the other numbers that you're bombarded with every day. If you want to go beyond the basics, they've created the ultimate resource: "The Math Cave," where they reveal the more advanced formulas and concepts.Timely, authoritative, and hilarious, The Cartoon Introduction to Statistics is an essential guide for anyone who wants to better navigate our data-driven world.

Advanced Electronic Communications Systems


Wayne Tomasi - 1987
    Numerous examples throughout provide readers with real-life applications of the concepts of analog and digital communications systems, while chapter-end questions and problems give them a chance to test and review their understanding of fundamental and key topics. Modern digital and data communications systems, microwave radio communications systems, satellite communications systems, and optical fiber communications systems. Cellular and PCS telephone systems coverage presents the latest and most innovative technological advancements being made in cellular communication systems. Optical fiber communications chapter includes new sections on light sources, optical power, optical sources and link budget. Current topics include trellis encoding, CCITT modem recommendations, PCM line speed, extended superframe format, wavelength division multiplexing, Kepler's laws, Clark orbits, limits of visibility, Satellite Radio Navigation and Navstar GPS. For the study of electronic communications systems.

The Seven Pillars of Statistical Wisdom


Stephen M. Stigler - 2016
    It allows one to gain information by discarding information, namely, the individuality of the observations. Stigler s second pillar, information measurement, challenges the importance of big data by noting that observations are not all equally important: the amount of information in a data set is often proportional to only the square root of the number of observations, not the absolute number. The third idea is likelihood, the calibration of inferences with the use of probability. Intercomparison is the principle that statistical comparisons do not need to be made with respect to an external standard. The fifth pillar is regression, both a paradox (tall parents on average produce shorter children; tall children on average have shorter parents) and the basis of inference, including Bayesian inference and causal reasoning. The sixth concept captures the importance of experimental design for example, by recognizing the gains to be had from a combinatorial approach with rigorous randomization. The seventh idea is the residual the notion that a complicated phenomenon can be simplified by subtracting the effect of known causes, leaving a residual phenomenon that can be explained more easily.The Seven Pillars of Statistical Wisdom presents an original, unified account of statistical science that will fascinate the interested layperson and engage the professional statistician."