Book picks similar to
Statistical Evidence: A Likelihood Paradigm by Richard Royall
data-science
statistics
science
math
Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak
Travis Sawchik - 2015
Pittsburghers joked their town was the city of champions…and the Pirates. Big Data Baseball is the story of how the 2013 Pirates, mired in the longest losing streak in North American pro sports history, adopted drastic big-data strategies to end the drought, make the playoffs, and turn around the franchise's fortunes.Award-winning journalist Travis Sawchik takes you behind the scenes to expertly weave together the stories of the key figures who changed the way the small-market Pirates played the game. For manager Clint Hurdle and the front office staff to save their jobs, they could not rely on a free agent spending spree, instead they had to improve the sum of their parts and find hidden value. They had to change. From Hurdle shedding his old-school ways to work closely with Neal Huntington, the forward-thinking data-driven GM and his team of talented analysts; to pitchers like A. J. Burnett and Gerrit Cole changing what and where they threw; to Russell Martin, the undervalued catcher whose expert use of the nearly-invisible skill of pitch framing helped the team's pitchers turn more balls into strikes; to Clint Barmes, a solid shortstop and one of the early adopters of the unconventional on-field shift which forced the entire infield to realign into positions they never stood in before. Under Hurdle's leadership, a culture of collaboration and creativity flourished as he successfully blended whiz kid analysts with graybeard coaches—a kind of symbiotic teamwork which was unique to the sport.Big Data Baseball is Moneyball on steroids. It is an entertaining and enlightening underdog story that uses the 2013 Pirates season as the perfect lens to examine the sport's burgeoning big-data movement. With the help of data-tracking systems like PitchF/X and TrackMan, the Pirates collected millions of data points on every pitch and ball in play to create a tome of color-coded reports that revealed groundbreaking insights for how to win more games without spending a dime. In the process, they discovered that most batters struggled to hit two-seam fastballs, that an aggressive defensive shift on the field could turn more batted balls into outs, and that a catcher's most valuable skill was hidden. All these data points which aren't immediately visible to players and spectators, are the bit of magic that led the Pirates to spin straw in to gold, finish the 2013 season in second place, end a twenty-year losing streak.
The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball
Benjamin Baumer - 2013
Rocketed to popularity by the 2003 bestseller Moneyball and the film of the same name, the use of sabermetrics to analyze player performance has appeared to be a David to the Goliath of systemically advantaged richer teams that could be toppled only by creative statistical analysis. The story has been so compelling that, over the past decade, team after team has integrated statistical analysis into its front office. But how accurately can crunching numbers quantify a player's ability? Do sabermetrics truly level the playing field for financially disadvantaged teams? How much of the baseball analytic trend is fad and how much fact?The Sabermetric Revolution sets the record straight on the role of analytics in baseball. Former Mets sabermetrician Benjamin Baumer and leading sports economist Andrew Zimbalist correct common misinterpretations and develop new methods to assess the effectiveness of sabermetrics on team performance. Tracing the growth of front office dependence on sabermetrics and the breadth of its use today, they explore how Major League Baseball and the field of sports analytics have changed since the 2002 season. Their conclusion is optimistic, but the authors also caution that sabermetric insights will be more difficult to come by in the future. The Sabermetric Revolution offers more than a fascinating case study of the use of statistics by general managers and front office executives: for fans and fantasy leagues, this book will provide an accessible primer on the real math behind moneyball as well as new insight into the changing business of baseball.
Introduction to Probability
Joseph K. Blitzstein - 2014
The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo MCMC. Additional application areas explored include genetics, medicine, computer science, and information theory. The print book version includes a code that provides free access to an eBook version. The authors present the material in an accessible style and motivate concepts using real-world examples. Throughout, they use stories to uncover connections between the fundamental distributions in statistics and conditioning to reduce complicated problems to manageable pieces. The book includes many intuitive explanations, diagrams, and practice problems. Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment.
The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
Sharon Bertsch McGrayne - 2011
To its adherents, it is an elegant statement about learning from experience. To its opponents, it is subjectivity run amok.In the first-ever account of Bayes' rule for general readers, Sharon Bertsch McGrayne explores this controversial theorem and the human obsessions surrounding it. She traces its discovery by an amateur mathematician in the 1740s through its development into roughly its modern form by French scientist Pierre Simon Laplace. She reveals why respected statisticians rendered it professionally taboo for 150 years—at the same time that practitioners relied on it to solve crises involving great uncertainty and scanty information (Alan Turing's role in breaking Germany's Enigma code during World War II), and explains how the advent of off-the-shelf computer technology in the 1980s proved to be a game-changer. Today, Bayes' rule is used everywhere from DNA de-coding to Homeland Security.Drawing on primary source material and interviews with statisticians and other scientists, The Theory That Would Not Die is the riveting account of how a seemingly simple theorem ignited one of the greatest controversies of all time.
Head First Data Analysis: A Learner's Guide to Big Numbers, Statistics, and Good Decisions
Michael G. Milton - 2009
If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to:Determine which data sources to use for collecting information Assess data quality and distinguish signal from noise Build basic data models to illuminate patterns, and assimilate new information into the models Cope with ambiguous information Design experiments to test hypotheses and draw conclusions Use segmentation to organize your data within discrete market groups Visualize data distributions to reveal new relationships and persuade others Predict the future with sampling and probability models Clean your data to make it useful Communicate the results of your analysis to your audience Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Data Analysis uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.
Applied Predictive Modeling
Max Kuhn - 2013
Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms. Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance-all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code f
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
Matthew A. Russell - 2011
You’ll learn how to combine social web data, analysis techniques, and visualization to find what you’ve been looking for in the social haystack—as well as useful information you didn’t know existed.Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started is a programming background and a willingness to learn basic Python tools.Get a straightforward synopsis of the social web landscapeUse adaptable scripts on GitHub to harvest data from social network APIs such as Twitter, Facebook, LinkedIn, and Google+Learn how to employ easy-to-use Python tools to slice and dice the data you collectExplore social connections in microformats with the XHTML Friends NetworkApply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detectionBuild interactive visualizations with web technologies based upon HTML5 and JavaScript toolkits"A rich, compact, useful, practical introduction to a galaxy of tools, techniques, and theories for exploring structured and unstructured data." --Alex Martelli, Senior Staff Engineer, Google
The Hundred-Page Machine Learning Book
Andriy Burkov - 2019
During that week, you will learn almost everything modern machine learning has to offer. The author and other practitioners have spent years learning these concepts.Companion wiki — the book has a continuously updated wiki that extends some book chapters with additional information: Q&A, code snippets, further reading, tools, and other relevant resources.Flexible price and formats — choose from a variety of formats and price options: Kindle, hardcover, paperback, EPUB, PDF. If you buy an EPUB or a PDF, you decide the price you pay!Read first, buy later — download book chapters for free, read them and share with your friends and colleagues. Only if you liked the book or found it useful in your work, study or business, then buy it.
Machine Learning: A Probabilistic Perspective
Kevin P. Murphy - 2012
Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Too Big to Ignore: The Business Case for Big Data
Phil Simon - 2013
Progressive Insurance tracks real-time customer driving patterns and uses that information to offer rates truly commensurate with individual safety. Google accurately predicts local flu outbreaks based upon thousands of user search queries. Amazon provides remarkably insightful, relevant, and timely product recommendations to its hundreds of millions of customers. Quantcast lets companies target precise audiences and key demographics throughout the Web. NASA runs contests via gamification site TopCoder, awarding prizes to those with the most innovative and cost-effective solutions to its problems. Explorys offers penetrating and previously unknown insights into healthcare behavior.How do these organizations and municipalities do it? Technology is certainly a big part, but in each case the answer lies deeper than that. Individuals at these organizations have realized that they don't have to be Nate Silver to reap massive benefits from today's new and emerging types of data. And each of these organizations has embraced Big Data, allowing them to make astute and otherwise impossible observations, actions, and predictions.It's time to start thinking big.In Too Big to Ignore, recognized technology expert and award-winning author Phil Simon explores an unassailably important trend: Big Data, the massive amounts, new types, and multifaceted sources of information streaming at us faster than ever. Never before have we seen data with the volume, velocity, and variety of today. Big Data is no temporary blip of fad. In fact, it is only going to intensify in the coming years, and its ramifications for the future of business are impossible to overstate.Too Big to Ignore explains why Big Data is a big deal. Simon provides commonsense, jargon-free advice for people and organizations looking to understand and leverage Big Data. Rife with case studies, examples, analysis, and quotes from real-world Big Data practitioners, the book is required reading for chief executives, company owners, industry leaders, and business professionals.
Mining of Massive Datasets
Anand Rajaraman - 2011
This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Linear Algebra Done Right
Sheldon Axler - 1995
The novel approach taken here banishes determinants to the end of the book and focuses on the central goal of linear algebra: understanding the structure of linear operators on vector spaces. The author has taken unusual care to motivate concepts and to simplify proofs. For example, the book presents - without having defined determinants - a clean proof that every linear operator on a finite-dimensional complex vector space (or an odd-dimensional real vector space) has an eigenvalue. A variety of interesting exercises in each chapter helps students understand and manipulate the objects of linear algebra. This second edition includes a new section on orthogonal projections and minimization problems. The sections on self-adjoint operators, normal operators, and the spectral theorem have been rewritten. New examples and new exercises have been added, several proofs have been simplified, and hundreds of minor improvements have been made throughout the text.
Social and Economic Networks
Matthew O. Jackson - 2008
The many aspects of our lives that are governed by social networks make it critical to understand how they impact behavior, which network structures are likely to emerge in a society, and why we organize ourselves as we do. In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics. He provides empirical background on networks and the regularities that they exhibit, and discusses random graph-based models and strategic models of network formation. He helps readers to understand behavior in networked societies, with a detailed analysis of learning and diffusion in networks, decision making by individuals who are influenced by their social neighbors, game theory and markets on networks, and a host of related subjects. Jackson also describes the varied statistical and modeling techniques used to analyze social networks. Each chapter includes exercises to aid students in their analysis of how networks function.This book is an indispensable resource for students and researchers in economics, mathematics, physics, sociology, and business.
Learning From Data: A Short Course
Yaser S. Abu-Mostafa - 2012
Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.