The Signal and the Noise: Why So Many Predictions Fail—But Some Don't


Nate Silver - 2012
    He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com. Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the "prediction paradox": The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good-or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary-and dangerous-science.Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

Thinking in Systems: A Primer


Donella H. Meadows - 2008
    Edited by the Sustainability Institute’s Diana Wright, this essential primer brings systems thinking out of the realm of computers and equations and into the tangible world, showing readers how to develop the systems-thinking skills that thought leaders across the globe consider critical for 21st-century life.Some of the biggest problems facing the world—war, hunger, poverty, and environmental degradation—are essentially system failures. They cannot be solved by fixing one piece in isolation from the others, because even seemingly minor details have enormous power to undermine the best efforts of too-narrow thinking.While readers will learn the conceptual tools and methods of systems thinking, the heart of the book is grander than methodology. Donella Meadows was known as much for nurturing positive outcomes as she was for delving into the science behind global dilemmas. She reminds readers to pay attention to what is important, not just what is quantifiable, to stay humble, and to stay a learner.In a world growing ever more complicated, crowded, and interdependent, Thinking in Systems helps readers avoid confusion and helplessness, the first step toward finding proactive and effective solutions.

Mostly Harmless Econometrics: An Empiricist's Companion


Joshua D. Angrist - 2008
    In the modern experimentalist paradigm, these techniques address clear causal questions such as: Do smaller classes increase learning? Should wife batterers be arrested? How much does education raise wages? Mostly Harmless Econometrics shows how the basic tools of applied econometrics allow the data to speak.In addition to econometric essentials, Mostly Harmless Econometrics covers important new extensions--regression-discontinuity designs and quantile regression--as well as how to get standard errors right. Joshua Angrist and Jorn-Steffen Pischke explain why fancier econometric techniques are typically unnecessary and even dangerous. The applied econometric methods emphasized in this book are easy to use and relevant for many areas of contemporary social science.An irreverent review of econometric essentials A focus on tools that applied researchers use most Chapters on regression-discontinuity designs, quantile regression, and standard errors Many empirical examples A clear and concise resource with wide applications

Discovering Statistics Using SPSS (Introducing Statistical Methods)


Andy Field - 2000
    What's new in the Second Edition? 1. Fully compliant with the latest version of SPSS version 12 2. More coverage of advanced statistics including completely new coverage of non-parametric statistics. The book is 50 per cent longer than the First Edition. 3. Each section of each chapter now has a notation - 1,2 or 3 - referring to the intended level of study. This helps students navigate their way through the book and makes it user-friendly for students of ALL levels. 4. Has a 'how to use this book' section at the start of the text. 5. Characters in each chapter have defined roles - summarizing key points, to pose questions etc 6. Each chapter now has several examples for students to work through. Answers provided on the enclosed CD-ROM

Probability Theory: The Logic of Science


E.T. Jaynes - 1999
    It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.

The Little SAS Book: A Primer


Lora D. Delwiche - 1995
    This friendly, easy-to-read guide gently introduces you to the most commonly used features of SAS software plus a whole lot more! Authors Lora Delwiche and Susan Slaughter have revised the text to include concepts of the Output Delivery System; the STYLE= option in the PRINT, REPORT, and TABULATE procedures; ODS HTML, RTF, PRINTER, and OUTPUT destinations; PROC REPORT; more on PROC TABULATE; exporting data; and the colon modifier for informats. You'll find clear and concise explanations of basic SAS concepts (such as DATA and PROC steps), inputting data, modifying and combining data sets, summarizing and presenting data, basic statistical procedures, and debugging SAS programs. Each topic is presented in a self-contained, two-page layout complete with examples and graphics. This format enables new users to get up and running quickly, while the examples allow you to type in the program and see it work!

Machine Learning: A Probabilistic Perspective


Kevin P. Murphy - 2012
    Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Game Theory


Drew Fudenberg - 1991
    The analytic material is accompanied by many applications, examples, and exercises. The theory of noncooperative games studies the behavior of agents in any situation where each agent's optimal choice may depend on a forecast of the opponents' choices. "Noncooperative" refers to choices that are based on the participant's perceived selfinterest. Although game theory has been applied to many fields, Fudenberg and Tirole focus on the kinds of game theory that have been most useful in the study of economic problems. They also include some applications to political science. The fourteen chapters are grouped in parts that cover static games of complete information, dynamic games of complete information, static games of incomplete information, dynamic games of incomplete information, and advanced topics.--mitpress.mit.edu

The Perfect Swarm: The Science of Complexity in Everyday Life


Len Fisher - 2009
    This process of “self-organization” reveals itself in the inanimate worlds of crystals and seashells, but as Len Fisher shows, it is also evident in living organisms, from fish to ants to human beings. The coordinated movements of fish in shoals, for example, arise from the simple rule: “Follow the fish in front.” Traffic flow arises from simple rules: “Keep your distance” and “Keep to the right.”Now, in his new book, Fisher shows how we can manage our complex social lives in an ever more chaotic world. His investigation encompasses topics ranging from “swarm intelligence” to the science of parties and the best ways to start a fad. Finally, Fisher sheds light on the beauty and utility of complexity theory. An entertaining journey into the science of everyday life, The Perfect Swarm will delight anyone who wants to understand the complex situations in which we so often find ourselves.

Cybernetics: or the Control and Communication in the Animal and the Machine


Norbert Wiener - 1948
    It is a ‘ must’ book for those in every branch of science . . . in addition, economists, politicians, statesmen, and businessmen cannot afford to overlook cybernetics and its tremendous, even terrifying implications. "It is a beautifully written book, lucid, direct, and despite its complexity, as readable by the layman as the trained scientist." -- John B. Thurston, "The Saturday Review of Literature" Acclaimed one of the "seminal books . . . comparable in ultimate importance to . . . Galileo or Malthus or Rousseau or Mill," "Cybernetics" was judged by twenty-seven historians, economists, educators, and philosophers to be one of those books published during the "past four decades", which may have a substantial impact on public thought and action in the years ahead." -- Saturday Review

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor


Virginia Eubanks - 2018
    In Pittsburgh, a child welfare agency uses a statistical model to try to predict which children might be future victims of abuse or neglect.Since the dawn of the digital age, decision-making in finance, employment, politics, health and human services has undergone revolutionary change. Today, automated systems—rather than humans—control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data, the most invasive and punitive systems are aimed at the poor.In Automating Inequality, Virginia Eubanks systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of heart-wrenching and eye-opening stories, from a woman in Indiana whose benefits are literally cut off as she lays dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile.The U.S. has always used its most cutting-edge science and technology to contain, investigate, discipline and punish the destitute. Like the county poorhouse and scientific charity before them, digital tracking and automated decision-making hide poverty from the middle-class public and give the nation the ethical distance it needs to make inhumane choices: which families get food and which starve, who has housing and who remains homeless, and which families are broken up by the state. In the process, they weaken democracy and betray our most cherished national values.This deeply researched and passionate book could not be more timely.Naomi Klein: "This book is downright scary."Ethan Zuckerman, MIT: "Should be required reading."Dorothy Roberts, author of Killing the Black Body: "A must-read for everyone concerned about modern tools of inequality in America."Astra Taylor, author of The People's Platform: "This is the single most important book about technology you will read this year."

Against the Gods: The Remarkable Story of Risk


Peter L. Bernstein - 1996
    Peter Bernstein has written a comprehensive history of man's efforts to understand risk and probability, beginning with early gamblers in ancient Greece, continuing through the 17th-century French mathematicians Pascal and Fermat and up to modern chaos theory. Along the way he demonstrates that understanding risk underlies everything from game theory to bridge-building to winemaking.

Learning From Data: A Short Course


Yaser S. Abu-Mostafa - 2012
    Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.