Machine Learning: The Art and Science of Algorithms That Make Sense of Data


Peter Flach - 2012
    Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

Mathematical Statistics and Data Analysis


John A. Rice - 1988
    The book's approach interweaves traditional topics with data analysis and reflects the use of the computer with close ties to the practice of statistics. The author stresses analysis of data, examines real problems with real data, and motivates the theory. The book's descriptive statistics, graphical displays, and realistic applications stand in strong contrast to traditional texts which are set in abstract settings.

Macroanalysis: Digital Methods and Literary History


Matthew L. Jockers - 2013
    Jockers introduces readers to large-scale literary computing and the revolutionary potential of macroanalysis--a new approach to the study of the literary record designed for probing the digital-textual world as it exists today, in digital form and in large quantities. Using computational analysis to retrieve key words, phrases, and linguistic patterns across thousands of texts in digital libraries, researchers can draw conclusions based on quantifiable evidence regarding how literary trends are employed over time, across periods, within regions, or within demographic groups, as well as how cultural, historical, and societal linkages may bind individual authors, texts, and genres into an aggregate literary culture. Moving beyond the limitations of literary interpretation based on the "close-reading" of individual works, Jockers describes how this new method of studying large collections of digital material can help us to better understand and contextualize the individual works within those collections.

Statistics: A Very Short Introduction


David J. Hand - 2008
    From randomized clinical trials in medical research, to statistical models of risk in banking and hedge fund industries, to the statistical tools used to probe vast astronomical databases, the field of statistics has become centrally important to how we understand our world. But the discipline underlying all these is not the dull statistics of the popular imagination. Long gone are the days of manual arithmetic manipulation. Nowadays statistics is a dynamic discipline, revolutionized by the computer, which uses advanced software tools to probe numerical data, seeking structures, patterns, and relationships. This Very Short Introduction sets the study of statistics in context, describing its history and giving examples of its impact, summarizes methods of gathering and evaluating data, and explains the role played by the science of chance, of probability, in statistical methods. The book also explores deep philosophical issues of induction--how we use statistics to discern the true nature of reality from the limited observations we necessarily must make.About the Series: Combining authority with wit, accessibility, and style, Very Short Introductions offer an introduction to some of life's most interesting topics. Written by experts for the newcomer, they demonstrate the finest contemporary thinking about the central problems and issues in hundreds of key topics, from philosophy to Freud, quantum theory to Islam.

How Charts Lie: Getting Smarter about Visual Information


Alberto Cairo - 2019
    While such visualizations can better inform us, they can also deceive by displaying incomplete or inaccurate data, suggesting misleading patterns—or simply misinform us by being poorly designed, such as the confusing “eye of the storm” maps shown on TV every hurricane season.Many of us are ill equipped to interpret the visuals that politicians, journalists, advertisers, and even employers present each day, enabling bad actors to easily manipulate visuals to promote their own agendas. Public conversations are increasingly driven by numbers, and to make sense of them we must be able to decode and use visual information. By examining contemporary examples ranging from election-result infographics to global GDP maps and box-office record charts, How Charts Lie teaches us how to do just that.

Fearless Symmetry: Exposing the Hidden Patterns of Numbers


Avner Ash - 2006
    But sometimes the solutions are not as interesting as the beautiful symmetric patterns that lead to them. Written in a friendly style for a general audience, Fearless Symmetry is the first popular math book to discuss these elegant and mysterious patterns and the ingenious techniques mathematicians use to uncover them.Hidden symmetries were first discovered nearly two hundred years ago by French mathematician �variste Galois. They have been used extensively in the oldest and largest branch of mathematics--number theory--for such diverse applications as acoustics, radar, and codes and ciphers. They have also been employed in the study of Fibonacci numbers and to attack well-known problems such as Fermat's Last Theorem, Pythagorean Triples, and the ever-elusive Riemann Hypothesis. Mathematicians are still devising techniques for teasing out these mysterious patterns, and their uses are limited only by the imagination.The first popular book to address representation theory and reciprocity laws, Fearless Symmetry focuses on how mathematicians solve equations and prove theorems. It discusses rules of math and why they are just as important as those in any games one might play. The book starts with basic properties of integers and permutations and reaches current research in number theory. Along the way, it takes delightful historical and philosophical digressions. Required reading for all math buffs, the book will appeal to anyone curious about popular mathematics and its myriad contributions to everyday life.

The Cartoon Guide to Statistics


Larry Gonick - 1993
    Never again will you order the Poisson Distribution in a French restaurant!This updated version features all new material.

Probability Theory: The Logic of Science


E.T. Jaynes - 1999
    It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management


Michael J.A. Berry - 1997
    Packed with more than forty percent new and updated material, this edition shows business managers, marketing analysts, and data mining specialists how to harness fundamental data mining methods and techniques to solve common types of business problemsEach chapter covers a new data mining technique, and then shows readers how to apply the technique for improved marketing, sales, and customer supportThe authors build on their reputation for concise, clear, and practical explanations of complex concepts, making this book the perfect introduction to data miningMore advanced chapters cover such topics as how to prepare data for analysis and how to create the necessary infrastructure for data miningCovers core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, clustering, and survival analysis

The Signal and the Noise: Why So Many Predictions Fail—But Some Don't


Nate Silver - 2012
    He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com. Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the "prediction paradox": The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good-or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary-and dangerous-science.Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

Mostly Harmless Econometrics: An Empiricist's Companion


Joshua D. Angrist - 2008
    In the modern experimentalist paradigm, these techniques address clear causal questions such as: Do smaller classes increase learning? Should wife batterers be arrested? How much does education raise wages? Mostly Harmless Econometrics shows how the basic tools of applied econometrics allow the data to speak.In addition to econometric essentials, Mostly Harmless Econometrics covers important new extensions--regression-discontinuity designs and quantile regression--as well as how to get standard errors right. Joshua Angrist and Jorn-Steffen Pischke explain why fancier econometric techniques are typically unnecessary and even dangerous. The applied econometric methods emphasized in this book are easy to use and relevant for many areas of contemporary social science.An irreverent review of econometric essentials A focus on tools that applied researchers use most Chapters on regression-discontinuity designs, quantile regression, and standard errors Many empirical examples A clear and concise resource with wide applications

Programming Collective Intelligence: Building Smart Web 2.0 Applications


Toby Segaran - 2002
    With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in a dataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."-- Tim Wolters, CTO, Collective Intellect

Information Theory, Inference and Learning Algorithms


David J.C. MacKay - 2002
    These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

Elementary Statistics: Picturing the World


Ron Larson - 2002
    Offering an approach with a visual/graphical emphasis, this text offers a number of examples on the premise that students learn best by doing. This book features an emphasis on interpretation of results and critical thinking over calculations.