The Data Detective: Ten Easy Rules to Make Sense of Statistics


Tim Harford - 2020
    That’s a mistake, Tim Harford says in The Data Detective. We shouldn’t be suspicious of statistics—we need to understand what they mean and how they can improve our lives: they are, at heart, human behavior seen through the prism of numbers and are often “the only way of grasping much of what is going on around us.” If we can toss aside our fears and learn to approach them clearly—understanding how our own preconceptions lead us astray—statistics can point to ways we can live better and work smarter.As “perhaps the best popular economics writer in the world” (New Statesman), Tim Harford is an expert at taking complicated ideas and untangling them for millions of readers. In The Data Detective, he uses new research in science and psychology to set out ten strategies for using statistics to erase our biases and replace them with new ideas that use virtues like patience, curiosity, and good sense to better understand ourselves and the world. As a result, The Data Detective is a big-idea book about statistics and human behavior that is fresh, unexpected, and insightful.

Python Machine Learning


Sebastian Raschka - 2015
    We are living in an age where data comes in abundance, and thanks to the self-learning algorithms from the field of machine learning, we can turn this data into knowledge. Automated speech recognition on our smart phones, web search engines, e-mail spam filters, the recommendation systems of our favorite movie streaming services – machine learning makes it all possible.Thanks to the many powerful open-source libraries that have been developed in recent years, machine learning is now right at our fingertips. Python provides the perfect environment to build machine learning systems productively.This book will teach you the fundamentals of machine learning and how to utilize these in real-world applications using Python. Step-by-step, you will expand your skill set with the best practices for transforming raw data into useful information, developing learning algorithms efficiently, and evaluating results.You will discover the different problem categories that machine learning can solve and explore how to classify objects, predict continuous outcomes with regression analysis, and find hidden structures in data via clustering. You will build your own machine learning system for sentiment analysis and finally, learn how to embed your model into a web app to share with the world

Computer Age Statistical Inference: Algorithms, Evidence, and Data Science


Bradley Efron - 2016
    'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.

Head First Data Analysis: A Learner's Guide to Big Numbers, Statistics, and Good Decisions


Michael G. Milton - 2009
    If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to:Determine which data sources to use for collecting information Assess data quality and distinguish signal from noise Build basic data models to illuminate patterns, and assimilate new information into the models Cope with ambiguous information Design experiments to test hypotheses and draw conclusions Use segmentation to organize your data within discrete market groups Visualize data distributions to reveal new relationships and persuade others Predict the future with sampling and probability models Clean your data to make it useful Communicate the results of your analysis to your audience Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Data Analysis uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.

Linear Algebra and Its Applications


Gilbert Strang - 1976
    While the mathematics is there, the effort is not all concentrated on proofs. Strang's emphasis is on understanding. He explains concepts, rather than deduces. This book is written in an informal and personal style and teaches real mathematics. The gears change in Chapter 2 as students reach the introduction of vector spaces. Throughout the book, the theory is motivated and reinforced by genuine applications, allowing pure mathematicians to teach applied mathematics.

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics


Paul Teetor - 2011
    The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author

Principles of Statistics


M.G. Bulmer - 1979
    There are equally many advanced textbooks which delve into the far reaches of statistical theory, while bypassing practical applications. But between these two approaches is an unfilled gap, in which theory and practice merge at an intermediate level. Professor M. G. Bulmer's Principles of Statistics, originally published in 1965, was created to fill that need. The new, corrected Dover edition of Principles of Statistics makes this invaluable mid-level text available once again for the classroom or for self-study.Principles of Statistics was created primarily for the student of natural sciences, the social scientist, the undergraduate mathematics student, or anyone familiar with the basics of mathematical language. It assumes no previous knowledge of statistics or probability; nor is extensive mathematical knowledge necessary beyond a familiarity with the fundamentals of differential and integral calculus. (The calculus is used primarily for ease of notation; skill in the techniques of integration is not necessary in order to understand the text.)Professor Bulmer devotes the first chapters to a concise, admirably clear description of basic terminology and fundamental statistical theory: abstract concepts of probability and their applications in dice games, Mendelian heredity, etc.; definitions and examples of discrete and continuous random variables; multivariate distributions and the descriptive tools used to delineate them; expected values; etc. The book then moves quickly to more advanced levels, as Professor Bulmer describes important distributions (binomial, Poisson, exponential, normal, etc.), tests of significance, statistical inference, point estimation, regression, and correlation. Dozens of exercises and problems appear at the end of various chapters, with answers provided at the back of the book. Also included are a number of statistical tables and selected references.

R Programming for Data Science


Roger D. Peng - 2015
    

Programming Pearls


Jon L. Bentley - 1986
    Jon has done a wonderful job of updating the material. I am very impressed at how fresh the new examples seem." - Steve McConnell, author, Code CompleteWhen programmers list their favorite books, Jon Bentley's collection of programming pearls is commonly included among the classics. Just as natural pearls grow from grains of sand that irritate oysters, programming pearls have grown from real problems that have irritated real programmers. With origins beyond solid engineering, in the realm of insight and creativity, Bentley's pearls offer unique and clever solutions to those nagging problems. Illustrated by programs designed as much for fun as for instruction, the book is filled with lucid and witty descriptions of practical programming techniques and fundamental design principles. It is not at all surprising that Programming Pearls has been so highly valued by programmers at every level of experience. In this revision, the first in 14 years, Bentley has substantially updated his essays to reflect current programming methods and environments. In addition, there are three new essays on (1) testing, debugging, and timing; (2) set representations; and (3) string problems. All the original programs have been rewritten, and an equal amount of new code has been generated. Implementations of all the programs, in C or C++, are now available on the Web.What remains the same in this new edition is Bentley's focus on the hard core of programming problems and his delivery of workable solutions to those problems. Whether you are new to Bentley's classic or are revisiting his work for some fresh insight, this book is sure to make your own list of favorites.

R for Everyone: Advanced Analytics and Graphics


Jared P. Lander - 2013
    R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES - Exploring R, RStudio, and R packages - Using R for math: variable types, vectors, calling functions, and more - Exploiting data structures, including data.frames, matrices, and lists - Creating attractive, intuitive statistical graphics - Writing user-defined functions - Controlling program flow with if, ifelse, and complex checks - Improving program efficiency with group manipulations - Combining and reshaping multiple datasets - Manipulating strings using R's facilities and regular expressions - Creating normal, binomial, and Poisson probability distributions - Programming basic statistics: mean, standard deviation, and t-tests - Building linear, generalized linear, and nonlinear models - Assessing the quality of models and variable selection - Preventing overfitting, using the Elastic Net and Bayesian methods - Analyzing univariate and multivariate time series data - Grouping data via K-means and hierarchical clustering - Preparing reports, slideshows, and web pages with knitr - Building reusable R packages with devtools and Rcpp - Getting involved with the R global community

Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms


Nikhil Buduma - 2015
    

Disruptive Possibilities: How Big Data Changes Everything


Jeffrey Needham - 2013
    As author Jeffrey Needham points out in this eye-opening book, big data can provide unprecedented insight into user habits, giving enterprises a huge market advantage. It will also inspire organizations to change the way they function."Disruptive Possibilities: How Big Data Changes Everything" takes you on a journey of discovery into the emerging world of big data, from its relatively simple technology to the ways it differs from cloud computing. But the big story of big data is the disruption of enterprise status quo, especially vendor-driven technology silos and budget-driven departmental silos. In the highly collaborative environment needed to make big data work, silos simply don't fit.Internet-scale computing offers incredible opportunity and a tremendous challenge--and it will soon become standard operating procedure in the enterprise. This book shows you what to expect.

Probability, Random Variables and Stochastic Processes with Errata Sheet


Athanasios Papoulis - 2001
    Unnikrishna Pillai of Polytechnic University. The book is intended for a senior/graduate level course in probability and is aimed at students in electrical engineering, math, and physics departments. The authors' approach is to develop the subject of probability theory and stochastic processes as a deductive discipline and to illustrate the theory with basic applications of engineering interest. Approximately 1/3 of the text is new material--this material maintains the style and spirit of previous editions. In order to bridge the gap between concepts and applications, a number of additional examples have been added for further clarity, as well as several new topics.

Make Your Own Neural Network: An In-depth Visual Introduction For Beginners


Michael Taylor - 2017
    A step-by-step visual journey through the mathematics of neural networks, and making your own using Python and Tensorflow.

Python Data Science Handbook: Tools and Techniques for Developers


Jake Vanderplas - 2016
    Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.With this handbook, you’ll learn how to use: * IPython and Jupyter: provide computational environments for data scientists using Python * NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python * Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python * Matplotlib: includes capabilities for a flexible range of data visualizations in Python * Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms