The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Introduction to Algorithms


Thomas H. Cormen - 1989
    Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.

Statistics: An Introduction Using R


Michael J. Crawley - 2005
    R is one of the most powerful and flexible statistical software packages available, and enables the user to apply a wide variety of statistical methods ranging from simple regression to generalized linear modelling. Statistics: An Introduction using R is a clear and concise introductory textbook to statistical analysis using this powerful and free software, and follows on from the success of the author's previous best-selling title Statistical Computing. * Features step-by-step instructions that assume no mathematics, statistics or programming background, helping the non-statistician to fully understand the methodology. * Uses a series of realistic examples, developing step-wise from the simplest cases, with the emphasis on checking the assumptions (e.g. constancy of variance and normality of errors) and the adequacy of the model chosen to fit the data. * The emphasis throughout is on estimation of effect sizes and confidence intervals, rather than on hypothesis testing. * Covers the full range of statistical techniques likely to be need to analyse the data from research projects, including elementary material like t-tests and chi-squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling. * Includes numerous worked examples and exercises within each chapter. * Accompanied by a website featuring worked examples, data sets, exercises and solutions: http: //www.imperial.ac.uk/bio/research/crawl... Statistics: An Introduction using R is the first text to offer such a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a broad range of disciplines. It is primarily aimed at undergraduate students in medicine, engineering, economics and biology - but will also appeal to postgraduates who have not previously covered this area, or wish to switch to using R.

Structure and Interpretation of Computer Programs


Harold Abelson - 1984
    This long-awaited revision contains changes throughout the text. There are new implementations of most of the major programming systems in the book, including the interpreters and compilers, and the authors have incorporated many small changes that reflect their experience teaching the course at MIT since the first edition was published. A new theme has been introduced that emphasizes the central role played by different approaches to dealing with time in computational models: objects with state, concurrent programming, functional programming and lazy evaluation, and nondeterministic programming. There are new example sections on higher-order procedures in graphics and on applications of stream processing in numerical programming, and many new exercises. In addition, all the programs have been reworked to run in any Scheme implementation that adheres to the IEEE standard.

Uncharted: Big Data and an Emerging Science of Human History


Erez Aiden - 2013
    Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who started to ask those questions. They teamed up with Google to create the Ngram Viewer, a Web-based tool that can chart words throughout the massive Google Books archive, sifting through billions of words to find fascinating cultural trends. On the day that the Ngram Viewer debuted in 2010, more than one million queries were run through it.   On the front lines of Big Data, Aiden and Michel realized that this big dataset—the Google Books archive that contains remarkable information on the human experience—had huge implications for looking at our shared human history. The tool they developed to delve into the data has enabled researchers to track how our language has evolved over time, how art has been censored, how fame can grow and fade, how nations trend toward war. How we remember and how we forget. And ultimately, how Big Data is changing the game for the sciences, humanities, politics, business, and our culture.

A Human's Guide to Machine Intelligence: How Algorithms Are Shaping Our Lives and How We Can Stay in Control


Kartik Hosanagar - 2019
    We've even delegated life-and-death decisions to algorithms--decisions once made by doctors, pilots, and judges. In his new book, Kartik Hosanagar surveys the brave new world of algorithmic decision-making and reveals the potentially dangerous biases they can give rise to as they increasingly run our lives. He makes the compelling case that we need to arm ourselves with a better, deeper, more nuanced understanding of the phenomenon of algorithmic thinking. And he gives us a route in, pointing out that algorithms often think a lot like their creators--that is, like you and me.Hosanagar draws on his experiences designing algorithms professionally--as well as on history, computer science, and psychology--to explore how algorithms work and why they occasionally go rogue, what drives our trust in them, and the many ramifications of algorithmic decision-making. He examines episodes like Microsoft's chatbot Tay, which was designed to converse on social media like a teenage girl, but instead turned sexist and racist; the fatal accidents of self-driving cars; and even our own common, and often frustrating, experiences on services like Netflix and Amazon. A Human's Guide to Machine Intelligence is an entertaining and provocative look at one of the most important developments of our time and a practical user's guide to this first wave of practical artificial intelligence.

Hands-On Programming with R: Write Your Own Functions and Simulations


Garrett Grolemund - 2014
    With this book, you'll learn how to load data, assemble and disassemble data objects, navigate R's environment system, write your own functions, and use all of R's programming tools.RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modeling data. You'll gain valuable programming skills and support your work as a data scientist at the same time.Work hands-on with three practical data analysis projects based on casino gamesStore, retrieve, and change data values in your computer's memoryWrite programs and simulations that outperform those written by typical R usersUse R programming tools such as if else statements, for loops, and S3 classesLearn how to write lightning-fast vectorized R codeTake advantage of R's package system and debugging toolsPractice and apply R programming concepts as you learn them

Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World


Cade Metz - 2021
    Through the lives of Geoff Hinton and other major players, Metz explains this transformative technology and makes the quest thrilling.--Walter Isaacson, author of The Code Breaker Recipient of starred reviews in both Kirkus and Library JournalTHE UNTOLD TECH STORY OF OUR TIMEWhat does it mean to be smart? To be human? What do we really want from life and the intelligence we have, or might create?With deep and exclusive reporting, across hundreds of interviews, New York Times Silicon Valley journalist Cade Metz brings you into the rooms where these questions are being answered. Where an extraordinarily powerful new artificial intelligence has been built into our biggest companies, our social discourse, and our daily lives, with few of us even noticing.Long dismissed as a technology of the distant future, artificial intelligence was a project consigned to the fringes of the scientific community. Then two researchers changed everything. One was a sixty-four-year-old computer science professor who didn't drive and didn't fly because he could no longer sit down--but still made his way across North America for the moment that would define a new age of technology. The other was a thirty-six-year-old neuroscientist and chess prodigy who laid claim to being the greatest game player of all time before vowing to build a machine that could do anything the human brain could do.They took two very different paths to that lofty goal, and they disagreed on how quickly it would arrive. But both were soon drawn into the heart of the tech industry. Their ideas drove a new kind of arms race, spanning Google, Microsoft, Facebook, and OpenAI, a new lab founded by Silicon Valley kingpin Elon Musk. But some believed that China would beat them all to the finish line.Genius Makers dramatically presents the fierce conflict between national interests, shareholder value, the pursuit of scientific knowledge, and the very human concerns about privacy, security, bias, and prejudice. Like a great Victorian novel, this world of eccentric, brilliant, often unimaginably yet suddenly wealthy characters draws you into the most profound moral questions we can ask. And like a great mystery, it presents the story and facts that lead to a core, vital question:How far will we let it go?

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics


Paul Teetor - 2011
    The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author

Designing Data-Intensive Applications


Martin Kleppmann - 2015
    Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Programming Entity Framework: Code First


Julia Lerman - 2011
    With this concise book, you’ll work hands-on with examples to learn how Code First can create an in-memory model and database by default, and how you can exert more control over the model through further configuration.Code First provides an alternative to the database first and model first approaches to the Entity Data Model. Learn the benefits of defining your model with code, whether you’re working with an existing database or building one from scratch. If you work with Visual Studio and understand database management basics, this book is for you.Learn exactly what Code First does—and does not—enable you to doUnderstand how property attributes, relationships, and database mappings are inferred from your classes by Code FirstUse Data Annotations and the Fluent API to configure the Code First data modelPerform advanced techniques, such as controlling the database schema and overriding the default model cachingThis book is a continuation of author Julia Lerman’s Programming Entity Framework, widely recognized as the leading book on the topic.

R Packages


Hadley Wickham - 2015
    This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham’s package development philosophy. In the process, you’ll work with devtools, roxygen, and testthat, a set of R packages that automate common development tasks. Devtools encapsulates best practices that Hadley has learned from years of working with this programming language. Ideal for developers, data scientists, and programmers with various backgrounds, this book starts you with the basics and shows you how to improve your package writing over time. You’ll learn to focus on what you want your package to do, rather than think about package structure. Learn about the most useful components of an R package, including vignettes and unit tests Automate anything you can, taking advantage of the years of development experience embodied in devtools Get tips on good style, such as organizing functions into files Streamline your development process with devtools Learn the best way to submit your package to the Comprehensive R Archive Network (CRAN) Learn from a well-respected member of the R community who created 30 R packages, including ggplot2, dplyr, and tidyr

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Practical SQL: A Beginner's Guide to Storytelling with Data


Anthony DeBarros - 2022
    An approachable guide to programming in SQL (Structured Query Language) that will teach even beginning programmers how to build powerful databases and analyze data to find meaningful information.Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language) written by longtime professional journalist Anthony DeBarros. SQL is the primary tool that programmers, web developers, researchers, journalists, and others use to explore data in a database. DeBarros focuses on using SQL to find the story in data, with the aid of the popular open-source database PostgreSQL and the pgAdmin interface.This thoroughly revised second edition includes a new chapter describing how to set up PostgreSQL and more extensive discussion of pgAdmin's best features. The author has also added a chapter on the JSON data format that shows readers how to store and query JSON data. DeBarros has also updated the data in the book throughout, added coverage of additional topics, and perfected the book's examples.Readers love DeBarros's use of exercises and real-world examples that demonstrate how to:- Create databases and related tables using your own data - Correctly define data typesAggregate, sort, and filter data to find patterns - Clean their data and transfer data as text files - Create advanced queries and automate tasksThis book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL.

Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI


Matthew Sadler - 2019
    The artificial intelligence system, created by DeepMind, had been fed nothing but the rules of the Royal Game when it beat the world’s strongest chess engine in a prolonged match. The selection of ten games published in December 2017 created a worldwide sensation: how was it possible to play in such a brilliant and risky style and not lose a single game against an opponent of superhuman strength?For Game Changer, Matthew Sadler and Natasha Regan investigated more than two thousand previously unpublished games by AlphaZero. They also had unparalleled access to its team of developers and were offered a unique look ‘under the bonnet’ to grasp the depth and breadth of AlphaZero’s search. Sadler and Regan reveal its thinking process and tell the story of the human motivation and the techniques that created AlphaZero.Game Changer also presents a collection of lucidly explained chess games of astonishing quality. Both professionals and club players will improve their game by studying AlphaZero’s stunning discoveries in every field that matters: opening preparation, piece mobility, initiative, attacking techniques, long-term sacrifices and much more.The story of AlphaZero has a wider impact. Game Changer offers intriguing insights into the opportunities and horizons of Artificial Intelligence. Not just in solving games, but in providing solutions for a wide variety of challenges in society.With a foreword by former World Chess Champion Garry Kasparov and an introduction by DeepMind CEO Demis Hassabis.Matthew Sadler (1974) is a Grandmaster who twice won the British Championship and was awarded an individual Gold Medal at the 1996 Olympiad. He has authored several highly acclaimed books on chess and has been writing the famous ‘Sadler on Books’ column for New In Chess magazine for many years. Natasha Regan is a Women’s International Master from England who achieved a degree in mathematics from Cambridge University. Matthew Sadler and Natasha Regan won the English Chess Federation 2016 Book of the Award for their book Chess for Life.