Think Stats


Allen B. Downey - 2011
    This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.Develop your understanding of probability and statistics by writing and testing codeRun experiments to test statistical behavior, such as generating samples from several distributionsUse simulations to understand concepts that are hard to grasp mathematicallyLearn topics not usually covered in an introductory course, such as Bayesian estimationImport data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics toolsUse statistical inference to answer questions about real-world data

Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, Lego, and Rubber Ducks


Will Kurt - 2019
    But many people use data in ways they don't even understand, meaning they aren't getting the most from it. Bayesian Statistics the Fun Way will change that.This book will give you a complete understanding of Bayesian statistics through simple explanations and un-boring examples. Find out the probability of UFOs landing in your garden, how likely Han Solo is to survive a flight through an asteroid shower, how to win an argument about conspiracy theories, and whether a burglary really was a burglary, to name a few examples.By using these off-the-beaten-track examples, the author actually makes learning statistics fun. And you'll learn real skills, like how to:- How to measure your own level of uncertainty in a conclusion or belief- Calculate Bayes theorem and understand what it's useful for- Find the posterior, likelihood, and prior to check the accuracy of your conclusions- Calculate distributions to see the range of your data- Compare hypotheses and draw reliable conclusions from themNext time you find yourself with a sheaf of survey results and no idea what to do with them, turn to Bayesian Statistics the Fun Way to get the most value from your data.

Moneyball: The Art of Winning an Unfair Game


Michael Lewis - 2003
    Conventional wisdom long held that big name, highly athletic hitters and young pitchers with rocket arms were the ticket to success. But Beane and his staff, buoyed by massive amounts of carefully interpreted statistical data, believed that wins could be had by more affordable methods such as hitters with high on-base percentage and pitchers who get lots of ground outs. Given this information and a tight budget, Beane defied tradition and his own scouting department to build winning teams of young affordable players and inexpensive castoff veterans. Lewis was in the room with the A's top management as they spent the summer of 2002 adding and subtracting players and he provides outstanding play-by-play. In the June player draft, Beane acquired nearly every prospect he coveted (few of whom were coveted by other teams) and at the July trading deadline he engaged in a tense battle of nerves to acquire a lefty reliever. Besides being one of the most insider accounts ever written about baseball, Moneyball is populated with fascinating characters. We meet Jeremy Brown, an overweight college catcher who most teams project to be a 15th round draft pick (Beane takes him in the first). Sidearm pitcher Chad Bradford is plucked from the White Sox triple-A club to be a key set-up man and catcher Scott Hatteberg is rebuilt as a first baseman. But the most interesting character is Beane himself. A speedy athletic can't-miss prospect who somehow missed, Beane reinvents himself as a front-office guru, relying on players completely unlike, say, Billy Beane. Lewis, one of the top nonfiction writers of his era (Liar's Poker, The New New Thing), offers highly accessible explanations of baseball stats and his roadmap of Beane's economic approach makes Moneyball an appealing reading experience for business people and sports fans alike. --John Moe

Big Data Now: Current Perspectives from O'Reilly Radar


O'Reilly Radar Team - 2011
    Mike Loukides kicked things off in June 2010 with “What is data science?” and from there we’ve pursued the various threads and themes that naturally emerged. Now, roughly a year later, we can look back over all we’ve covered and identify a number of core data areas: Data issues -- The opportunities and ambiguities of the data space are evident in discussions around privacy, the implications of data-centric industries, and the debate about the phrase “data science” itself. The application of data: products and processes – A “data product” can emerge from virtually any domain, including everything from data startups to established enterprises to media/journalism to education and research. Data science and data tools -- The tools and technologies that drive data science are of course essential to this space, but the varied techniques being applied are also key to understanding the big data arena.The business of data – Take a closer look at the actions connected to data -- the finding, organizing, and analyzing that provide organizations of all sizes with the information they need to compete.

Programming Collective Intelligence: Building Smart Web 2.0 Applications


Toby Segaran - 2002
    With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in a dataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."-- Tim Wolters, CTO, Collective Intellect

DAX Formulas for PowerPivot: The Excel Pro's Guide to Mastering DAX


Rob Collie - 2012
    Written by the world’s foremost PowerPivot blogger and practitioner, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce and include lessons on the difference between calculated columns and measures, how formulas can be reused across reports of completely different shapes, how to merge disjointed sets of data into unified reports, how to make certain columns in a pivot behave as if the pivot were filtered while other columns do not, and how to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar. The “pattern-like” techniques and best practices contained in this book have been developed and refined over two years of onsite training with Excel users around the world, and the key lessons from those seminars costing thousands of dollars per day are now available to within the pages of this easy-to-follow guide.

The Signal and the Noise: Why So Many Predictions Fail—But Some Don't


Nate Silver - 2012
    He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com. Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the "prediction paradox": The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good-or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary-and dangerous-science.Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

The Cartoon Guide to Statistics


Larry Gonick - 1993
    Never again will you order the Poisson Distribution in a French restaurant!This updated version features all new material.

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics


Paul Teetor - 2011
    The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author

Not Everyone Gets A Trophy: How to Manage the Millennials


Bruce Tulgan - 2015
    

R in Action


Robert Kabacoff - 2011
    The book begins by introducing the R language, including the development environment. Focusing on practical solutions, the book also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.About the TechnologyR is a powerful language for statistical computing and graphics that can handle virtually any data-crunching task. It runs on all important platforms and provides thousands of useful specialized modules and utilities. This makes R a great way to get meaningful information from mountains of raw data.About the BookR in Action is a language tutorial focused on practical problems. It presents useful statistics examples and includes elegant methods for handling messy, incomplete, and non-normal data that are difficult to analyze using traditional methods. And statistical analysis is only part of the story. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book. What's InsidePractical data analysis, step by stepInterfacing R with other softwareUsing R to visualize dataOver 130 graphsEight reference appendixes================================Table of ContentsPart I Getting startedIntroduction to RCreating a datasetGetting started with graphsBasic data managementAdvanced data managementPart II Basic methodsBasic graphsBasic statisticsPart III Intermediate methodsRegressionAnalysis of variancePower analysisIntermediate graphsRe-sampling statistics and bootstrappingPart IV Advanced methodsGeneralized linear modelsPrincipal components and factor analysisAdvanced methods for missing dataAdvanced graphics

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

HYPERGROWTH: How the Customer-Driven Model Is Revolutionizing the Way Businesses Build Products, Teams, & Brands


David Cancel - 2017
    The key to achieving HYPERGROWTH is being customer-driven. So if you’re ready to start putting your customers first, keep reading... What You’ll Learn: A New Approach to Product Management and Developing SaaS Products People Love Today, there’s no excuse for not communicating with customers on a daily basis. Messaging has exploded, new generations are focused on 1:1 communication by default, and artificial intelligence is finally coming so we can deliver 1:1 at scale. So why would you build a product, or a company, without leaning into the advantages of that ecosystem? In his new book, HYPERGROWTH, serial entrepreneur and Drift co-founder/CEO David Cancel shares a modern approach for building products and structuring teams that makes customer communication a central priority. The book tells the story of how Cancel’s customer-driven approach started out as a test with a product team (Performable), transformed an entire organization (HubSpot), and sparked a new movement (Drift). What’s Inside: Practical Advice and Frameworks for Becoming Customer-Driven and Growing Your Business Responsive Development (RD): a new approach to building products that adds the customer back into the equation The Burndown Framework: a framework for implementing Responsive Development that’s faster and more flexible than Agile. The Three-Person Team: the customer-driven way to structure engineering teams. Each team consists of a tech lead who manages two other engineers. Getting Rid of Roadmaps: through building a culture of transparency and accountability and working closely with internal customers, you can release product updates more rapidly and iteratively. The Spotlight Framework: a framework for helping you focus on the right parts of customer feedback so you can take the appropriate next steps. The framework breaks feedback down into three main categories: user experience issues, product marketing issues, and positioning issues. Who This Book Is For: Entrepreneurs, Startup Founders, Product Managers, Product Teams, Marketing Teams … Entire Companies! Every part of your business can benefit from being customer-driven. With the rise of SaaS and the on-demand economy, customer expectations have changed. Customers expect their voices to be heard. They find value in being part of a community, and being part of that journey of creating the product. So stop running your business like we’re still living in the 2000s. It’s time to take a customer-driven approach. Here’s what people are saying about the book: “David Cancel is one of the best when it comes to building products that customers love. And now he’s sharing his wisdom and writing the book explaining how he does it. This is a must read for any entrepreneur or business owner.” -MARK ROBERGE Senior Lecturer, Harvard Business School, Former SVP of Sale and Services at HubSpot ”When it comes to building business software, there’s no one better than David Cancel, and I saw fi

How Not to Be Wrong: The Power of Mathematical Thinking


Jordan Ellenberg - 2014
    In How Not to Be Wrong, Jordan Ellenberg shows us how terribly limiting this view is: Math isn’t confined to abstract incidents that never occur in real life, but rather touches everything we do—the whole world is shot through with it.Math allows us to see the hidden structures underneath the messy and chaotic surface of our world. It’s a science of not being wrong, hammered out by centuries of hard work and argument. Armed with the tools of mathematics, we can see through to the true meaning of information we take for granted: How early should you get to the airport? What does “public opinion” really represent? Why do tall parents have shorter children? Who really won Florida in 2000? And how likely are you, really, to develop cancer?How Not to Be Wrong presents the surprising revelations behind all of these questions and many more, using the mathematician’s method of analyzing life and exposing the hard-won insights of the academic community to the layman—minus the jargon. Ellenberg chases mathematical threads through a vast range of time and space, from the everyday to the cosmic, encountering, among other things, baseball, Reaganomics, daring lottery schemes, Voltaire, the replicability crisis in psychology, Italian Renaissance painting, artificial languages, the development of non-Euclidean geometry, the coming obesity apocalypse, Antonin Scalia’s views on crime and punishment, the psychology of slime molds, what Facebook can and can’t figure out about you, and the existence of God.Ellenberg pulls from history as well as from the latest theoretical developments to provide those not trained in math with the knowledge they need. Math, as Ellenberg says, is “an atomic-powered prosthesis that you attach to your common sense, vastly multiplying its reach and strength.” With the tools of mathematics in hand, you can understand the world in a deeper, more meaningful way. How Not to Be Wrong will show you how.