Book picks similar to
On Being a Data Skeptic by Cathy O'Neil


nonfiction
non-fiction
data-science
science

Data Science from Scratch: First Principles with Python


Joel Grus - 2015
    In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Social Physics: How Good Ideas Spread— The Lessons from a New Science


Alex Pentland - 2014
    Over years of groundbreaking experiments, he has distilled remarkable discoveries significant enough to become the bedrock of a whole new scientific field: social physics. Humans have more in common with bees than we like to admit: We’re social creatures first and foremost. Our most important habits of action—and most basic notions of common sense—are wired into us through our coordination in social groups. Social physics is about idea flow, the way human social networks spread ideas and transform those ideas into behaviors. Thanks to the millions of digital bread crumbs people leave behind via smartphones, GPS devices, and the Internet, the amount of new information we have about human activity is truly profound. Until now, sociologists have depended on limited data sets and surveys that tell us how people say they think and behave, rather than what they actually do. As a result, we’ve been stuck with the same stale social structures—classes, markets—and a focus on individual actors, data snapshots, and steady states. Pentland shows that, in fact, humans respond much more powerfully to social incentives that involve rewarding others and strengthening the ties that bind than incentives that involve only their own economic self-interest. Pentland and his teams have found that they can study patterns of information exchange in a social network without any knowledge of the actual content of the information and predict with stunning accuracy how productive and effective that network is, whether it’s a business or an entire city. We can maximize a group’s collective intelligence to improve performance and use social incentives to create new organizations and guide them through disruptive change in a way that maximizes the good. At every level of interaction, from small groups to large cities, social networks can be tuned to increase exploration and engagement, thus vastly improving idea flow.  Social Physics will change the way we think about how we learn and how our social groups work—and can be made to work better, at every level of society. Pentland leads readers to the edge of the most important revolution in the study of social behavior in a generation, an entirely new way to look at life itself.

Data Science for Business: What you need to know about data mining and data-analytic thinking


Foster Provost - 2013
    This guide also helps you understand the many data-mining techniques in use today.Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World


Bruce Schneier - 2015
    Your online and in-store purchasing patterns are recorded, and reveal if you're unemployed, sick, or pregnant. Your e-mails and texts expose your intimate and casual friends. Google knows what you’re thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.The powers that surveil us do more than simply store this information. Corporations use surveillance to manipulate not only the news articles and advertisements we each see, but also the prices we’re offered. Governments use surveillance to discriminate, censor, chill free speech, and put people in danger worldwide. And both sides share this information with each other or, even worse, lose it to cybercriminals in huge data breaches.Much of this is voluntary: we cooperate with corporate surveillance because it promises us convenience, and we submit to government surveillance because it promises us protection. The result is a mass surveillance society of our own making. But have we given up more than we’ve gained? In Data and Goliath, security expert Bruce Schneier offers another path, one that values both security and privacy. He brings his bestseller up-to-date with a new preface covering the latest developments, and then shows us exactly what we can do to reform government surveillance programs, shake up surveillance-based business models, and protect our individual privacy. You'll never look at your phone, your computer, your credit cards, or even your car in the same way again.

New Dark Age: Technology and the End of the Future


James Bridle - 2018
    Underlying this trend is a single idea: the belief that our existence is understandable through computation, and more data is enough to help us build a better world.   In actual fact, we are lost in a sea of information, increasingly divided by fundamentalism, simplistic narratives, conspiracy theories, and post-factual politics. Meanwhile, those in power use our lack of understanding to further their own interests. Despite the accessibility of information, we’re living in a new Dark Age.   From rogue financial systems to shopping algorithms, from artificial intelligence to state secrecy, we no longer understand how our world is governed or presented to us. The media is filled with unverifiable speculation, much of it generated by anonymous software, while companies dominate their employees through surveillance and the threat of automation.   In his brilliant new work, leading artist and writer James Bridle excavates the limits of technology and how it aids our understanding of the world. Surveying the history of art, technology, and information systems, he explores the dark clouds that gather over our dreams of the digital sublime.

Automate This: How Algorithms Came to Rule Our World


Christopher Steiner - 2012
    It used to be that to diagnose an illness, interpret legal documents, analyze foreign policy, or write a newspaper article you needed a human being with specific skills—and maybe an advanced degree or two. These days, high-level tasks are increasingly being handled by algorithms that can do precise work not only with speed but also with nuance. These “bots” started with human programming and logic, but now their reach extends beyond what their creators ever expected. In this fascinating, frightening book, Christopher Steiner tells the story of how algorithms took over—and shows why the “bot revolution” is about to spill into every aspect of our lives, often silently, without our knowledge. The May 2010 “Flash Crash” exposed Wall Street’s reliance on trading bots to the tune of a 998-point market drop and $1 trillion in vanished market value. But that was just the beginning. In Automate This, we meet bots that are driving cars, penning haiku, and writing music mistaken for Bach’s. They listen in on our customer service calls and figure out what Iran would do in the event of a nuclear standoff. There are algorithms that can pick out the most cohesive crew of astronauts for a space mission or identify the next Jeremy Lin. Some can even ingest statistics from baseball games and spit out pitch-perfect sports journalism indistinguishable from that produced by humans. The interaction of man and machine can make our lives easier. But what will the world look like when algorithms control our hospitals, our roads, our culture, and our national security? What hap­pens to businesses when we automate judgment and eliminate human instinct? And what role will be left for doctors, lawyers, writers, truck drivers, and many others?  Who knows—maybe there’s a bot learning to do your job this minute.

How Not to Be Wrong: The Power of Mathematical Thinking


Jordan Ellenberg - 2014
    In How Not to Be Wrong, Jordan Ellenberg shows us how terribly limiting this view is: Math isn’t confined to abstract incidents that never occur in real life, but rather touches everything we do—the whole world is shot through with it.Math allows us to see the hidden structures underneath the messy and chaotic surface of our world. It’s a science of not being wrong, hammered out by centuries of hard work and argument. Armed with the tools of mathematics, we can see through to the true meaning of information we take for granted: How early should you get to the airport? What does “public opinion” really represent? Why do tall parents have shorter children? Who really won Florida in 2000? And how likely are you, really, to develop cancer?How Not to Be Wrong presents the surprising revelations behind all of these questions and many more, using the mathematician’s method of analyzing life and exposing the hard-won insights of the academic community to the layman—minus the jargon. Ellenberg chases mathematical threads through a vast range of time and space, from the everyday to the cosmic, encountering, among other things, baseball, Reaganomics, daring lottery schemes, Voltaire, the replicability crisis in psychology, Italian Renaissance painting, artificial languages, the development of non-Euclidean geometry, the coming obesity apocalypse, Antonin Scalia’s views on crime and punishment, the psychology of slime molds, what Facebook can and can’t figure out about you, and the existence of God.Ellenberg pulls from history as well as from the latest theoretical developments to provide those not trained in math with the knowledge they need. Math, as Ellenberg says, is “an atomic-powered prosthesis that you attach to your common sense, vastly multiplying its reach and strength.” With the tools of mathematics in hand, you can understand the world in a deeper, more meaningful way. How Not to Be Wrong will show you how.

How to Lie with Statistics


Darrell Huff - 1954
    Darrell Huff runs the gamut of every popularly used type of statistic, probes such things as the sample study, the tabulation method, the interview technique, or the way the results are derived from the figures, and points up the countless number of dodges which are used to fool rather than to inform.

Calling Bullshit: The Art of Skepticism in a Data-Driven World


Carl T. Bergstrom - 2020
    Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.It's increasingly difficult to know what's true. Misinformation, disinformation, and fake news abound. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don't feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.You don't need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.

The Signal and the Noise: Why So Many Predictions Fail—But Some Don't


Nate Silver - 2012
    He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com. Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the "prediction paradox": The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good-or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary-and dangerous-science.Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data


Hadley Wickham - 2016
    This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way. You’ll learn how to: Wrangle—transform your datasets into a form convenient for analysis Program—learn powerful R tools for solving data problems with greater clarity and ease Explore—examine your data, generate hypotheses, and quickly test them Model—provide a low-dimensional summary that captures true "signals" in your dataset Communicate—learn R Markdown for integrating prose, code, and results

Real-Time Big Data Analytics: Emerging Architecture


Mike Barlow - 2013
    The data world was revolutionized a few years ago when Hadoop and other tools made it possible to getthe results from queries in minutes. But the revolution continues. Analysts now demand sub-second, near real-time query results. Fortunately, we have the tools to deliver them. This report examines tools and technologies that are driving real-time big data analytics.

Data Feminism


Catherine D’Ignazio - 2020
    It has been used to expose injustice, improve health outcomes, and topple governments. But it has also been used to discriminate, police, and surveil. This potential for good, on the one hand, and harm, on the other, makes it essential to ask: Data science by whom? Data science for whom? Data science with whose interests in mind? The narratives around big data and data science are overwhelmingly white, male, and techno-heroic. In Data Feminism, Catherine D'Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought.Illustrating data feminism in action, D'Ignazio and Klein show how challenges to the male/female binary can help challenge other hierarchical (and empirically wrong) classification systems. They explain how, for example, an understanding of emotion can expand our ideas about effective data visualization, and how the concept of invisible labor can expose the significant human efforts required by our automated systems. And they show why the data never, ever “speak for themselves.”Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn't, and about how those differentials of power can be challenged and changed.

Who Owns the Future?


Jaron Lanier - 2013
    Who Owns the Future? is his visionary reckoning with the most urgent economic and social trend of our age: the poisonous concentration of money and power in our digital networks.Lanier has predicted how technology will transform our humanity for decades, and his insight has never been more urgently needed. He shows how Siren Servers, which exploit big data and the free sharing of information, led our economy into recession, imperiled personal privacy, and hollowed out the middle class. The networks that define our world—including social media, financial institutions, and intelligence agencies—now threaten to destroy it.But there is an alternative. In this provocative, poetic, and deeply humane book, Lanier charts a path toward a brighter future: an information economy that rewards ordinary people for what they do and share on the web.

In the Beginning...Was the Command Line


Neal Stephenson - 1999
    And considering that the "one man" is Neal Stephenson, "the hacker Hemingway" (Newsweek) -- acclaimed novelist, pragmatist, seer, nerd-friendly philosopher, and nationally bestselling author of groundbreaking literary works (Snow Crash, Cryptonomicon, etc., etc.) -- the word is well worth hearing. Mostly well-reasoned examination and partial rant, Stephenson's In the Beginning... was the Command Line is a thoughtful, irreverent, hilarious treatise on the cyber-culture past and present; on operating system tyrannies and downloaded popular revolutions; on the Internet, Disney World, Big Bangs, not to mention the meaning of life itself.