The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day


David J. Hand - 2014
    Hand argues that extraordinarily rare events are anything but. In fact, they’re commonplace. Not only that, we should all expect to experience a miracle roughly once every month.     But Hand is no believer in superstitions, prophecies, or the paranormal. His definition of “miracle” is thoroughly rational. No mystical or supernatural explanation is necessary to understand why someone is lucky enough to win the lottery twice, or is destined to be hit by lightning three times and still survive. All we need, Hand argues, is a firm grounding in a powerful set of laws: the laws of inevitability, of truly large numbers, of selection, of the probability lever, and of near enough.     Together, these constitute Hand’s groundbreaking Improbability Principle. And together, they explain why we should not be so surprised to bump into a friend in a foreign country, or to come across the same unfamiliar word four times in one day. Hand wrestles with seemingly less explicable questions as well: what the Bible and Shakespeare have in common, why financial crashes are par for the course, and why lightning does strike the same place (and the same person) twice. Along the way, he teaches us how to use the Improbability Principle in our own lives—including how to cash in at a casino and how to recognize when a medicine is truly effective.     An irresistible adventure into the laws behind “chance” moments and a trusty guide for understanding the world and universe we live in, The Improbability Principle will transform how you think about serendipity and luck, whether it’s in the world of business and finance or you’re merely sitting in your backyard, tossing a ball into the air and wondering where it will land.

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark


Holden Karau - 2017
    But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.With this book, you'll explore:How Spark SQL's new interfaces improve performance over SQL's RDD data structureThe choice between data joins in Core Spark and Spark SQLTechniques for getting the most out of standard RDD transformationsHow to work around performance issues in Spark's key/value pair paradigmWriting high-performance Spark code without Scala or the JVMHow to test for functionality and performance when applying suggested improvementsUsing Spark MLlib and Spark ML machine learning librariesSpark's Streaming components and external community packages

Competing on Analytics: The New Science of Winning


Thomas H. Davenport - 2007
    But are you using it to “out-think” your rivals? If not, you may be missing out on a potent competitive tool.In Competing on Analytics: The New Science of Winning, Thomas H. Davenport and Jeanne G. Harris argue that the frontier for using data to make decisions has shifted dramatically. Certain high-performing enterprises are now building their competitive strategies around data-driven insights that in turn generate impressive business results. Their secret weapon? Analytics: sophisticated quantitative and statistical analysis and predictive modeling.Exemplars of analytics are using new tools to identify their most profitable customers and offer them the right price, to accelerate product innovation, to optimize supply chains, and to identify the true drivers of financial performance. A wealth of examples—from organizations as diverse as Amazon, Barclay’s, Capital One, Harrah’s, Procter & Gamble, Wachovia, and the Boston Red Sox—illuminate how to leverage the power of analytics.

Dark Pools: The Rise of Artificially Intelligent Trading Machines and the Looming Threat to Wall Street


Scott Patterson - 2012
    In the beginning was Josh Levine, an idealistic programming genius who dreamed of wresting control of the market from the big exchanges that, again and again, gave the giant institutions an advantage over the little guy. Levine created a computerized trading hub named Island where small traders swapped stocks, and over time his invention morphed into a global electronic stock market that sent trillions in capital through a vast jungle of fiber-optic cables. By then, the market that Levine had sought to fix had turned upside down, birthing secretive exchanges called dark pools and a new species of trading machines that could think, and that seemed, ominously, to be slipping the control of their human masters. Dark Pools is the fascinating story of how global markets have been hijacked by trading robots--many so self-directed that humans can't predict what they'll do next.

Creating a Data-Driven Organization: Practical Advice from the Trenches


Carl Anderson - 2015
    This practical book shows you how true data-drivenness involves processes that require genuine buy-in across your company, from analysts and management to the C-Suite and the board.Through interviews and examples from data scientists and analytics leaders in a variety of industries, author Carl Anderson explains the analytics value chain you need to adopt when building predictive business models—from data collection and analysis to the insights and leadership that drive concrete actions. You’ll learn what works and what doesn’t, and why creating a data-driven culture throughout your organization is essential. Start from the bottom up: learn how to collect the right data the right way Hire analysts with the right skills, and organize them into teams Examine statistical and visualization tools, and fact-based story-telling methods Collect and analyze data while respecting privacy and ethics Understand how analysts and their managers can help spur a data-driven culture Learn the importance of data leadership and C-level positions such as chief data officer and chief analytics officer

Breakpoint


Jon McGee - 2015
    Fortunately, Jon McGee is an ideal guide through this dynamic marketplace. In Breakpoint, he argues that higher education is in the midst of an extraordinary moment of demographic, economic, and cultural transition that has significant implications for how colleges understand their mission, their market, and their management. Drawing from an extensive assessment of demographic and economic trends, McGee presents a broad and integrative picture of these changes while stressing the importance of decisive campus leadership. He describes the key forces that influence higher education and provides a framework from which trustees, presidents, administrators, faculty, and policy makers can address pressing issues in the aftermath of the Great Recession.Although McGee avoids endorsing one-size-fits-all solutions, he suggests a number of concrete strategies for handling prospective students and developing pedagogical practices, curricular content and delivery, and management structures. Practical and compelling, Breakpoint will help higher education leaders make choices that advance their institutional values and serve their students and the common good for generations to come.

Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python


Scott Hartshorn - 2016
    They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.

Head First Statistics


Dawn Griffiths - 2008
    Whether you're a student, a professional, or just curious about statistical analysis, Head First's brain-friendly formula helps you get a firm grasp of statistics so you can understand key points and actually use them. Learn to present data visually with charts and plots; discover the difference between taking the average with mean, median, and mode, and why it's important; learn how to calculate probability and expectation; and much more.Head First Statistics is ideal for high school and college students taking statistics and satisfies the requirements for passing the College Board's Advanced Placement (AP) Statistics Exam. With this book, you'll:Study the full range of topics covered in first-year statistics Tackle tough statistical concepts using Head First's dynamic, visually rich format proven to stimulate learning and help you retain knowledge Explore real-world scenarios, ranging from casino gambling to prescription drug testing, to bring statistical principles to life Discover how to measure spread, calculate odds through probability, and understand the normal, binomial, geometric, and Poisson distributions Conduct sampling, use correlation and regression, do hypothesis testing, perform chi square analysis, and moreBefore you know it, you'll not only have mastered statistics, you'll also see how they work in the real world. Head First Statistics will help you pass your statistics course, and give you a firm understanding of the subject so you can apply the knowledge throughout your life.

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die


Eric Siegel - 2013
    Rather than a "how to" for hands-on techies, the book entices lay-readers and experts alike by covering new case studies and the latest state-of-the-art techniques.You have been predicted — by companies, governments, law enforcement, hospitals, and universities. Their computers say, "I knew you were going to do that!" These institutions are seizing upon the power to predict whether you're going to click, buy, lie, or die.Why? For good reason: predicting human behavior combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales.How? Prediction is powered by the world's most potent, booming unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future — lifting a bit of the fog off our hazy view of tomorrow — means pay dirt.In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: -What type of mortgage risk Chase Bank predicted before the recession. -Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves. -Why early retirement decreases life expectancy and vegetarians miss fewer flights. -Five reasons why organizations predict death, including one health insurance company. -How U.S. Bank, European wireless carrier Telenor, and Obama's 2012 campaign calculated the way to most strongly influence each individual. -How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! -How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. -How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. -What's predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia. A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.Predictive analytics transcends human perception. This book's final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can't even be sure has happened afterward — but that can be predicted in advance?Whether you are a consumer of it — or consumed by it — get a handle on the power of Predictive Analytics.

Cassandra: The Definitive Guide


Eben Hewitt - 2010
    Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.Author Eben Hewitt demonstrates the advantages of Cassandra's nonrelational design, and pays special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.Understand the tenets of Cassandra's column-oriented structureLearn how to write, update, and read Cassandra dataDiscover how to add or remove nodes from the cluster as your application requiresExamine a working application that translates from a relational model to Cassandra's data modelUse examples for writing clients in Java, Python, and C#Use the JMX interface to monitor a cluster's usage, memory patterns, and moreTune memory settings, data storage, and caching for better performance

Prediction Machines: The Simple Economics of Artificial Intelligence


Ajay Agrawal - 2018
    But facing the sea change that AI will bring can be paralyzing. How should companies set strategies, governments design policies, and people plan their lives for a world so different from what we know? In the face of such uncertainty, many analysts either cower in fear or predict an impossibly sunny future.But in Prediction Machines, three eminent economists recast the rise of AI as a drop in the cost of prediction. With this single, masterful stroke, they lift the curtain on the AI-is-magic hype and show how basic tools from economics provide clarity about the AI revolution and a basis for action by CEOs, managers, policy makers, investors, and entrepreneurs.When AI is framed as cheap prediction, its extraordinary potential becomes clear: Prediction is at the heart of making decisions under uncertainty. Our businesses and personal lives are riddled with such decisions. Prediction tools increase productivity--operating machines, handling documents, communicating with customers. Uncertainty constrains strategy. Better prediction creates opportunities for new business structures and strategies to compete. Penetrating, fun, and always insightful and practical, Prediction Machines follows its inescapable logic to explain how to navigate the changes on the horizon. The impact of AI will be profound, but the economic framework for understanding it is surprisingly simple.

Metadata for Digital Collections: A How-To-Do-It Manual


Steven J. Miller - 2011
    

The Seven Pillars of Statistical Wisdom


Stephen M. Stigler - 2016
    It allows one to gain information by discarding information, namely, the individuality of the observations. Stigler s second pillar, information measurement, challenges the importance of big data by noting that observations are not all equally important: the amount of information in a data set is often proportional to only the square root of the number of observations, not the absolute number. The third idea is likelihood, the calibration of inferences with the use of probability. Intercomparison is the principle that statistical comparisons do not need to be made with respect to an external standard. The fifth pillar is regression, both a paradox (tall parents on average produce shorter children; tall children on average have shorter parents) and the basis of inference, including Bayesian inference and causal reasoning. The sixth concept captures the importance of experimental design for example, by recognizing the gains to be had from a combinatorial approach with rigorous randomization. The seventh idea is the residual the notion that a complicated phenomenon can be simplified by subtracting the effect of known causes, leaving a residual phenomenon that can be explained more easily.The Seven Pillars of Statistical Wisdom presents an original, unified account of statistical science that will fascinate the interested layperson and engage the professional statistician."

Uncharted: Big Data and an Emerging Science of Human History


Erez Aiden - 2013
    Gigabytes, exabytes (that’s one quintillion bytes) of data are sitting on servers across the world. So how can we start to access this explosion of information, this “big data,” and what can it tell us?   Erez Aiden and Jean-Baptiste Michel are two young scientists at Harvard who started to ask those questions. They teamed up with Google to create the Ngram Viewer, a Web-based tool that can chart words throughout the massive Google Books archive, sifting through billions of words to find fascinating cultural trends. On the day that the Ngram Viewer debuted in 2010, more than one million queries were run through it.   On the front lines of Big Data, Aiden and Michel realized that this big dataset—the Google Books archive that contains remarkable information on the human experience—had huge implications for looking at our shared human history. The tool they developed to delve into the data has enabled researchers to track how our language has evolved over time, how art has been censored, how fame can grow and fade, how nations trend toward war. How we remember and how we forget. And ultimately, how Big Data is changing the game for the sciences, humanities, politics, business, and our culture.

HBR Guide to Data Analytics Basics for Managers (HBR Guide Series)


Harvard Business Review - 2018
    Now more than ever, managers must know how to tease insight from data--to understand where the numbers come from, make sense of them, and use them to inform tough decisions. How do you get started?Whether you're working with data experts or running your own tests, you'll find answers in the HBR Guide to Data Analytics Basics for Managers. This book describes three key steps in the data analysis process, so you can get the information you need, study the data, and communicate your findings to others.You'll learn how to: Identify the metrics you need to measure Run experiments and A/B tests Ask the right questions of your data experts Understand statistical terms and concepts Create effective charts and visualizations Avoid common mistakes