Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Python Data Science Handbook: Tools and Techniques for Developers


Jake Vanderplas - 2016
    Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.With this handbook, you’ll learn how to use: * IPython and Jupyter: provide computational environments for data scientists using Python * NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python * Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python * Matplotlib: includes capabilities for a flexible range of data visualizations in Python * Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

Convex Optimization


Stephen Boyd - 2004
    A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)


Jiawei Han - 2000
    Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business dataUpdates that incorporate input from readers, changes in the field, and more material on statistics and machine learningDozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projectsComplete classroom support for instructors at www.mkp.com/datamining2e companion site

The Seven Pillars of Statistical Wisdom


Stephen M. Stigler - 2016
    It allows one to gain information by discarding information, namely, the individuality of the observations. Stigler s second pillar, information measurement, challenges the importance of big data by noting that observations are not all equally important: the amount of information in a data set is often proportional to only the square root of the number of observations, not the absolute number. The third idea is likelihood, the calibration of inferences with the use of probability. Intercomparison is the principle that statistical comparisons do not need to be made with respect to an external standard. The fifth pillar is regression, both a paradox (tall parents on average produce shorter children; tall children on average have shorter parents) and the basis of inference, including Bayesian inference and causal reasoning. The sixth concept captures the importance of experimental design for example, by recognizing the gains to be had from a combinatorial approach with rigorous randomization. The seventh idea is the residual the notion that a complicated phenomenon can be simplified by subtracting the effect of known causes, leaving a residual phenomenon that can be explained more easily.The Seven Pillars of Statistical Wisdom presents an original, unified account of statistical science that will fascinate the interested layperson and engage the professional statistician."

Are You Smart Enough to Work at Google?


William Poundstone - 2012
    The blades start moving in 60 seconds. What do you do? If you want to work at Google, or any of America's best companies, you need to have an answer to this and other puzzling questions. Are You Smart Enough to Work at Google? guides readers through the surprising solutions to dozens of the most challenging interview questions. The book covers the importance of creative thinking, ways to get a leg up on the competition, what your Facebook page says about you, and much more. Are You Smart Enough to Work at Google? is a must-read for anyone who wants to succeed in today's job market.

Survey Methodology


Robert M. Groves - 2004
    Survey Methodology describes the basic principles of survey design discovered in methodological research over recent years and offers guidance for making successful decisions in the design and execution of high quality surveys. Written by six nationally recognized experts in the field, this book covers the major considerations in designing and conducting a sample survey. Topical, accessible, and succinct, this book represents the state of the science in survey methodology. Employing the "total survey error" paradigm as an organizing framework, it merges the science of surveys with state-of-the-art practices. End-of-chapter terms, references, and exercises enhance its value as a reference for practitioners and as a text for advanced students.

The Numbers Game: The Commonsense Guide to Understanding Numbers in the News, in Politics, and in Life


Michael Blastland - 2008
    Drawing on their hugely popular BBC Radio 4 show More or Less,, journalist Michael Blastland and internationally known economist Andrew Dilnot delight, amuse, and convert American mathphobes by showing how our everyday experiences make sense of numbers. The radical premise of The Numbers Game is to show how much we already know, and give practical ways to use our knowledge to become cannier consumers of the media. In each concise chapter, the authors take on a different theme—such as size, chance, averages, targets, risk, measurement, and data—and present it as a memorable and entertaining story. If you’ve ever wondered what “average” really means, whether the scare stories about cancer risk should convince you to change your behavior, or whether a story you read in the paper is biased (and how), you need this book. Blastland and Dilnot show how to survive and thrive on the torrent of numbers that pours through everyday life. It’s the essential guide to every cause you love or hate, and every issue you follow, in the language everyone uses.

The Elements of Data Analytic Style


Jeffrey Leek - 2015
    This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. It is based in part on the authors blog posts, lecture materials, and tutorials. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world that has enrolled more than 1.76 million people. The book is useful as a companion to introductory courses in data science or data analysis. It is also a useful reference tool for people tasked with reading and critiquing data analyses. It is based on the authors popular open-source guides available through his Github account (https://github.com/jtleek). The paper is also available through Leanpub (https://leanpub.com/datastyle), if the book is purchased on that platform you are entitled to lifetime free updates.

Outnumbered: Exploring the Algorithms That Control Our Lives


David Sumpter - 2018
    Using the data they are constantly collecting about where we travel, where we shop, what we buy, and what interests us, they can begin to predict our daily habits, and increasingly we are relinquishing our decision-making to algorithms. Are we giving this up too easily?Without understanding what mathematics can and can't do it is impossible to get a handle on how it is changing our lives. Outnumbered is a journey to the dark side of mathematics, from how it dictates our social media activities to our travel routes. David Sumpter investigates whether mathematics is crossing dangerous lines when it comes to what we can make decisions about.This book will show how math impacts all parts of our lives: from the algorithms that decide whom we interact with to the statistical methods that categorize us as potential criminals. It tests financial algorithms that purport to generate money from nothing, and reveals that we are constantly manipulated by the math used by others, from algorithms choosing the news we hear to automated hospital waiting lists deciding whether we receive treatment.Using interviews with those people working at the cutting edge of mathematical and data research, Outnumbered will explain how math and stats work in the real world, and what we should and shouldn't worry about.

How to Measure Anything: Finding the Value of "Intangibles" in Business


Douglas W. Hubbard - 1985
    Douglas Hubbard helps us create a path to know the answer to almost any question in business, in science, or in life . . . Hubbard helps us by showing us that when we seek metrics to solve problems, we are really trying to know something better than we know it now. How to Measure Anything provides just the tools most of us need to measure anything better, to gain that insight, to make progress, and to succeed." -Peter Tippett, PhD, M.D. Chief Technology Officer at CyberTrust and inventor of the first antivirus software "Doug Hubbard has provided an easy-to-read, demystifying explanation of how managers can inform themselves to make less risky, more profitable business decisions. We encourage our clients to try his powerful, practical techniques." -Peter Schay EVP and COO of The Advisory Council "As a reader you soon realize that actually everything can be measured while learning how to measure only what matters. This book cuts through conventional cliches and business rhetoric and offers practical steps to using measurements as a tool for better decision making. Hubbard bridges the gaps to make college statistics relevant and valuable for business decisions." -Ray Gilbert EVP Lucent "This book is remarkable in its range of measurement applications and its clarity of style. A must-read for every professional who has ever exclaimed, 'Sure, that concept is important, but can we measure it?'" -Dr. Jack Stenner Cofounder and CEO of MetraMetrics, Inc.

Elements of Information Theory


Thomas M. Cover - 1991
    Readers are provided once again with an instructive mix of mathematics, physics, statistics, and information theory.All the essential topics in information theory are covered in detail, including entropy, data compression, channel capacity, rate distortion, network information theory, and hypothesis testing. The authors provide readers with a solid understanding of the underlying theory and applications. Problem sets and a telegraphic summary at the end of each chapter further assist readers. The historical notes that follow each chapter recap the main points.The Second Edition features: * Chapters reorganized to improve teaching * 200 new problems * New material on source coding, portfolio theory, and feedback capacity * Updated referencesNow current and enhanced, the Second Edition of Elements of Information Theory remains the ideal textbook for upper-level undergraduate and graduate courses in electrical engineering, statistics, and telecommunications.

Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists


Joel Best - 1998
    But all too often, these numbers are wrong. This book is a lively guide to spotting bad statistics and learning to think critically about these influential numbers. Damned Lies and Statistics is essential reading for everyone who reads or listens to the news, for students, and for anyone who relies on statistical information to understand social problems.Joel Best bases his discussion on a wide assortment of intriguing contemporary issues that have garnered much recent media attention, including abortion, cyberporn, homelessness, the Million Man March, teen suicide, the U.S. census, and much more. Using examples from the New York Times, the Washington Post, and other major newspapers and television programs, he unravels many fascinating examples of the use, misuse, and abuse of statistical information.In this book Best shows us exactly how and why bad statistics emerge, spread, and come to shape policy debates. He recommends specific ways to detect bad statistics, and shows how to think more critically about "stat wars," or disputes over social statistics among various experts. Understanding this book does not require sophisticated mathematical knowledge; Best discusses the most basic and most easily understood forms of statistics, such as percentages, averages, and rates.This accessible book provides an alternative to either naively accepting the statistics we hear or cynically assuming that all numbers are meaningless. It shows how anyone can become a more intelligent, critical, and empowered consumer of the statistics that inundate both the social sciences and our media-saturated lives.

A Mind for Numbers: How to Excel at Math and Science (Even If You Flunked Algebra)


Barbara Oakley - 2014
    Engineering professor Barbara Oakley knows firsthand how it feels to struggle with math. She flunked her way through high school math and science courses, before enlisting in the army immediately after graduation. When she saw how her lack of mathematical and technical savvy severely limited her options—both to rise in the military and to explore other careers—she returned to school with a newfound determination to re-tool her brain to master the very subjects that had given her so much trouble throughout her entire life. In A Mind for Numbers, Dr. Oakley lets us in on the secrets to effectively learning math and science—secrets that even dedicated and successful students wish they’d known earlier. Contrary to popular belief, math requires creative, as well as analytical, thinking. Most people think that there’s only one way to do a problem, when in actuality, there are often a number of different solutions—you just need the creativity to see them. For example, there are more than three hundred different known proofs of the Pythagorean Theorem. In short, studying a problem in a laser-focused way until you reach a solution is not an effective way to learn math. Rather, it involves taking the time to step away from a problem and allow the more relaxed and creative part of the brain to take over. A Mind for Numbers shows us that we all have what it takes to excel in math, and learning it is not as painful as some might think!

The Efficiency Paradox: What Big Data Can't Do


Edward Tenner - 2018
    One of the great promises of the Internet and big data revolutions is the idea that we can improve the processes and routines of our work and personal lives to get more done in less time than ever before. There is no doubt that we're performing at higher scales and going faster than ever, but what if we're headed in the wrong direction?The Efficiency Paradox questions our ingrained assumptions about efficiency, persuasively showing how relying on the algorithms of platforms can in fact lead to wasted efforts, missed opportunities, and above all an inability to break out of established patterns. Edward Tenner offers a smarter way to think about efficiency, showing how we can combine artificial intelligence and our own intuition, leaving ourselves and our institutions open to learning from the random and unexpected.