The Art of Data Science: A Guide for Anyone Who Works with Data


Roger D. Peng - 2015
    The authors have extensive experience both managing data analysts and conducting their own data analyses, and have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science.

Number Freak: From 1 to 200- The Hidden Language of Numbers Revealed


Derrick Niederman - 2009
    Includes such gems as:? There are 42 eyes in a deck of cards, and 42 dots on a pair of dice ? In order to fill in a map so that neighboring regions never get the same color, one never needs more than four colors ? Hells Angels use the number 81 in their insignia because the initials H and A are the eighth and first numbers in the alphabet respectively

Python for Data Analysis


Wes McKinney - 2011
    It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It's ideal for analysts new to Python and for Python programmers new to scientific computing.Use the IPython interactive shell as your primary development environmentLearn basic and advanced NumPy (Numerical Python) featuresGet started with data analysis tools in the pandas libraryUse high-performance tools to load, clean, transform, merge, and reshape dataCreate scatter plots and static or interactive visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsMeasure data by points in time, whether it's specific instances, fixed periods, or intervalsLearn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

Stat-Spotting: A Field Guide to Identifying Dubious Data


Joel Best - 2008
    But all too often, even the most respected publications present numbers that are miscalculated, misinterpreted, hyped, or simply misleading. Following on the heels of his highly acclaimed Damned Lies and Statistics and More Damned Lies and Statistics, Joel Best now offers this practical field guide to help everyone identify questionable statistics. Entertaining, informative, and concise, Stat-Spotting is essential reading for people who want to be more savvy and critical consumers of news and information.Stat-Spotting features:* Pertinent examples from today's news, including the number of deaths reported in Iraq, the threat of secondhand smoke, the increase in the number of overweight Americans, and many more* A commonsense approach that doesn't require advanced math or statistics

Natural Language Processing with Python


Steven Bird - 2009
    With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.Packed with examples and exercises, Natural Language Processing with Python will help you: Extract information from unstructured text, either to guess the topic or identify "named entities" Analyze linguistic structure in text, including parsing and semantic analysis Access popular linguistic databases, including WordNet and treebanks Integrate techniques drawn from fields as diverse as linguistics and artificial intelligenceThis book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

Artificial Intelligence: A Guide for Thinking Humans


Melanie Mitchell - 2019
    The award-winning author Melanie Mitchell, a leading computer scientist, now reveals AI’s turbulent history and the recent spate of apparent successes, grand hopes, and emerging fears surrounding it.In Artificial Intelligence, Mitchell turns to the most urgent questions concerning AI today: How intelligent—really—are the best AI programs? How do they work? What can they actually do, and when do they fail? How humanlike do we expect them to become, and how soon do we need to worry about them surpassing us? Along the way, she introduces the dominant models of modern AI and machine learning, describing cutting-edge AI programs, their human inventors, and the historical lines of thought underpinning recent achievements. She meets with fellow experts such as Douglas Hofstadter, the cognitive scientist and Pulitzer Prize–winning author of the modern classic Gödel, Escher, Bach, who explains why he is “terrified” about the future of AI. She explores the profound disconnect between the hype and the actual achievements in AI, providing a clear sense of what the field has accomplished and how much further it has to go.Interweaving stories about the science of AI and the people behind it, Artificial Intelligence brims with clear-sighted, captivating, and accessible accounts of the most interesting and provocative modern work in the field, flavored with Mitchell’s humor and personal observations. This frank, lively book is an indispensable guide to understanding today’s AI, its quest for “human-level” intelligence, and its impact on the future for us all.

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics


Nathan Yau - 2011
    Wouldn't it be wonderful if we could actually visualize data in such a way that we could maximize its potential and tell a story in a clear, concise manner? Thanks to the creative genius of Nathan Yau, we can. With this full-color book, data visualization guru and author Nathan Yau uses step-by-step tutorials to show you how to visualize and tell stories with data. He explains how to gather, parse, and format data and then design high quality graphics that help you explore and present patterns, outliers, and relationships.Presents a unique approach to visualizing and telling stories with data, from a data visualization expert and the creator of flowingdata.com, Nathan Yau Offers step-by-step tutorials and practical design tips for creating statistical graphics, geographical maps, and information design to find meaning in the numbers Details tools that can be used to visualize data-native graphics for the Web, such as ActionScript, Flash libraries, PHP, and JavaScript and tools to design graphics for print, such as R and Illustrator Contains numerous examples and descriptions of patterns and outliers and explains how to show them Visualize This demonstrates how to explain data visually so that you can present your information in a way that is easy to understand and appealing.

Social Statistics for a Diverse Society


Chava Frankfort-Nachmias - 1996
    The authors help students learn key sociological concepts through real research examples related to the dynamic interplay of race, class, gender, and other social variables.

Networks, Crowds, and Markets


David Easley - 2010
    This connectedness is found in many incarnations: in the rapid growth of the Internet, in the ease with which global communication takes place, and in the ability of news and information as well as epidemics and financial crises to spread with surprising speed and intensity. These are phenomena that involve networks, incentives, and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which our decisions can have subtle consequences for others. This introductory undergraduate textbook takes an interdisciplinary look at economics, sociology, computing and information science, and applied mathematics to understand networks and behavior. It describes the emerging field of study that is growing at the interface of these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected.

Struck by Lightning: The Curious World of Probabilities


Jeffrey S. Rosenthal - 2005
    Human beings have long been both fascinated and appalled by randomness. On the one hand, we love the thrill of a surprise party, the unpredictability of a budding romance, or the freedom of not knowing what tomorrow will bring. We are inexplicably delighted by strange coincidences and striking similarities. But we also hate uncertainty's dark side. From cancer to SARS, diseases strike with no apparent pattern. Terrorists attack, airplanes crash, bridges collapse, and we never know if we'll be that one in a million statistic. We are all constantly faced with situations and choices that involve randomness and uncertainty. A basic understanding of the rules of probability theory, applied to real-life circumstances, can help us to make sense of these situations, to avoid unnecessary fear, to seize the opportunities that randomness presents to us, and to actually enjoy the uncertainties we face. The reality is that when it comes to randomness, you can run, but you can't hide. So many aspects of our lives are governed by events that are simply not in our control. In this entertaining yet sophisticated look at the world of probabilities, author Jeffrey Rosenthal--an improbably talented math professor--explains the mechanics of randomness and teaches us how to develop an informed perspective on probability.

Statistics in a Nutshell: A Desktop Quick Reference


Sarah Boslaugh - 2008
    This book gives you a solid understanding of statistics without being too simple, yet without the numbing complexity of most college texts. You get a firm grasp of the fundamentals and a hands-on understanding of how to apply them before moving on to the more advanced material that follows. Each chapter presents you with easy-to-follow descriptions illustrated by graphics, formulas, and plenty of solved examples. Before you know it, you'll learn to apply statistical reasoning and statistical techniques, from basic concepts of probability and hypothesis testing to multivariate analysis. Organized into four distinct sections, Statistics in a Nutshell offers you:Introductory material: Different ways to think about statistics Basic concepts of measurement and probability theoryData management for statistical analysis Research design and experimental design How to critique statistics presented by others Basic inferential statistics: Basic concepts of inferential statistics The concept of correlation, when it is and is not an appropriate measure of association Dichotomous and categorical data The distinction between parametric and nonparametric statistics Advanced inferential techniques: The General Linear Model Analysis of Variance (ANOVA) and MANOVA Multiple linear regression Specialized techniques: Business and quality improvement statistics Medical and public health statistics Educational and psychological statistics Unlike many introductory books on the subject, Statistics in a Nutshell doesn't omit important material in an effort to dumb it down. And this book is far more practical than most college texts, which tend to over-emphasize calculation without teaching you when and how to apply different statistical tests. With Statistics in a Nutshell, you learn how to perform most common statistical analyses, and understand statistical techniques presented in research articles. If you need to know how to use a wide range of statistical techniques without getting in over your head, this is the book you want.

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Learning From Data: A Short Course


Yaser S. Abu-Mostafa - 2012
    Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.

Econometric Analysis of Cross Section and Panel Data


Jeffrey M. Wooldridge - 2001
    The book makes clear that applied microeconometrics is about the estimation of marginal and treatment effects, and that parametric estimation is simply a means to this end. It also clarifies the distinction between causality and statistical association. The book focuses specifically on cross section and panel data methods. Population assumptions are stated separately from sampling assumptions, leading to simple statements as well as to important insights. The unified approach to linear and nonlinear models and to cross section and panel data enables straightforward coverage of more advanced methods. The numerous end-of-chapter problems are an important component of the book. Some problems contain important points not fully described in the text, and others cover new ideas that can be analyzed using tools presented in the current and previous chapters. Several problems require the use of the data sets located at the author's website.

Multiple View Geometry in Computer Vision


Richard Hartley - 2000
    This book covers relevant geometric principles and how to represent objects algebraically so they can be computed and applied. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. Richard Hartley and Andrew Zisserman provide comprehensive background material and explain how to apply the methods and implement the algorithms. First Edition HB (2000): 0-521-62304-9