Book picks similar to
A Programmer’s Guide to Data Mining: The Ancient Art of the Numerati by Ron Zacharski
data-science
artificial-intelligence
data-mining
not-completed
Introducing Regular Expressions
Michael J. Fitzgerald - 2012
You’ll learn the fundamentals step-by-step with the help of numerous examples, discovering first-hand how to match, extract, and transform text by matching specific words, characters, and patterns.Regular expressions are an essential part of a programmer’s toolkit, available in various Unix utlilities as well as programming languages such as Perl, Java, JavaScript, and C#. When you’ve finished this book, you’ll be familiar with the most commonly used syntax in regular expressions, and you’ll understand how using them will save you considerable time.Discover what regular expressions are and how they workLearn many of the differences between regular expressions used with command-line tools and in various programming languagesApply simple methods for finding patterns in text, including digits, letters, Unicode characters, and string literalsLearn how to use zero-width assertions and lookaroundsWork with groups, backreferences, character classes, and quantifiersUse regular expressions to mark up plain text with HTML5
You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It's Making the World a Weirder Place
Janelle Shane - 2019
according to an artificial intelligence trained by scientist Janelle Shane, creator of the popular blog "AI Weirdness." She creates silly AIs that learn how to name paint colors, create the best recipes, and even flirt (badly) with humans--all to understand the technology that governs so much of our daily lives.We rely on AI every day for recommendations, for translations, and to put cat ears on our selfie videos. We also trust AI with matters of life and death, on the road and in our hospitals. But how smart is AI really, and how does it solve problems, understand humans, and even drive self-driving cars?Shane delivers the answers to every AI question you've ever asked, and some you definitely haven't--like, how can a computer design the perfect sandwich? What does robot-generated Harry Potter fan-fiction look like? And is the world's best Halloween costume really "Vampire Hog Bride"?In this smart, often hilarious introduction to the most interesting science of our time, Shane shows how these programs learn, fail, and adapt--and how they reflect the best and worst of humanity. You Look Like a Thing and I Love You is the perfect book for anyone curious about what the robots in our lives are thinking.
Introduction to Probability
Joseph K. Blitzstein - 2014
The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo MCMC. Additional application areas explored include genetics, medicine, computer science, and information theory. The print book version includes a code that provides free access to an eBook version. The authors present the material in an accessible style and motivate concepts using real-world examples. Throughout, they use stories to uncover connections between the fundamental distributions in statistics and conditioning to reduce complicated problems to manageable pieces. The book includes many intuitive explanations, diagrams, and practice problems. Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment.
Visualize This: The FlowingData Guide to Design, Visualization, and Statistics
Nathan Yau - 2011
Wouldn't it be wonderful if we could actually visualize data in such a way that we could maximize its potential and tell a story in a clear, concise manner? Thanks to the creative genius of Nathan Yau, we can. With this full-color book, data visualization guru and author Nathan Yau uses step-by-step tutorials to show you how to visualize and tell stories with data. He explains how to gather, parse, and format data and then design high quality graphics that help you explore and present patterns, outliers, and relationships.Presents a unique approach to visualizing and telling stories with data, from a data visualization expert and the creator of flowingdata.com, Nathan Yau Offers step-by-step tutorials and practical design tips for creating statistical graphics, geographical maps, and information design to find meaning in the numbers Details tools that can be used to visualize data-native graphics for the Web, such as ActionScript, Flash libraries, PHP, and JavaScript and tools to design graphics for print, such as R and Illustrator Contains numerous examples and descriptions of patterns and outliers and explains how to show them Visualize This demonstrates how to explain data visually so that you can present your information in a way that is easy to understand and appealing.
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Rob Collie - 2016
Written by the world’s foremost PowerPivot blogger and practitioner, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce. It includes lessons on the difference between calculated columns and measures; how formulas can be reused across reports of completely different shapes; how to merge disjointed sets of data into unified reports; how to make certain columns in a pivot behave as if the pivot were filtered while other columns do not; and how to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar. The “pattern-like” techniques and best practices contained in this book have been developed and refined over two years of onsite training with Excel users around the world, and the key lessons from those seminars costing thousands of dollars per day are now available to within the pages of this easy-to-follow guide. This updated second edition covers new features introduced with Office 2015.
Probabilistic Graphical Models: Principles and Techniques
Daphne Koller - 2009
The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.
Make Your Own Neural Network: An In-depth Visual Introduction For Beginners
Michael Taylor - 2017
A step-by-step visual journey through the mathematics of neural networks, and making your own using Python and Tensorflow.
How Not to Be Wrong: The Power of Mathematical Thinking
Jordan Ellenberg - 2014
In How Not to Be Wrong, Jordan Ellenberg shows us how terribly limiting this view is: Math isn’t confined to abstract incidents that never occur in real life, but rather touches everything we do—the whole world is shot through with it.Math allows us to see the hidden structures underneath the messy and chaotic surface of our world. It’s a science of not being wrong, hammered out by centuries of hard work and argument. Armed with the tools of mathematics, we can see through to the true meaning of information we take for granted: How early should you get to the airport? What does “public opinion” really represent? Why do tall parents have shorter children? Who really won Florida in 2000? And how likely are you, really, to develop cancer?How Not to Be Wrong presents the surprising revelations behind all of these questions and many more, using the mathematician’s method of analyzing life and exposing the hard-won insights of the academic community to the layman—minus the jargon. Ellenberg chases mathematical threads through a vast range of time and space, from the everyday to the cosmic, encountering, among other things, baseball, Reaganomics, daring lottery schemes, Voltaire, the replicability crisis in psychology, Italian Renaissance painting, artificial languages, the development of non-Euclidean geometry, the coming obesity apocalypse, Antonin Scalia’s views on crime and punishment, the psychology of slime molds, what Facebook can and can’t figure out about you, and the existence of God.Ellenberg pulls from history as well as from the latest theoretical developments to provide those not trained in math with the knowledge they need. Math, as Ellenberg says, is “an atomic-powered prosthesis that you attach to your common sense, vastly multiplying its reach and strength.” With the tools of mathematics in hand, you can understand the world in a deeper, more meaningful way. How Not to Be Wrong will show you how.
Thinking with Data
Max Shron - 2014
In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Data helps you learn techniques for turning data into knowledge you can use. You’ll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You’ll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved.Learn a framework for scoping data projectsUnderstand how to pin down the details of an idea, receive feedback, and begin prototypingUse the tools of arguments to ask good questions, build projects in stages, and communicate resultsExplore data-specific patterns of reasoning and learn how to build more useful argumentsDelve into causal reasoning and learn how it permeates data workPut everything together, using extended examples to see the method of full problem thinking in action
Pattern Classification
David G. Stork - 1973
Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included are worked examples, comparisons between different methods, extensive graphics, expanded exercises and computer project topics.An Instructor's Manual presenting detailed solutions to all the problems in the book is available from the Wiley editorial department.
All of Statistics: A Concise Course in Statistical Inference
Larry Wasserman - 2003
But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like nonparametric curve estimation, bootstrapping, and clas- sification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analyzing data. For some time, statistics research was con- ducted in statistics departments while data mining and machine learning re- search was conducted in computer science departments. Statisticians thought that computer scientists were reinventing the wheel. Computer scientists thought that statistical theory didn't apply to their problems. Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algo- rithms are more scalable than statisticians ever thought possible. Formal sta- tistical theory is more pervasive than computer scientists had realized.
Spark: The Definitive Guide: Big Data Processing Made Simple
Bill Chambers - 2018
With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.
You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library.
Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples
Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Structured Streaming, Spark’s stream-processing engine
Learn how you can apply MLlib to a variety of problems, including classification or recommendation
Writing Idiomatic Python 2.7.3
Jeff Knupp - 2013
Each idiom comes with a detailed description, example code showing the "wrong" way to do it, and code for the idiomatic, "Pythonic" alternative. *This version of the book is for Python 2.7.3+. There is also a Python 3.3+ version available.* "Writing Idiomatic Python" contains the most common and important Python idioms in a format that maximizes identification and understanding. Each idiom is presented as a recommendation to write some commonly used piece of code. It is followed by an explanation of why the idiom is important. It also contains two code samples: the "Harmful" way to write it and the "Idiomatic" way. * The "Harmful" way helps you identify the idiom in your own code. * The "Idiomatic" way shows you how to easily translate that code into idiomatic Python. This book is perfect for you: * If you're coming to Python from another programming language * If you're learning Python as a first programming language * If you're looking to increase the readability, maintainability, and correctness of your Python code What is "Idiomatic" Python? Every programming language has its own idioms. Programming language idioms are nothing more than the generally accepted way of writing a certain piece of code. Consistently writing idiomatic code has a number of important benefits: * Others can read and understand your code easily * Others can maintain and enhance your code with minimal effort * Your code will contain fewer bugs * Your code will teach others to write correct code without any effort on your part
The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy
Sharon Bertsch McGrayne - 2011
To its adherents, it is an elegant statement about learning from experience. To its opponents, it is subjectivity run amok.In the first-ever account of Bayes' rule for general readers, Sharon Bertsch McGrayne explores this controversial theorem and the human obsessions surrounding it. She traces its discovery by an amateur mathematician in the 1740s through its development into roughly its modern form by French scientist Pierre Simon Laplace. She reveals why respected statisticians rendered it professionally taboo for 150 years—at the same time that practitioners relied on it to solve crises involving great uncertainty and scanty information (Alan Turing's role in breaking Germany's Enigma code during World War II), and explains how the advent of off-the-shelf computer technology in the 1980s proved to be a game-changer. Today, Bayes' rule is used everywhere from DNA de-coding to Homeland Security.Drawing on primary source material and interviews with statisticians and other scientists, The Theory That Would Not Die is the riveting account of how a seemingly simple theorem ignited one of the greatest controversies of all time.
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor
Virginia Eubanks - 2018
In Pittsburgh, a child welfare agency uses a statistical model to try to predict which children might be future victims of abuse or neglect.Since the dawn of the digital age, decision-making in finance, employment, politics, health and human services has undergone revolutionary change. Today, automated systems—rather than humans—control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data, the most invasive and punitive systems are aimed at the poor.In Automating Inequality, Virginia Eubanks systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of heart-wrenching and eye-opening stories, from a woman in Indiana whose benefits are literally cut off as she lays dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile.The U.S. has always used its most cutting-edge science and technology to contain, investigate, discipline and punish the destitute. Like the county poorhouse and scientific charity before them, digital tracking and automated decision-making hide poverty from the middle-class public and give the nation the ethical distance it needs to make inhumane choices: which families get food and which starve, who has housing and who remains homeless, and which families are broken up by the state. In the process, they weaken democracy and betray our most cherished national values.This deeply researched and passionate book could not be more timely.Naomi Klein: "This book is downright scary."Ethan Zuckerman, MIT: "Should be required reading."Dorothy Roberts, author of Killing the Black Body: "A must-read for everyone concerned about modern tools of inequality in America."Astra Taylor, author of The People's Platform: "This is the single most important book about technology you will read this year."