Envisioning Information


Edward R. Tufte - 1990
    The Whole Earth Review called Envisioning Information a "passionate, elegant revelation."

Probability Theory: The Logic of Science


E.T. Jaynes - 1999
    It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.

Programming in Haskell


Graham Hutton - 2006
    This introduction is ideal for beginners: it requires no previous programming experience and all concepts are explained from first principles via carefully chosen examples. Each chapter includes exercises that range from the straightforward to extended projects, plus suggestions for further reading on more advanced topics. The author is a leading Haskell researcher and instructor, well-known for his teaching skills. The presentation is clear and simple, and benefits from having been refined and class-tested over several years. The result is a text that can be used with courses, or for self-learning. Features include freely accessible Powerpoint slides for each chapter, solutions to exercises and examination questions (with solutions) available to instructors, and a downloadable code that's fully compliant with the latest Haskell release.

Structure and Interpretation of Computer Programs


Harold Abelson - 1984
    This long-awaited revision contains changes throughout the text. There are new implementations of most of the major programming systems in the book, including the interpreters and compilers, and the authors have incorporated many small changes that reflect their experience teaching the course at MIT since the first edition was published. A new theme has been introduced that emphasizes the central role played by different approaches to dealing with time in computational models: objects with state, concurrent programming, functional programming and lazy evaluation, and nondeterministic programming. There are new example sections on higher-order procedures in graphics and on applications of stream processing in numerical programming, and many new exercises. In addition, all the programs have been reworked to run in any Scheme implementation that adheres to the IEEE standard.

Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

Now You See It: Simple Visualization Techniques for Quantitative Analysis


Stephen Few - 2009
    Employing a methodology that is primarily learning by example and “thinking with our eyes,” this manual features graphs and practical analytical techniques that can be applied to a broad range of data analysis tools—including the most commonly used Microsoft Excel. This approach is particularly valuable to those who need to make sense of quantitative business data by discerning meaningful patterns, trends, relationships, and exceptions that reveal business performance, potential problems and opportunities, and hints about the future. It provides practical skills that are useful to managers at all levels and to those interested in keeping a keen eye on their business.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design


Michael Kearns - 2019
    Algorithms have made our lives more efficient, more entertaining, and, sometimes, better informed. At the same time, complex algorithms are increasingly violating the basic rights of individual citizens. Allegedly anonymized datasets routinely leak our most sensitive personal information; statistical models for everything from mortgages to college admissions reflect racial and gender bias. Meanwhile, users manipulate algorithms to "game" search engines, spam filters, online reviewing services, and navigation apps.Understanding and improving the science behind the algorithms that run our lives is rapidly becoming one of the most pressing issues of this century. Traditional fixes, such as laws, regulations and watchdog groups, have proven woefully inadequate. Reporting from the cutting edge of scientific research, The Ethical Algorithm offers a new approach: a set of principled solutions based on the emerging and exciting science of socially aware algorithm design. Michael Kearns and Aaron Roth explain how we can better embed human principles into machine code - without halting the advance of data-driven scientific exploration. Weaving together innovative research with stories of citizens, scientists, and activists on the front lines, The Ethical Algorithm offers a compelling vision for a future, one in which we can better protect humans from the unintended impacts of algorithms while continuing to inspire wondrous advances in technology.

Learning From Data: A Short Course


Yaser S. Abu-Mostafa - 2012
    Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.

Dataclysm: Who We Are (When We Think No One's Looking)


Christian Rudder - 2014
    In Dataclysm, Christian Rudder uses it to show us who we truly are.   For centuries, we’ve relied on polling or small-scale lab experiments to study human behavior. Today, a new approach is possible. As we live more of our lives online, researchers can finally observe us directly, in vast numbers, and without filters. Data scientists have become the new demographers.   In this daring and original book, Rudder explains how Facebook "likes" can predict, with surprising accuracy, a person’s sexual orientation and even intelligence; how attractive women receive exponentially more interview requests; and why you must have haters to be hot. He charts the rise and fall of America’s most reviled word through Google Search and examines the new dynamics of collaborative rage on Twitter. He shows how people express themselves, both privately and publicly. What is the least Asian thing you can say? Do people bathe more in Vermont or New Jersey? What do black women think about Simon & Garfunkel? (Hint: they don’t think about Simon & Garfunkel.) Rudder also traces human migration over time, showing how groups of people move from certain small towns to the same big cities across the globe. And he grapples with the challenge of maintaining privacy in a world where these explorations are possible.   Visually arresting and full of wit and insight, Dataclysm is a new way of seeing ourselves—a brilliant alchemy, in which math is made human and numbers become the narrative of our time.

Introduction to Algorithms


Thomas H. Cormen - 1989
    Each chapter is relatively self-contained and can be used as a unit of study. The algorithms are described in English and in a pseudocode designed to be readable by anyone who has done a little programming. The explanations have been kept elementary without sacrificing depth of coverage or mathematical rigor.

The Essential Turing: Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life Plus the Secrets of Enigma


Alan Turing - 2004
    In 1935, aged 22, he developed the mathematical theory upon which all subsequent stored-program digital computers are modeled.At the outbreak of hostilities with Germany in September 1939, he joined the Government Codebreaking team at Bletchley Park, Buckinghamshire and played a crucial role in deciphering Engima, the code used by the German armed forces to protect their radio communications. Turing's work on the versionof Enigma used by the German navy was vital to the battle for supremacy in the North Atlantic. He also contributed to the attack on the cyphers known as Fish, which were used by the German High Command for the encryption of signals during the latter part of the war. His contribution helped toshorten the war in Europe by an estimated two years.After the war, his theoretical work led to the development of Britain's first computers at the National Physical Laboratory and the Royal Society Computing Machine Laboratory at Manchester University.Turing was also a founding father of modern cognitive science, theorizing that the cortex at birth is an unorganized machine which through training becomes organized into a universal machine or something like it. He went on to develop the use of computers to model biological growth, launchingthe discipline now referred to as Artificial Life.The papers in this book are the key works for understanding Turing's phenomenal contribution across all these fields. The collection includes Turing's declassified wartime Treatise on the Enigma; letters from Turing to Churchill and to codebreakers; lectures, papers, and broadcasts which opened upthe concept of AI and its implications; and the paper which formed the genesis of the investigation of Artifical Life.

Introduction to Probability


Joseph K. Blitzstein - 2014
    The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo MCMC. Additional application areas explored include genetics, medicine, computer science, and information theory. The print book version includes a code that provides free access to an eBook version. The authors present the material in an accessible style and motivate concepts using real-world examples. Throughout, they use stories to uncover connections between the fundamental distributions in statistics and conditioning to reduce complicated problems to manageable pieces. The book includes many intuitive explanations, diagrams, and practice problems. Each chapter ends with a section showing how to perform relevant simulations and calculations in R, a free statistical software environment.

Data Jujitsu: The Art of Turning Data into Product


D.J. Patil - 2012
    Acclaimed data scientist DJ Patil details a new approach to solving problems in Data Jujitsu.Learn how to use a problem's "weight" against itself to:Break down seemingly complex data problems into simplified partsUse alternative data analysis techniques to examine themUse human input, such as Mechanical Turk, and design tricks that enlist the help of your users to take short cuts around tough problemsLearn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at all.

Neural Networks: A Comprehensive Foundation


Simon Haykin - 1994
    Introducing students to the many facets of neural networks, this text provides many case studies to illustrate their real-life, practical applications.

Introduction to Statistical Quality Control


Douglas C. Montgomery - 1985
    It provides comprehensive coverage of the subject from basic principles to state-of-art concepts and applications. The objective is to give the reader a sound understanding of the principles and the basis for applying them in a variety of both product and nonproduct situations. While statistical techniques are emphasized throughout, the book has a strong engineering and management orientation. Guidelines are given throughout the book for selecting the proper type of statistical technique to use in a wide variety of product and nonproduct situations. By presenting theory, and supporting the theory with clear and relevant examples, Montgomery helps the reader to understand the big picture of important concepts. Updated to reflect contemporary practice and provide more information on management aspects of quality improvement.