Book picks similar to
Exploratory Data Analysis by John W. Tukey
statistics
data-science
mathematics
data
Data Science at the Command Line: Facing the Future with Time-Tested Tools
Jeroen Janssens - 2014
You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms
Data Visualisation: A Handbook for Data Driven Design
Andy Kirk - 2016
Scholars and students need to be able to analyze, design and curate information into useful tools of communication, insight and understanding. This book is the starting point in learning the process and skills of data visualization, teaching the concepts and skills of how to present data and inspiring effective visual design. Benefits of this book: A flexible step-by-step journey that equips you to achieve great data visualization.A curated collection of classic and contemporary examples, giving illustrations of good and bad practice Examples on every page to give creative inspiration Illustrations of good and bad practice show you how to critically evaluate and improve your own work Advice and experience from the best designers in the field Loads of online practical help, checklists, case studies and exercises make this the most comprehensive text available
Designing Data-Intensive Applications
Martin Kleppmann - 2015
Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures
The Art of Doing Science and Engineering: Learning to Learn
Richard Hamming - 1996
By presenting actual experiences and analyzing them as they are described, the author conveys the developmental thought processes employed and shows a style of thinking that leads to successful results is something that can be learned. Along with spectacular successes, the author also conveys how failures contributed to shaping the thought processes. Provides the reader with a style of thinking that will enhance a person's ability to function as a problem-solver of complex technical issues. Consists of a collection of stories about the author's participation in significant discoveries, relating how those discoveries came about and, most importantly, provides analysis about the thought processes and reasoning that took place as the author and his associates progressed through engineering problems.
R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics
Paul Teetor - 2011
The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author
Structure and Interpretation of Computer Programs
Harold Abelson - 1984
This long-awaited revision contains changes throughout the text. There are new implementations of most of the major programming systems in the book, including the interpreters and compilers, and the authors have incorporated many small changes that reflect their experience teaching the course at MIT since the first edition was published. A new theme has been introduced that emphasizes the central role played by different approaches to dealing with time in computational models: objects with state, concurrent programming, functional programming and lazy evaluation, and nondeterministic programming. There are new example sections on higher-order procedures in graphics and on applications of stream processing in numerical programming, and many new exercises. In addition, all the programs have been reworked to run in any Scheme implementation that adheres to the IEEE standard.
Foundations of Statistical Natural Language Processing
Christopher D. Manning - 1999
This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computability and the Turing Machine
Charles Petzold - 2008
Turing
Mathematician Alan Turing invented an imaginary computer known as the Turing Machine; in an age before computers, he explored the concept of what it meant to be "computable," creating the field of computability theory in the process, a foundation of present-day computer programming.The book expands Turing's original 36-page paper with additional background chapters and extensive annotations; the author elaborates on and clarifies many of Turing's statements, making the original difficult-to-read document accessible to present day programmers, computer science majors, math geeks, and others.Interwoven into the narrative are the highlights of Turing's own life: his years at Cambridge and Princeton, his secret work in cryptanalysis during World War II, his involvement in seminal computer projects, his speculations about artificial intelligence, his arrest and prosecution for the crime of "gross indecency," and his early death by apparent suicide at the age of 41.
Learning From Data: A Short Course
Yaser S. Abu-Mostafa - 2012
Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Pedro Domingos - 2015
In The Master Algorithm, Pedro Domingos lifts the veil to give us a peek inside the learning machines that power Google, Amazon, and your smartphone. He assembles a blueprint for the future universal learner--the Master Algorithm--and discusses what it will mean for business, science, and society. If data-ism is today's philosophy, this book is its bible.
Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures
Claus O. Wilke - 2019
But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke teaches you the elements most critical to successful data visualization.Explore the basic concepts of color as a tool to highlight, distinguish, or represent a valueUnderstand the importance of redundant coding to ensure you provide key information in multiple waysUse the book's visualizations directory, a graphical guide to commonly used types of data visualizationsGet extensive examples of good and bad figuresLearn how to use figures in a document or report and how employ them effectively to tell a compelling story
A Field Guide to Lies: Critical Thinking in the Information Age
Daniel J. Levitin - 2016
We are bombarded with more information each day than our brains can process—especially in election season. It's raining bad data, half-truths, and even outright lies. New York Times bestselling author Daniel J. Levitin shows how to recognize misleading announcements, statistics, graphs, and written reports revealing the ways lying weasels can use them.
It's becoming harder to separate the wheat from the digital chaff. How do we distinguish misinformation, pseudo-facts, distortions, and outright lies from reliable information? Levitin groups his field guide into two categories—statistical infomation and faulty arguments—ultimately showing how science is the bedrock of critical thinking. Infoliteracy means understanding that there are hierarchies of source quality and bias that variously distort our information feeds via every media channel, including social media. We may expect newspapers, bloggers, the government, and Wikipedia to be factually and logically correct, but they so often aren't. We need to think critically about the words and numbers we encounter if we want to be successful at work, at play, and in making the most of our lives. This means checking the plausibility and reasoning—not passively accepting information, repeating it, and making decisions based on it. Readers learn to avoid the extremes of passive gullibility and cynical rejection. Levitin's charming, entertaining, accessible guide can help anyone wake up to a whole lot of things that aren't so. And catch some lying weasels in their tracks!
Python Data Science Handbook: Tools and Techniques for Developers
Jake Vanderplas - 2016
Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.With this handbook, you’ll learn how to use: * IPython and Jupyter: provide computational environments for data scientists using Python * NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python * Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python * Matplotlib: includes capabilities for a flexible range of data visualizations in Python * Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms
Build a Career in Data Science
Emily Robinson - 2020
Industry experts Jacqueline Nolis and Emily Robinson lay out the soft skills you’ll need alongside your technical know-how in order to succeed in the field. Following their clear and simple instructions you’ll craft a resume that hiring managers will love, learn how to ace your interview, and ensure you hit the ground running in your first months at your new job. Once you’ve gotten your foot in the door, learn to thrive as a data scientist by handling high expectations, dealing with stakeholders, and managing failures. Finally, you’ll look towards the future and learn about how to join the broader data science community, leaving a job gracefully, and plotting your career path. With this book by your side you’ll have everything you need to ensure a rewarding and productive role in data science.
Statistical Inference
George Casella - 2001
Starting from the basics of probability, the authors develop the theory of statistical inference using techniques, definitions, and concepts that are statistical and are natural extensions and consequences of previous concepts. This book can be used for readers who have a solid mathematics background. It can also be used in a way that stresses the more practical uses of statistical theory, being more concerned with understanding basic statistical concepts and deriving reasonable statistical procedures for a variety of situations, and less concerned with formal optimality investigations.