Book picks similar to
Causal Inference in Statistics: A Primer by Judea Pearl
statistics
data-science
science
math
Introduction to Automata Theory, Languages, and Computation
John E. Hopcroft - 1979
With this long-awaited revision, the authors continue to present the theory in a concise and straightforward manner, now with an eye out for the practical applications. They have revised this book to make it more accessible to today's students, including the addition of more material on writing proofs, more figures and pictures to convey ideas, side-boxes to highlight other interesting material, and a less formal writing style. Exercises at the end of each chapter, including some new, easier exercises, help readers confirm and enhance their understanding of the material. *NEW! Completely rewritten to be less formal, providing more accessibility to todays students. *NEW! Increased usage of figures and pictures to help convey ideas. *NEW! More detail and intuition provided for definitions and proofs. *NEW! Provides special side-boxes to present supplemental material that may be of interest to readers. *NEW! Includes more exercises, including many at a lower level. *NEW! Presents program-like notation for PDAs and Turing machines. *NEW! Increas
R for Everyone: Advanced Analytics and Graphics
Jared P. Lander - 2013
R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES - Exploring R, RStudio, and R packages - Using R for math: variable types, vectors, calling functions, and more - Exploiting data structures, including data.frames, matrices, and lists - Creating attractive, intuitive statistical graphics - Writing user-defined functions - Controlling program flow with if, ifelse, and complex checks - Improving program efficiency with group manipulations - Combining and reshaping multiple datasets - Manipulating strings using R's facilities and regular expressions - Creating normal, binomial, and Poisson probability distributions - Programming basic statistics: mean, standard deviation, and t-tests - Building linear, generalized linear, and nonlinear models - Assessing the quality of models and variable selection - Preventing overfitting, using the Elastic Net and Bayesian methods - Analyzing univariate and multivariate time series data - Grouping data via K-means and hierarchical clustering - Preparing reports, slideshows, and web pages with knitr - Building reusable R packages with devtools and Rcpp - Getting involved with the R global community
The Information: A History, a Theory, a Flood
James Gleick - 2011
The story of information begins in a time profoundly unlike our own, when every thought and utterance vanishes as soon as it is born. From the invention of scripts and alphabets to the long-misunderstood talking drums of Africa, Gleick tells the story of information technologies that changed the very nature of human consciousness. He provides portraits of the key figures contributing to the inexorable development of our modern understanding of information: Charles Babbage, the idiosyncratic inventor of the first great mechanical computer; Ada Byron, the brilliant and doomed daughter of the poet, who became the first true programmer; pivotal figures like Samuel Morse and Alan Turing; and Claude Shannon, the creator of information theory itself. And then the information age arrives. Citizens of this world become experts willy-nilly: aficionados of bits and bytes. And we sometimes feel we are drowning, swept by a deluge of signs and signals, news and images, blogs and tweets. The Information is the story of how we got here and where we are heading.
Introduction to Real Analysis
Robert G. Bartle - 1982
Therefore, this book provides the fundamental concepts and techniques of real analysis for readers in all of these areas. It helps one develop the ability to think deductively, analyze mathematical situations and extend ideas to a new context. Like the first two editions, this edition maintains the same spirit and user-friendly approach with some streamlined arguments, a few new examples, rearranged topics, and a new chapter on the Generalized Riemann Integral.
Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures
Claus O. Wilke - 2019
But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke teaches you the elements most critical to successful data visualization.Explore the basic concepts of color as a tool to highlight, distinguish, or represent a valueUnderstand the importance of redundant coding to ensure you provide key information in multiple waysUse the book's visualizations directory, a graphical guide to commonly used types of data visualizationsGet extensive examples of good and bad figuresLearn how to use figures in a document or report and how employ them effectively to tell a compelling story
How to Create a Mind: The Secret of Human Thought Revealed
Ray Kurzweil - 2012
In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse engineering the brain to understand precisely how it works and using that knowledge to create even more intelligent machines.Kurzweil discusses how the brain functions, how the mind emerges from the brain, and the implications of vastly increasing the powers of our intelligence in addressing the world’s problems. He thoughtfully examines emotional and moral intelligence and the origins of consciousness and envisions the radical possibilities of our merging with the intelligent technology we are creating.Certain to be one of the most widely discussed and debated science books of the year, How to Create a Mind is sure to take its place alongside Kurzweil’s previous classics which include Fantastic Voyage: Live Long Enough to Live Forever and The Age of Spiritual Machines.
The Feynman Lectures on Physics
Richard P. Feynman - 1964
A new foreword by Kip Thorne, the current Richard Feynman Professor of Theoretical Physics at Caltech, discusses the relevance of the new edition to today's readers. This boxed set also includes Feynman's new Tips on Physics—the four previously unpublished lectures that Feynman gave to students preparing for exams at the end of his course. Thus, this 4-volume set is the complete and definitive edition of The Feynman Lectures on Physics. Packaged in a specially designed slipcase, this 4-volume set provides the ultimate legacy of Feynman's extraordinary contribution to students, teachers, researches, and lay readers around the world.
Code Complete
Steve McConnell - 1993
Now this classic book has been fully updated and revised with leading-edge practices--and hundreds of new code samples--illustrating the art and science of software construction. Capturing the body of knowledge available from research, academia, and everyday commercial practice, McConnell synthesizes the most effective techniques and must-know principles into clear, pragmatic guidance. No matter what your experience level, development environment, or project size, this book will inform and stimulate your thinking--and help you build the highest quality code. Discover the timeless techniques and strategies that help you: Design for minimum complexity and maximum creativity Reap the benefits of collaborative development Apply defensive programming techniques to reduce and flush out errors Exploit opportunities to refactor--or evolve--code, and do it safely Use construction practices that are right-weight for your project Debug problems quickly and effectively Resolve critical construction issues early and correctly Build quality into the beginning, middle, and end of your project
Even You Can Learn Statistics: A Guide for Everyone Who Has Ever Been Afraid of Statistics
David M. Levine - 2004
Each technique is introduced with a simple, jargon-free explanation, practical examples, and hands-on guidance for solving real problems with Excel or a TI-83/84 series calculator, including Plus models. Hate math? No sweat. You'll be amazed how little you need! For those who do have an interest in mathematics, optional "Equation Blackboard" sections review the equations that provide the foundations for important concepts. David M. Levine is a much-honored innovator in statistics education. He is Professor Emeritus of Statistics and Computer Information Systems at Bernard M. Baruch College (CUNY), and co-author of several best-selling books, including Statistics for Managers using Microsoft Excel, Basic Business Statistics, Quality Management, and Six Sigma for Green Belts and Champions. Instructional designer David F. Stephan pioneered the classroom use of personal computers, and is a leader in making Excel more accessible to statistics students. He has co-authored several textbooks with David M. Levine. Here's just some of what you'll learn how to do... Use statistics in your everyday work or study Perform common statistical tasks using a Texas Instruments statistical calculator or Microsoft Excel Build and interpret statistical charts and tables "Test Yourself" at the end of each chapter to review the concepts and methods that you learned in the chapter Work with mean, median, mode, standard deviation, Z scores, skewness, and other descriptive statistics Use probability and probability distributions Work with sampling distributions and confidence intervals Test hypotheses and decision-making risks with Z, t, Chi-Square, ANOVA, and other techniques Perform regression analysis and modeling The easy, practical introduction to statistics--for everyone! Thought you couldn't learn statistics? Think again. You can--and you will!
Data Jujitsu: The Art of Turning Data into Product
D.J. Patil - 2012
Acclaimed data scientist DJ Patil details a new approach to solving problems in Data Jujitsu.Learn how to use a problem's "weight" against itself to:Break down seemingly complex data problems into simplified partsUse alternative data analysis techniques to examine themUse human input, such as Mechanical Turk, and design tricks that enlist the help of your users to take short cuts around tough problemsLearn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at all.
Learning Python
Mark Lutz - 2003
Python is considered easy to learn, but there's no quicker way to mastery of the language than learning from an expert teacher. This edition of "Learning Python" puts you in the hands of two expert teachers, Mark Lutz and David Ascher, whose friendly, well-structured prose has guided many a programmer to proficiency with the language. "Learning Python," Second Edition, offers programmers a comprehensive learning tool for Python and object-oriented programming. Thoroughly updated for the numerous language and class presentation changes that have taken place since the release of the first edition in 1999, this guide introduces the basic elements of the latest release of Python 2.3 and covers new features, such as list comprehensions, nested scopes, and iterators/generators. Beyond language features, this edition of "Learning Python" also includes new context for less-experienced programmers, including fresh overviews of object-oriented programming and dynamic typing, new discussions of program launch and configuration options, new coverage of documentation sources, and more. There are also new use cases throughout to make the application of language features more concrete. The first part of "Learning Python" gives programmers all the information they'll need to understand and construct programs in the Python language, including types, operators, statements, classes, functions, modules and exceptions. The authors then present more advanced material, showing how Python performs common tasks by offering real applications and the libraries available for those applications. Each chapter ends with a series of exercises that will test your Python skills and measure your understanding."Learning Python," Second Edition is a self-paced book that allows readers to focus on the core Python language in depth. As you work through the book, you'll gain a deep and complete understanding of the Python language that will help you to understand the larger application-level examples that you'll encounter on your own. If you're interested in learning Python--and want to do so quickly and efficiently--then "Learning Python," Second Edition is your best choice.
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Ralph Kimball - 1996
Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)
Probability, Random Variables and Stochastic Processes with Errata Sheet
Athanasios Papoulis - 2001
Unnikrishna Pillai of Polytechnic University. The book is intended for a senior/graduate level course in probability and is aimed at students in electrical engineering, math, and physics departments. The authors' approach is to develop the subject of probability theory and stochastic processes as a deductive discipline and to illustrate the theory with basic applications of engineering interest. Approximately 1/3 of the text is new material--this material maintains the style and spirit of previous editions. In order to bridge the gap between concepts and applications, a number of additional examples have been added for further clarity, as well as several new topics.
Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python
Scott Hartshorn - 2016
They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.
Text Mining with R: A Tidy Approach
Julia Silge - 2017
With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective.The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.Learn how to apply the tidy text format to NLPUse sentiment analysis to mine the emotional content of textIdentify a document's most important terms with frequency measurementsExplore relationships and connections between words with the ggraph and widyr packagesConvert back and forth between R's tidy and non-tidy text formatsUse topic modeling to classify document collections into natural groupsExamine case studies that compare Twitter archives, dig into NASA metadata, and analyze thousands of Usenet messages