Book picks similar to
R Markdown Cookbook by Yihui Xie


coding
programming-analytics
statistics
waiting-list

Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python


Scott Hartshorn - 2016
    They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.

Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference


Cameron Davidson-Pilon - 2014
    However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice-freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You'll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you've mastered these techniques, you'll constantly turn to this guide for the working PyMC code you need to jumpstart future projects. Coverage includes - Learning the Bayesian "state of mind" and its practical implications - Understanding how computers perform Bayesian inference - Using the PyMC Python library to program Bayesian analyses - Building and debugging models with PyMC - Testing your model's "goodness of fit" - Opening the "black box" of the Markov Chain Monte Carlo algorithm to see how and why it works - Leveraging the power of the "Law of Large Numbers" - Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning - Using loss functions to measure an estimate's weaknesses based on your goals and desired outcomes - Selecting appropriate priors and understanding how their influence changes with dataset size - Overcoming the "exploration versus exploitation" dilemma: deciding when "pretty good" is good enough - Using Bayesian inference to improve A/B testing - Solving data science problems when only small amounts of data are available Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

The Chase Fulton Novels Boxed Set #1: Books 1 - 3


Cap Daniels - 2020
    

Statistical Analysis with Excel for Dummies


Joseph Schmuller - 2005
    mean, margin of error, standard deviation, permutations, and correlations-all using Excel

Practical Statistics for Data Scientists: 50 Essential Concepts


Peter Bruce - 2017
    Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.With this book, you'll learn:Why exploratory data analysis is a key preliminary step in data scienceHow random sampling can reduce bias and yield a higher quality dataset, even with big dataHow the principles of experimental design yield definitive answers to questionsHow to use regression to estimate outcomes and detect anomaliesKey classification techniques for predicting which categories a record belongs toStatistical machine learning methods that "learn" from dataUnsupervised learning methods for extracting meaning from unlabeled data

Finger of an Angel


Panayotis Cacoyannis - 2019
    The day is unbearably hot, and the snazzy car’s air-con is broken, blowing out hot air instead of cold. Lily follows the meandering road in a state of dehydration, and experiences a series of encounters with angels and demons and ghosts from the past. As time begins to travel backwards, Lily knows that very soon she will find her way back to the city. But even when eventually she does, events nearer home seem to mirror her encounters on the long winding road that disappeared.

Discovering Statistics Using SPSS (Introducing Statistical Methods)


Andy Field - 2000
    What's new in the Second Edition? 1. Fully compliant with the latest version of SPSS version 12 2. More coverage of advanced statistics including completely new coverage of non-parametric statistics. The book is 50 per cent longer than the First Edition. 3. Each section of each chapter now has a notation - 1,2 or 3 - referring to the intended level of study. This helps students navigate their way through the book and makes it user-friendly for students of ALL levels. 4. Has a 'how to use this book' section at the start of the text. 5. Characters in each chapter have defined roles - summarizing key points, to pose questions etc 6. Each chapter now has several examples for students to work through. Answers provided on the enclosed CD-ROM

The Data Journalism Handbook


Jonathan Gray - 2012
    With The Data Journalism Handbook, you’ll explore the potential, limits, and applied uses of this new and fascinating field.This valuable handbook has attracted scores of contributors since the European Journalism Centre and the Open Knowledge Foundation launched the project at MozFest 2011. Through a collection of tips and techniques from leading journalists, professors, software developers, and data analysts, you’ll learn how data can be either the source of data journalism or a tool with which the story is told—or both.Examine the use of data journalism at the BBC, the Chicago Tribune, the Guardian, and other news organizationsExplore in-depth case studies on elections, riots, school performance, and corruptionLearn how to find data from the Web, through freedom of information laws, and by "crowd sourcing"Extract information from raw data with tips for working with numbers and statistics and using data visualizationDeliver data through infographics, news apps, open data platforms, and download links

A Path and a Practice: Using Lao Tzu's Tao Te Ching as a Guide to an Awakened Spiritual Life


William Martin - 2004
    But no modern translation has yet captured the essential thrust of Lao Tzu's work as a practical guide to living an awakened life. Now William Martin, whose acclaimed previous reinterpretations of the Tao (for parents, couples, and elders) have introduced or reacquainted this classic text to thousands of readers, strikingly translates the Tao's eighty-one chapters to uniquely address someone on a Tao—or path—with a practice. Martin frames his new translation with two illuminating, groundbreaking sections: "A Path," which introduces the Tao's nonlinear construction and explains how it works its themes; and "A Practice," which provides practical guidance for readers exploring each of the Tao's themes in depth. Martin's genius in this new translation uncovers how directly the Tao speaks to readers on or about to embark on a spiritual journey.

Hands-On Machine Learning with Scikit-Learn and TensorFlow


Aurélien Géron - 2017
    Now that machine learning is thriving, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This practical book shows you how.By using concrete examples, minimal theory, and two production-ready Python frameworks—Scikit-Learn and TensorFlow—author Aurélien Géron helps you gain an intuitive understanding of the concepts and tools for building intelligent systems. You’ll learn how to use a range of techniques, starting with simple Linear Regression and progressing to Deep Neural Networks. If you have some programming experience and you’re ready to code a machine learning project, this guide is for you.This hands-on book shows you how to use:Scikit-Learn, an accessible framework that implements many algorithms efficiently and serves as a great machine learning entry pointTensorFlow, a more complex library for distributed numerical computation, ideal for training and running very large neural networksPractical code examples that you can apply without learning excessive machine learning theory or algorithm details

The Best American Infographics 2015


Gareth Cook - 2015
    Finding the visual form of data can simplify this deluge into pearls of understanding.” —Kim Rees, Periscopic    The most creative and effective data visualizations from the past year, edited by Brain Pickings creator Maria Popova   The rise of infographics across nearly all print and electronic media—from a graphic illuminating the tweets of the women of Isis to a memorable depiction of the national geography of beer—reveals patterns in our lives and the world in often startling ways. The Best American Infographics 2015 showcases visualizations from the worlds of politics, social issues, health, sports, arts and culture, and more. From an elegant graphic comparison of first sentences in classic novels to a startling illustration of the world’s deadliest animals, “You’ll come away with more than your share of . . . mind-bending moments—and a wide-ranging view of what infographics can do” (Harvard Business Review). “This is what information design does at its best – it gives pause, makes visible the unsuspected yet significant invisibilia of life, and by astonishing us into mobilization, it catapults us toward one of the greatest feats of human courage: the act of changing one’s mind.”—from the Introduction by Maria Popova Guest introducer MARIA POPOVA is the one-woman curation machine behind Brain Pickings, a cross-disciplinary blog showcasing content that makes people smarter. She has more than half a million monthly readers and over 480,000 Twitter followers. Popova is an MIT Futures of Entertainment Fellow and has written for the New York Times, Atlantic, Wired UK, GOOD Magazine, The Huffington Post, and the Nieman Journalism Lab.   Series editor GARETH COOK is a Pulitzer Prize–winning journalist, a contributor to the New York Times Magazine, and the editor of Mind Matters, Scientific American’s neuroscience blog.  He helped invent the Boston Globe’s Sunday Ideas section and served as its editor from 2007 to 2011. His work has also appeared in NewYorker.com, WIRED, Scientific American, and The Best American Science and Nature Writing.

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics


Paul Teetor - 2011
    The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author

Applied Predictive Modeling


Max Kuhn - 2013
    Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms. Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance-all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code f

Data Science from Scratch: First Principles with Python


Joel Grus - 2015
    In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases