R for Everyone: Advanced Analytics and Graphics


Jared P. Lander - 2013
    R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES - Exploring R, RStudio, and R packages - Using R for math: variable types, vectors, calling functions, and more - Exploiting data structures, including data.frames, matrices, and lists - Creating attractive, intuitive statistical graphics - Writing user-defined functions - Controlling program flow with if, ifelse, and complex checks - Improving program efficiency with group manipulations - Combining and reshaping multiple datasets - Manipulating strings using R's facilities and regular expressions - Creating normal, binomial, and Poisson probability distributions - Programming basic statistics: mean, standard deviation, and t-tests - Building linear, generalized linear, and nonlinear models - Assessing the quality of models and variable selection - Preventing overfitting, using the Elastic Net and Bayesian methods - Analyzing univariate and multivariate time series data - Grouping data via K-means and hierarchical clustering - Preparing reports, slideshows, and web pages with knitr - Building reusable R packages with devtools and Rcpp - Getting involved with the R global community

Introduction to Machine Learning with Python: A Guide for Data Scientists


Andreas C. Müller - 2015
    If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.With this book, you'll learn:Fundamental concepts and applications of machine learningAdvantages and shortcomings of widely used machine learning algorithmsHow to represent data processed by machine learning, including which data aspects to focus onAdvanced methods for model evaluation and parameter tuningThe concept of pipelines for chaining models and encapsulating your workflowMethods for working with text data, including text-specific processing techniquesSuggestions for improving your machine learning and data science skills

The Elements of Data Analytic Style


Jeffrey Leek - 2015
    This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. It is based in part on the authors blog posts, lecture materials, and tutorials. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world that has enrolled more than 1.76 million people. The book is useful as a companion to introductory courses in data science or data analysis. It is also a useful reference tool for people tasked with reading and critiquing data analyses. It is based on the authors popular open-source guides available through his Github account (https://github.com/jtleek). The paper is also available through Leanpub (https://leanpub.com/datastyle), if the book is purchased on that platform you are entitled to lifetime free updates.

Statistics: An Introduction Using R


Michael J. Crawley - 2005
    R is one of the most powerful and flexible statistical software packages available, and enables the user to apply a wide variety of statistical methods ranging from simple regression to generalized linear modelling. Statistics: An Introduction using R is a clear and concise introductory textbook to statistical analysis using this powerful and free software, and follows on from the success of the author's previous best-selling title Statistical Computing. * Features step-by-step instructions that assume no mathematics, statistics or programming background, helping the non-statistician to fully understand the methodology. * Uses a series of realistic examples, developing step-wise from the simplest cases, with the emphasis on checking the assumptions (e.g. constancy of variance and normality of errors) and the adequacy of the model chosen to fit the data. * The emphasis throughout is on estimation of effect sizes and confidence intervals, rather than on hypothesis testing. * Covers the full range of statistical techniques likely to be need to analyse the data from research projects, including elementary material like t-tests and chi-squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling. * Includes numerous worked examples and exercises within each chapter. * Accompanied by a website featuring worked examples, data sets, exercises and solutions: http: //www.imperial.ac.uk/bio/research/crawl... Statistics: An Introduction using R is the first text to offer such a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a broad range of disciplines. It is primarily aimed at undergraduate students in medicine, engineering, economics and biology - but will also appeal to postgraduates who have not previously covered this area, or wish to switch to using R.

Introductory Statistics


Neil A. Weiss - 1987
    This book develops statistical thinking over rote drill and practice. The Nature of Statistics; Organizing Data; Descriptive Measures; Probability Concepts; Discrete Random Variables; The Normal Distribution; The Sampling Distribution of the Sample Menu; Confidence Intervals for One Population Mean; Hypothesis Tests for One Population Mean; Inferences for Two Population Means; Inferences for Population Standard Deviations; Inferences for Population Proportions; Chi-Square Procedures; Descriptive Methods in Regression and Correlation; Inferential Methods in Regression and Correlation; Analysis of Variance (ANOVA) For all readers interested in Introductory Statistics.

DAX Formulas for PowerPivot: The Excel Pro's Guide to Mastering DAX


Rob Collie - 2012
    Written by the world’s foremost PowerPivot blogger and practitioner, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce and include lessons on the difference between calculated columns and measures, how formulas can be reused across reports of completely different shapes, how to merge disjointed sets of data into unified reports, how to make certain columns in a pivot behave as if the pivot were filtered while other columns do not, and how to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar. The “pattern-like” techniques and best practices contained in this book have been developed and refined over two years of onsite training with Excel users around the world, and the key lessons from those seminars costing thousands of dollars per day are now available to within the pages of this easy-to-follow guide.

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management


Michael J.A. Berry - 1997
    Packed with more than forty percent new and updated material, this edition shows business managers, marketing analysts, and data mining specialists how to harness fundamental data mining methods and techniques to solve common types of business problemsEach chapter covers a new data mining technique, and then shows readers how to apply the technique for improved marketing, sales, and customer supportThe authors build on their reputation for concise, clear, and practical explanations of complex concepts, making this book the perfect introduction to data miningMore advanced chapters cover such topics as how to prepare data for analysis and how to create the necessary infrastructure for data miningCovers core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, clustering, and survival analysis

T-SQL Fundamentals


Itzik Ben-Gan - 2016
    Itzik Ben-Gan explains key T-SQL concepts and helps you apply your knowledge with hands-on exercises. The book first introduces T-SQL's roots and underlying logic. Next, it walks you through core topics such as single-table queries, joins, subqueries, table expressions, and set operators. Then the book covers more-advanced data-query topics such as window functions, pivoting, and grouping sets. The book also explains how to modify data, work with temporal tables, and handle transactions, and provides an overview of programmable objects. Microsoft Data Platform MVP Itzik Ben-Gan shows you how to: Review core SQL concepts and its mathematical roots Create tables and enforce data integrity Perform effective single-table queries by using the SELECT statement Query multiple tables by using joins, subqueries, table expressions, and set operators Use advanced query techniques such as window functions, pivoting, and grouping sets Insert, update, delete, and merge data Use transactions in a concurrent environment Get started with programmable objects-from variables and batches to user-defined functions, stored procedures, triggers, and dynamic SQL

Building Machine Learning Systems with Python


Willi Richert - 2013
    

Data Science For Dummies


Lillian Pierson - 2014
    Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of their organization’s massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you’ll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization. Provides a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing your data for analysis Details different data visualization techniques that can be used to showcase and summarize your data Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark It’s a big, big data world out there – let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

Excel 2007 VBA Programming for Dummies


John Walkenbach - 1996
    Packed with plenty of sample programs, it explains how to work with range objects, control program flow, develop custom dialog boxes, create custom toolbars and menus, and much more.Discover how toGrasp essential programming concepts Use the Visual Basic Editor Navigate the new Excel user interface Communicate with your users Deal with errors and bugs

Data Visualisation: A Handbook for Data Driven Design


Andy Kirk - 2016
    Scholars and students need to be able to analyze, design and curate information into useful tools of communication, insight and understanding. This book is the starting point in learning the process and skills of data visualization, teaching the concepts and skills of how to present data and inspiring effective visual design. Benefits of this book: A flexible step-by-step journey that equips you to achieve great data visualization.A curated collection of classic and contemporary examples, giving illustrations of good and bad practice Examples on every page to give creative inspiration Illustrations of good and bad practice show you how to critically evaluate and improve your own work Advice and experience from the best designers in the field Loads of online practical help, checklists, case studies and exercises make this the most comprehensive text available

Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations


Scott Berinato - 2016
    No longer. A new generation of tools and massive amounts of available data make it easy for anyone to create visualizations that communicate ideas far more effectively than generic spreadsheet charts ever could.What’s more, building good charts is quickly becoming a need-to-have skill for managers. If you’re not doing it, other managers are, and they’re getting noticed for it and getting credit for contributing to your company’s success.In Good Charts, dataviz maven Scott Berinato provides an essential guide to how visualization works and how to use this new language to impress and persuade. Dataviz today is where spreadsheets and word processors were in the early 1980s—on the cusp of changing how we work. Berinato lays out a system for thinking visually and building better charts through a process of talking, sketching, and prototyping.This book is much more than a set of static rules for making visualizations. It taps into both well-established and cutting-edge research in visual perception and neuroscience, as well as the emerging field of visualization science, to explore why good charts (and bad ones) create “feelings behind our eyes.” Along the way, Berinato also includes many engaging vignettes of dataviz pros, illustrating the ideas in practice.Good Charts will help you turn plain, uninspiring charts that merely present information into smart, effective visualizations that powerfully convey ideas.

Practical SQL: A Beginner's Guide to Storytelling with Data


Anthony DeBarros - 2022
    An approachable guide to programming in SQL (Structured Query Language) that will teach even beginning programmers how to build powerful databases and analyze data to find meaningful information.Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language) written by longtime professional journalist Anthony DeBarros. SQL is the primary tool that programmers, web developers, researchers, journalists, and others use to explore data in a database. DeBarros focuses on using SQL to find the story in data, with the aid of the popular open-source database PostgreSQL and the pgAdmin interface.This thoroughly revised second edition includes a new chapter describing how to set up PostgreSQL and more extensive discussion of pgAdmin's best features. The author has also added a chapter on the JSON data format that shows readers how to store and query JSON data. DeBarros has also updated the data in the book throughout, added coverage of additional topics, and perfected the book's examples.Readers love DeBarros's use of exercises and real-world examples that demonstrate how to:- Create databases and related tables using your own data - Correctly define data typesAggregate, sort, and filter data to find patterns - Clean their data and transfer data as text files - Create advanced queries and automate tasksThis book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL.

R in a Nutshell: A Desktop Quick Reference


Joseph Adler - 2009
    R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.Understand the basics of the language, including the nature of R objectsLearn how to write R functions and build your own packagesWork with data through visualization, statistical analysis, and other methodsExplore the wealth of packages contributed by the R communityBecome familiar with the lattice graphics package for high-level data visualizationLearn about bioinformatics packages provided by Bioconductor"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."