Book picks similar to
Introduction to Data Science by Rafael A. Irizarry
data-science
data-analytics
data-science-ethics-analytics-viz
partially-read
Data Science at the Command Line: Facing the Future with Time-Tested Tools
Jeroen Janssens - 2014
You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms
How Charts Lie: Getting Smarter about Visual Information
Alberto Cairo - 2019
While such visualizations can better inform us, they can also deceive by displaying incomplete or inaccurate data, suggesting misleading patterns—or simply misinform us by being poorly designed, such as the confusing “eye of the storm” maps shown on TV every hurricane season.Many of us are ill equipped to interpret the visuals that politicians, journalists, advertisers, and even employers present each day, enabling bad actors to easily manipulate visuals to promote their own agendas. Public conversations are increasingly driven by numbers, and to make sense of them we must be able to decode and use visual information. By examining contemporary examples ranging from election-result infographics to global GDP maps and box-office record charts, How Charts Lie teaches us how to do just that.
Computer Age Statistical Inference: Algorithms, Evidence, and Data Science
Bradley Efron - 2016
'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.
The Linux Command Line
William E. Shotts Jr. - 2012
Available here:readmeaway.com/download?i=1593279523The Linux Command Line, 2nd Edition: A Complete Introduction PDF by William ShottsRead The Linux Command Line, 2nd Edition: A Complete Introduction PDF from No Starch Press,William ShottsDownload William Shotts’s PDF E-book The Linux Command Line, 2nd Edition: A Complete Introduction
Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software
Dan Murray - 2013
It illustrates little-known features and techniques for getting the most from the Tableau toolset, supporting the needs of the business analysts who use the product as well as the data and IT managers who support it.This comprehensive guide covers the core feature set for data analytics, illustrating best practices for creating and sharing specific types of dynamic data visualizations. Featuring a helpful full-color layout, the book covers analyzing data with Tableau Desktop, sharing information with Tableau Server, understanding Tableau functions and calculations, and Use Cases for Tableau Software.Includes little-known, as well as more advanced features and techniques, using detailed, real-world case studies that the author has developed as part of his consulting and training practice Explains why and how Tableau differs from traditional business information analysis tools Shows you how to deploy dashboards and visualizations throughout the enterprise Provides a detailed reference resource that is aimed at users of all skill levels Depicts ways to leverage Tableau across the value chain in the enterprise through case studies that target common business requirements Endorsed by Tableau Software Tableau Your Data shows you how to build dynamic, best-of-breed visualizations using the Tableau Software toolset.
Grokking Simplicity: Taming complex software with functional thinking
Eric Normand - 2019
Grokking Simplicity is a friendly, practical guide that will change the way you approach software design and development. It introduces a unique approach to functional programming that explains why certain features of software are prone to complexity, and teaches you the functional techniques you can use to simplify these systems so that they’re easier to test and debug.
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
Eric Redmond - 2012
As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Viktor Mayer-Schönberger - 2013
“Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing.www.big-data-book.com
Applied Predictive Modeling
Max Kuhn - 2013
Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms. Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance-all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code f
R for Dummies
Joris Meys - 2012
R is packed with powerful programming capabilities, but learning to use R in the real world can be overwhelming for even the most seasoned statisticians. This easy-to-follow guide explains how to use R for data processing and statistical analysis, and then, shows you how to present your data using compelling and informative graphics. You'll gain practical experience using R in a variety of settings and delve deeper into R's feature-rich toolset.Includes tips for the initial installation of RDemonstrates how to easily perform calculations on vectors, arrays, and lists of dataShows how to effectively visualize data using R's powerful graphics packagesGives pointers on how to find, install, and use add-on packages created by the R communityProvides tips on getting additional help from R mailing lists and websitesWhether you're just starting out with statistical analysis or are a procedural programming pro, "R For Dummies" is the book you need to get the most out of R.
Concrete Mathematics: A Foundation for Computer Science
Ronald L. Graham - 1988
"More concretely," the authors explain, "it is the controlled manipulation of mathematical formulas, using a collection of techniques for solving problems."
Cracking the Coding Interview: 150 Programming Questions and Solutions
Gayle Laakmann McDowell - 2008
This is a deeply technical book and focuses on the software engineering skills to ace your interview. The book is over 500 pages and includes 150 programming interview questions and answers, as well as other advice.The full list of topics are as follows:The Interview ProcessThis section offers an overview on questions are selected and how you will be evaluated. What happens when you get a question wrong? When should you start preparing, and how? What language should you use? All these questions and more are answered.Behind the ScenesLearn what happens behind the scenes during your interview, how decisions really get made, who you interview with, and what they ask you. Companies covered include Google, Amazon, Yahoo, Microsoft, Apple and Facebook.Special SituationsThis section explains the process for experience candidates, Program Managers, Dev Managers, Testers / SDETs, and more. Learn what your interviewers are looking for and how much code you need to know.Before the InterviewIn order to ace the interview, you first need to get an interview. This section describes what a software engineer's resume should look like and what you should be doing well before your interview.Behavioral PreparationAlthough most of a software engineering interview will be technical, behavioral questions matter too. This section covers how to prepare for behavioral questions and how to give strong, structured responses.Technical Questions (+ 5 Algorithm Approaches)This section covers how to prepare for technical questions (without wasting your time) and teaches actionable ways to solve the trickiest algorithm problems. It also teaches you what exactly "good coding" is when it comes to an interview.150 Programming Questions and AnswersThis section forms the bulk of the book. Each section opens with a discussion of the core knowledge and strategies to tackle this type of question, diving into exactly how you break down and solve it. Topics covered include• Arrays and Strings• Linked Lists• Stacks and Queues• Trees and Graphs• Bit Manipulation• Brain Teasers• Mathematics and Probability• Object-Oriented Design• Recursion and Dynamic Programming• Sorting and Searching• Scalability and Memory Limits• Testing• C and C++• Java• Databases• Threads and LocksFor the widest degree of readability, the solutions are almost entirely written with Java (with the exception of C / C++ questions). A link is provided with the book so that you can download, compile, and play with the solutions yourself.Changes from the Fourth Edition: The fifth edition includes over 200 pages of new content, bringing the book from 300 pages to over 500 pages. Major revisions were done to almost every solution, including a number of alternate solutions added. The introductory chapters were massively expanded, as were the opening of each of the chapters under Technical Questions. In addition, 24 new questions were added.Cracking the Coding Interview, Fifth Edition is the most expansive, detailed guide on how to ace your software development / programming interviews.
Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists
Philipp K. Janert - 2010
With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora
Designing Data-Intensive Applications
Martin Kleppmann - 2015
Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures
iOS Programming: The Big Nerd Ranch Guide (Big Nerd Ranch Guides)
Christian Keur - 2015
After completing this book, you will have the know-how and the confidence you need to tackle iOS projects of your own. Based on Big Nerd Ranch's popular iOS Bootcamp course and its well-tested materials and methodology, this bestselling guide teaches iOS concepts and coding in tandem. The result is instruction that is relevant and useful.Throughout the book, the authors explain what's important and share their insights into the larger context of the iOS platform. You get a real understanding of how iOS development works, the many features that are available, and when and where to apply what you've learned.