Book picks similar to
Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks by Jonathan Schwabish
data
non-fiction
data-science
dataviz
Data Feminism
Catherine D’Ignazio - 2020
It has been used to expose injustice, improve health outcomes, and topple governments. But it has also been used to discriminate, police, and surveil. This potential for good, on the one hand, and harm, on the other, makes it essential to ask: Data science by whom? Data science for whom? Data science with whose interests in mind? The narratives around big data and data science are overwhelmingly white, male, and techno-heroic. In Data Feminism, Catherine D'Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought.Illustrating data feminism in action, D'Ignazio and Klein show how challenges to the male/female binary can help challenge other hierarchical (and empirically wrong) classification systems. They explain how, for example, an understanding of emotion can expand our ideas about effective data visualization, and how the concept of invisible labor can expose the significant human efforts required by our automated systems. And they show why the data never, ever “speak for themselves.”Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn't, and about how those differentials of power can be challenged and changed.
The Cartoon Guide to Statistics
Larry Gonick - 1993
Never again will you order the Poisson Distribution in a French restaurant!This updated version features all new material.
Automate the Boring Stuff with Python: Practical Programming for Total Beginners
Al Sweigart - 2014
But what if you could have your computer do them for you?In "Automate the Boring Stuff with Python," you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand no prior programming experience required. Once you've mastered the basics of programming, you'll create Python programs that effortlessly perform useful and impressive feats of automation to: Search for text in a file or across multiple filesCreate, update, move, and rename files and foldersSearch the Web and download online contentUpdate and format data in Excel spreadsheets of any sizeSplit, merge, watermark, and encrypt PDFsSend reminder emails and text notificationsFill out online formsStep-by-step instructions walk you through each program, and practice projects at the end of each chapter challenge you to improve those programs and use your newfound skills to automate similar tasks.Don't spend your time doing work a well-trained monkey could do. Even if you've never written a line of code, you can make your computer do the grunt work. Learn how in "Automate the Boring Stuff with Python.""
Designing Data-Intensive Applications
Martin Kleppmann - 2015
Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures
Probability Theory: The Logic of Science
E.T. Jaynes - 1999
It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.
Think Stats
Allen B. Downey - 2011
This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.Develop your understanding of probability and statistics by writing and testing codeRun experiments to test statistical behavior, such as generating samples from several distributionsUse simulations to understand concepts that are hard to grasp mathematicallyLearn topics not usually covered in an introductory course, such as Bayesian estimationImport data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics toolsUse statistical inference to answer questions about real-world data
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Trevor Hastie - 2001
With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
Deep Learning
Ian Goodfellow - 2016
Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.
Infographics
Jason Lankow - 2012
Visual content--such as infographics and data visualization--can accomplish this. With DIY functionality, Infographics: The Power of Visual Storytelling will teach you how to find stories in your data, and how to visually communicate and share them with your audience for maximum impact.Infographics will show you the vast potential to using the communication medium as a marketing tool by creating informative and shareable infographic content.Learn how to explain an object, idea, or process using strong illustration that captures interest and provides instant clarity Discover how to unlock interesting stories (in previously buried or boring data) and turn them into visual communications that will help build brands and increase sales Use the power of visual content to communicate with and engage your audience, capture attention, and expand your market.
Python for Data Analysis
Wes McKinney - 2011
It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It's ideal for analysts new to Python and for Python programmers new to scientific computing.Use the IPython interactive shell as your primary development environmentLearn basic and advanced NumPy (Numerical Python) featuresGet started with data analysis tools in the pandas libraryUse high-performance tools to load, clean, transform, merge, and reshape dataCreate scatter plots and static or interactive visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsMeasure data by points in time, whether it's specific instances, fixed periods, or intervalsLearn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work
Q. Ethan McCallum - 2012
In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems.From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.Among the many topics covered, you’ll discover how to:Test drive your data to see if it’s ready for analysisWork spreadsheet data into a usable formHandle encoding problems that lurk in text dataDevelop a successful web-scraping effortUse NLP tools to reveal the real sentiment of online reviewsAddress cloud computing issues that can impact your analysis effortAvoid policies that create data analysis roadblocksTake a systematic approach to data quality analysis
Mining of Massive Datasets
Anand Rajaraman - 2011
This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Ralph Kimball - 1996
Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
Cathy O'Neil - 2016
Increasingly, the decisions that affect our lives--where we go to school, whether we can get a job or a loan, how much we pay for health insurance--are being made not by humans, but by machines. In theory, this should lead to greater fairness: Everyone is judged according to the same rules.But as mathematician and data scientist Cathy O'Neil reveals, the mathematical models being used today are unregulated and uncontestable, even when they're wrong. Most troubling, they reinforce discrimination--propping up the lucky, punishing the downtrodden, and undermining our democracy in the process.
An Introduction to Database Systems
C.J. Date - 2003
This new edition has been rewritten and expanded to stay current with database system trends.