Data Science at the Command Line: Facing the Future with Time-Tested Tools


Jeroen Janssens - 2014
    You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms

Mostly Harmless Econometrics: An Empiricist's Companion


Joshua D. Angrist - 2008
    In the modern experimentalist paradigm, these techniques address clear causal questions such as: Do smaller classes increase learning? Should wife batterers be arrested? How much does education raise wages? Mostly Harmless Econometrics shows how the basic tools of applied econometrics allow the data to speak.In addition to econometric essentials, Mostly Harmless Econometrics covers important new extensions--regression-discontinuity designs and quantile regression--as well as how to get standard errors right. Joshua Angrist and Jorn-Steffen Pischke explain why fancier econometric techniques are typically unnecessary and even dangerous. The applied econometric methods emphasized in this book are easy to use and relevant for many areas of contemporary social science.An irreverent review of econometric essentials A focus on tools that applied researchers use most Chapters on regression-discontinuity designs, quantile regression, and standard errors Many empirical examples A clear and concise resource with wide applications

The Society of Mind


Marvin Minsky - 1985
    Mirroring his theory, Minsky boldly casts The Society of Mind as an intellectual puzzle whose pieces are assembled along the way. Each chapter -- on a self-contained page -- corresponds to a piece in the puzzle. As the pages turn, a unified theory of the mind emerges, like a mosaic. Ingenious, amusing, and easy to read, The Society of Mind is an adventure in imagination.

ZooKeeper: Distributed process coordination


Flavio Junqueira - 2013
    This practical guide shows how Apache ZooKeeper helps you manage distributed systems, so you can focus mainly on application logic. Even with ZooKeeper, implementing coordination tasks is not trivial, but this book provides good practices to give you a head start, and points out caveats that developers and administrators alike need to watch for along the way.In three separate sections, ZooKeeper contributors Flavio Junqueira and Benjamin Reed introduce the principles of distributed systems, provide ZooKeeper programming techniques, and include the information you need to administer this service.Learn how ZooKeeper solves common coordination tasksExplore the ZooKeeper API’s Java and C implementations and how they differUse methods to track and react to ZooKeeper state changesHandle failures of the network, application processes, and ZooKeeper itselfLearn about ZooKeeper’s trickier aspects dealing with concurrency, ordering, and configurationUse the Curator high-level interface for connection managementBecome familiar with ZooKeeper internals and administration tools

Why Information Grows: The Evolution of Order, from Atoms to Economies


Cesar A. Hidalgo - 2015
    He believes that we should investigate what makes some countries more capable than others. Complex products—from films to robots, apps to automobiles—are a physical distillation of an economy’s knowledge, a measurable embodiment of its education, infrastructure, and capability. Economic wealth accrues when applications of this knowledge turn ideas into tangible products; the more complex its products, the more economic growth a country will experience.A radical new interpretation of global economics, Why Information Grows overturns traditional assumptions about the development of economies and the origins of wealth and takes a crucial step toward making economics less the dismal science and more the insightful one.

AI Ethics


Mark Coeckelbergh - 2020
    AI is also behind self-driving cars, predictive policing, and autonomous weapons that can kill without human intervention. These and other AI applications raise complex ethical issues that are the subject of ongoing debate. This volume in the MIT Press Essential Knowledge series offers an accessible synthesis of these issues. Written by a philosopher of technology, AI Ethics goes beyond the usual hype and nightmare scenarios to address concrete questions.Mark Coeckelbergh describes influential AI narratives, ranging from Frankenstein's monster to transhumanism and the technological singularity. He surveys relevant philosophical discussions: questions about the fundamental differences between humans and machines and debates over the moral status of AI. He explains the technology of AI, describing different approaches and focusing on machine learning and data science. He offers an overview of important ethical issues, including privacy concerns, responsibility and the delegation of decision making, transparency, and bias as it arises at all stages of data science processes. He also considers the future of work in an AI economy. Finally, he analyzes a range of policy proposals and discusses challenges for policymakers. He argues for ethical practices that embed values in design, translate democratic values into practices and include a vision of the good life and the good society.

Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements


John R. Taylor - 1982
    It is designed as a reference for students in the physical sciences and engineering.

Advanced Analytics with Spark


Sandy Ryza - 2015
    

Thinking Statistically


Uri Bram - 2011
    Along the way we’ll learn how selection bias can explain why your boss doesn’t know he sucks (even when everyone else does); how to use Bayes’ Theorem to decide if your partner is cheating on you; and why Mark Zuckerberg should never be used as an example for anything. See the world in a whole new light, and make better decisions and judgements without ever going near a t-test. Think. Think Statistically.

Bit by Bit: Social Research in the Digital Age


Matthew J. Salganik - 2017
    In addition to changing how we live, these tools enable us to collect and process data about human behavior on a scale never before imaginable, offering entirely new approaches to core questions about social behavior. Bit by Bit is the key to unlocking these powerful methods--a landmark book that will fundamentally change how the next generation of social scientists and data scientists explores the world around us.Bit by Bit is the essential guide to mastering the key principles of doing social research in this fast-evolving digital age. In this comprehensive yet accessible book, Matthew Salganik explains how the digital revolution is transforming how social scientists observe behavior, ask questions, run experiments, and engage in mass collaborations. He provides a wealth of real-world examples throughout and also lays out a principles-based approach to handling ethical challenges.Bit by Bit is an invaluable resource for social scientists who want to harness the research potential of big data and a must-read for data scientists interested in applying the lessons of social science to tomorrow's technologies.Illustrates important ideas with examples of outstanding researchCombines ideas from social science and data science in an accessible style and without jargonGoes beyond the analysis of "found" data to discuss the collection of "designed" data such as surveys, experiments, and mass collaborationFeatures an entire chapter on ethicsIncludes extensive suggestions for further reading and activities for the classroom or self-study

The Demon in the Machine: How Hidden Webs of Information Are Solving the Mystery of Life


Paul C.W. Davies - 2019
    if you want to understand how the concept of life is changing, read this' Professor Andrew Briggs, University of OxfordWhen Darwin set out to explain the origin of species, he made no attempt to answer the deeper question: what is life? For generations, scientists have struggled to make sense of this fundamental question. Life really does look like magic: even a humble bacterium accomplishes things so dazzling that no human engineer can match it. And yet, huge advances in molecular biology over the past few decades have served only to deepen the mystery. So can life be explained by known physics and chemistry, or do we need something fundamentally new?In this penetrating and wide-ranging new analysis, world-renowned physicist and science communicator Paul Davies searches for answers in a field so new and fast-moving that it lacks a name, a domain where computing, chemistry, quantum physics and nanotechnology intersect. At the heart of these diverse fields, Davies explains, is the concept of information: a quantity with the power to unify biology with physics, transform technology and medicine, and even to illuminate the age-old question of whether we are alone in the universe.From life's murky origins to the microscopic engines that run the cells of our bodies, The Demon in the Machine is a breath-taking journey across the landscape of physics, biology, logic and computing. Weaving together cancer and consciousness, two-headed worms and bird navigation, Davies reveals how biological organisms garner and process information to conjure order out of chaos, opening a window on the secret of life itself.

Adventures of a Computational Explorer


Stephen Wolfram - 2019
    In this lively book of essays, Stephen Wolfram takes the reader along on some of his most surprising and engaging intellectual adventures in science, technology, artificial intelligence and language design.

Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design


Michael J. Hernandez - 1996
    You d be up to your neck in normal forms before you even had a chance to wade. When Michael J. Hernandez needed a database design book to teach mere mortals like himself, there were none. So he began a personal quest to learn enough to write one. And he did.Now in its Second Edition, Database Design for Mere Mortals is a miracle for today s generation of database users who don t have the background -- or the time -- to learn database design the hard way. It s also a secret pleasure for working pros who are occasionally still trying to figure out what they were taught.Drawing on 13 years of database teaching experience, Hernandez has organized database design into several key principles that are surprisingly easy to understand and remember. He illuminates those principles using examples that are generic enough to help you with virtually any application.Hernandez s goals are simple. You ll learn how to create a sound database structure as easily as possible. You ll learn how to optimize your structure for efficiency and data integrity. You ll learn how to avoid problems like missing, incorrect, mismatched, or inaccurate data. You ll learn how to relate tables together to make it possible to get whatever answers you need in the future -- even if you haven t thought of the questions yet.If -- as is often the case -- you already have a database, Hernandez explains how to analyze it -- and leverage it. You ll learn how to identify new information requirements, determine new business rules that need to be applied, and apply them.Hernandez starts with an introduction to databases, relational databases, and the idea and objectives of database design. Next, you ll walk through the key elements of the database design process: establishing table structures and relationships, assigning primary keys, setting field specifications, and setting up views. Hernandez s extensive coverage of data integrity includes a full chapter on establishing business rules and using validation tables.Hernandez surveys bad design techniques in a chapter on what not to do -- and finally, helps you identify those rare instances when it makes sense to bend or even break the conventional rules of database design.There s plenty that s new in this edition. Hernandez has gone over his text and illustrations with a fine-tooth comb to improve their already impressive clarity. You ll find updates to reflect new advances in technology, including web database applications. There are expanded and improved discussions of nulls and many-to-many relationships; multivalued fields; primary keys; and SQL data type fields. There s a new Quick Reference database design flowchart. A new glossary. New review questions at the end of every chapter.Finally, it s worth mentioning what this book isn t. It isn t a guide to any specific database platform -- so you can use it whether you re running Access, SQL Server, or Oracle, MySQL or PostgreSQL. And it isn t an SQL guide. (If that s what you need, Michael J. Hernandez has also coauthored the superb SQL Queries for Mere Mortals). But if database design is what you need to learn, this book s worth its weight in gold. Bill CamardaBill Camarda is a consultant, writer, and web/multimedia content developer. His 15 books include Special Edition Using Word 2000 and Upgrading & Fixing Networks for Dummies, Second Edition.

Successful Business Intelligence: Secrets to Making BI a Killer App


Cindi Howson - 2007
    Learn about the components of a BI architecture, how to choose the appropriate tools and technologies, and how to roll out a BI strategy throughout the organisation.