Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

How to Lie with Statistics


Darrell Huff - 1954
    Darrell Huff runs the gamut of every popularly used type of statistic, probes such things as the sample study, the tabulation method, the interview technique, or the way the results are derived from the figures, and points up the countless number of dodges which are used to fool rather than to inform.

R Programming for Data Science


Roger D. Peng - 2015
    

The C++ Programming Language


Bjarne Stroustrup - 1986
    For this special hardcover edition, two new appendixes on locales and standard library exception safety (also available at www.research.att.com/ bs/) have been added. The result is complete, authoritative coverage of the C++ language, its standard library, and key design techniques. Based on the ANSI/ISO C++ standard, The C++ Programming Language provides current and comprehensive coverage of all C++ language features and standard library components. For example:abstract classes as interfaces class hierarchies for object-oriented programming templates as the basis for type-safe generic software exceptions for regular error handling namespaces for modularity in large-scale software run-time type identification for loosely coupled systems the C subset of C++ for C compatibility and system-level work standard containers and algorithms standard strings, I/O streams, and numerics C compatibility, internationalization, and exception safety Bjarne Stroustrup makes C++ even more accessible to those new to the language, while adding advanced information and techniques that even expert C++ programmers will find invaluable.

Calling Bullshit: The Art of Skepticism in a Data-Driven World


Carl T. Bergstrom - 2020
    Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.It's increasingly difficult to know what's true. Misinformation, disinformation, and fake news abound. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don't feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.You don't need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.

Decision Trees and Random Forests: A Visual Introduction For Beginners: A Simple Guide to Machine Learning with Decision Trees


Chris Smith - 2017
     They are also used in countless industries such as medicine, manufacturing and finance to help companies make better decisions and reduce risk. Whether coded or scratched out by hand, both algorithms are powerful tools that can make a significant impact. This book is a visual introduction for beginners that unpacks the fundamentals of decision trees and random forests. If you want to dig into the basics with a visual twist plus create your own machine learning algorithms in Python, this book is for you.

Prediction Machines: The Simple Economics of Artificial Intelligence


Ajay Agrawal - 2018
    But facing the sea change that AI will bring can be paralyzing. How should companies set strategies, governments design policies, and people plan their lives for a world so different from what we know? In the face of such uncertainty, many analysts either cower in fear or predict an impossibly sunny future.But in Prediction Machines, three eminent economists recast the rise of AI as a drop in the cost of prediction. With this single, masterful stroke, they lift the curtain on the AI-is-magic hype and show how basic tools from economics provide clarity about the AI revolution and a basis for action by CEOs, managers, policy makers, investors, and entrepreneurs.When AI is framed as cheap prediction, its extraordinary potential becomes clear: Prediction is at the heart of making decisions under uncertainty. Our businesses and personal lives are riddled with such decisions. Prediction tools increase productivity--operating machines, handling documents, communicating with customers. Uncertainty constrains strategy. Better prediction creates opportunities for new business structures and strategies to compete. Penetrating, fun, and always insightful and practical, Prediction Machines follows its inescapable logic to explain how to navigate the changes on the horizon. The impact of AI will be profound, but the economic framework for understanding it is surprisingly simple.

Mastering Regular Expressions


Jeffrey E.F. Friedl - 1997
    They are now standard features in a wide range of languages and popular tools, including Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL.If you don't use regular expressions yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage. If you think you know all you need to know about regularexpressions, this book is a stunning eye-opener.As this book shows, a command of regular expressions is an invaluable skill. Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. Regular expressions can save you time and aggravation. They can be used to craft elegant solutions to a wide range of problems. Once you've mastered regular expressions, they'll become an invaluable part of your toolkit. You will wonder how you ever got by without them.Yet despite their wide availability, flexibility, and unparalleled power, regular expressions are frequently underutilized. Yet what is power in the hands of an expert can be fraught with peril for the unwary. Mastering Regular Expressions will help you navigate the minefield to becoming an expert and help you optimize your use of regular expressions.Mastering Regular Expressions, Third Edition, now includes a full chapter devoted to PHP and its powerful and expressive suite of regular expression functions, in addition to enhanced PHP coverage in the central "core" chapters. Furthermore, this edition has been updated throughout to reflect advances in other languages, including expanded in-depth coverage of Sun's java.util.regex package, which has emerged as the standard Java regex implementation.Topics include:A comparison of features among different versions of many languages and toolsHow the regular expression engine worksOptimization (major savings available here!)Matching just what you want, but not what you don't wantSections and chapters on individual languagesWritten in the lucid, entertaining tone that makes a complex, dry topic become crystal-clear to programmers, and sprinkled with solutions to complex real-world problems, Mastering Regular Expressions, Third Edition offers a wealth information that you can put to immediateuse.Reviews of this new edition and the second edition: "There isn't a better (or more useful) book available on regular expressions."--Zak Greant, Managing Director, eZ Systems"A real tour-de-force of a book which not only covers the mechanics of regexes in extraordinary detail but also talks about efficiency and the use of regexes in Perl, Java, and .NET...If you use regular expressions as part of your professional work (even if you already have a good book on whatever language you're programming in) I would strongly recommend this book to you."--Dr. Chris Brown, Linux Format"The author does an outstanding job leading the reader from regexnovice to master. The book is extremely easy to read and chock full ofuseful and relevant examples...Regular expressions are valuable toolsthat every developer should have in their toolbox. Mastering RegularExpressions is the definitive guide to the subject, and an outstandingresource that belongs on every programmer's bookshelf. Ten out of TenHorseshoes."--Jason Menard, Java Ranch

Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures


Claus O. Wilke - 2019
    But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke teaches you the elements most critical to successful data visualization.Explore the basic concepts of color as a tool to highlight, distinguish, or represent a valueUnderstand the importance of redundant coding to ensure you provide key information in multiple waysUse the book's visualizations directory, a graphical guide to commonly used types of data visualizationsGet extensive examples of good and bad figuresLearn how to use figures in a document or report and how employ them effectively to tell a compelling story

The Art of Readable Code


Dustin Boswell - 2010
    Over the past five years, authors Dustin Boswell and Trevor Foucher have analyzed hundreds of examples of "bad code" (much of it their own) to determine why they’re bad and how they could be improved. Their conclusion? You need to write code that minimizes the time it would take someone else to understand it—even if that someone else is you.This book focuses on basic principles and practical techniques you can apply every time you write code. Using easy-to-digest code examples from different languages, each chapter dives into a different aspect of coding, and demonstrates how you can make your code easy to understand.Simplify naming, commenting, and formatting with tips that apply to every line of codeRefine your program’s loops, logic, and variables to reduce complexity and confusionAttack problems at the function level, such as reorganizing blocks of code to do one task at a timeWrite effective test code that is thorough and concise—as well as readable"Being aware of how the code you create affects those who look at it later is an important part of developing software. The authors did a great job in taking you through the different aspects of this challenge, explaining the details with instructive examples." —Michael Hunger, passionate Software Developer

Data Visualization: A Practical Introduction


Kieran Healy - 2018
    It explains what makes some graphs succeed while others fail, how to make high-quality figures from data using powerful and reproducible methods, and how to think about data visualization in an honest and effective way.Data Visualization builds the reader's expertise in ggplot2, a versatile visualization library for the R programming language. Through a series of worked examples, this accessible primer then demonstrates how to create plots piece by piece, beginning with summaries of single variables and moving on to more complex graphics. Topics include plotting continuous and categorical variables; layering information on graphics; producing effective "small multiple" plots; grouping, summarizing, and transforming data for plotting; creating maps; working with the output of statistical models; and refining plots to make them more comprehensible.Effective graphics are essential to communicating ideas and a great way to better understand data. This book provides the practical skills students and practitioners need to visualize quantitative data and get the most out of their research findings.Provides hands-on instruction using R and ggplot2Shows how the "tidyverse" of data analysis tools makes working with R easier and more consistentIncludes a library of data sets, code, and functions

Make Your Own Neural Network


Tariq Rashid - 2016
     Neural networks are a key element of deep learning and artificial intelligence, which today is capable of some truly impressive feats. Yet too few really understand how neural networks actually work. This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work. You won't need any mathematics beyond secondary school, and an accessible introduction to calculus is also included. The ambition of this guide is to make neural networks as accessible as possible to as many readers as possible - there are enough texts for advanced readers already! You'll learn to code in Python and make your own neural network, teaching it to recognise human handwritten numbers, and performing as well as professionally developed networks. Part 1 is about ideas. We introduce the mathematical ideas underlying the neural networks, gently with lots of illustrations and examples. Part 2 is practical. We introduce the popular and easy to learn Python programming language, and gradually builds up a neural network which can learn to recognise human handwritten numbers, easily getting it to perform as well as networks made by professionals. Part 3 extends these ideas further. We push the performance of our neural network to an industry leading 98% using only simple ideas and code, test the network on your own handwriting, take a privileged peek inside the mysterious mind of a neural network, and even get it all working on a Raspberry Pi. All the code in this has been tested to work on a Raspberry Pi Zero.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are


Seth Stephens-Davidowitz - 2017
    This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world.

Thinking in Java


Bruce Eckel - 1998
    The author's take on the essence of Java as a new programming language and the thorough introduction to Java's features make this a worthwhile tutorial. Thinking in Java begins a little esoterically, with the author's reflections on why Java is new and better. (This book's choice of font for chapter headings is remarkably hard on the eyes.) The author outlines his thoughts on why Java will make you a better programmer, without all the complexity. The book is better when he presents actual language features. There's a tutorial to basic Java types, keywords, and operators. The guide includes extensive source code that is sometimes daunting (as with the author's sample code for all the Java operators in one listing.) As such, this text will be most useful for the experienced developer. The text then moves on to class design issues, when to use inheritance and composition, and related topics of information hiding and polymorphism. (The treatment of inner classes and scoping will likely seem a bit overdone for most readers.) The chapter on Java collection classes for both Java Developer's Kit (JDK) 1.1 and the new classes, such as sets, lists, and maps, are much better. There's material in this chapter that you are unlikely to find anywhere else. Chapters on exception handling and programming with type information are also worthwhile, as are the chapters on the new Swing interface classes and network programming. Although it adopts somewhat of a mixed-bag approach, Thinking in Java contains some excellent material for the object-oriented developer who wants to see what all the fuss is about with Java.

A New Kind of Science


Stephen Wolfram - 1997
    Wolfram lets the world see his work in A New Kind of Science, a gorgeous, 1,280-page tome more than a decade in the making. With patience, insight, and self-confidence to spare, Wolfram outlines a fundamental new way of modeling complex systems. On the frontier of complexity science since he was a boy, Wolfram is a champion of cellular automata--256 "programs" governed by simple nonmathematical rules. He points out that even the most complex equations fail to accurately model biological systems, but the simplest cellular automata can produce results straight out of nature--tree branches, stream eddies, and leopard spots, for instance. The graphics in A New Kind of Science show striking resemblance to the patterns we see in nature every day. Wolfram wrote the book in a distinct style meant to make it easy to read, even for nontechies; a basic familiarity with logic is helpful but not essential. Readers will find themselves swept away by the elegant simplicity of Wolfram's ideas and the accidental artistry of the cellular automaton models. Whether or not Wolfram's revolution ultimately gives us the keys to the universe, his new science is absolutely awe-inspiring. --Therese Littleton