The Data Journalism Handbook


Jonathan Gray - 2012
    With The Data Journalism Handbook, you’ll explore the potential, limits, and applied uses of this new and fascinating field.This valuable handbook has attracted scores of contributors since the European Journalism Centre and the Open Knowledge Foundation launched the project at MozFest 2011. Through a collection of tips and techniques from leading journalists, professors, software developers, and data analysts, you’ll learn how data can be either the source of data journalism or a tool with which the story is told—or both.Examine the use of data journalism at the BBC, the Chicago Tribune, the Guardian, and other news organizationsExplore in-depth case studies on elections, riots, school performance, and corruptionLearn how to find data from the Web, through freedom of information laws, and by "crowd sourcing"Extract information from raw data with tips for working with numbers and statistics and using data visualizationDeliver data through infographics, news apps, open data platforms, and download links

Theory of Games and Economic Behavior


John von Neumann - 1944
    What began more than sixty years ago as a modest proposal that a mathematician and an economist write a short paper together blossomed, in 1944, when Princeton University Press published Theory of Games and Economic Behavior. In it, John von Neumann and Oskar Morgenstern conceived a groundbreaking mathematical theory of economic and social organization, based on a theory of games of strategy. Not only would this revolutionize economics, but the entirely new field of scientific inquiry it yielded--game theory--has since been widely used to analyze a host of real-world phenomena from arms races to optimal policy choices of presidential candidates, from vaccination policy to major league baseball salary negotiations. And it is today established throughout both the social sciences and a wide range of other sciences.This sixtieth anniversary edition includes not only the original text but also an introduction by Harold Kuhn, an afterword by Ariel Rubinstein, and reviews and articles on the book that appeared at the time of its original publication in the New York Times, tthe American Economic Review, and a variety of other publications. Together, these writings provide readers a matchless opportunity to more fully appreciate a work whose influence will yet resound for generations to come.

Data Analytics Made Accessible


Anil Maheshwari - 2014
    It is a conversational book that feels easy and informative. This short and lucid book covers everything important, with concrete examples, and invites the reader to join this field. The chapters in the book are organized for a typical one-semester course. The book contains case-lets from real-world stories at the beginning of every chapter. There is a running case study across the chapters as exercises. This book is designed to provide a student with the intuition behind this evolving area, along with a solid toolset of the major data mining techniques and platforms. Students across a variety of academic disciplines, including business, computer science, statistics, engineering, and others are attracted to the idea of discovering new insights and ideas from data. This book can also be gainfully used by executives, managers, analysts, professors, doctors, accountants, and other professionals to learn how to make sense of the data coming their way. This is a lucid flowing book that one can finish in one sitting, or can return to it again and again for insights and techniques. Table of Contents Chapter 1: Wholeness of Business Intelligence and Data Mining Chapter 2: Business Intelligence Concepts & Applications Chapter 3: Data Warehousing Chapter 4: Data Mining Chapter 5: Decision Trees Chapter 6: Regression Models Chapter 7: Artificial Neural Networks Chapter 8: Cluster Analysis Chapter 9: Association Rule Mining Chapter 10: Text Mining Chapter 11: Web Mining Chapter 12: Big Data Chapter 13: Data Modeling Primer Appendix: Data Mining Tutorial using Weka

Grokking Algorithms An Illustrated Guide For Programmers and Other Curious People


Aditya Y. Bhargava - 2015
    The algorithms you'll use most often as a programmer have already been discovered, tested, and proven. If you want to take a hard pass on Knuth's brilliant but impenetrable theories and the dense multi-page proofs you'll find in most textbooks, this is the book for you. This fully-illustrated and engaging guide makes it easy for you to learn how to use algorithms effectively in your own programs.Grokking Algorithms is a disarming take on a core computer science topic. In it, you'll learn how to apply common algorithms to the practical problems you face in day-to-day life as a programmer. You'll start with problems like sorting and searching. As you build up your skills in thinking algorithmically, you'll tackle more complex concerns such as data compression or artificial intelligence. Whether you're writing business software, video games, mobile apps, or system utilities, you'll learn algorithmic techniques for solving problems that you thought were out of your grasp. For example, you'll be able to:Write a spell checker using graph algorithmsUnderstand how data compression works using Huffman codingIdentify problems that take too long to solve with naive algorithms, and attack them with algorithms that give you an approximate answer insteadEach carefully-presented example includes helpful diagrams and fully-annotated code samples in Python. By the end of this book, you will know some of the most widely applicable algorithms as well as how and when to use them.

Designing Data-Intensive Applications


Martin Kleppmann - 2015
    Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

The Human Face of Big Data


Rick Smolan - 2012
    Its enable us to sense, measure, and understand aspects of our existence in ways never before possible. The Human Face of Big Data captures, in glorious photographs and moving essays, an extraordinary revolution sweeping, almost invisibly, through business, academia, government, healthcare, and everyday life. It's already enabling us to provide a healthier life for our children. To provide our seniors with independence while keeping them safe. To help us conserve precious resources like water and energy. To alert us to tiny changes in our health, weeks or years before we develop a life-threatening illness. To peer into our own individual genetic makeup. To create new forms of life.  And soon, as many predict, to re-engineer our own species. And we've barely scratched the surface . . . Over the past decade, Rick Smolan and Jennifer Erwitt, co-founders of Against All Odds Productions, have produced a series of ambitious global projects in collaboration with hundreds of the world's leading photographers, writers, and graphic designers. Their Day in the Life projects were credited for creating a mass market for large-format illustrated books (rare was the coffee table book without one).  Today their projects aim at sparking global conversations about emerging topics ranging from the Internet (24 Hours in Cyberspace), to Microprocessors (One Digital Day), to how the human race is learning to heal itself, (The Power to Heal) to the global water crisis (Blue Planet Run). This year Smolan and Erwitt dispatched photographers and writers in every corner of the globe to explore the world of “Big Data” and to determine if it truly does, as many in the field claim, represent a brand new toolset for humanity, helping address the biggest challenges facing our species. The book features 10 essays by noted writers:Introduction: OCEANS OF DATA by Dan GardnerChapter 1: REFLECTIONS IN A DIGITAL MIRROR by Juan Enriquez, CEO, BiotechnomomyChapter 2: OUR DATA OURSELVES by Kate Green, the EconomistChapter 3: QUANTIFYING MYSELF by AJ Jacobs, EsquireChapter 4: DARK DATA by Marc Goodman, Future Crime InstituteChapter 5:  THE SENTIENT SENSOR MESH by Susan Karlin, Fast CompanyChapter 6: TAKING THE PULSE OF THE PLANET by Esther Dyson, EDventureChapter 7: CITIZEN SCIENCE by Gareth Cook, the Boston GlobeChapter 8: A DEMOGRAPH OF ONE by Michael Malone, Forbes magazineChapter 9: THE ART OF DATA by Aaron Koblin, Google Artist in ResidenceChapter 10: DATA DRIVEN by Jonathan Harris, Cowbird The book will also feature stunning info graphics from NIGEL HOLMES.1) GOOGLING GOOGLE: all the ways Google uses Data to help humanity2) DATA IS THE NEW OIL3) THE WORLD ACCORDING TO TWITTER4) AUCTIONING EYEBALLS: The world of Internet advertising5) FACEBOOK: A Billion Friends

Data Science at the Command Line: Facing the Future with Time-Tested Tools


Jeroen Janssens - 2014
    You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms

The Fractal Geometry of Nature


Benoît B. Mandelbrot - 1977
    The complexity of nature's shapes differs in kind, not merely degree, from that of the shapes of ordinary geometry, the geometry of fractal shapes.Now that the field has expanded greatly with many active researchers, Mandelbrot presents the definitive overview of the origins of his ideas and their new applications. The Fractal Geometry of Nature is based on his highly acclaimed earlier work, but has much broader and deeper coverage and more extensive illustrations.

Moneyball: The Art of Winning an Unfair Game


Michael Lewis - 2003
    Conventional wisdom long held that big name, highly athletic hitters and young pitchers with rocket arms were the ticket to success. But Beane and his staff, buoyed by massive amounts of carefully interpreted statistical data, believed that wins could be had by more affordable methods such as hitters with high on-base percentage and pitchers who get lots of ground outs. Given this information and a tight budget, Beane defied tradition and his own scouting department to build winning teams of young affordable players and inexpensive castoff veterans. Lewis was in the room with the A's top management as they spent the summer of 2002 adding and subtracting players and he provides outstanding play-by-play. In the June player draft, Beane acquired nearly every prospect he coveted (few of whom were coveted by other teams) and at the July trading deadline he engaged in a tense battle of nerves to acquire a lefty reliever. Besides being one of the most insider accounts ever written about baseball, Moneyball is populated with fascinating characters. We meet Jeremy Brown, an overweight college catcher who most teams project to be a 15th round draft pick (Beane takes him in the first). Sidearm pitcher Chad Bradford is plucked from the White Sox triple-A club to be a key set-up man and catcher Scott Hatteberg is rebuilt as a first baseman. But the most interesting character is Beane himself. A speedy athletic can't-miss prospect who somehow missed, Beane reinvents himself as a front-office guru, relying on players completely unlike, say, Billy Beane. Lewis, one of the top nonfiction writers of his era (Liar's Poker, The New New Thing), offers highly accessible explanations of baseball stats and his roadmap of Beane's economic approach makes Moneyball an appealing reading experience for business people and sports fans alike. --John Moe

Quantifying the User Experience: Practical Statistics for User Research


Jeff Sauro - 2012
    Many designers and researchers view usability and design as qualitative activities, which do not require attention to formulas and numbers. However, usability practitioners and user researchers are increasingly expected to quantify the benefits of their efforts. The impact of good and bad designs can be quantified in terms of conversions, completion rates, completion times, perceived satisfaction, recommendations, and sales.The book discusses ways to quantify user research; summarize data and compute margins of error; determine appropriate samples sizes; standardize usability questionnaires; and settle controversies in measurement and statistics. Each chapter concludes with a list of key points and references. Most chapters also include a set of problems and answers that enable readers to test their understanding of the material. This book is a valuable resource for those engaged in measuring the behavior and attitudes of people during their interaction with interfaces.

The Manga Guide to Statistics


Shin Takahashi - 2008
    With its unique combination of Japanese-style comics called manga and serious educational content, the EduManga format is already a hit in Japan.In The Manga Guide to Statistics, our heroine Rui is determined to learn about statistics to impress the dreamy Mr. Igarashi and begs her father for a tutor. Soon she's spending her Saturdays with geeky, bespectacled Mr. Yamamoto, who patiently teaches her all about the fundamentals of statistics: topics like data categorization, averages, graphing, and standard deviation.After all her studying, Rui is confident in her knowledge of statistics, including complex concepts like probability, coefficients of correlation, hypothesis tests, and tests of independence. But is it enough to impress her dream guy? Or maybe there's someone better, right in front of her?Reluctant statistics students of all ages will enjoy learning along with Rui in this charming, easy-to-read guide, which uses real-world examples like teen magazine quizzes, bowling games, test scores, and ramen noodle prices. Examples, exercises, and answer keys help you follow along and check your work. An appendix showing how to perform statistics calculations in Microsoft Excel makes it easy to put Rui's lessons into practice.This EduManga book is a translation from a bestselling series in Japan, co-published with Ohmsha, Ltd. of Tokyo, Japan.

Gödel, Escher, Bach: An Eternal Golden Braid


Douglas R. Hofstadter - 1979
    However, according to Hofstadter, the formal system that underlies all mental activity transcends the system that supports it. If life can grow out of the formal chemical substrate of the cell, if consciousness can emerge out of a formal system of firing neurons, then so too will computers attain human intelligence. Gödel, Escher, Bach is a wonderful exploration of fascinating ideas at the heart of cognitive science: meaning, reduction, recursion, and much more.

Understanding Cryptography: A Textbook For Students And Practitioners


Christof Paar - 2009
    Today's designers need a comprehensive understanding of applied cryptography.After an introduction to cryptography and data security, the authors explain the main techniques in modern cryptography, with chapters addressing stream ciphers, the Data Encryption Standard (DES) and 3DES, the Advanced Encryption Standard (AES), block ciphers, the RSA cryptosystem, public-key cryptosystems based on the discrete logarithm problem, elliptic-curve cryptography (ECC), digital signatures, hash functions, Message Authentication Codes (MACs), and methods for key establishment, including certificates and public-key infrastructure (PKI). Throughout the book, the authors focus on communicating the essentials and keeping the mathematics to a minimum, and they move quickly from explaining the foundations to describing practical implementations, including recent topics such as lightweight ciphers for RFIDs and mobile devices, and current key-length recommendations.The authors have considerable experience teaching applied cryptography to engineering and computer science students and to professionals, and they make extensive use of examples, problems, and chapter reviews, while the book's website offers slides, projects and links to further resources. This is a suitable textbook for graduate and advanced undergraduate courses and also for self-study by engineers.

Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work


Q. Ethan McCallum - 2012
    In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems.From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.Among the many topics covered, you’ll discover how to:Test drive your data to see if it’s ready for analysisWork spreadsheet data into a usable formHandle encoding problems that lurk in text dataDevelop a successful web-scraping effortUse NLP tools to reveal the real sentiment of online reviewsAddress cloud computing issues that can impact your analysis effortAvoid policies that create data analysis roadblocksTake a systematic approach to data quality analysis

Compilers: Principles, Techniques, and Tools


Alfred V. Aho - 1986
    The authors present updated coverage of compilers based on research and techniques that have been developed in the field over the past few years. The book provides a thorough introduction to compiler design and covers topics such as context-free grammars, fine state machines, and syntax-directed translation.