Book picks similar to
Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work by Q. Ethan McCallum
data-science
big-data
programming
data
Python for Informatics: Exploring Information: Exploring Information
Charles Severance - 2002
You can think of Python as your tool to solve problems that are far beyond the capability of a spreadsheet. It is an easy-to-use and easy-to learn programming language that is freely available on Windows, Macintosh, and Linux computers. There are free downloadable copies of this book in various electronic formats and a self-paced free online course where you can explore the course materials. All the supporting materials for the book are available under open and remixable licenses. This book is designed to teach people to program even if they have no prior experience.
Build Awesome Command-Line Applications in Ruby: Control Your Computer, Simplify Your Life
David B. Copeland - 2012
With its simple commands, flags, and parameters, a well-formed command-line application is the quickest way to automate a backup, a build, or a deployment and simplify your life. As Ruby pro David Copeland explains, writing a command-line application that is self-documenting, robust, adaptable and forever useful is easier than you might think. Ruby is particularly suited to this task, since it combines high-level abstractions with "close to the metal" system interaction wrapped up in a concise, readable syntax. Moreover, Ruby has the support of a rich ecosystem of open-source tools and libraries. Ten insightful chapters each explain and demonstrate a command-line best practice. You'll see how to use these tools to elevate the lowliest automation script to a maintainable, polished application. You'll learn how to use free, open source parsers to create user-friendly command-line interfaces as well as command suites. You'll see how to use defaults to keep options simple for everyday users, while giving advanced users options for more complex tasks. There's no reason a command-line application should lack documentation, whether it's part of a help command or a man page; you'll find out when and how to use both. Your journey from command-line novice to pro ends with a look at valuable approaches to testing your apps, and includes some fun techniques for outside-the-box, colorful interfaces that will delight your users. With Ruby, the command line is not dead. Long live the command line.What You Need: All you'll need is Ruby, and the ability to install a few gems along the way. Examples written for Ruby 1.9.2, but 1.8.7 should work just as well.
Redis in Action
Josiah L. Carlson - 2013
You'll begin by getting Redis set up properly and then exploring the key-value model. Then, you'll dive into real use cases including simple caching, distributed ad targeting, and more. You'll learn how to scale Redis from small jobs to massive datasets. Experienced developers will appreciate chapters on clustering and internal scripting to make Redis easier to use.About the TechnologyWhen you need near-real-time access to a fast-moving data stream, key-value stores like Redis are the way to go. Redis expands on the key-value pattern by accepting a wide variety of data types, including hashes, strings, lists, and other structures. It provides lightning-fast operations on in-memory datasets, and also makes it easy to persist to disk on the fly. Plus, it's free and open source.About this bookRedis in Action introduces Redis and the key-value model. You'll quickly dive into real use cases including simple caching, distributed ad targeting, and more. You'll learn how to scale Redis from small jobs to massive datasets and discover how to integrate with traditional RDBMS or other NoSQL stores. Experienced developers will appreciate the in-depth chapters on clustering and internal scripting.Written for developers familiar with database concepts. No prior exposure to NoSQL database concepts nor to Redis itself is required. Appropriate for systems administrators comfortable with programming.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.What's InsideRedis from the ground upPreprocessing real-time dataManaging in-memory datasetsPub/sub and configurationPersisting to diskAbout the AuthorDr. Josiah L. Carlson is a seasoned database professional and an active contributor to the Redis community.Table of ContentsPART 1 GETTING STARTEDGetting to know RedisAnatomy of a Redis web applicationPART 2 CORE CONCEPTSCommands in RedisKeeping data safe and ensuring performanceUsing Redis for application supportApplication components in RedisSearch-based applicationsBuilding a simple social networkPART 3 NEXT STEPSReducing memory useScaling RedisScripting Redis with Lua
Thinking with Data
Max Shron - 2014
In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Data helps you learn techniques for turning data into knowledge you can use. You’ll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You’ll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved.Learn a framework for scoping data projectsUnderstand how to pin down the details of an idea, receive feedback, and begin prototypingUse the tools of arguments to ask good questions, build projects in stages, and communicate resultsExplore data-specific patterns of reasoning and learn how to build more useful argumentsDelve into causal reasoning and learn how it permeates data workPut everything together, using extended examples to see the method of full problem thinking in action
Artificial Intelligence: A Guide for Thinking Humans
Melanie Mitchell - 2019
The award-winning author Melanie Mitchell, a leading computer scientist, now reveals AI’s turbulent history and the recent spate of apparent successes, grand hopes, and emerging fears surrounding it.In Artificial Intelligence, Mitchell turns to the most urgent questions concerning AI today: How intelligent—really—are the best AI programs? How do they work? What can they actually do, and when do they fail? How humanlike do we expect them to become, and how soon do we need to worry about them surpassing us? Along the way, she introduces the dominant models of modern AI and machine learning, describing cutting-edge AI programs, their human inventors, and the historical lines of thought underpinning recent achievements. She meets with fellow experts such as Douglas Hofstadter, the cognitive scientist and Pulitzer Prize–winning author of the modern classic Gödel, Escher, Bach, who explains why he is “terrified” about the future of AI. She explores the profound disconnect between the hype and the actual achievements in AI, providing a clear sense of what the field has accomplished and how much further it has to go.Interweaving stories about the science of AI and the people behind it, Artificial Intelligence brims with clear-sighted, captivating, and accessible accounts of the most interesting and provocative modern work in the field, flavored with Mitchell’s humor and personal observations. This frank, lively book is an indispensable guide to understanding today’s AI, its quest for “human-level” intelligence, and its impact on the future for us all.
High Performance MySQL: Optimization, Backups, and Replication
Baron Schwartz - 2008
This guide also teaches you safe and practical ways to scale applications through replication, load balancing, high availability, and failover.
Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.
Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and views
Implement improvements in replication, high availability, and clustering
Achieve high performance when running MySQL in the cloud
Optimize advanced querying features, such as full-text searches
Take advantage of modern multi-core CPUs and solid-state disks
Explore backup and recovery strategies—including new tools for hot online backups
Quantum Computing Since Democritus
Scott Aaronson - 2013
Full of insights, arguments and philosophical perspectives, the book covers an amazing array of topics. Beginning in antiquity with Democritus, it progresses through logic and set theory, computability and complexity theory, quantum computing, cryptography, the information content of quantum states and the interpretation of quantum mechanics. There are also extended discussions about time travel, Newcomb's Paradox, the anthropic principle and the views of Roger Penrose. Aaronson's informal style makes this fascinating book accessible to readers with scientific backgrounds, as well as students and researchers working in physics, computer science, mathematics and philosophy.
Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine
Clinton Gormley - 2014
This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.If you're a newcomer to both search and distributed systems, you'll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you'll follow a problem-based approach to learn why, when, and how to use Elasticsearch features.Understand how Elasticsearch interprets data in your documentsIndex and query your data to take advantage of search concepts such as relevance and word proximityHandle human language through the effective use of analyzers and queriesSummarize and group data to show overall trends, with aggregations and analyticsUse geo-points and geo-shapes--Elasticsearch's approaches to geolocationModel your data to take advantage of Elasticsearch's horizontal scalabilityLearn how to configure and monitor your cluster in production
Programming Pearls
Jon L. Bentley - 1986
Jon has done a wonderful job of updating the material. I am very impressed at how fresh the new examples seem." - Steve McConnell, author, Code CompleteWhen programmers list their favorite books, Jon Bentley's collection of programming pearls is commonly included among the classics. Just as natural pearls grow from grains of sand that irritate oysters, programming pearls have grown from real problems that have irritated real programmers. With origins beyond solid engineering, in the realm of insight and creativity, Bentley's pearls offer unique and clever solutions to those nagging problems. Illustrated by programs designed as much for fun as for instruction, the book is filled with lucid and witty descriptions of practical programming techniques and fundamental design principles. It is not at all surprising that
Programming Pearls
has been so highly valued by programmers at every level of experience. In this revision, the first in 14 years, Bentley has substantially updated his essays to reflect current programming methods and environments. In addition, there are three new essays on (1) testing, debugging, and timing; (2) set representations; and (3) string problems. All the original programs have been rewritten, and an equal amount of new code has been generated. Implementations of all the programs, in C or C++, are now available on the Web.What remains the same in this new edition is Bentley's focus on the hard core of programming problems and his delivery of workable solutions to those problems. Whether you are new to Bentley's classic or are revisiting his work for some fresh insight, this book is sure to make your own list of favorites.
Spark: The Definitive Guide: Big Data Processing Made Simple
Bill Chambers - 2018
With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.
You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library.
Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples
Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Structured Streaming, Spark’s stream-processing engine
Learn how you can apply MLlib to a variety of problems, including classification or recommendation
Mindstorms: Children, Computers, And Powerful Ideas
Seymour Papert - 1980
We have Mindstorms to thank for that. In this book, pioneering computer scientist Seymour Papert uses the invention of LOGO, the first child-friendly programming language, to make the case for the value of teaching children with computers. Papert argues that children are more than capable of mastering computers, and that teaching computational processes like de-bugging in the classroom can change the way we learn everything else. He also shows that schools saturated with technology can actually improve socialization and interaction among students and between students and teachers.
Creating a Data-Driven Organization: Practical Advice from the Trenches
Carl Anderson - 2015
This practical book shows you how true data-drivenness involves processes that require genuine buy-in across your company, from analysts and management to the C-Suite and the board.Through interviews and examples from data scientists and analytics leaders in a variety of industries, author Carl Anderson explains the analytics value chain you need to adopt when building predictive business models—from data collection and analysis to the insights and leadership that drive concrete actions. You’ll learn what works and what doesn’t, and why creating a data-driven culture throughout your organization is essential.
Start from the bottom up: learn how to collect the right data the right way
Hire analysts with the right skills, and organize them into teams
Examine statistical and visualization tools, and fact-based story-telling methods
Collect and analyze data while respecting privacy and ethics
Understand how analysts and their managers can help spur a data-driven culture
Learn the importance of data leadership and C-level positions such as chief data officer and chief analytics officer
Specification by Example: How Successful Teams Deliver the Right Software
Gojko Adzic - 2011
In this book, author Gojko Adzic distills interviews with successful teams worldwide, sharing how they specify, develop, and deliver software, without defects, in short iterative delivery cycles.About the Technology Specification by Example is a collaborative method for specifying requirements and tests. Seven patterns, fully explored in this book, are key to making the method effective. The method has four main benefits: it produces living, reliable documentation; it defines expectations clearly and makes validation efficient; it reduces rework; and, above all, it assures delivery teams and business stakeholders that the software that's built is right for its purpose.About the Book This book distills from the experience of leading teams worldwide effective ways to specify, test, and deliver software in short, iterative delivery cycles. Case studies in this book range from small web startups to large financial institutions, working in many processes including XP, Scrum, and Kanban.This book is written for developers, testers, analysts, and business people working together to build great software.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.What's InsideCommon process patterns How to avoid bad practices Fitting SBE in your process 50+ case studies For additional resources go to specificationbyexample.com.
The C Programming Language
Brian W. Kernighan - 1978
It is the definitive reference guide, now in a second edition. Although the first edition was written in 1978, it continues to be a worldwide best-seller. This second edition brings the classic original up to date to include the ANSI standard. From the Preface: We have tried to retain the brevity of the first edition. C is not a big language, and it is not well served by a big book. We have improved the exposition of critical features, such as pointers, that are central to C programming. We have refined the original examples, and have added new examples in several chapters. For instance, the treatment of complicated declarations is augmented by programs that convert declarations into words and vice versa. As before, all examples have been tested directly from the text, which is in machine-readable form. As we said in the first preface to the first edition, C "wears well as one's experience with it grows." With a decade more experience, we still feel that way. We hope that this book will help you to learn C and use it well.
HBase: The Definitive Guide
Lars George - 2011
As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.
Discover how tight integration with Hadoop makes scalability with HBase easier
Distribute large datasets across an inexpensive cluster of commodity servers
Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs
Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more
Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs
Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks