Find a book to read

Book picks similar to
HBase: The Definitive Guide by Lars George

programming

tech

computers

non-fiction

Hadoop: The Definitive Guide

Tom White - 2009

Hadoop: The Definitive Guide helps you harness the power of your data.

Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Cassandra: The Definitive Guide

Eben Hewitt - 2010

What could you do with data if scalability wasn't a problem? With this hands-on guide, you'll learn how Apache Cassandra handles hundreds of terabytes of data while remaining highly available across multiple data centers -- capabilities that have attracted Facebook, Twitter, and other data-intensive companies.

Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.Author Eben Hewitt demonstrates the advantages of Cassandra's nonrelational design, and pays special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.Understand the tenets of Cassandra's column-oriented structureLearn how to write, update, and read Cassandra dataDiscover how to add or remove nodes from the cluster as your application requiresExamine a working application that translates from a relational model to Cassandra's data modelUse examples for writing clients in Java, Python, and C#Use the JMX interface to monitor a cluster's usage, memory patterns, and moreTune memory settings, data storage, and caching for better performance

High Performance MySQL: Optimization, Backups, and Replication

programming

tech

technical

Baron Schwartz - 2008

How can you bring out MySQL’s full power? With High Performance MySQL, you’ll learn advanced techniques for everything from designing schemas, indexes, and queries to tuning your MySQL server, operating system, and hardware to their fullest potential.

This guide also teaches you safe and practical ways to scale applications through replication, load balancing, high availability, and failover. Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL. Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and views Implement improvements in replication, high availability, and clustering Achieve high performance when running MySQL in the cloud Optimize advanced querying features, such as full-text searches Take advantage of modern multi-core CPUs and solid-state disks Explore backup and recovery strategies—including new tools for hot online backups

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement

Eric Redmond - 2012

Data is getting bigger and more complex by the day, and so are the choices in handling that data.

As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.

ZooKeeper: Distributed process coordination

Flavio Junqueira - 2013

Building distributed applications is difficult enough without having to coordinate the actions that make them work.

This practical guide shows how Apache ZooKeeper helps you manage distributed systems, so you can focus mainly on application logic. Even with ZooKeeper, implementing coordination tasks is not trivial, but this book provides good practices to give you a head start, and points out caveats that developers and administrators alike need to watch for along the way.In three separate sections, ZooKeeper contributors Flavio Junqueira and Benjamin Reed introduce the principles of distributed systems, provide ZooKeeper programming techniques, and include the information you need to administer this service.Learn how ZooKeeper solves common coordination tasksExplore the ZooKeeper API’s Java and C implementations and how they differUse methods to track and react to ZooKeeper state changesHandle failures of the network, application processes, and ZooKeeper itselfLearn about ZooKeeper’s trickier aspects dealing with concurrency, ordering, and configurationUse the Curator high-level interface for connection managementBecome familiar with ZooKeeper internals and administration tools

Hadoop in Action

Chuck Lam - 2010

Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs.

The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples. Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Think Stats

Allen B. Downey - 2011

If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics.

This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.You'll work with a case study throughout the book to help you learn the entire data analysis process—from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.Develop your understanding of probability and statistics by writing and testing codeRun experiments to test statistical behavior, such as generating samples from several distributionsUse simulations to understand concepts that are hard to grasp mathematicallyLearn topics not usually covered in an introductory course, such as Bayesian estimationImport data from almost any source using Python, rather than be limited to data that has been cleaned and formatted for statistics toolsUse statistical inference to answer questions about real-world data

MongoDB in Action

Kyle Banker - 2011

SummaryMongoDB in Action is a comprehensive guide to MongoDB for application developers.

The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.About the TechnologyBig data can mean big headaches. MongoDB is a document-oriented database designed to be flexible, scalable, and very fast, even with big data loads. It's built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers.About this BookMongoDB in Action introduces you to MongoDB and the document-oriented database model. This perfectly paced book provides both the big picture you'll need as a developer and enough low-level detail to satisfy a system engineer. Numerous examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deploymentThis book is written for developers. No MongoDB or NoSQL experience required.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.What's InsideIndexes, queries, and standard DB operations Map-reduce for custom aggregations and reporting Schema design patterns Deploying for scale and high availabilityTable of ContentsPART 1 GETTING STARTED A database for the modern web MongoDB through the JavaScript shell Writing programs using MongoDB PART 2 APPLICATION DEVELOPMENT IN MONGODB Document-oriented data Queries and aggregation Updates, atomic operations, and deletes PART 3 MONGODB MASTERY Indexing and query optimization Replication Sharding Deployment and administration

The Little Book on CoffeeScript

Alex MacCaw - 2012

This little book shows JavaScript developers how to build superb web applications with CoffeeScript, the remarkable little language that’s gaining considerable interest.

Through example code, this guide demonstrates how CoffeeScript abstracts JavaScript, providing syntactical sugar and preventing many common errors. You’ll learn CoffeeScript’s syntax and idioms step by step, from basic variables and functions to complex comprehensions and classes.Written by Alex MacCaw, author of JavaScript Web Applications (O’Reilly), with contributions from CoffeeScript creator Jeremy Ashkenas, this book quickly teaches you best practices for using this language—not just on the client side, but for server-side applications as well. It’s time to take a ride with the little language that could.Discover how CoffeeScript’s syntax differs from JavaScriptLearn about features such as array comprehensions, destructuring assignments, and classesExplore CoffeeScript idioms and compare them to their JavaScript counterpartsCompile CoffeeScript files in static sites with the Cake build systemUse CommonJS modules to structure and deploy CoffeeScript client-side applicationsExamine JavaScript’s bad parts—including features CoffeeScript was able to fix

Planning for Big Data

Edd Wilder-James - 2004

In an age where everything is measurable, understanding big data is an essential.

From creating new data-driven products through to increasing operational efficiency, big data has the potential to makeyour organization both more competitive and more innovative.As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven.Written by O'Reilly Radar's experts on big data, this anthology describes:- The broad industry changes heralded by the big data era- What big data is, what it means to your business, and how to start solving data problems- The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions- The landscape of NoSQL databases and their relative merits- How visualization plays an important part in data work

Hadoop Explained

Aravind Shenoy - 2014

With the almost unfathomable increase in web traffic over recent years, driven by millions of connected users, businesses are gaining access to massive amounts of complex, unstructured data from which to gain insight.When Hadoop was introduced by Yahoo in 2007, it brought with it a paradigm shift in how this data was stored and analysed.

Hadoop allowed small and medium sized companies to store huge amounts of data on cheap commodity servers in racks. The introduction of Big Data has allowed businesses to make decisions based on quantifiable analysis. Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera, and Dell to name a few. This book introduces you to Hadoop and to concepts such as ‘MapReduce’, ‘Rack Awareness’, ‘Yarn’ and ‘HDFS Federation’, which will help you get acquainted with the technology.

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Holden Karau - 2017

Apache Spark is amazing when everything clicks.

But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.With this book, you'll explore:How Spark SQL's new interfaces improve performance over SQL's RDD data structureThe choice between data joins in Core Spark and Spark SQLTechniques for getting the most out of standard RDD transformationsHow to work around performance issues in Spark's key/value pair paradigmWriting high-performance Spark code without Scala or the JVMHow to test for functionality and performance when applying suggested improvementsUsing Spark MLlib and Spark ML machine learning librariesSpark's Streaming components and external community packages

Practical SQL: A Beginner's Guide to Storytelling with Data

Anthony DeBarros - 2018

Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases.

The book focuses on using SQL to find the story your data tells, with the popular open-source database PostgreSQL and the pgAdmin interface as its primary tools.You'll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from the U.S. Census and other federal and state government agencies. With exercises and real-world examples in each chapter, this book will teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently.You'll learn how to: •Create databases and related tables using your own data •Define the right data types for your information •Aggregate, sort, and filter data to find patterns •Use basic math and advanced statistical functions •Identify errors in data and clean them up •Import and export data using delimited text files •Write queries for geographic information systems (GIS) •Create advanced queries and automate tasks Learning SQL doesn't have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. This book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL.

What Is Data Science?

Mike Loukides - 2011

We've all heard it: according to Hal Varian, statistics is the next sexy job.

Five years ago, in What is Web 2.0, Tim O'Reilly said that "data is the next Intel Inside." But what does that statement mean? Why do we suddenly care about statistics and about data? This report examines the many sides of data science -- the technologies, the companies and the unique skill sets.The web is full of "data-driven apps." Almost any e-commerce application is a data-driven application. There's a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn't really what we mean by "data science." A data application acquires its value from the data itself, and creates more data as a result. It's not just an application with data; it's a data product. Data science enables the creation of data products.

Python for Data Analysis

Wes McKinney - 2011

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python.

It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It's ideal for analysts new to Python and for Python programmers new to scientific computing.Use the IPython interactive shell as your primary development environmentLearn basic and advanced NumPy (Numerical Python) featuresGet started with data analysis tools in the pandas libraryUse high-performance tools to load, clean, transform, merge, and reshape dataCreate scatter plots and static or interactive visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsMeasure data by points in time, whether it's specific instances, fixed periods, or intervalsLearn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

Book picks similar toHBase: The Definitive Guide by Lars George

Hadoop: The Definitive Guide

Cassandra: The Definitive Guide

High Performance MySQL: Optimization, Backups, and Replication

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement

ZooKeeper: Distributed process coordination

Hadoop in Action

Think Stats

MongoDB in Action

The Little Book on CoffeeScript

Planning for Big Data

Hadoop Explained

High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

Practical SQL: A Beginner's Guide to Storytelling with Data

What Is Data Science?

Python for Data Analysis

Book picks similar to
HBase: The Definitive Guide by Lars George