Book picks similar to
Data Mining: The Textbook by Charu C. Aggarwal
data-science
non-fiction
computing
tech
Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists
Philipp K. Janert - 2010
With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora
Hadoop: The Definitive Guide
Tom White - 2009
Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!
Python for Informatics: Exploring Information: Exploring Information
Charles Severance - 2002
You can think of Python as your tool to solve problems that are far beyond the capability of a spreadsheet. It is an easy-to-use and easy-to learn programming language that is freely available on Windows, Macintosh, and Linux computers. There are free downloadable copies of this book in various electronic formats and a self-paced free online course where you can explore the course materials. All the supporting materials for the book are available under open and remixable licenses. This book is designed to teach people to program even if they have no prior experience.
All of Statistics: A Concise Course in Statistical Inference
Larry Wasserman - 2003
But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like nonparametric curve estimation, bootstrapping, and clas- sification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analyzing data. For some time, statistics research was con- ducted in statistics departments while data mining and machine learning re- search was conducted in computer science departments. Statisticians thought that computer scientists were reinventing the wheel. Computer scientists thought that statistical theory didn't apply to their problems. Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algo- rithms are more scalable than statisticians ever thought possible. Formal sta- tistical theory is more pervasive than computer scientists had realized.
Effective Python: 90 Specific Ways to Write Better Python (Effective Software Development Series)
Brett Slatkin - 2019
However, Python’s unique strengths, charms, and expressiveness can be hard to grasp, and there are hidden pitfalls that can easily trip you up. This second edition of Effective Python will help you master a truly “Pythonic” approach to programming, harnessing Python’s full power to write exceptionally robust and well-performing code. Using the concise, scenario-driven style pioneered in Scott Meyers’ best-selling Effective C++, Brett Slatkin brings together 90 Python best practices, tips, and shortcuts, and explains them with realistic code examples so that you can embrace Python with confidence. Drawing on years of experience building Python infrastructure at Google, Slatkin uncovers little-known quirks and idioms that powerfully impact code behavior and performance. You’ll understand the best way to accomplish key tasks so you can write code that’s easier to understand, maintain, and improve. In addition to even more advice, this new edition substantially revises all items from the first edition to reflect how best practices have evolved. Key features include 30 new actionable guidelines for all major areas of Python Detailed explanations and examples of statements, expressions, and built-in types Best practices for writing functions that clarify intention, promote reuse, and avoid bugs Better techniques and idioms for using comprehensions and generator functions Coverage of how to accurately express behaviors with classes and interfaces Guidance on how to avoid pitfalls with metaclasses and dynamic attributes More efficient and clear approaches to concurrency and parallelism Solutions for optimizing and hardening to maximize performance and quality Techniques and built-in modules that aid in debugging and testing Tools and best practices for collaborative development Effective Python will prepare growing programmers to make a big impact using Python.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
Eric Siegel - 2013
Rather than a "how to" for hands-on techies, the book entices lay-readers and experts alike by covering new case studies and the latest state-of-the-art techniques.You have been predicted — by companies, governments, law enforcement, hospitals, and universities. Their computers say, "I knew you were going to do that!" These institutions are seizing upon the power to predict whether you're going to click, buy, lie, or die.Why? For good reason: predicting human behavior combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales.How? Prediction is powered by the world's most potent, booming unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future — lifting a bit of the fog off our hazy view of tomorrow — means pay dirt.In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: -What type of mortgage risk Chase Bank predicted before the recession. -Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves. -Why early retirement decreases life expectancy and vegetarians miss fewer flights. -Five reasons why organizations predict death, including one health insurance company. -How U.S. Bank, European wireless carrier Telenor, and Obama's 2012 campaign calculated the way to most strongly influence each individual. -How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! -How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. -How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. -What's predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia. A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.Predictive analytics transcends human perception. This book's final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can't even be sure has happened afterward — but that can be predicted in advance?Whether you are a consumer of it — or consumed by it — get a handle on the power of Predictive Analytics.
Programming Pearls
Jon L. Bentley - 1986
Jon has done a wonderful job of updating the material. I am very impressed at how fresh the new examples seem." - Steve McConnell, author, Code CompleteWhen programmers list their favorite books, Jon Bentley's collection of programming pearls is commonly included among the classics. Just as natural pearls grow from grains of sand that irritate oysters, programming pearls have grown from real problems that have irritated real programmers. With origins beyond solid engineering, in the realm of insight and creativity, Bentley's pearls offer unique and clever solutions to those nagging problems. Illustrated by programs designed as much for fun as for instruction, the book is filled with lucid and witty descriptions of practical programming techniques and fundamental design principles. It is not at all surprising that
Programming Pearls
has been so highly valued by programmers at every level of experience. In this revision, the first in 14 years, Bentley has substantially updated his essays to reflect current programming methods and environments. In addition, there are three new essays on (1) testing, debugging, and timing; (2) set representations; and (3) string problems. All the original programs have been rewritten, and an equal amount of new code has been generated. Implementations of all the programs, in C or C++, are now available on the Web.What remains the same in this new edition is Bentley's focus on the hard core of programming problems and his delivery of workable solutions to those problems. Whether you are new to Bentley's classic or are revisiting his work for some fresh insight, this book is sure to make your own list of favorites.
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Ralph Kimball - 1996
Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)
Cassandra: The Definitive Guide
Eben Hewitt - 2010
Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.Author Eben Hewitt demonstrates the advantages of Cassandra's nonrelational design, and pays special attention to data modeling. If you're a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra's speed and flexibility.Understand the tenets of Cassandra's column-oriented structureLearn how to write, update, and read Cassandra dataDiscover how to add or remove nodes from the cluster as your application requiresExamine a working application that translates from a relational model to Cassandra's data modelUse examples for writing clients in Java, Python, and C#Use the JMX interface to monitor a cluster's usage, memory patterns, and moreTune memory settings, data storage, and caching for better performance
Big Data for Dummies
Judith Hurwitz - 2013
Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work.Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals Authors are experts in information management, big data, and a variety of solutions Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more Provides essential information in a no-nonsense, easy-to-understand style that is empowering Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.
HBase: The Definitive Guide
Lars George - 2011
As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.
Discover how tight integration with Hadoop makes scalability with HBase easier
Distribute large datasets across an inexpensive cluster of commodity servers
Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs
Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more
Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs
Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks
Thinking with Data
Max Shron - 2014
In this practical guide, data strategy consultant Max Shron shows you how to put the why before the how, through an often-overlooked set of analytical skills.Thinking with Data helps you learn techniques for turning data into knowledge you can use. You’ll learn a framework for defining your project, including the data you want to collect, and how you intend to approach, organize, and analyze the results. You’ll also learn patterns of reasoning that will help you unveil the real problem that needs to be solved.Learn a framework for scoping data projectsUnderstand how to pin down the details of an idea, receive feedback, and begin prototypingUse the tools of arguments to ask good questions, build projects in stages, and communicate resultsExplore data-specific patterns of reasoning and learn how to build more useful argumentsDelve into causal reasoning and learn how it permeates data workPut everything together, using extended examples to see the method of full problem thinking in action
Data Analysis Using SQL and Excel
Gordon S. Linoff - 2007
This book helps you use SQL and Excel to extract business information from relational databases and use that data to define business dimensions, store transactions about customers, produce results, and more. Each chapter explains when and why to perform a particular type of business analysis in order to obtain useful results, how to design and perform the analysis using SQL and Excel, and what the results should look like.
Fluent Python: Clear, Concise, and Effective Programming
Luciano Ramalho - 2015
With this hands-on guide, you'll learn how to write effective, idiomatic Python code by leveraging its best and possibly most neglected features. Author Luciano Ramalho takes you through Python's core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time.Many experienced programmers try to bend Python to fit patterns they learned from other languages, and never discover Python features outside of their experience. With this book, those Python programmers will thoroughly learn how to become proficient in Python 3.This book covers:Python data model: understand how special methods are the key to the consistent behavior of objectsData structures: take full advantage of built-in types, and understand the text vs bytes duality in the Unicode ageFunctions as objects: view Python functions as first-class objects, and understand how this affects popular design patternsObject-oriented idioms: build classes by learning about references, mutability, interfaces, operator overloading, and multiple inheritanceControl flow: leverage context managers, generators, coroutines, and concurrency with the concurrent.futures and asyncio packagesMetaprogramming: understand how properties, attribute descriptors, class decorators, and metaclasses work"
Make Your Own Neural Network
Tariq Rashid - 2016
Neural networks are a key element of deep learning and artificial intelligence, which today is capable of some truly impressive feats. Yet too few really understand how neural networks actually work. This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work. You won't need any mathematics beyond secondary school, and an accessible introduction to calculus is also included. The ambition of this guide is to make neural networks as accessible as possible to as many readers as possible - there are enough texts for advanced readers already! You'll learn to code in Python and make your own neural network, teaching it to recognise human handwritten numbers, and performing as well as professionally developed networks. Part 1 is about ideas. We introduce the mathematical ideas underlying the neural networks, gently with lots of illustrations and examples. Part 2 is practical. We introduce the popular and easy to learn Python programming language, and gradually builds up a neural network which can learn to recognise human handwritten numbers, easily getting it to perform as well as networks made by professionals. Part 3 extends these ideas further. We push the performance of our neural network to an industry leading 98% using only simple ideas and code, test the network on your own handwriting, take a privileged peek inside the mysterious mind of a neural network, and even get it all working on a Raspberry Pi. All the code in this has been tested to work on a Raspberry Pi Zero.