Doing Data Science


Cathy O'Neil - 2013
    But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine


Clinton Gormley - 2014
    This practical guide not only shows you how to search, analyze, and explore data with Elasticsearch, but also helps you deal with the complexities of human language, geolocation, and relationships.If you're a newcomer to both search and distributed systems, you'll quickly learn how to integrate Elasticsearch into your application. More experienced users will pick up lots of advanced techniques. Throughout the book, you'll follow a problem-based approach to learn why, when, and how to use Elasticsearch features.Understand how Elasticsearch interprets data in your documentsIndex and query your data to take advantage of search concepts such as relevance and word proximityHandle human language through the effective use of analyzers and queriesSummarize and group data to show overall trends, with aggregations and analyticsUse geo-points and geo-shapes--Elasticsearch's approaches to geolocationModel your data to take advantage of Elasticsearch's horizontal scalabilityLearn how to configure and monitor your cluster in production

The Art of Monitoring


James Turnbull - 2016
    We start small and then build on what you learn to scale out to multi-site, multi-tier applications. The book is written for both developers and sysadmins. We focus on building monitored and measurable applications. We also use tools that are designed to handle the challenges of managing Cloud, containerised and distributed applications and infrastructure.In the book we'll deliver:* An introduction to monitoring, metrics and measurement.* A scalable framework for monitoring hosts (including Docker and containers), services and applications built on top of the Riemann event stream processor. * Graphing and metric storage using Graphite and Grafana.* Logging with Logstash.* A framework for high quality and useful notifications* Techniques for developing and building monitorable applications* A capstone that puts all the pieces together to monitor a multi-tier application.

High Performance MySQL: Optimization, Backups, and Replication


Baron Schwartz - 2008
    This guide also teaches you safe and practical ways to scale applications through replication, load balancing, high availability, and failover. Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL. Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and views Implement improvements in replication, high availability, and clustering Achieve high performance when running MySQL in the cloud Optimize advanced querying features, such as full-text searches Take advantage of modern multi-core CPUs and solid-state disks Explore backup and recovery strategies—including new tools for hot online backups

Machine Learning: A Probabilistic Perspective


Kevin P. Murphy - 2012
    Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

Python Cookbook


David Beazley - 2002
    Packed with practical recipes written and tested with Python 3.3, this unique cookbook is for experienced Python programmers who want to focus on modern tools and idioms.Inside, you’ll find complete recipes for more than a dozen topics, covering the core Python language as well as tasks common to a wide variety of application domains. Each recipe contains code samples you can use in your projects right away, along with a discussion about how and why the solution works.Topics include:Data Structures and AlgorithmsStrings and TextNumbers, Dates, and TimesIterators and GeneratorsFiles and I/OData Encoding and ProcessingFunctionsClasses and ObjectsMetaprogrammingModules and PackagesNetwork and Web ProgrammingConcurrencyUtility Scripting and System AdministrationTesting, Debugging, and ExceptionsC Extensions

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling


Ralph Kimball - 1996
    Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)

Adventures In Raspberry Pi (Adventures In ...)


Carrie Anne Philbin - 2013
    Written for 11- to 15-year-olds and assuming no prior computing knowledge, this book uses the wildly successful, low-cost, credit-card-sized Raspberry Pi computer to explain fundamental computing concepts. Young people will enjoy going through the book's nine fun projects while they learn basic programming and system administration skills, starting with the very basics of how to plug in the board and turn it on. Each project includes a lively and informative video to reinforce the lessons. It's perfect for young, eager self-learners—your kids can jump in, set up their Raspberry Pi, and go through the lessons on their own. Written by Carrie Anne Philbin, a high school teacher of computing who advises the U.K. government on the revised ICT Curriculum Teaches 11- to 15-year-olds programming and system administration skills using Raspberry Pi Features 9 fun projects accompanied by lively and helpful videos Raspberry Pi is a $35/£25 credit-card-sized computer created by the non-profit Raspberry Pi Foundation; over a million have been sold Help your children have fun and learn computing skills at the same time with Adventures in Raspberry Pi.

Joe Celko's SQL for Smarties: Advanced SQL Programming


Joe Celko - 1995
    Now, 10 years later and in the third edition, this classic still reigns supreme as the book written by an SQL master that teaches future SQL masters. These are not just tips and techniques; Joe also offers the best solutions to old and new challenges and conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance.In the third edition, Joe features new examples and updates to SQL-99, expanded sections of Query techniques, and a new section on schema design, with the same war-story teaching style that made the first and second editions of this book classics.

The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography


Simon Singh - 1999
    From Mary, Queen of Scots, trapped by her own code, to the Navajo Code Talkers who helped the Allies win World War II, to the incredible (and incredibly simple) logisitical breakthrough that made Internet commerce secure, The Code Book tells the story of the most powerful intellectual weapon ever known: secrecy.Throughout the text are clear technical and mathematical explanations, and portraits of the remarkable personalities who wrote and broke the world’s most difficult codes. Accessible, compelling, and remarkably far-reaching, this book will forever alter your view of history and what drives it. It will also make you wonder how private that e-mail you just sent really is.

C# 4.0 in a Nutshell


Joseph Albahari - 2010
    It is a book I recommend." --Scott Guthrie, Corporate Vice President, .NET Developer Platform, Microsoft Corporation "A must-read for a concise but thorough examination of the parallel programming features in the .NET Framework 4." --Stephen Toub, Parallel Computing Platform Program Manager, Microsoft "This wonderful book is a great reference for developers of all levels." -- Chris Burrows, C# Compiler Team, Microsoft When you have questions about how to use C# 4.0 or the .NET CLR, this highly acclaimed bestseller has precisely the answers you need. Uniquely organized around concepts and use cases, this fourth edition includes in-depth coverage of new C# topics such as parallel programming, code contracts, dynamic programming, security, and COM interoperability. You'll also find updated information on LINQ, including examples that work with both LINQ to SQL and Entity Framework. This book has all the essential details to keep you on track with C# 4.0. Get up to speed on C# language basics, including syntax, types, and variables Explore advanced topics such as unsafe code and preprocessor directives Learn C# 4.0 features such as dynamic binding, type parameter variance, and optional and named parameters Work with .NET 4's rich set of features for parallel programming, code contracts, and the code security model Learn .NET topics, including XML, collections, I/O and networking, memory management, reflection, attributes, security, and native interoperability

Head First Python


Paul Barry - 2010
    You'll quickly learn the language's fundamentals, then move onto persistence, exception handling, web development, SQLite, data wrangling, and Google App Engine. You'll also learn how to write mobile apps for Android, all thanks to the power that Python gives you.We think your time is too valuable to waste struggling with new concepts. Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Python uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.

Docker in Action


Jeff Nickoloff - 2015
    Create a tiny virtual environment, called a container, for your application that includes only its particular set of dependencies. The Docker engine accounts for, manages, and builds these containers through functionality provided by the host operating system. Software running inside containers share the Linux OS and other resources, such as libraries, making their footprints radically smaller, and the containerized applications are easy to install, manage, and remove. Developers can package their applications without worrying about environment-specific deployment concerns, and the operations team gets cleaner, more efficient systems across the board. Better still, Docker is free and open source.Docker in Action teaches readers how to create, deploy, and manage applications hosted in Docker containers. The book starts with a clear explanation of the Docker model of virtualization, comparing this approach to the traditional hypervisor model. Developers will learn how to package applications in containers, including specific techniques for testing and distributing applications via Docker Hub and other registries. Readers will learn how to take advantage of the Linux OS features that Docker uses to run programs securely, and how to manage shared resources. Using carefully-designed examples, the book teaches you how to orchestrate containers and applications from installation to removal. Along the way, you'll learn techniques for using Docker on systems ranging from your personal dev-and-test machine to full-scale cloud deployments.

Learn Python The Hard Way


Zed A. Shaw - 2010
    The title says it is the hard way to learn to writecode but it’s actually not. It’s the “hard” way only in that it’s the way people used to teach things. In this book youwill do something incredibly simple that all programmers actually do to learn a language: 1. Go through each exercise. 2. Type in each sample exactly. 3. Make it run.That’s it. This will be very difficult at first, but stick with it. If you go through this book, and do each exercise for1-2 hours a night, then you’ll have a good foundation for moving on to another book. You might not really learn“programming” from this book, but you will learn the foundation skills you need to start learning the language.This book’s job is to teach you the three most basic essential skills that a beginning programmer needs to know:Reading And Writing, Attention To Detail, Spotting Differences.