Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Probabilistic Graphical Models: Principles and Techniques


Daphne Koller - 2009
    The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.

Data Structures and Algorithms in Python


Michael T. Goodrich - 2012
     Data Structures and Algorithms in Python is the first mainstream object-oriented book available for the Python data structures course. Designed to provide a comprehensive introduction to data structures and algorithms, including their design, analysis, and implementation, the text will maintain the same general structure as Data Structures and Algorithms in Java and Data Structures and Algorithms in C++.

Spark: The Definitive Guide: Big Data Processing Made Simple


Bill Chambers - 2018
    With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Spark’s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Machine Learning in Action


Peter Harrington - 2011
    "Machine learning," the process of automating tasks once considered the domain of highly-trained analysts and mathematicians, is the key to efficiently extracting useful information from this sea of raw data. Machine Learning in Action is a unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. In it, the author uses the flexible Python programming language to show how to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.

Beautiful Visualization: Looking at Data through the Eyes of Experts


Julie Steele - 2010
    Think of the familiar map of the New York City subway system, or a diagram of the human brain. Successful visualizations are beautiful not only for their aesthetic design, but also for elegant layers of detail that efficiently generate insight and new understanding.This book examines the methods of two dozen visualization experts who approach their projects from a variety of perspectives -- as artists, designers, commentators, scientists, analysts, statisticians, and more. Together they demonstrate how visualization can help us make sense of the world.Explore the importance of storytelling with a simple visualization exerciseLearn how color conveys information that our brains recognize before we're fully aware of itDiscover how the books we buy and the people we associate with reveal clues to our deeper selvesRecognize a method to the madness of air travel with a visualization of civilian air trafficFind out how researchers investigate unknown phenomena, from initial sketches to published papers Contributors include:Nick Bilton, Michael E. Driscoll, Jonathan Feinberg, Danyel Fisher, Jessica Hagy, Gregor Hochmuth, Todd Holloway, Noah Iliinsky, Eddie Jabbour, Valdean Klump, Aaron Koblin, Robert Kosara, Valdis Krebs, JoAnn Kuchera-Morin et al., Andrew Odewahn, Adam Perer, Anders Persson, Maximilian Schich, Matthias Shapiro, Julie Steele, Moritz Stefaner, Jer Thorp, Fernanda Viegas, Martin Wattenberg, and Michael Young.

Probabilistic Robotics


Sebastian Thrun - 2005
    Building on the field of mathematical statistics, probabilistic robotics endows robots with a new level of robustness in real-world situations. This book introduces the reader to a wealth of techniques and algorithms in the field. All algorithms are based on a single overarching mathematical foundation. Each chapter provides example implementations in pseudo code, detailed mathematical derivations, discussions from a practitioner's perspective, and extensive lists of exercises and class projects. The book's Web site, www.probabilistic-robotics.org, has additional material. The book is relevant for anyone involved in robotic software development and scientific research. It will also be of interest to applied statisticians and engineers dealing with real-world sensor data.

Superintelligence: Paths, Dangers, Strategies


Nick Bostrom - 2014
    The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. If machine brains surpassed human brains in general intelligence, then this new superintelligence could become extremely powerful--possibly beyond our control. As the fate of the gorillas now depends more on humans than on the species itself, so would the fate of humankind depend on the actions of the machine superintelligence.But we have one advantage: we get to make the first move. Will it be possible to construct a seed Artificial Intelligence, to engineer initial conditions so as to make an intelligence explosion survivable? How could one achieve a controlled detonation?

WPF 4 Unleashed


Adam Nathan - 2010
    Windows Presentation Foundation (WPF) is the recommended technology for creating Windows user interfaces, giving you the power to create richer and more compelling applications than you dreamed possible. Whether you want to develop traditional user interfaces or integrate 3D graphics, audio/video, animation, dynamic skinning, multi-touch, rich document support, speech recognition, or more, WPF enables you to do so in a seamless, resolution-independent manner. WPF 4 Unleashed is the authoritative book that covers it all, in a practical and approachable fashion, authored by WPF guru and Microsoft developer Adam Nathan. Covers everything you need to know about Extensible Application Markup Language (XAML) Examines the WPF feature areas in incredible depth: controls, layout, resources, data binding, styling, graphics, animation, and more Highlights the latest features, such as multi-touch, text rendering improvements, XAML language enhancements, new controls, the Visual State Manager, easing functions, and much more Delves into topics that aren't covered by most books: 3D, speech, audio/video, documents, effects Shows how to create popular UI elements, such as Galleries, ScreenTips, and more Demonstrates how to create sophisticated UI mechanisms, such as Visual Studio-like collapsible/dockable panes Explains how to create first-class custom controls for WPF Demonstrates how to create hybrid WPF software that leverages Windows Forms, DirectX, ActiveX, or other non-WPF technologies Explains how to exploit new Windows 7 features, such as Jump Lists and taskbar customizations

Python for Informatics: Exploring Information: Exploring Information


Charles Severance - 2002
    You can think of Python as your tool to solve problems that are far beyond the capability of a spreadsheet. It is an easy-to-use and easy-to learn programming language that is freely available on Windows, Macintosh, and Linux computers. There are free downloadable copies of this book in various electronic formats and a self-paced free online course where you can explore the course materials. All the supporting materials for the book are available under open and remixable licenses. This book is designed to teach people to program even if they have no prior experience.

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data


Hadley Wickham - 2016
    This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way. You’ll learn how to: Wrangle—transform your datasets into a form convenient for analysis Program—learn powerful R tools for solving data problems with greater clarity and ease Explore—examine your data, generate hypotheses, and quickly test them Model—provide a low-dimensional summary that captures true "signals" in your dataset Communicate—learn R Markdown for integrating prose, code, and results

Machine Learning for Hackers


Drew Conway - 2012
    Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyze sample datasets and write simple machine learning algorithms. "Machine Learning for Hackers" is ideal for programmers from any background, including business, government, and academic research.Develop a naive Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a "whom to follow" recommendation system from Twitter data

Data Science


John D. Kelleher - 2018
    Today data science determines the ads we see online, the books and movies that are recommended to us online, which emails are filtered into our spam folders, and even how much we pay for health insurance. This volume in the MIT Press Essential Knowledge series offers a concise introduction to the emerging field of data science, explaining its evolution, current uses, data infrastructure issues, and ethical challenges.It has never been easier for organizations to gather, store, and process data. Use of data science is driven by the rise of big data and social media, the development of high-performance computing, and the emergence of such powerful methods for data analysis and modeling as deep learning. Data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large datasets. It is closely related to the fields of data mining and machine learning, but broader in scope. This book offers a brief history of the field, introduces fundamental data concepts, and describes the stages in a data science project. It considers data infrastructure and the challenges posed by integrating data from multiple sources, introduces the basics of machine learning, and discusses how to link machine learning expertise with real-world problems. The book also reviews ethical and legal issues, developments in data regulation, and computational approaches to preserving privacy. Finally, it considers the future impact of data science and offers principles for success in data science projects.

Sun Certified Programmer & Developer for Java 2 Study Guide (Exam 310-035 & 310-027)


Kathy Sierra - 2002
    More than 250 challenging practice questions have been completely revised to closely model the format, tone, topics, and difficulty of the real exam. An integrated study system based on proven pedagogy, exam coverage includes step-by-step exercises, special Exam Watch notes, On-the-Job elements, and Self Tests with in-depth answer explanations to help reinforce and teach practical skills.Praise for the author:"Finally A Java certification book that explains everything clearly. All you need to pass the exam is in this book."--Solveig Haugland, Technical Trainer and Former Sun Course Developer"Who better to write a Java study guide than Kathy Sierra, the reigning queen of Java instruction? Kathy Sierra has done it again--here is a study guide that almost guarantees you a certification "--James Cubeta, Systems Engineer, SGI"The thing I appreciate most about Kathy is her quest to make us all remember that we are teaching people and not just lecturing about Java. Her passion and desire for the highest quality education that meets the needs of the individual student is positively unparalleled at SunEd. Undoubtedly there are hundreds of students who have benefited from taking Kathy's classes."--Victor Peters, founder Next Step Education & Software Sun Certified Java Instructor"I want to thank Kathy for the EXCELLENT Study Guide. The book is well written, every concept is clearly explained using a real life example, and the book states what you specifically need to know for the exam. The way it's written, you feel that you're in a classroom and someone is actually teaching you the difficult concepts, but not in a dry, formal manner. The questions at the end of the chapters are also REALLY good, and I am sure they will help candidates pass the test. Watch out for this Wickedly Smart book."-Alfred Raouf, Web Solution Developer, Kemety.Net"The Sun Certification exam was certainly no walk in the park but Kathy's material allowed me to not only pass the exam, but Ace it "--Mary Whetsel, Sr. Technology Specialist, Application Strategy and Integration, The St. Paul Companies

Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD


Jeremy Howard - 2020
    But as this hands-on guide demonstrates, programmers comfortable with Python can achieve impressive results in deep learning with little math background, small amounts of data, and minimal code. How? With fastai, the first library to provide a consistent interface to the most frequently used deep learning applications.Authors Jeremy Howard and Sylvain Gugger show you how to train a model on a wide range of tasks using fastai and PyTorch. You'll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes.Train models in computer vision, natural language processing, tabular data, and collaborative filteringLearn the latest deep learning techniques that matter most in practiceImprove accuracy, speed, and reliability by understanding how deep learning models workDiscover how to turn your models into web applicationsImplement deep learning algorithms from scratchConsider the ethical implications of your work