Doing Data Science


Cathy O'Neil - 2013
    But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Causality: Models, Reasoning, and Inference


Judea Pearl - 2000
    It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, philosophy, cognitive science, and the health and social sciences. Pearl presents a unified account of the probabilistic, manipulative, counterfactual and structural approaches to causation, and devises simple mathematical tools for analyzing the relationships between causal connections, statistical associations, actions and observations. The book will open the way for including causal analysis in the standard curriculum of statistics, artifical intelligence, business, epidemiology, social science and economics. Students in these areas will find natural models, simple identification procedures, and precise mathematical definitions of causal concepts that traditional texts have tended to evade or make unduly complicated. This book will be of interest to professionals and students in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable. Professor of Computer Science at the UCLA, Judea Pearl is the winner of the 2008 Benjamin Franklin Award in Computers and Cognitive Science.

Security Engineering: A Guide to Building Dependable Distributed Systems


Ross J. Anderson - 2008
    Spammers, virus writers, phishermen, money launderers, and spies now trade busily with each other in a lively online criminal economy and as they specialize, they get better. In this indispensable, fully updated guide, Ross Anderson reveals how to build systems that stay dependable whether faced with error or malice. Here's straight talk on critical topics such as technical engineering basics, types of attack, specialized protection mechanisms, security psychology, policy, and more.

Probabilistic Graphical Models: Principles and Techniques


Daphne Koller - 2009
    The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.

Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Black Hat Python: Python Programming for Hackers and Pentesters


Justin Seitz - 2014
    But just how does the magic happen?In Black Hat Python, the latest from Justin Seitz (author of the best-selling Gray Hat Python), you'll explore the darker side of Python's capabilities writing network sniffers, manipulating packets, infecting virtual machines, creating stealthy trojans, and more. You'll learn how to:Create a trojan command-and-control using GitHubDetect sandboxing and automate common malware tasks, like keylogging and screenshottingEscalate Windows privileges with creative process controlUse offensive memory forensics tricks to retrieve password hashes and inject shellcode into a virtual machineExtend the popular Burp Suite web-hacking toolAbuse Windows COM automation to perform a man-in-the-browser attackExfiltrate data from a network most sneakilyInsider techniques and creative challenges throughout show you how to extend the hacks and how to write your own exploits.When it comes to offensive security, your ability to create powerful tools on the fly is indispensable. Learn how in Black Hat Python."

Pattern Classification


David G. Stork - 1973
    Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included are worked examples, comparisons between different methods, extensive graphics, expanded exercises and computer project topics.An Instructor's Manual presenting detailed solutions to all the problems in the book is available from the Wiley editorial department.

All of Statistics: A Concise Course in Statistical Inference


Larry Wasserman - 2003
    But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like nonparametric curve estimation, bootstrapping, and clas- sification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analyzing data. For some time, statistics research was con- ducted in statistics departments while data mining and machine learning re- search was conducted in computer science departments. Statisticians thought that computer scientists were reinventing the wheel. Computer scientists thought that statistical theory didn't apply to their problems. Things are changing. Statisticians now recognize that computer scientists are making novel contributions while computer scientists now recognize the generality of statistical theory and methodology. Clever data mining algo- rithms are more scalable than statisticians ever thought possible. Formal sta- tistical theory is more pervasive than computer scientists had realized.

C# in Depth


Jon Skeet - 2008
    With the many upgraded features, C# is more expressive than ever. However, an in depth understanding is required to get the most out of the language.C# in Depth, Second Edition is a thoroughly revised, up-to-date book that covers the new features of C# 4 as well as Code Contracts. In it, you'll see the subtleties of C# programming in action, learning how to work with high-value features that you'll be glad to have in your toolkit. The book helps readers avoid hidden pitfalls of C# programming by understanding "behind the scenes" issues.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

C++ Templates: The Complete Guide


David Vandevoorde - 2002
    C++ Templates: The Complete Guide provides software architects and engineers with a clear understanding of why, when, and how to use templates to build and maintain cleaner, faster, and smarter software more efficiently. C++ Templates begins with an insightful tutorial on basic concepts and language features. The remainder of the book serves as a comprehensive reference, focusing first on language details, then on a wide range of coding techniques, and finally on advanced applications for templates. Examples used throughout the book illustrate abstract concepts and demonstrate best practices. Readers learn: The exact behaviors of templates How to avoid the pitfalls associated with templates Idioms and techniques, from the basic to the previously undocumented How to reuse source code without threatening performance or safety How to increase the efficiency of C++ programs How to produce more flexible and maintainable software This practical guide shows programmers how to exploit the full power of the template features in C++.

Mastering Regular Expressions


Jeffrey E.F. Friedl - 1997
    They are now standard features in a wide range of languages and popular tools, including Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL.If you don't use regular expressions yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage. If you think you know all you need to know about regularexpressions, this book is a stunning eye-opener.As this book shows, a command of regular expressions is an invaluable skill. Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. Regular expressions can save you time and aggravation. They can be used to craft elegant solutions to a wide range of problems. Once you've mastered regular expressions, they'll become an invaluable part of your toolkit. You will wonder how you ever got by without them.Yet despite their wide availability, flexibility, and unparalleled power, regular expressions are frequently underutilized. Yet what is power in the hands of an expert can be fraught with peril for the unwary. Mastering Regular Expressions will help you navigate the minefield to becoming an expert and help you optimize your use of regular expressions.Mastering Regular Expressions, Third Edition, now includes a full chapter devoted to PHP and its powerful and expressive suite of regular expression functions, in addition to enhanced PHP coverage in the central "core" chapters. Furthermore, this edition has been updated throughout to reflect advances in other languages, including expanded in-depth coverage of Sun's java.util.regex package, which has emerged as the standard Java regex implementation.Topics include:A comparison of features among different versions of many languages and toolsHow the regular expression engine worksOptimization (major savings available here!)Matching just what you want, but not what you don't wantSections and chapters on individual languagesWritten in the lucid, entertaining tone that makes a complex, dry topic become crystal-clear to programmers, and sprinkled with solutions to complex real-world problems, Mastering Regular Expressions, Third Edition offers a wealth information that you can put to immediateuse.Reviews of this new edition and the second edition: "There isn't a better (or more useful) book available on regular expressions."--Zak Greant, Managing Director, eZ Systems"A real tour-de-force of a book which not only covers the mechanics of regexes in extraordinary detail but also talks about efficiency and the use of regexes in Perl, Java, and .NET...If you use regular expressions as part of your professional work (even if you already have a good book on whatever language you're programming in) I would strongly recommend this book to you."--Dr. Chris Brown, Linux Format"The author does an outstanding job leading the reader from regexnovice to master. The book is extremely easy to read and chock full ofuseful and relevant examples...Regular expressions are valuable toolsthat every developer should have in their toolbox. Mastering RegularExpressions is the definitive guide to the subject, and an outstandingresource that belongs on every programmer's bookshelf. Ten out of TenHorseshoes."--Jason Menard, Java Ranch

The Essential Turing: Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life Plus the Secrets of Enigma


Alan Turing - 2004
    In 1935, aged 22, he developed the mathematical theory upon which all subsequent stored-program digital computers are modeled.At the outbreak of hostilities with Germany in September 1939, he joined the Government Codebreaking team at Bletchley Park, Buckinghamshire and played a crucial role in deciphering Engima, the code used by the German armed forces to protect their radio communications. Turing's work on the versionof Enigma used by the German navy was vital to the battle for supremacy in the North Atlantic. He also contributed to the attack on the cyphers known as Fish, which were used by the German High Command for the encryption of signals during the latter part of the war. His contribution helped toshorten the war in Europe by an estimated two years.After the war, his theoretical work led to the development of Britain's first computers at the National Physical Laboratory and the Royal Society Computing Machine Laboratory at Manchester University.Turing was also a founding father of modern cognitive science, theorizing that the cortex at birth is an unorganized machine which through training becomes organized into a universal machine or something like it. He went on to develop the use of computers to model biological growth, launchingthe discipline now referred to as Artificial Life.The papers in this book are the key works for understanding Turing's phenomenal contribution across all these fields. The collection includes Turing's declassified wartime Treatise on the Enigma; letters from Turing to Churchill and to codebreakers; lectures, papers, and broadcasts which opened upthe concept of AI and its implications; and the paper which formed the genesis of the investigation of Artifical Life.

Algorithmic Puzzles


Anany V. Levitin - 2011
    This logic extends far beyond the realm of computer science and into the wide and entertaining world of puzzles. In Algorithmic Puzzles, Anany and Maria Levitin use many classic brainteasers as well as newer examples from job interviews with major corporations to show readers how to apply analytical thinking to solve puzzles requiring well-defined procedures.The book's unique collection of puzzles is supplemented with carefully developed tutorials on algorithm design strategies and analysis techniques intended to walk the reader step-by-step through the various approaches to algorithmic problem solving. Mastery of these strategies--exhaustive search, backtracking, and divide-and-conquer, among others--will aid the reader in solving not only the puzzles contained in this book, but also others encountered in interviews, puzzle collections, and throughout everyday life. Each of the 150 puzzles contains hints and solutions, along with commentary onthe puzzle's origins and solution methods. The only book of its kind, Algorithmic Puzzles houses puzzles for all skill levels. Readers with only middle school mathematics will develop their algorithmic problem-solving skills through puzzles at the elementary level, while seasoned puzzle solvers will enjoy the challenge of thinking throughmore difficult puzzles.

Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)


Jiawei Han - 2000
    Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business dataUpdates that incorporate input from readers, changes in the field, and more material on statistics and machine learningDozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projectsComplete classroom support for instructors at www.mkp.com/datamining2e companion site

Mastering Emacs


Mickey Petersen - 2015
    In the Mastering Emacs ebook you will learn the answers to all the concepts that take weeks, months or even years to truly learn, all in one place.“Emacs is such a hard editor to learn”But why is it so hard to learn? As it turns out, it's almost always the same handful of issues that everyone faces.If you have tried to learn Emacs you will have struggled with the same problems everyone faces, and few tutorials to see you through it.I have dedicated the first half of the book to explaining the essence of Emacs — and in doing so, how to overcome these issues:Memorizing Emacs’s keys: You will learn Emacs one key at a time, starting with the arrow keys. To feel productive in Emacs, it’s important you start on an equal footing — without too many new concepts and keys to memorize. Each chapter will introduce more keys and concepts so you can learn at your own pace. Discovering new modes and features: Emacs is a self-documenting editor, and I will teach you how to use the apropos, info, and describe system to discover new modes and features, or help you find things you forgot! Customizing Emacs: You don’t have to learn Emacs Lisp to alter a lot of Emacs’s functionality. Most changes you want to make are possible using Emacs’s Customize interface and I will show you how to use it efficiently. Understanding the terminology: Emacs is so old it predates almost every other editor and all modern user interfaces. I have an entire chapter dedicated to the unique terminology in Emacs; how it is different from other editors, and what that means to you.