Learning Spark: Lightning-Fast Big Data Analysis


Holden Karau - 2013
    How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates. Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm Learn how to deploy interactive, batch, and streaming applications Connect to data sources including HDFS, Hive, JSON, and S3 Master advanced topics like data partitioning and shared variables

What Hedge Funds Really Do: An Introduction to Portfolio Management


Philip J. Romero - 2014
    We’ve comea long way since then. With this book, Drs. Romero and Balch liftthe veil from many of these once-opaque concepts in high-techfinance. We can all benefit from learning how the cooperationbetween wetware and software creates fitter models. This bookdoes a fantastic job describing how the latest advances in financialmodeling and data science help today’s portfolio managerssolve these greater riddles. —Michael Himmel, ManagingPartner, Essex Asset ManagementI applaud Phil Romero’s willingness to write about the hedgefund world, an industry that is very private, often flamboyant,and easily misunderstood. As with every sector of the investmentlandscape, the hedge fund industry varies dramaticallyfrom quantitative “black box” technology, to fundamental researchand old-fashioned stock picking. This book helps investorsdistinguish between these diverse opposites and understandtheir place in the new evolving world of finance. —Mick Elfers,Founder and Chief Investment Strategist, Irvington Capital

Introduction to Machine Learning with Python: A Guide for Data Scientists


Andreas C. Müller - 2015
    If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.With this book, you'll learn:Fundamental concepts and applications of machine learningAdvantages and shortcomings of widely used machine learning algorithmsHow to represent data processed by machine learning, including which data aspects to focus onAdvanced methods for model evaluation and parameter tuningThe concept of pipelines for chaining models and encapsulating your workflowMethods for working with text data, including text-specific processing techniquesSuggestions for improving your machine learning and data science skills

Algorithms


Robert Sedgewick - 1983
    This book surveys the most important computer algorithms currently in use and provides a full treatment of data structures and algorithms for sorting, searching, graph processing, and string processing -- including fifty algorithms every programmer should know. In this edition, new Java implementations are written in an accessible modular programming style, where all of the code is exposed to the reader and ready to use.The algorithms in this book represent a body of knowledge developed over the last 50 years that has become indispensable, not just for professional programmers and computer science students but for any student with interests in science, mathematics, and engineering, not to mention students who use computation in the liberal arts.The companion web site, algs4.cs.princeton.edu contains An online synopsis Full Java implementations Test data Exercises and answers Dynamic visualizations Lecture slides Programming assignments with checklists Links to related material The MOOC related to this book is accessible via the "Online Course" link at algs4.cs.princeton.edu. The course offers more than 100 video lecture segments that are integrated with the text, extensive online assessments, and the large-scale discussion forums that have proven so valuable. Offered each fall and spring, this course regularly attracts tens of thousands of registrants.Robert Sedgewick and Kevin Wayne are developing a modern approach to disseminating knowledge that fully embraces technology, enabling people all around the world to discover new ways of learning and teaching. By integrating their textbook, online content, and MOOC, all at the state of the art, they have built a unique resource that greatly expands the breadth and depth of the educational experience.

Bayesian Reasoning and Machine Learning


David Barber - 2012
    They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. People who know the methods have their choice of rewarding jobs. This hands-on text opens these opportunities to computer science students with modest mathematical backgrounds. It is designed for final-year undergraduates and master's students with limited background in linear algebra and calculus. Comprehensive and coherent, it develops everything from basic reasoning to advanced techniques within the framework of graphical models. Students learn more than a menu of techniques, they develop analytical and problem-solving skills that equip them for the real world. Numerous examples and exercises, both computer based and theoretical, are included in every chapter. Resources for students and instructors, including a MATLAB toolbox, are available online.

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World


Bruce Schneier - 2015
    Your online and in-store purchasing patterns are recorded, and reveal if you're unemployed, sick, or pregnant. Your e-mails and texts expose your intimate and casual friends. Google knows what you’re thinking because it saves your private searches. Facebook can determine your sexual orientation without you ever mentioning it.The powers that surveil us do more than simply store this information. Corporations use surveillance to manipulate not only the news articles and advertisements we each see, but also the prices we’re offered. Governments use surveillance to discriminate, censor, chill free speech, and put people in danger worldwide. And both sides share this information with each other or, even worse, lose it to cybercriminals in huge data breaches.Much of this is voluntary: we cooperate with corporate surveillance because it promises us convenience, and we submit to government surveillance because it promises us protection. The result is a mass surveillance society of our own making. But have we given up more than we’ve gained? In Data and Goliath, security expert Bruce Schneier offers another path, one that values both security and privacy. He brings his bestseller up-to-date with a new preface covering the latest developments, and then shows us exactly what we can do to reform government surveillance programs, shake up surveillance-based business models, and protect our individual privacy. You'll never look at your phone, your computer, your credit cards, or even your car in the same way again.

The Sciences of the Artificial


Herbert A. Simon - 1969
    There are updates throughout the book as well. These take into account important advances in cognitive psychology and the science of design while confirming and extending the book's basic thesis: that a physical symbol system has the necessary and sufficient means for intelligent action. The chapter "Economic Reality" has also been revised to reflect a change in emphasis in Simon's thinking about the respective roles of organizations and markets in economic systems."People sometimes ask me what they should read to find out about artificial intelligence. Herbert Simon's book The Sciences of the Artificial is always on the list I give them. Every page issues a challenge to conventional thinking, and the layman who digests it well will certainly understand what the field of artificial intelligence hopes to accomplish. I recommend it in the same spirit that I recommend Freud to people who ask about psychoanalysis, or Piaget to those who ask about child psychology: If you want to learn about a subject, start by reading its founding fathers." -- George A. Miller

The R Book


Michael J. Crawley - 2007
    The R language is recognised as one of the most powerful and flexible statistical software packages, and it enables the user to apply many statistical techniques that would be impossible without such software to help implement such large data sets.

Modern Information Retrieval


Ricardo Baeza-Yates - 1999
    The timely provision of relevant information with minimal 'noise' is critical to modern society and this is what information retrieval (IR) is all about. It is a dynamic subject, with current changes driven by the expansion of the World Wide Web, the advent of modern and inexpensive graphical user interfaces and the development of reliable and low-cost mass storage devices. Modern Information Retrieval discusses all these changes in great detail and can be used for a first course on IR as well as graduate courses on the topic.The organization of the book, which includes a comprehensive glossary, allows the reader to either obtain a broad overview or detailed knowledge of all the key topics in modern IR. The heart of the book is the nine chapters written by Baeza-Yates and Ribeiro-Neto, two leading exponents in the field. For those wishing to delve deeper into key areas there are further state-of-the-art ch

Amazon Elastic Compute Cloud (EC2) User Guide


Amazon Web Services - 2012
    This is official Amazon Web Services (AWS) documentation for Amazon Compute Cloud (Amazon EC2).This guide explains the infrastructure provided by the Amazon EC2 web service, and steps you through how to configure and manage your virtual servers using the AWS Management Console (an easy-to-use graphical interface), the Amazon EC2 API, or web tools and utilities.Amazon EC2 provides resizable computing capacity—literally, server instances in Amazon's data centers—that you use to build and host your software systems.

Machine Learning: The Art and Science of Algorithms That Make Sense of Data


Peter Flach - 2012
    Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

PCs for Dummies


Dan Gookin - 1992
    They have also sprouted new and wondrous capabilities at a dizzying pace. This 11th Edition of the all-time bestselling PC guide has been polished and honed to deliver everything you need to know about your twenty-first-century PC -- from what plugs into what to adjusting your monitor to burning DVDs, and much more.Whether you want to go online, install a firewall, live the digital life, or finally get a handle on the whole computer software concept, this fun, plain-English handbook is here to answer all your questions PC questions. You'll find out why Windows Vista is the way to go and how to use it to get everywhere else. And, you'll pick up Web and email tricks and learn about all the new levels of PC security. Discover how to: Set up your PC Use Vista menus Store your stuff on Memory Cards Record live TV Download digital photos Connect to a wireless network Explore the Internet safely Print perfect documents, photos, and more Use your PC as the new hub of your digital worldComplete with helpful hints on how to avoid beginner mistakes, a list of extras and accessories you may want for your PC, and insider tips from a PC guru. PCs for Dummies, 11th Edition is the one PC accessory you can't do without.

The Intelligent Web: Search, Smart Algorithms, and Big Data


Gautam Shroff - 2013
    These days, linger over a Web page selling lamps, and they will turn up at the advertising margins as you move around the Internet, reminding you, tempting you to make that purchase. Search engines such as Google can now look deep into the data on the Web to pull out instances of the words you are looking for. And there are pages that collect and assess information to give you a snapshot of changing political opinion. These are just basic examples of the growth of Web intelligence, as increasingly sophisticated algorithms operate on the vast and growing amount of data on the Web, sifting, selecting, comparing, aggregating, correcting; following simple but powerful rules to decide what matters. While original optimism for Artificial Intelligence declined, this new kind of machine intelligence is emerging as the Web grows ever larger and more interconnected.Gautam Shroff takes us on a journey through the computer science of search, natural language, text mining, machine learning, swarm computing, and semantic reasoning, from Watson to self-driving cars. This machine intelligence may even mimic at a basic level what happens in the brain.

The Elements of Data Analytic Style


Jeffrey Leek - 2015
    This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. It is based in part on the authors blog posts, lecture materials, and tutorials. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world that has enrolled more than 1.76 million people. The book is useful as a companion to introductory courses in data science or data analysis. It is also a useful reference tool for people tasked with reading and critiquing data analyses. It is based on the authors popular open-source guides available through his Github account (https://github.com/jtleek). The paper is also available through Leanpub (https://leanpub.com/datastyle), if the book is purchased on that platform you are entitled to lifetime free updates.

The Node Beginner Book


Manuel Kiessling - 2011
    The aim of The Node Beginner Book is to get you started with developing applications for Node.js, teaching you everything you need to know about advanced JavaScript along the way on 59 pages.