The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Designing Data-Intensive Applications


Martin Kleppmann - 2015
    Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Programming Collective Intelligence: Building Smart Web 2.0 Applications


Toby Segaran - 2002
    With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in a dataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."-- Tim Wolters, CTO, Collective Intellect

Information Dashboard Design: The Effective Visual Communication of Data


Stephen Few - 2006
    Although dashboards are potentially powerful, this potential is rarely realized. The greatest display technology in the world won't solve this if you fail to use effective visual design. And if a dashboard fails to tell you precisely what you need to know in an instant, you'll never use it, even if it's filled with cute gauges, meters, and traffic lights. Don't let your investment in dashboard technology go to waste.This book will teach you the visual design skills you need to create dashboards that communicate clearly, rapidly, and compellingly. Information Dashboard Design will explain how to:Avoid the thirteen mistakes common to dashboard design Provide viewers with the information they need quickly and clearly Apply what we now know about visual perception to the visual presentation of information Minimize distractions, cliches, and unnecessary embellishments that create confusion Organize business information to support meaning and usability Create an aesthetically pleasing viewing experience Maintain consistency of design to provide accurate interpretation Optimize the power of dashboard technology by pairing it with visual effectiveness Stephen Few has over 20 years of experience as an IT innovator, consultant, and educator. As Principal of the consultancy Perceptual Edge, Stephen focuses on data visualization for analyzing and communicating quantitative business information. He provides consulting and training services, speaks frequently at conferences, and teaches in the MBA program at the University of California in Berkeley. He is also the author of Show Me the Numbers: Designing Tables and Graphs to Enlighten. Visit his website at www.perceptualedge.com.

The New Kingmakers: How Developers Conquered the World


Stephen O’Grady - 2013
    In a 1995 interview, the late Steve Jobs claimed that the secret to his and Apple’s success was talent. “We’ve gone to exceptional lengths to hire the best people,” he said, believing that the talented resource was twenty-five times more valuable than an average alternative. For Microsoft founder Bill Gates, the multiple was even higher:A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer.While the actual number might be up for debate, the importance of technical talent is not. The most successful companies today are those that understand the strategic role that developers will play in their success or failure. Not just successful technology companies – virtually every company today needs a developer strategy. There’s a reason that ESPN and Sears have rolled out API programs, that companies are being bought not for their products but their people. The reason is that developers are the most valuable resource in business.How did we get here? How did developers become the most important constituency in business seemingly overnight? The New Kingmakers explores the rise of the developer class, its implications and provides suggestions for navigating the new developer-centric landscape.

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

Scrum: The Art of Doing Twice the Work in Half the Time


Jeff Sutherland - 2014
    It already drives most of the world’s top technology companies. And now it’s starting to spread to every domain where leaders wrestle with complex projects. If you’ve ever been startled by how fast the world is changing, Scrum is one of the reasons why. Productivity gains of as much as 1200% have been recorded, and there’s no more lucid – or compelling – explainer of Scrum and its bright promise than Jeff Sutherland, the man who put together the first Scrum team more than twenty years ago. The thorny problem Jeff began tackling back then boils down to this: people are spectacularly bad at doing things with agility and efficiency. Best laid plans go up in smoke. Teams often work at cross purposes to each other. And when the pressure rises, unhappiness soars. Drawing on his experience as a West Point-educated fighter pilot, biometrics expert, early innovator of ATM technology, and V.P. of engineering or CTO at eleven different technology companies, Jeff began challenging those dysfunctional realities, looking for solutions that would have global impact. In this book you’ll journey to Scrum’s front lines where Jeff’s system of deep accountability, team interaction, and constant iterative improvement is, among other feats, bringing the FBI into the 21st century, perfecting the design of an affordable 140 mile per hour/100 mile per gallon car, helping NPR report fast-moving action in the Middle East, changing the way pharmacists interact with patients, reducing poverty in the Third World, and even helping people plan their weddings and accomplish weekend chores. Woven with insights from martial arts, judicial decision making, advanced aerial combat, robotics, and many other disciplines, Scrum is consistently riveting. But the most important reason to read this book is that it may just help you achieve what others consider unachievable – whether it be inventing a trailblazing technology, devising a new system of education, pioneering a way to feed the hungry, or, closer to home, a building a foundation for your family to thrive and prosper.

Agile Product Management with Scrum: Creating Products That Customers Love


Roman Pichler - 2008
    He describes a broad range of agile product management practices, including making agile product discovery work, taking advantage of emergent requirements, creating the minimal marketable product, leveraging early customer feedback, and working closely with the development team. Benefitting from Pichler's extensive experience, you'll learn how Scrum product ownership differs from traditional product management and how to avoid and overcome the common challenges that Scrum product owners face. Coverage includesUnderstanding the product owner's role: what product owners do, how they do it, and the surprising implicationsEnvisioning the product: creating a compelling product vision to galvanize and guide the team and stakeholdersGrooming the product backlog: managing the product backlog effectively even for the most complex productsPlanning the release: bringing clarity to scheduling, budgeting, and functionality decisionsCollaborating in sprint meetings: understanding the product owner's role in sprint meetings, including the dos and don'tsTransitioning into product ownership: succeeding as a product owner and establishing the role in the enterprise This book is an indispensable resource for anyone who works as a product owner, or expects to do so, as well as executives and coaches interested in establishing agile product management.

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management


Michael J.A. Berry - 1997
    Packed with more than forty percent new and updated material, this edition shows business managers, marketing analysts, and data mining specialists how to harness fundamental data mining methods and techniques to solve common types of business problemsEach chapter covers a new data mining technique, and then shows readers how to apply the technique for improved marketing, sales, and customer supportThe authors build on their reputation for concise, clear, and practical explanations of complex concepts, making this book the perfect introduction to data miningMore advanced chapters cover such topics as how to prepare data for analysis and how to create the necessary infrastructure for data miningCovers core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, clustering, and survival analysis

Building Data Science Teams


D.J. Patil - 2011
    In this in-depth report, data scientist DJ Patil explains the skills, perspectives, tools and processes that position data science teams for success.Topics include: What it means to be "data driven." The unique roles of data scientists. The four essential qualities of data scientists. Patil's first-hand experience building the LinkedIn data science team.

Cracking the PM Interview: How to Land a Product Manager Job in Technology


Gayle Laakmann McDowell - 2013
    Cracking the PM Interview is a comprehensive book about landing a product management role in a startup or bigger tech company. Learn how the ambiguously-named "PM" (product manager / program manager) role varies across companies, what experience you need, how to make your existing experience translate, what a great PM resume and cover letter look like, and finally, how to master the interview: estimation questions, behavioral questions, case questions, product questions, technical questions, and the super important "pitch."

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Shape Up: Stop Running in Circles and Ship Work that Matters


Ryan Singer - 2019
    "This book is a guide to how we do product development at Basecamp. It’s also a toolbox full of techniques that you can apply in your own way to your own process.Whether you’re a founder, CTO, product manager, designer, or developer, you’re probably here because of some common challenges that all software companies have to face."

Pattern Recognition and Machine Learning


Christopher M. Bishop - 2006
    However, these activities can be viewed as two facets of the same field, and together they have undergone substantial development over the past ten years. In particular, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic models. Also, the practical applicability of Bayesian methods has been greatly enhanced through the development of a range of approximate inference algorithms such as variational Bayes and expectation propagation. Similarly, new models based on kernels have had a significant impact on both algorithms and applications. This new textbook reflects these recent developments while providing a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners, and assumes no previous knowledge of pattern recognition or machine learning concepts. Knowledge of multivariate calculus and basic linear algebra is required, and some familiarity with probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.

Planning for Big Data


Edd Wilder-James - 2004
    From creating new data-driven products through to increasing operational efficiency, big data has the potential to makeyour organization both more competitive and more innovative.As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven.Written by O'Reilly Radar's experts on big data, this anthology describes:- The broad industry changes heralded by the big data era- What big data is, what it means to your business, and how to start solving data problems- The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions- The landscape of NoSQL databases and their relative merits- How visualization plays an important part in data work