Book picks similar to
Information Retrieval: Implementing and Evaluating Search Engines by Stefan Büttcher
cs
data-science
search-engine
information-retrieval
Mining of Massive Datasets
Anand Rajaraman - 2011
This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Probabilistic Graphical Models: Principles and Techniques
Daphne Koller - 2009
The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality.Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.
What is DevOps?
Mike Loukides - 2012
Old-style system administrators may be disappearing in the face of automation and cloud computing, but operations have become more significant than ever. As this O'Reilly Radar Report explains, we're moving into a more complex arrangement known as "DevOps."Mike Loukides, O'Reilly's VP of Content Strategy, provides an incisive look into this new world of operations, where IT specialists are becoming part of the development team. In an environment with thousands of servers, these specialists now write the code that maintains the infrastructure. Even applications that run in the cloud have to be resilient and fault tolerant, need to be monitored, and must adjust to huge swings in load. That was underscored by Amazon's EBS outage last year.From the discussions at O'Reilly's Velocity Conference, it's evident that many operations specialists are quickly adapting to the DevOps reality. But as a whole, the industry has just scratched the surface. This report tells you why.
R for Everyone: Advanced Analytics and Graphics
Jared P. Lander - 2013
R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES - Exploring R, RStudio, and R packages - Using R for math: variable types, vectors, calling functions, and more - Exploiting data structures, including data.frames, matrices, and lists - Creating attractive, intuitive statistical graphics - Writing user-defined functions - Controlling program flow with if, ifelse, and complex checks - Improving program efficiency with group manipulations - Combining and reshaping multiple datasets - Manipulating strings using R's facilities and regular expressions - Creating normal, binomial, and Poisson probability distributions - Programming basic statistics: mean, standard deviation, and t-tests - Building linear, generalized linear, and nonlinear models - Assessing the quality of models and variable selection - Preventing overfitting, using the Elastic Net and Bayesian methods - Analyzing univariate and multivariate time series data - Grouping data via K-means and hierarchical clustering - Preparing reports, slideshows, and web pages with knitr - Building reusable R packages with devtools and Rcpp - Getting involved with the R global community
Learning SQL
Alan Beaulieu - 2005
If you're working with a relational database--whether you're writing applications, performing administrative tasks, or generating reports--you need to know how to interact with your data. Even if you are using a tool that generates SQL for you, such as a reporting tool, there may still be cases where you need to bypass the automatic generation feature and write your own SQL statements.To help you attain this fundamental SQL knowledge, look to "Learning SQL," an introductory guide to SQL, designed primarily for developers just cutting their teeth on the language."Learning SQL" moves you quickly through the basics and then on to some of the more commonly used advanced features. Among the topics discussed: The history of the computerized databaseSQL Data Statements--those used to create, manipulate, and retrieve data stored in your database; example statements include select, update, insert, and deleteSQL Schema Statements--those used to create database objects, such as tables, indexes, and constraintsHow data sets can interact with queriesThe importance of subqueriesData conversion and manipulation via SQL's built-in functionsHow conditional logic can be used in Data StatementsBest of all, "Learning SQL" talks to you in a real-world manner, discussing various platform differences that you're likely to encounter and offering a series of chapter exercises that walk you through the learning process. Whenever possible, the book sticks to the features included in the ANSI SQL standards. This means you'll be able to apply what you learn to any of several different databases; the book covers MySQL, Microsoft SQL Server, and Oracle Database, but the features and syntax should apply just as well (perhaps with some tweaking) to IBM DB2, Sybase Adaptive Server, and PostgreSQL.Put the power and flexibility of SQL to work. With "Learning SQL" you can master this important skill and know that the SQL statements you write are indeed correct.
Engineering a Compiler
Keith D. Cooper - 2003
No longer is execution speed the sole criterion for judging compiled code. Today, code might be judged on how small it is, how much power it consumes, how well it compresses, or how many page faults it generates. In this evolving environment, the task of building a successful compiler relies upon the compiler writer's ability to balance and blend algorithms, engineering insights, and careful planning. Today's compiler writer must choose a path through a design space that is filled with diverse alternatives, each with distinct costs, advantages, and complexities.Engineering a Compiler explores this design space by presenting some of the ways these problems have been solved, and the constraints that made each of those solutions attractive. By understanding the parameters of the problem and their impact on compiler design, the authors hope to convey both the depth of the problems and the breadth of possible solutions. Their goal is to cover a broad enough selection of material to show readers that real tradeoffs exist, and that the impact of those choices can be both subtle and far-reaching.Authors Keith Cooper and Linda Torczon convey both the art and the science of compiler construction and show best practice algorithms for the major passes of a compiler. Their text re-balances the curriculum for an introductory course in compiler construction to reflect the issues that arise in current practice.
Apprenticeship Patterns: Guidance for the Aspiring Software Craftsman
Dave Hoover - 2009
To grow professionally, you also need soft skills and effective learning techniques. Honing those skills is what this book is all about. Authors Dave Hoover and Adewale Oshineye have cataloged dozens of behavior patterns to help you perfect essential aspects of your craft. Compiled from years of research, many interviews, and feedback from O'Reilly's online forum, these patterns address difficult situations that programmers, administrators, and DBAs face every day. And it's not just about financial success. Apprenticeship Patterns also approaches software development as a means to personal fulfillment. Discover how this book can help you make the best of both your life and your career. Solutions to some common obstacles that this book explores in-depth include:Burned out at work? "Nurture Your Passion" by finding a pet project to rediscover the joy of problem solving.Feeling overwhelmed by new information? Re-explore familiar territory by building something you've built before, then use "Retreat into Competence" to move forward again.Stuck in your learning? Seek a team of experienced and talented developers with whom you can "Be the Worst" for a while. "Brilliant stuff! Reading this book was like being in a time machine that pulled me back to those key learning moments in my career as a professional software developer and, instead of having to learn best practices the hard way, I had a guru sitting on my shoulder guiding me every step towards master craftsmanship. I'll certainly be recommending this book to clients. I wish I had this book 14 years ago!" -Russ Miles, CEO, OpenCredo
The Little Schemer
Daniel P. Friedman - 1974
The authors' enthusiasm for their subject is compelling as they present abstract concepts in a humorous and easy-to-grasp fashion. Together, these books will open new doors of thought to anyone who wants to find out what computing is really about. The Little Schemer introduces computing as an extension of arithmetic and algebra; things that everyone studies in grade school and high school. It introduces programs as recursive functions and briefly discusses the limits of what computers can do. The authors use the programming language Scheme, and interesting foods to illustrate these abstract ideas. The Seasoned Schemer informs the reader about additional dimensions of computing: functions as values, change of state, and exceptional cases. The Little LISPer has been a popular introduction to LISP for many years. It had appeared in French and Japanese. The Little Schemer and The Seasoned Schemer are worthy successors and will prove equally popular as textbooks for Scheme courses as well as companion texts for any complete introductory course in Computer Science.
Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python
Scott Hartshorn - 2016
They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.
The Art of Unit Testing: With Examples in .NET
Roy Osherove - 2009
It guides you step by step from simple tests to tests that are maintainable, readable, and trustworthy. It covers advanced subjects like mocks, stubs, and frameworks such as Typemock Isolator and Rhino Mocks. And you'll learn about advanced test patterns and organization, working with legacy code and even untestable code. The book discusses tools you need when testing databases and other technologies. It's written for .NET developers but others will also benefit from this book.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.Table of ContentsThe basics of unit testingA first unit testUsing stubs to break dependenciesInteraction testing using mock objectsIsolation (mock object) frameworksTest hierarchies and organizationThe pillars of good testsIntegrating unit testing into the organizationWorking with legacy code
T-SQL Fundamentals
Itzik Ben-Gan - 2016
Itzik Ben-Gan explains key T-SQL concepts and helps you apply your knowledge with hands-on exercises. The book first introduces T-SQL's roots and underlying logic. Next, it walks you through core topics such as single-table queries, joins, subqueries, table expressions, and set operators. Then the book covers more-advanced data-query topics such as window functions, pivoting, and grouping sets. The book also explains how to modify data, work with temporal tables, and handle transactions, and provides an overview of programmable objects.
Microsoft Data Platform MVP Itzik Ben-Gan shows you how to: Review core SQL concepts and its mathematical roots Create tables and enforce data integrity Perform effective single-table queries by using the SELECT statement Query multiple tables by using joins, subqueries, table expressions, and set operators Use advanced query techniques such as window functions, pivoting, and grouping sets Insert, update, delete, and merge data Use transactions in a concurrent environment Get started with programmable objects-from variables and batches to user-defined functions, stored procedures, triggers, and dynamic SQL
Bandit Algorithms for Website Optimization
John Myles White - 2012
Author John Myles White shows you how this powerful class of algorithms can help you boost website traffic, convert visitors to customers, and increase many other measures of success.This is the first developer-focused book on bandit algorithms, which were previously described only in research papers. You’ll quickly learn the benefits of several simple algorithms—including the epsilon-Greedy, Softmax, and Upper Confidence Bound (UCB) algorithms—by working through code examples written in Python, which you can easily adapt for deployment on your own website.Learn the basics of A/B testing—and recognize when it’s better to use bandit algorithmsDevelop a unit testing framework for debugging bandit algorithmsGet additional code examples written in Julia, Ruby, and JavaScript with supplemental online materials
Growing Object-Oriented Software, Guided by Tests
Steve Freeman - 2009
This one's a keeper." --Robert C. Martin "If you want to be an expert in the state of the art in TDD, you need to understand the ideas in this book."--Michael Feathers Test-Driven Development (TDD) is now an established technique for delivering better software faster. TDD is based on a simple idea: Write tests for your code before you write the code itself. However, this simple idea takes skill and judgment to do well. Now there's a practical guide to TDD that takes you beyond the basic concepts. Drawing on a decade of experience building real-world systems, two TDD pioneers show how to let tests guide your development and "grow" software that is coherent, reliable, and maintainable. Steve Freeman and Nat Pryce describe the processes they use, the design principles they strive to achieve, and some of the tools that help them get the job done. Through an extended worked example, you'll learn how TDD works at multiple levels, using tests to drive the features and the object-oriented structure of the code, and using Mock Objects to discover and then describe relationships between objects. Along the way, the book systematically addresses challenges that development teams encounter with TDD--from integrating TDD into your processes to testing your most difficult features. Coverage includes - Implementing TDD effectively: getting started, and maintaining your momentum throughout the project - Creating cleaner, more expressive, more sustainable code - Using tests to stay relentlessly focused on sustaining quality - Understanding how TDD, Mock Objects, and Object-Oriented Design come together in the context of a real software development project - Using Mock Objects to guide object-oriented designs - Succeeding where TDD is difficult: managing complex test data, and testing persistence and concurrency
97 Things Every Programmer Should Know: Collective Wisdom from the Experts
Kevlin Henney - 2010
With the 97 short and extremely useful tips for programmers in this book, you'll expand your skills by adopting new approaches to old problems, learning appropriate best practices, and honing your craft through sound advice.With contributions from some of the most experienced and respected practitioners in the industry--including Michael Feathers, Pete Goodliffe, Diomidis Spinellis, Cay Horstmann, Verity Stob, and many more--this book contains practical knowledge and principles that you can apply to all kinds of projects.A few of the 97 things you should know:"Code in the Language of the Domain" by Dan North"Write Tests for People" by Gerard Meszaros"Convenience Is Not an -ility" by Gregor Hohpe"Know Your IDE" by Heinz Kabutz"A Message to the Future" by Linda Rising"The Boy Scout Rule" by Robert C. Martin (Uncle Bob)"Beware the Share" by Udi Dahan
Site Reliability Engineering: How Google Runs Production Systems
Betsy Beyer - 2016
So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?In this collection of essays and articles, key members of Google's Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You'll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient--lessons directly applicable to your organization.This book is divided into four sections: Introduction--Learn what site reliability engineering is and why it differs from conventional IT industry practicesPrinciples--Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)Practices--Understand the theory and practice of an SRE's day-to-day work: building and operating large distributed computing systemsManagement--Explore Google's best practices for training, communication, and meetings that your organization can use