Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

MongoDB: The Definitive Guide


Kristina Chodorow - 2010
    Learn how easy it is to handle data as self-contained JSON-style documents, rather than as records in a relational database.Explore ways that document-oriented storage will work for your projectLearn how MongoDB’s schema-free data model handles documents, collections, and multiple databasesExecute basic write operations, and create complex queries to find data with any criteriaUse indexes, aggregation tools, and other advanced query techniquesLearn about monitoring, security and authentication, backup and repair, and moreSet up master-slave and automatic failover replication in MongoDBUse sharding to scale MongoDB horizontally, and learn how it impacts applicationsGet example applications written in Java, PHP, Python, and Ruby

The Linux Programming Interface: A Linux and Unix System Programming Handbook


Michael Kerrisk - 2010
    You'll learn how to:Read and write files efficiently Use signals, clocks, and timers Create processes and execute programs Write secure programs Write multithreaded programs using POSIX threads Build and use shared libraries Perform interprocess communication using pipes, message queues, shared memory, and semaphores Write network applications with the sockets API While The Linux Programming Interface covers a wealth of Linux-specific features, including epoll, inotify, and the /proc file system, its emphasis on UNIX standards (POSIX.1-2001/SUSv3 and POSIX.1-2008/SUSv4) makes it equally valuable to programmers working on other UNIX platforms.The Linux Programming Interface is the most comprehensive single-volume work on the Linux and UNIX programming interface, and a book that's destined to become a new classic.Praise for The Linux Programming Interface "If I had to choose a single book to sit next to my machine when writing software for Linux, this would be it." —Martin Landers, Software Engineer, Google "This book, with its detailed descriptions and examples, contains everything you need to understand the details and nuances of the low-level programming APIs in Linux . . . no matter what the level of reader, there will be something to be learnt from this book." —Mel Gorman, Author of Understanding the Linux Virtual Memory Manager "Michael Kerrisk has not only written a great book about Linux programming and how it relates to various standards, but has also taken care that bugs he noticed got fixed and the man pages were (greatly) improved. In all three ways, he has made Linux programming easier. The in-depth treatment of topics in The Linux Programming Interface . . . makes it a must-have reference for both new and experienced Linux programmers." —Andreas Jaeger, Program Manager, openSUSE, Novell "Michael's inexhaustible determination to get his information right, and to express it clearly and concisely, has resulted in a strong reference source for programmers. While this work is targeted at Linux programmers, it will be of value to any programmer working in the UNIX/POSIX ecosystem." —David Butenhof, Author of Programming with POSIX Threads and Contributor to the POSIX and UNIX Standards ". . . a very thorough—yet easy to read—explanation of UNIX system and network programming, with an emphasis on Linux systems. It's certainly a book I'd recommend to anybody wanting to get into UNIX programming (in general) or to experienced UNIX programmers wanting to know 'what's new' in the popular GNU/Linux system." —Fernando Gont, Network Security Researcher, IETF Participant, and RFC Author ". . . encyclopedic in the breadth and depth of its coverage, and textbook-like in its wealth of worked examples and exercises. Each topic is clearly and comprehensively covered, from theory to hands-on working code. Professionals, students, educators, this is the Linux/UNIX reference that you have been waiting for." —Anthony Robins, Associate Professor of Computer Science, The University of Otago "I've been very impressed by the precision, the quality and the level of detail Michael Kerrisk put in his book. He is a great expert of Linux system calls and lets us share his knowledge and understanding of the Linux APIs." —Christophe Blaess, Author of Programmation systeme en C sous Linux ". . . an essential resource for the serious or professional Linux and UNIX systems programmer. Michael Kerrisk covers the use of all the key APIs across both the Linux and UNIX system interfaces with clear descriptions and tutorial examples and stresses the importance and benefits of following standards such as the Single UNIX Specification and POSIX 1003.1." —Andrew Josey, Director, Standards, The Open Group, and Chair of the POSIX 1003.1 Working Group "What could be better than an encyclopedic reference to the Linux system, from the standpoint of the system programmer, written by none other than the maintainer of the man pages himself? The Linux Programming Interface is comprehensive and detailed. I firmly expect it to become an indispensable addition to my programming bookshelf." —Bill Gallmeister, Author of POSIX.4 Programmer's Guide: Programming for the Real World ". . . the most complete and up-to-date book about Linux and UNIX system programming. If you're new to Linux system programming, if you're a UNIX veteran focused on portability while interested in learning the Linux way, or if you're simply looking for an excellent reference about the Linux programming interface, then Michael Kerrisk's book is definitely the companion you want on your bookshelf." —Loic Domaigne, Chief Software Architect (Embedded), Corpuls.com

Planning for Big Data


Edd Wilder-James - 2004
    From creating new data-driven products through to increasing operational efficiency, big data has the potential to makeyour organization both more competitive and more innovative.As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven.Written by O'Reilly Radar's experts on big data, this anthology describes:- The broad industry changes heralded by the big data era- What big data is, what it means to your business, and how to start solving data problems- The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions- The landscape of NoSQL databases and their relative merits- How visualization plays an important part in data work

Algorithms in a Nutshell


George T. Heineman - 2008
    Algorithms in a Nutshell describes a large number of existing algorithms for solving a variety of problems, and helps you select and implement the right algorithm for your needs -- with just enough math to let you understand and analyze algorithm performance. With its focus on application, rather than theory, this book provides efficient code solutions in several programming languages that you can easily adapt to a specific project. Each major algorithm is presented in the style of a design pattern that includes information to help you understand why and when the algorithm is appropriate. With this book, you will:Solve a particular coding problem or improve on the performance of an existing solutionQuickly locate algorithms that relate to the problems you want to solve, and determine why a particular algorithm is the right one to useGet algorithmic solutions in C, C++, Java, and Ruby with implementation tipsLearn the expected performance of an algorithm, and the conditions it needs to perform at its bestDiscover the impact that similar design decisions have on different algorithmsLearn advanced data structures to improve the efficiency of algorithmsWith Algorithms in a Nutshell, you'll learn how to improve the performance of key algorithms essential for the success of your software applications.

Growing Object-Oriented Software, Guided by Tests


Steve Freeman - 2009
    This one's a keeper." --Robert C. Martin "If you want to be an expert in the state of the art in TDD, you need to understand the ideas in this book."--Michael Feathers Test-Driven Development (TDD) is now an established technique for delivering better software faster. TDD is based on a simple idea: Write tests for your code before you write the code itself. However, this simple idea takes skill and judgment to do well. Now there's a practical guide to TDD that takes you beyond the basic concepts. Drawing on a decade of experience building real-world systems, two TDD pioneers show how to let tests guide your development and "grow" software that is coherent, reliable, and maintainable. Steve Freeman and Nat Pryce describe the processes they use, the design principles they strive to achieve, and some of the tools that help them get the job done. Through an extended worked example, you'll learn how TDD works at multiple levels, using tests to drive the features and the object-oriented structure of the code, and using Mock Objects to discover and then describe relationships between objects. Along the way, the book systematically addresses challenges that development teams encounter with TDD--from integrating TDD into your processes to testing your most difficult features. Coverage includes - Implementing TDD effectively: getting started, and maintaining your momentum throughout the project - Creating cleaner, more expressive, more sustainable code - Using tests to stay relentlessly focused on sustaining quality - Understanding how TDD, Mock Objects, and Object-Oriented Design come together in the context of a real software development project - Using Mock Objects to guide object-oriented designs - Succeeding where TDD is difficult: managing complex test data, and testing persistence and concurrency

Microservice Patterns


Chris Richardson - 2017
    However, successful applications have a habit of growing. Eventually the development team ends up in what is known as monolithic hell. All aspects of software development and deployment become painfully slow. The solution is to adopt the microservice architecture, which structures an application as a services, organized around business capabilities. This architecture accelerates software development and enables continuous delivery and deployment of complex software applications.Microservice Patterns teaches enterprise developers and architects how to build applications with the microservice architecture. Rather than simply advocating for the use the microservice architecture, this clearly-written guide takes a balanced, pragmatic approach. You'll discover that the microservice architecture is not a silver bullet and has both benefits and drawbacks. Along the way, you'll learn a pattern language that will enable you to solve the issues that arise when using the microservice architecture. This book also teaches you how to refactor a monolithic application to a microservice architecture.

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora