Book picks similar to
Hadoop in Action by Chuck Lam


programming
big-data
hadoop
data-science

Professional Test Driven Development with C#: Developing Real World Applications with Tdd


James Bender - 2011
    This hands-on guide provides invaluable insight for creating successful test-driven development processes. With source code and examples featured in both C# and .NET, the book walks you through the TDD methodology and shows how it is applied to a real-world application. You'll witness the application built from scratch and details each step that is involved in the development, as well as any problems that were encountered and the solutions that were applied.Clarifies the motivation behind test-driven development (TDD), what it is, and how it works Reviews the various steps involved in developing an application and the testing that is involved prior to implementing the functionality Discusses unit testing and refactoring Professional Test-Driven Development with C# shows you how to create great TDD processes right away.

Systems Performance: Enterprise and the Cloud


Brendan Gregg - 2013
    Now, internationally renowned performance expert Brendan Gregg has brought together proven methodologies, tools, and metrics for analyzing and tuning even the most complex environments. Systems Performance: Enterprise and the Cloud focuses on Linux(R) and Unix(R) performance, while illuminating performance issues that are relevant to all operating systems. You'll gain deep insight into how systems work and perform, and learn methodologies for analyzing and improving system and application performance. Gregg presents examples from bare-metal systems and virtualized cloud tenants running Linux-based Ubuntu(R), Fedora(R), CentOS, and the illumos-based Joyent(R) SmartOS(TM) and OmniTI OmniOS(R). He systematically covers modern systems performance, including the "traditional" analysis of CPUs, memory, disks, and networks, and new areas including cloud computing and dynamic tracing. This book also helps you identify and fix the "unknown unknowns" of complex performance: bottlenecks that emerge from elements and interactions you were not aware of. The text concludes with a detailed case study, showing how a real cloud customer issue was analyzed from start to finish. Coverage includes - Modern performance analysis and tuning: terminology, concepts, models, methods, and techniques - Dynamic tracing techniques and tools, including examples of DTrace, SystemTap, and perf - Kernel internals: uncovering what the OS is doing - Using system observability tools, interfaces, and frameworks - Understanding and monitoring application performance - Optimizing CPUs: processors, cores, hardware threads, caches, interconnects, and kernel scheduling - Memory optimization: virtual memory, paging, swapping, memory architectures, busses, address spaces, and allocators - File system I/O, including caching - Storage devices/controllers, disk I/O workloads, RAID, and kernel I/O - Network-related performance issues: protocols, sockets, interfaces, and physical connections - Performance implications of OS and hardware-based virtualization, and new issues encountered with cloud computing - Benchmarking: getting accurate results and avoiding common mistakes This guide is indispensable for anyone who operates enterprise or cloud environments: system, network, database, and web admins; developers; and other professionals. For students and others new to optimization, it also provides exercises reflecting Gregg's extensive instructional experience.

Functional and Reactive Domain Modeling


Debasish Ghosh - 2016
    Domain modeling is a technique for creating a conceptual map of a problem space such as a business system or a scientific application, so that the developer can write the software more efficiently. The domain model doesn't present a solution to the problem, but instead describes the attributes, roles, and relationships of the entities involved, along with the constraints of the system.Reactive application design, which uses functional programming principles along with asynchronous non-blocking communication, promises to be a potent pattern for developing performant systems that are relatively easy to manage, maintain and evolve. Typically we call such models "reactive" because they are more responsive both to user requests and to system loads. But designing and implementing such models requires a different way of thinking. Because the core behaviors are implemented using pure functions, you can reason about the domain model just like mathematics, so your model becomes verifiable and robust.Functional and Reactive Domain Modeling teaches you how to think of the domain model in terms of pure functions and how to compose them to build larger abstractions. You will start with the basics of functional programming and gradually progress to the advanced concepts and patterns that you need to know to implement complex domain models. The book demonstrates how advanced FP patterns like algebraic data types, typeclass based design, and isolation of side-effects can make your model compose for readability and verifiability.On the subject of reactive modeling, the book focuses on higher order concurrency patterns like actors and futures. It uses the Akka framework as the reference implementation and demonstrates how advanced architectural patterns like event sourcing and CQRS can be put to great use in implementing scalable models. You will learn techniques that are radically different from the standard RDBMS based applications that are based on mutation of records. You'll also pick up important patterns like using asynchronous messaging for interaction based on non blocking concurrency and model persistence, which delivers the speed of in-memory processing along with suitable guarantees of reliability.

Real World Haskell: Code You Can Believe In


Bryan O'Sullivan - 2008
    You'll learn how to use Haskell in a variety of practical ways, from short scripts to large and demanding applications. Real World Haskell takes you through the basics of functional programming at a brisk pace, and then helps you increase your understanding of Haskell in real-world issues like I/O, performance, dealing with data, concurrency, and more as you move through each chapter. With this book, you will:Understand the differences between procedural and functional programming Learn the features of Haskell, and how to use it to develop useful programs Interact with filesystems, databases, and network services Write solid code with automated tests, code coverage, and error handling Harness the power of multicore systems via concurrent and parallel programming You'll find plenty of hands-on exercises, along with examples of real Haskell programs that you can modify, compile, and run. Whether or not you've used a functional language before, if you want to understand why Haskell is coming into its own as a practical language in so many major organizations, Real World Haskell is the best place to start.

R in a Nutshell: A Desktop Quick Reference


Joseph Adler - 2009
    R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.Understand the basics of the language, including the nature of R objectsLearn how to write R functions and build your own packagesWork with data through visualization, statistical analysis, and other methodsExplore the wealth of packages contributed by the R communityBecome familiar with the lattice graphics package for high-level data visualizationLearn about bioinformatics packages provided by Bioconductor"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."

The Art of Doing Science and Engineering: Learning to Learn


Richard Hamming - 1996
    By presenting actual experiences and analyzing them as they are described, the author conveys the developmental thought processes employed and shows a style of thinking that leads to successful results is something that can be learned. Along with spectacular successes, the author also conveys how failures contributed to shaping the thought processes. Provides the reader with a style of thinking that will enhance a person's ability to function as a problem-solver of complex technical issues. Consists of a collection of stories about the author's participation in significant discoveries, relating how those discoveries came about and, most importantly, provides analysis about the thought processes and reasoning that took place as the author and his associates progressed through engineering problems.

Kubernetes: Up & Running


Kelsey Hightower - 2016
    How's that possible? Google revealed the secret through a project called Kubernetes, an open source cluster orchestrator (based on its internal Borg system) that radically simplifies the task of building, deploying, and maintaining scalable distributed systems in the cloud. This practical guide shows you how Kubernetes and container technology can help you achieve new levels of velocity, agility, reliability, and efficiency.Authors Kelsey Hightower, Brendan Burns, and Joe Beda--who've worked on Kubernetes at Google--explain how this system fits into the lifecycle of a distributed application. You will learn how to use tools and APIs to automate scalable distributed systems, whether it is for online services, machine-learning applications, or a cluster of Raspberry Pi computers.Explore the distributed system challenges that Kubernetes addressesDive into containerized application development, using containers such as DockerCreate and run containers on Kubernetes, using Docker's Image format and container runtimeExplore specialized objects essential for running applications in productionReliably roll out new software versions without downtime or errorsGet examples of how to develop and deploy real-world applications in Kubernetes

The Rust Programming Language


Steve Klabnik
    This is the undisputed go-to guide to Rust, written by two members of the Rust core team, with feedback and contributions from 42 members of the community. The book assumes that you’ve written code in another programming language but makes no assumptions about which one, meaning the material is accessible and useful to developers from a wide variety of programming backgrounds.Known by the Rust community as "The Book," The Rust Programming Language includes concept chapters, where you’ll learn about a particular aspect of Rust, and project chapters, where you’ll apply what you’ve learned so far to build small programs.The Book opens with a quick hands-on project to introduce the basics then explores key concepts in depth, such as ownership, the type system, error handling, and fearless concurrency. Next come detailed explanations of Rust-oriented perspectives on topics like pattern matching, iterators, and smart pointers, with concrete examples and exercises--taking you from theory to practice.The Rust Programming Language will show you how to: Grasp important concepts unique to Rust like ownership, borrowing, and lifetimes Use Cargo, Rust’s built-in package manager, to build and maintain your code, including downloading and building dependencies Effectively use Rust’s zero-cost abstractions and employ your ownYou’ll learn to develop reliable code that’s speed and memory efficient, while avoiding the infamous and arcane programming pitfalls common at the systems level. When you need to dive down into lower-level control, this guide will show you how without taking on the customary risk of crashes or security holes and without requiring you to learn the fine points of a fickle toolchain.You’ll also learn how to create command line programs, build single- and multithreaded web servers, and much more.The Rust Programming Language fully embraces Rust’s potential to empower its users. This friendly and approachable guide will help you build not only your knowledge of Rust but also your ability to program with confidence in a wider variety of domains.

The Human Face of Big Data


Rick Smolan - 2012
    Its enable us to sense, measure, and understand aspects of our existence in ways never before possible. The Human Face of Big Data captures, in glorious photographs and moving essays, an extraordinary revolution sweeping, almost invisibly, through business, academia, government, healthcare, and everyday life. It's already enabling us to provide a healthier life for our children. To provide our seniors with independence while keeping them safe. To help us conserve precious resources like water and energy. To alert us to tiny changes in our health, weeks or years before we develop a life-threatening illness. To peer into our own individual genetic makeup. To create new forms of life.  And soon, as many predict, to re-engineer our own species. And we've barely scratched the surface . . . Over the past decade, Rick Smolan and Jennifer Erwitt, co-founders of Against All Odds Productions, have produced a series of ambitious global projects in collaboration with hundreds of the world's leading photographers, writers, and graphic designers. Their Day in the Life projects were credited for creating a mass market for large-format illustrated books (rare was the coffee table book without one).  Today their projects aim at sparking global conversations about emerging topics ranging from the Internet (24 Hours in Cyberspace), to Microprocessors (One Digital Day), to how the human race is learning to heal itself, (The Power to Heal) to the global water crisis (Blue Planet Run). This year Smolan and Erwitt dispatched photographers and writers in every corner of the globe to explore the world of “Big Data” and to determine if it truly does, as many in the field claim, represent a brand new toolset for humanity, helping address the biggest challenges facing our species. The book features 10 essays by noted writers:Introduction: OCEANS OF DATA by Dan GardnerChapter 1: REFLECTIONS IN A DIGITAL MIRROR by Juan Enriquez, CEO, BiotechnomomyChapter 2: OUR DATA OURSELVES by Kate Green, the EconomistChapter 3: QUANTIFYING MYSELF by AJ Jacobs, EsquireChapter 4: DARK DATA by Marc Goodman, Future Crime InstituteChapter 5:  THE SENTIENT SENSOR MESH by Susan Karlin, Fast CompanyChapter 6: TAKING THE PULSE OF THE PLANET by Esther Dyson, EDventureChapter 7: CITIZEN SCIENCE by Gareth Cook, the Boston GlobeChapter 8: A DEMOGRAPH OF ONE by Michael Malone, Forbes magazineChapter 9: THE ART OF DATA by Aaron Koblin, Google Artist in ResidenceChapter 10: DATA DRIVEN by Jonathan Harris, Cowbird The book will also feature stunning info graphics from NIGEL HOLMES.1) GOOGLING GOOGLE: all the ways Google uses Data to help humanity2) DATA IS THE NEW OIL3) THE WORLD ACCORDING TO TWITTER4) AUCTIONING EYEBALLS: The world of Internet advertising5) FACEBOOK: A Billion Friends

Practical SQL: A Beginner's Guide to Storytelling with Data


Anthony DeBarros - 2022
    An approachable guide to programming in SQL (Structured Query Language) that will teach even beginning programmers how to build powerful databases and analyze data to find meaningful information.Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language) written by longtime professional journalist Anthony DeBarros. SQL is the primary tool that programmers, web developers, researchers, journalists, and others use to explore data in a database. DeBarros focuses on using SQL to find the story in data, with the aid of the popular open-source database PostgreSQL and the pgAdmin interface.This thoroughly revised second edition includes a new chapter describing how to set up PostgreSQL and more extensive discussion of pgAdmin's best features. The author has also added a chapter on the JSON data format that shows readers how to store and query JSON data. DeBarros has also updated the data in the book throughout, added coverage of additional topics, and perfected the book's examples.Readers love DeBarros's use of exercises and real-world examples that demonstrate how to:- Create databases and related tables using your own data - Correctly define data typesAggregate, sort, and filter data to find patterns - Clean their data and transfer data as text files - Create advanced queries and automate tasksThis book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL.

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management


Michael J.A. Berry - 1997
    Packed with more than forty percent new and updated material, this edition shows business managers, marketing analysts, and data mining specialists how to harness fundamental data mining methods and techniques to solve common types of business problemsEach chapter covers a new data mining technique, and then shows readers how to apply the technique for improved marketing, sales, and customer supportThe authors build on their reputation for concise, clear, and practical explanations of complex concepts, making this book the perfect introduction to data miningMore advanced chapters cover such topics as how to prepare data for analysis and how to create the necessary infrastructure for data miningCovers core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, clustering, and survival analysis

Head First Object-Oriented Analysis and Design: A Brain Friendly Guide to OOA&D


Brett McLaughlin - 2006
    What sets this book apart is its focus on learning. The authors have made the content of OOAD accessible, usable for the practitioner." Ivar Jacobson, Ivar Jacobson Consulting"I just finished reading HF OOA&D and I loved it! The thing I liked most about this book was its focus on why we do OOA&D-to write great software!" Kyle Brown, Distinguished Engineer, IBM"Hidden behind the funny pictures and crazy fonts is a serious, intelligent, extremely well-crafted presentation of OO Analysis and Design. As I read the book, I felt like I was looking over the shoulder of an expert designer who was explaining to me what issues were important at each step, and why." Edward Sciore, Associate Professor, Computer Science Department, Boston College Tired of reading Object Oriented Analysis and Design books that only makes sense after you're an expert? You've heard OOA&D can help you write great software every time-software that makes your boss happy, your customers satisfied and gives you more time to do what makes you happy.But how?Head First Object-Oriented Analysis & Design shows you how to analyze, design, and write serious object-oriented software: software that's easy to reuse, maintain, and extend; software that doesn't hurt your head; software that lets you add new features without breaking the old ones. Inside you will learn how to:Use OO principles like encapsulation and delegation to build applications that are flexible Apply the Open-Closed Principle (OCP) and the Single Responsibility Principle (SRP) to promote reuse of your code Leverage the power of design patterns to solve your problems more efficiently Use UML, use cases, and diagrams to ensure that all stakeholders are communicating clearly to help you deliver the right software that meets everyone's needs.By exploiting how your brain works, Head First Object-Oriented Analysis & Design compresses the time it takes to learn and retain complex information. Expect to have fun, expect to learn, expect to be writing great software consistently by the time you're finished reading this!

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions


Gregor Hohpe - 2003
    The authors also include examples covering a variety of different integration technologies, such as JMS, MSMQ, TIBCO ActiveEnterprise, Microsoft BizTalk, SOAP, and XSL. A case study describing a bond trading system illustrates the patterns in practice, and the book offers a look at emerging standards, as well as insights into what the future of enterprise integration might hold. This book provides a consistent vocabulary and visual notation framework to describe large-scale integration solutions across many technologies. It also explores in detail the advantages and limitations of asynchronous messaging architectures. The authors present practical advice on designing code that connects an application to a messaging system, and provide extensive information to help you determine when to send a message, how to route it to the proper destination, and how to monitor the health of a messaging system. If you want to know how to manage, monitor, and maintain a messaging system once it is in use, get this book.

In the Beginning...Was the Command Line


Neal Stephenson - 1999
    And considering that the "one man" is Neal Stephenson, "the hacker Hemingway" (Newsweek) -- acclaimed novelist, pragmatist, seer, nerd-friendly philosopher, and nationally bestselling author of groundbreaking literary works (Snow Crash, Cryptonomicon, etc., etc.) -- the word is well worth hearing. Mostly well-reasoned examination and partial rant, Stephenson's In the Beginning... was the Command Line is a thoughtful, irreverent, hilarious treatise on the cyber-culture past and present; on operating system tyrannies and downloaded popular revolutions; on the Internet, Disney World, Big Bangs, not to mention the meaning of life itself.

Modern Operating Systems


Andrew S. Tanenbaum - 1992
    What makes an operating system modern? According to author Andrew Tanenbaum, it is the awareness of high-demand computer applications--primarily in the areas of multimedia, parallel and distributed computing, and security. The development of faster and more advanced hardware has driven progress in software, including enhancements to the operating system. It is one thing to run an old operating system on current hardware, and another to effectively leverage current hardware to best serve modern software applications. If you don't believe it, install Windows 3.0 on a modern PC and try surfing the Internet or burning a CD. Readers familiar with Tanenbaum's previous text, Operating Systems, know the author is a great proponent of simple design and hands-on experimentation. His earlier book came bundled with the source code for an operating system called Minux, a simple variant of Unix and the platform used by Linus Torvalds to develop Linux. Although this book does not come with any source code, he illustrates many of his points with code fragments (C, usually with Unix system calls). The first half of Modern Operating Systems focuses on traditional operating systems concepts: processes, deadlocks, memory management, I/O, and file systems. There is nothing groundbreaking in these early chapters, but all topics are well covered, each including sections on current research and a set of student problems. It is enlightening to read Tanenbaum's explanations of the design decisions made by past operating systems gurus, including his view that additional research on the problem of deadlocks is impractical except for "keeping otherwise unemployed graph theorists off the streets." It is the second half of the book that differentiates itself from older operating systems texts. Here, each chapter describes an element of what constitutes a modern operating system--awareness of multimedia applications, multiple processors, computer networks, and a high level of security. The chapter on multimedia functionality focuses on such features as handling massive files and providing video-on-demand. Included in the discussion on multiprocessor platforms are clustered computers and distributed computing. Finally, the importance of security is discussed--a lively enumeration of the scores of ways operating systems can be vulnerable to attack, from password security to computer viruses and Internet worms. Included at the end of the book are case studies of two popular operating systems: Unix/Linux and Windows 2000. There is a bias toward the Unix/Linux approach, not surprising given the author's experience and academic bent, but this bias does not detract from Tanenbaum's analysis. Both operating systems are dissected, describing how each implements processes, file systems, memory management, and other operating system fundamentals. Tanenbaum's mantra is simple, accessible operating system design. Given that modern operating systems have extensive features, he is forced to reconcile physical size with simplicity. Toward this end, he makes frequent references to the Frederick Brooks classic The Mythical Man-Month for wisdom on managing large, complex software development projects. He finds both Windows 2000 and Unix/Linux guilty of being too complicated--with a particular skewering of Windows 2000 and its "mammoth Win32 API." A primary culprit is the attempt to make operating systems more "user-friendly," which Tanenbaum views as an excuse for bloated code. The solution is to have smart people, the smallest possible team, and well-defined interactions between various operating systems components. Future operating system design will benefit if the advice in this book is taken to heart. --Pete Ostenson