Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Modern Operating Systems


Andrew S. Tanenbaum - 1992
    What makes an operating system modern? According to author Andrew Tanenbaum, it is the awareness of high-demand computer applications--primarily in the areas of multimedia, parallel and distributed computing, and security. The development of faster and more advanced hardware has driven progress in software, including enhancements to the operating system. It is one thing to run an old operating system on current hardware, and another to effectively leverage current hardware to best serve modern software applications. If you don't believe it, install Windows 3.0 on a modern PC and try surfing the Internet or burning a CD. Readers familiar with Tanenbaum's previous text, Operating Systems, know the author is a great proponent of simple design and hands-on experimentation. His earlier book came bundled with the source code for an operating system called Minux, a simple variant of Unix and the platform used by Linus Torvalds to develop Linux. Although this book does not come with any source code, he illustrates many of his points with code fragments (C, usually with Unix system calls). The first half of Modern Operating Systems focuses on traditional operating systems concepts: processes, deadlocks, memory management, I/O, and file systems. There is nothing groundbreaking in these early chapters, but all topics are well covered, each including sections on current research and a set of student problems. It is enlightening to read Tanenbaum's explanations of the design decisions made by past operating systems gurus, including his view that additional research on the problem of deadlocks is impractical except for "keeping otherwise unemployed graph theorists off the streets." It is the second half of the book that differentiates itself from older operating systems texts. Here, each chapter describes an element of what constitutes a modern operating system--awareness of multimedia applications, multiple processors, computer networks, and a high level of security. The chapter on multimedia functionality focuses on such features as handling massive files and providing video-on-demand. Included in the discussion on multiprocessor platforms are clustered computers and distributed computing. Finally, the importance of security is discussed--a lively enumeration of the scores of ways operating systems can be vulnerable to attack, from password security to computer viruses and Internet worms. Included at the end of the book are case studies of two popular operating systems: Unix/Linux and Windows 2000. There is a bias toward the Unix/Linux approach, not surprising given the author's experience and academic bent, but this bias does not detract from Tanenbaum's analysis. Both operating systems are dissected, describing how each implements processes, file systems, memory management, and other operating system fundamentals. Tanenbaum's mantra is simple, accessible operating system design. Given that modern operating systems have extensive features, he is forced to reconcile physical size with simplicity. Toward this end, he makes frequent references to the Frederick Brooks classic The Mythical Man-Month for wisdom on managing large, complex software development projects. He finds both Windows 2000 and Unix/Linux guilty of being too complicated--with a particular skewering of Windows 2000 and its "mammoth Win32 API." A primary culprit is the attempt to make operating systems more "user-friendly," which Tanenbaum views as an excuse for bloated code. The solution is to have smart people, the smallest possible team, and well-defined interactions between various operating systems components. Future operating system design will benefit if the advice in this book is taken to heart. --Pete Ostenson

Hands-On Programming with R: Write Your Own Functions and Simulations


Garrett Grolemund - 2014
    With this book, you'll learn how to load data, assemble and disassemble data objects, navigate R's environment system, write your own functions, and use all of R's programming tools.RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modeling data. You'll gain valuable programming skills and support your work as a data scientist at the same time.Work hands-on with three practical data analysis projects based on casino gamesStore, retrieve, and change data values in your computer's memoryWrite programs and simulations that outperform those written by typical R usersUse R programming tools such as if else statements, for loops, and S3 classesLearn how to write lightning-fast vectorized R codeTake advantage of R's package system and debugging toolsPractice and apply R programming concepts as you learn them

Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference


Cameron Davidson-Pilon - 2014
    However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice-freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You'll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you've mastered these techniques, you'll constantly turn to this guide for the working PyMC code you need to jumpstart future projects. Coverage includes - Learning the Bayesian "state of mind" and its practical implications - Understanding how computers perform Bayesian inference - Using the PyMC Python library to program Bayesian analyses - Building and debugging models with PyMC - Testing your model's "goodness of fit" - Opening the "black box" of the Markov Chain Monte Carlo algorithm to see how and why it works - Leveraging the power of the "Law of Large Numbers" - Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning - Using loss functions to measure an estimate's weaknesses based on your goals and desired outcomes - Selecting appropriate priors and understanding how their influence changes with dataset size - Overcoming the "exploration versus exploitation" dilemma: deciding when "pretty good" is good enough - Using Bayesian inference to improve A/B testing - Solving data science problems when only small amounts of data are available Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

Probability Theory: The Logic of Science


E.T. Jaynes - 1999
    It discusses new results, along with applications of probability theory to a variety of problems. The book contains many exercises and is suitable for use as a textbook on graduate-level courses involving data analysis. Aimed at readers already familiar with applied mathematics at an advanced undergraduate level or higher, it is of interest to scientists concerned with inference from incomplete information.

Storytelling with Data: A Data Visualization Guide for Business Professionals


Cole Nussbaumer Knaflic - 2015
    You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples--ready for immediate application to your next graph or presentation.Storytelling is not an inherent skill, especially when it comes to data visualization, and the tools at our disposal don't make it any easier. This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story. Specifically, you'll learn how to:Understand the importance of context and audience Determine the appropriate type of graph for your situation Recognize and eliminate the clutter clouding your information Direct your audience's attention to the most important parts of your data Think like a designer and utilize concepts of design in data visualization Leverage the power of storytelling to help your message resonate with your audience Together, the lessons in this book will help you turn your data into high impact visual stories that stick with your audience. Rid your world of ineffective graphs, one exploding 3D pie chart at a time. There is a story in your data--Storytelling with Data will give you the skills and power to tell it!

Information Dashboard Design: The Effective Visual Communication of Data


Stephen Few - 2006
    Although dashboards are potentially powerful, this potential is rarely realized. The greatest display technology in the world won't solve this if you fail to use effective visual design. And if a dashboard fails to tell you precisely what you need to know in an instant, you'll never use it, even if it's filled with cute gauges, meters, and traffic lights. Don't let your investment in dashboard technology go to waste.This book will teach you the visual design skills you need to create dashboards that communicate clearly, rapidly, and compellingly. Information Dashboard Design will explain how to:Avoid the thirteen mistakes common to dashboard design Provide viewers with the information they need quickly and clearly Apply what we now know about visual perception to the visual presentation of information Minimize distractions, cliches, and unnecessary embellishments that create confusion Organize business information to support meaning and usability Create an aesthetically pleasing viewing experience Maintain consistency of design to provide accurate interpretation Optimize the power of dashboard technology by pairing it with visual effectiveness Stephen Few has over 20 years of experience as an IT innovator, consultant, and educator. As Principal of the consultancy Perceptual Edge, Stephen focuses on data visualization for analyzing and communicating quantitative business information. He provides consulting and training services, speaks frequently at conferences, and teaches in the MBA program at the University of California in Berkeley. He is also the author of Show Me the Numbers: Designing Tables and Graphs to Enlighten. Visit his website at www.perceptualedge.com.

Psychology of Learning for Instruction


Marcy P. Driscoll - 1993
    Psychology of Learning for Instruction, Third Edition, focuses on the applications and implications of the learning theories. Using excellent examples ranging from primary school instruction to corporate training, this text combines the latest thinking and research to give readers the opportunity to explore the individual theories as viewed by the experts. Readers are encouraged to apply "reflective practice," which is designed to foster a critical and reflective mode of thinking when considering any particular approach to learning and instruction. Provides readers with the practical knowledge needed to apply learning theories to instruction. KEY TOPICS: This text addresses learning as it relates to behavior, cognition, development, biology, motivation and instruction. MARKET: Pre-service and in-service teachers, and educational psychologists.

Computer Systems: A Programmer's Perspective


Randal E. Bryant - 2002
    Often, computer science and computer engineering curricula don't provide students with a concentrated and consistent introduction to the fundamental concepts that underlie all computer systems. Traditional computer organization and logic design courses cover some of this material, but they focus largely on hardware design. They provide students with little or no understanding of how important software components operate, how application programs use systems, or how system attributes affect the performance and correctness of application programs. - A more complete view of systems - Takes a broader view of systems than traditional computer organization books, covering aspects of computer design, operating systems, compilers, and networking, provides students with the understanding of how programs run on real systems. - Systems presented from a programmers perspective - Material is presented in such a way that it has clear benefit to application programmers, students learn how to use this knowledge to improve program performance and reliability. They also become more effective in program debugging, because t

Text Mining with R: A Tidy Approach


Julia Silge - 2017
    With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective.The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media.Learn how to apply the tidy text format to NLPUse sentiment analysis to mine the emotional content of textIdentify a document's most important terms with frequency measurementsExplore relationships and connections between words with the ggraph and widyr packagesConvert back and forth between R's tidy and non-tidy text formatsUse topic modeling to classify document collections into natural groupsExamine case studies that compare Twitter archives, dig into NASA metadata, and analyze thousands of Usenet messages

The Art of Computer Programming, Volumes 1-3 Boxed Set


Donald Ervin Knuth - 1998
    For the first time, these books are available as a boxed, three-volume set. The handsome slipcase makes this set an ideal gift for the recent computer science graduate or professional programmer. Offering a description of classical computer science, this multi-volume work is a useful resource in programming theory and practice for students, researchers, and practitioners alike. For programmers, it offers cookbook solutions to their day-to-day problems.

A Student's Guide to Maxwell's Equations


Daniel Fleisch - 2007
    In this guide for students, each equation is the subject of an entire chapter, with detailed, plain-language explanations of the physical meaning of each symbol in the equation, for both the integral and differential forms. The final chapter shows how Maxwell's equations may be combined to produce the wave equation, the basis for the electromagnetic theory of light. This book is a wonderful resource for undergraduate and graduate courses in electromagnetism and electromagnetics. A website hosted by the author at www.cambridge.org/9780521701471 contains interactive solutions to every problem in the text as well as audio podcasts to walk students through each chapter.

Fundamentals of Deep Learning: Designing Next-Generation Artificial Intelligence Algorithms


Nikhil Buduma - 2015
    

The Signal and the Noise: Why So Many Predictions Fail—But Some Don't


Nate Silver - 2012
    He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com. Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the "prediction paradox": The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good-or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary-and dangerous-science.Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.