Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

A New Kind of Science


Stephen Wolfram - 1997
    Wolfram lets the world see his work in A New Kind of Science, a gorgeous, 1,280-page tome more than a decade in the making. With patience, insight, and self-confidence to spare, Wolfram outlines a fundamental new way of modeling complex systems. On the frontier of complexity science since he was a boy, Wolfram is a champion of cellular automata--256 "programs" governed by simple nonmathematical rules. He points out that even the most complex equations fail to accurately model biological systems, but the simplest cellular automata can produce results straight out of nature--tree branches, stream eddies, and leopard spots, for instance. The graphics in A New Kind of Science show striking resemblance to the patterns we see in nature every day. Wolfram wrote the book in a distinct style meant to make it easy to read, even for nontechies; a basic familiarity with logic is helpful but not essential. Readers will find themselves swept away by the elegant simplicity of Wolfram's ideas and the accidental artistry of the cellular automaton models. Whether or not Wolfram's revolution ultimately gives us the keys to the universe, his new science is absolutely awe-inspiring. --Therese Littleton

The Guru's Guide to Transact-Sql


Ken Henderson - 2000
    Beginners and intermediate developers will appreciate the comprehensive tutorial that walks step-by-step through building a real client/server database, from concept to deployment and beyond -- and points out key pitfalls to avoid throughout the process. Experienced users will appreciate the book's comprehensive coverage of the Transact-SQL language, from basic to advanced level; detailed ODBC database access information; expert coverage of concurrency control, and more. The book includes thorough, up-to-the-minute guidance on building multi-tier applications; SQL Server performance tuning; and other crucial issues for advanced developers. For all database developers, system administrators, and Web application developers who interact with databases in Microsoft-centric environments.

Spark: The Definitive Guide: Big Data Processing Made Simple


Bill Chambers - 2018
    With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Spark’s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Hadoop Explained


Aravind Shenoy - 2014
    Hadoop allowed small and medium sized companies to store huge amounts of data on cheap commodity servers in racks. The introduction of Big Data has allowed businesses to make decisions based on quantifiable analysis. Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera, and Dell to name a few. This book introduces you to Hadoop and to concepts such as ‘MapReduce’, ‘Rack Awareness’, ‘Yarn’ and ‘HDFS Federation’, which will help you get acquainted with the technology.

You Don't Know JS: Up & Going


Kyle Simpson - 2015
    With the "You Don’t Know JS" book series, you’ll get a more complete understanding of JavaScript, including trickier parts of the language that many experienced JavaScript programmers simply avoid.The series’ first book, Up & Going, provides the necessary background for those of you with limited programming experience. By learning the basic building blocks of programming, as well as JavaScript’s core mechanisms, you’ll be prepared to dive into the other, more in-depth books in the series—and be well on your way toward true JavaScript.With this book you will: Learn the essential programming building blocks, including operators, types, variables, conditionals, loops, and functions Become familiar with JavaScript's core mechanisms such as values, function closures, this, and prototypes Get an overview of other books in the series—and learn why it’s important to understand all parts of JavaScript

Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Testing


Rex Black - 1999
    The preeminent expert in his field, Mr.Black draws upon years of experience as president of both theInternational and American Software Testing Qualifications boardsto offer this extensive resource of all the standards, methods, andtools you'll need.The book covers core testing concepts and thoroughly examinesthe best test management practices and tools of leading hardwareand software vendors. Step-by-step guidelines and real-worldscenarios help you follow all necessary processes and avoidmistakes.Producing high-quality computer hardware and software requirescareful, professional testing; Managing the Testing Process, Third Edition explains how to achieve that by following adisciplined set of carefully managed and monitored practices andprocessesThe book covers all standards, methods, and tools you need forprojects large and smallPresents the business case for testing products and reviews theauthor's latest test assessmentsTopics include agile testing methods, risk-based testing, IEEEstandards, ISTQB certification, distributed and outsourced testing, and moreOver 100 pages of new material and case studies have been addedto this new editionIf you're responsible for managing testing in the real world, Managing the Testing Process, Third Edition is the valuablereference and guide you need.

Applied Predictive Modeling


Max Kuhn - 2013
    Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms. Applied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems. Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performance-all of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples. And every chapter contains extensive R code f

Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python


Scott Hartshorn - 2016
    They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.

APIs: A Strategy Guide


Daniel Jacobson - 2011
    Salesforce.com (more than 50%) and Twitter (more than 75% fall into this category. Ebay gets more than 8 billion API calls a month. Facebook and Google, have dozens of APIs that enable both free services and e-commerce, get more than 5 billion API calls each day. Other companies like NetFlix have expanded their service of streaming movies over the the web to dozens of devices using API. At peak times, more than 20 percent of all traffic is accounted for by Netflix through its APIs. Companies like Sears and E-Trade are opening up their catalogs and other services to allow developers and entrepreneurs to create new marketing experiences. Making an API work to create a new channel is not just a matter of technology. An API must be considered in terms of business strategy, marketing, and operations as well as the technical aspects of programming. This book, written by Greg Brail, CTO of Apigee, and Brian Mulloy, VP of Products, captures the knowledge of all these areas gained by Apigee, the leading company in supporting the rollout of high traffic APIs.

How Linux Works: What Every Superuser Should Know


Brian Ward - 2004
    Some books try to give you copy-and-paste instructions for how to deal with every single system issue that may arise, but How Linux Works actually shows you how the Linux system functions so that you can come up with your own solutions. After a guided tour of filesystems, the boot sequence, system management basics, and networking, author Brian Ward delves into open-ended topics such as development tools, custom kernels, and buying hardware, all from an administrator's point of view. With a mixture of background theory and real-world examples, this book shows both "how" to administer Linux, and "why" each particular technique works, so that you will know how to make Linux work for you.

Kubernetes in Action


Marko Luksa - 2017
    Each layer in their application is decoupled from other layers so they can scale, update, and maintain them independently.Kubernetes in Action teaches developers how to use Kubernetes to deploy self-healing scalable distributed applications. By the end, readers will be able to build and deploy applications in a proper way to take full advantage of the Kubernetes platform.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

Engineering a Compiler


Keith D. Cooper - 2003
    No longer is execution speed the sole criterion for judging compiled code. Today, code might be judged on how small it is, how much power it consumes, how well it compresses, or how many page faults it generates. In this evolving environment, the task of building a successful compiler relies upon the compiler writer's ability to balance and blend algorithms, engineering insights, and careful planning. Today's compiler writer must choose a path through a design space that is filled with diverse alternatives, each with distinct costs, advantages, and complexities.Engineering a Compiler explores this design space by presenting some of the ways these problems have been solved, and the constraints that made each of those solutions attractive. By understanding the parameters of the problem and their impact on compiler design, the authors hope to convey both the depth of the problems and the breadth of possible solutions. Their goal is to cover a broad enough selection of material to show readers that real tradeoffs exist, and that the impact of those choices can be both subtle and far-reaching.Authors Keith Cooper and Linda Torczon convey both the art and the science of compiler construction and show best practice algorithms for the major passes of a compiler. Their text re-balances the curriculum for an introductory course in compiler construction to reflect the issues that arise in current practice.

The Passionate Programmer


Chad Fowler - 2009
    In this book, you'll learn how to become an entrepreneur, driving your career in the direction of your choosing. You'll learn how to build your software development career step by step, following the same path that you would follow if you were building, marketing, and selling a product. After all, your skills themselves are a product. The choices you make about which technologies to focus on and which business domains to master have at least as much impact on your success as your technical knowledge itself--don't let those choices be accidental. We'll walk through all aspects of the decision-making process, so you can ensure that you're investing your time and energy in the right areas. You'll develop a structured plan for keeping your mind engaged and your skills fresh. You'll learn how to assess your skills in terms of where they fit on the value chain, driving you away from commodity skills and toward those that are in high demand. Through a mix of high-level, thought-provoking essays and tactical "Act on It" sections, you will come away with concrete plans you can put into action immediately. You'll also get a chance to read the perspectives of several highly successful members of our industry from a variety of career paths. As with any product or service, if nobody knows what you're selling, nobody will buy. We'll walk through the often-neglected world of marketing, and you'll create a plan to market yourself both inside your company and to the industry in general. Above all, you'll see how you can set the direction of your career, leading to a more fulfilling and remarkable professional life.

Problem Solving with Algorithms and Data Structures Using Python


Bradley N. Miller - 2005
    It is also about Python. However, there is much more. The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence. This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving. We cover abstract data types and data structures, writing algorithms, and solving problems. We look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.