Book picks similar to
Practical Data Science with R by Nina Zumel


data-science
data
computer-science
programming

Designing Data-Intensive Applications


Martin Kleppmann - 2015
    Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

The Well-Grounded Rubyist


David A. Black - 2008
    It's a beautifully written tutorial that begins with the basic steps to get your first Ruby program up and running and goes on to explore sophisticated topics like callable objects, reflection, and threading. Whether the topic is simple or tough, the book's easy-to-follow examples and explanations will give you immediate confidence as you build your Ruby programming skills.The Well-Grounded Rubyist is a thoroughly revised and updated edition of the best-selling Ruby for Rails. In this new book, expert author David A. Black moves beyond Rails and presents a broader view of Ruby. It covers Ruby 1.9, and keeps the same sharp focus and clear writing that made Ruby for Rails stand out.Starting with the basics, The Well-Grounded Rubyist explains Ruby objects and their interactions from the ground up. In the middle chapters, the book turns to an examination of Ruby's built-in, core classes, showing the reader how to manipulate strings, numbers, arrays, ranges, hashes, sets, and more. Regular expressions get attention, as do file and other I/O operations.Along the way, the reader is introduced to numerous tools included in the standard Ruby distribution--tools like the task manager Rake and the interactive Ruby console-based interpreter Irb--that facilitate Ruby development and make it an integrated and pleasant experience.The book encompasses advanced topics, like the design of Ruby's class and module system, and the use of Ruby threads, taking even the new Rubyist deep into the language and giving every reader the foundations necessary to use, explore, and enjoy this unusually popular and versatile language.It's no wonder one reader commented: "The technical depth is just right to not distract beginners, yet detailed enough for more advanced readers."Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Computer Age Statistical Inference: Algorithms, Evidence, and Data Science


Bradley Efron - 2016
    'Big data', 'data science', and 'machine learning' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? This book takes us on an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. The book ends with speculation on the future direction of statistics and data science.

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale


Neha Narkhede - 2017
    And how to move all of this data becomes nearly as important as the data itself. If you� re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you� ll learn Kafka� s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka� s operational measurementsExplore how Kafka� s stream delivery capabilities make it a perfect source for stream processing systems

Programming Erlang


Joe Armstrong - 2007
    It's used worldwide by companies who need to produce reliable, efficient, and scalable applications. Invest in learning Erlang now.Moore's Law is the observation that the amount you can do on a single chip doubles every two years. But Moore's Law is taking a detour. Rather than producing faster and faster processors, companies such as Intel and AMD are producing multi-core devices: single chips containing two, four, or more processors. If your programs aren't concurrent, they'll only run on a single processor at a time. Your users will think that your code is slow.Erlang is a programming language designed for building highly parallel, distributed, fault-tolerant systems. It has been used commercially for many years to build massive fault-tolerated systems that run for years with minimal failures.Erlang programs run seamlessly on multi-core computers: this means your Erlang program should run a lot faster on a 4 core processor than on a single core processor, all without you having to change a line of code.Erlang combines ideas from the world of functional programming with techniques for building fault-tolerant systems to make a powerful language for building the massively parallel, networked applications of the future.This book presents Erlang and functional programming in the familiar Pragmatic style. And it's written by Joe Armstrong, one of the creators of Erlang.It includes example code you'll be able to build upon. In addition, the book contains the full source code for two interesting applications:A SHOUTcast server which you can use to stream music to every computer in your house, and a full-text indexing and search engine that can index gigabytes of data. Learn how to write programs that run on dozens or even hundreds of local and remote processors. See how to write robust applications that run even in the face of network and hardware failure, using the Erlang programming language.

Structure and Interpretation of Computer Programs


Harold Abelson - 1984
    This long-awaited revision contains changes throughout the text. There are new implementations of most of the major programming systems in the book, including the interpreters and compilers, and the authors have incorporated many small changes that reflect their experience teaching the course at MIT since the first edition was published. A new theme has been introduced that emphasizes the central role played by different approaches to dealing with time in computational models: objects with state, concurrent programming, functional programming and lazy evaluation, and nondeterministic programming. There are new example sections on higher-order procedures in graphics and on applications of stream processing in numerical programming, and many new exercises. In addition, all the programs have been reworked to run in any Scheme implementation that adheres to the IEEE standard.

Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

Planning for Big Data


Edd Wilder-James - 2004
    From creating new data-driven products through to increasing operational efficiency, big data has the potential to makeyour organization both more competitive and more innovative.As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven.Written by O'Reilly Radar's experts on big data, this anthology describes:- The broad industry changes heralded by the big data era- What big data is, what it means to your business, and how to start solving data problems- The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions- The landscape of NoSQL databases and their relative merits- How visualization plays an important part in data work

Cryptography Engineering: Design Principles and Practical Applications


Niels Ferguson - 2010
    Cryptography is vital to keeping information safe, in an era when the formula to do so becomes more and more challenging. Written by a team of world-renowned cryptography experts, this essential guide is the definitive introduction to all major areas of cryptography: message security, key negotiation, and key management. You'll learn how to think like a cryptographer. You'll discover techniques for building cryptography into products from the start and you'll examine the many technical changes in the field.After a basic overview of cryptography and what it means today, this indispensable resource covers such topics as block ciphers, block modes, hash functions, encryption modes, message authentication codes, implementation issues, negotiation protocols, and more. Helpful examples and hands-on exercises enhance your understanding of the multi-faceted field of cryptography.An author team of internationally recognized cryptography experts updates you on vital topics in the field of cryptography Shows you how to build cryptography into products from the start Examines updates and changes to cryptography Includes coverage on key servers, message security, authentication codes, new standards, block ciphers, message authentication codes, and more Cryptography Engineering gets you up to speed in the ever-evolving field of cryptography.

Head First Python


Paul Barry - 2010
    You'll quickly learn the language's fundamentals, then move onto persistence, exception handling, web development, SQLite, data wrangling, and Google App Engine. You'll also learn how to write mobile apps for Android, all thanks to the power that Python gives you.We think your time is too valuable to waste struggling with new concepts. Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Python uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.

Python Cookbook


David Beazley - 2002
    Packed with practical recipes written and tested with Python 3.3, this unique cookbook is for experienced Python programmers who want to focus on modern tools and idioms.Inside, you’ll find complete recipes for more than a dozen topics, covering the core Python language as well as tasks common to a wide variety of application domains. Each recipe contains code samples you can use in your projects right away, along with a discussion about how and why the solution works.Topics include:Data Structures and AlgorithmsStrings and TextNumbers, Dates, and TimesIterators and GeneratorsFiles and I/OData Encoding and ProcessingFunctionsClasses and ObjectsMetaprogrammingModules and PackagesNetwork and Web ProgrammingConcurrencyUtility Scripting and System AdministrationTesting, Debugging, and ExceptionsC Extensions

The Art of Computer Programming, Volumes 1-3 Boxed Set


Donald Ervin Knuth - 1998
    For the first time, these books are available as a boxed, three-volume set. The handsome slipcase makes this set an ideal gift for the recent computer science graduate or professional programmer. Offering a description of classical computer science, this multi-volume work is a useful resource in programming theory and practice for students, researchers, and practitioners alike. For programmers, it offers cookbook solutions to their day-to-day problems.

Learning From Data: A Short Course


Yaser S. Abu-Mostafa - 2012
    Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.

Docker in Action


Jeff Nickoloff - 2015
    Create a tiny virtual environment, called a container, for your application that includes only its particular set of dependencies. The Docker engine accounts for, manages, and builds these containers through functionality provided by the host operating system. Software running inside containers share the Linux OS and other resources, such as libraries, making their footprints radically smaller, and the containerized applications are easy to install, manage, and remove. Developers can package their applications without worrying about environment-specific deployment concerns, and the operations team gets cleaner, more efficient systems across the board. Better still, Docker is free and open source.Docker in Action teaches readers how to create, deploy, and manage applications hosted in Docker containers. The book starts with a clear explanation of the Docker model of virtualization, comparing this approach to the traditional hypervisor model. Developers will learn how to package applications in containers, including specific techniques for testing and distributing applications via Docker Hub and other registries. Readers will learn how to take advantage of the Linux OS features that Docker uses to run programs securely, and how to manage shared resources. Using carefully-designed examples, the book teaches you how to orchestrate containers and applications from installation to removal. Along the way, you'll learn techniques for using Docker on systems ranging from your personal dev-and-test machine to full-scale cloud deployments.

97 Things Every Programmer Should Know: Collective Wisdom from the Experts


Kevlin Henney - 2010
    With the 97 short and extremely useful tips for programmers in this book, you'll expand your skills by adopting new approaches to old problems, learning appropriate best practices, and honing your craft through sound advice.With contributions from some of the most experienced and respected practitioners in the industry--including Michael Feathers, Pete Goodliffe, Diomidis Spinellis, Cay Horstmann, Verity Stob, and many more--this book contains practical knowledge and principles that you can apply to all kinds of projects.A few of the 97 things you should know:"Code in the Language of the Domain" by Dan North"Write Tests for People" by Gerard Meszaros"Convenience Is Not an -ility" by Gregor Hohpe"Know Your IDE" by Heinz Kabutz"A Message to the Future" by Linda Rising"The Boy Scout Rule" by Robert C. Martin (Uncle Bob)"Beware the Share" by Udi Dahan