The Art of Data Science: A Guide for Anyone Who Works with Data


Roger D. Peng - 2015
    The authors have extensive experience both managing data analysts and conducting their own data analyses, and have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science.

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale


Neha Narkhede - 2017
    And how to move all of this data becomes nearly as important as the data itself. If you� re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you� ll learn Kafka� s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka� s operational measurementsExplore how Kafka� s stream delivery capabilities make it a perfect source for stream processing systems

Learning Java


Patrick Niemeyer - 1996
    With Java 5.0, you'll not only find substantial changes in the platform, but to the language itself-something that developers of Java took five years to complete. The main goal of Java 5.0 is to make it easier for you to develop safe, powerful code, but none of these improvements makes Java any easier to learn, even if you've programmed with Java for years. And that means our bestselling hands-on tutorial takes on even greater significance."Learning Java" is the most widely sought introduction to the programming language that's changed the way we think about computing. Our updated third edition takes an objective, no-nonsense approach to the new features in Java 5.0, some of which are drastically different from the way things were done in any previous versions. The most essential change is the addition of "generics," a feature that allows developers to write, test, and deploy code once, and then reuse the code again and again for different data types. The beauty of generics is that more problems will be caught during development, and "Learning Java" will show you exactly how it's done.Java 5.0 also adds more than 1,000 new classes to the Java library. That means 1,000 new things you can do without having to program it in yourself. That's a huge change. With our book's practical examples, you'll come up to speed quickly on this and other new features such as loops and threads. The new edition also includes an introduction to Eclipse, the open source IDE that is growing in popularity. "Learning Java," 3rd Edition addresses all of the important uses of Java, such as web applications, servlets, and XML that are increasingly driving enterprise applications.

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics


Paul Teetor - 2011
    The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process.Create vectors, handle variables, and perform other basic functionsInput and output dataTackle data structures such as matrices, lists, factors, and data framesWork with probability, probability distributions, and random variablesCalculate statistics and confidence intervals, and perform statistical testsCreate a variety of graphic displaysBuild statistical models with linear regressions and analysis of variance (ANOVA)Explore advanced statistical techniques, such as finding clusters in your dataWonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time.--Jeffrey Ryan, software consultant and R package author

97 Things Every Programmer Should Know: Collective Wisdom from the Experts


Kevlin Henney - 2010
    With the 97 short and extremely useful tips for programmers in this book, you'll expand your skills by adopting new approaches to old problems, learning appropriate best practices, and honing your craft through sound advice.With contributions from some of the most experienced and respected practitioners in the industry--including Michael Feathers, Pete Goodliffe, Diomidis Spinellis, Cay Horstmann, Verity Stob, and many more--this book contains practical knowledge and principles that you can apply to all kinds of projects.A few of the 97 things you should know:"Code in the Language of the Domain" by Dan North"Write Tests for People" by Gerard Meszaros"Convenience Is Not an -ility" by Gregor Hohpe"Know Your IDE" by Heinz Kabutz"A Message to the Future" by Linda Rising"The Boy Scout Rule" by Robert C. Martin (Uncle Bob)"Beware the Share" by Udi Dahan

The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios


Steve Wexler - 2017
    It's great to have theory and evidenced-based research at your disposal, but what will you do when somebody asks you to make your dashboard 'cooler' by adding packed bubbles and donut charts?The expert authors have a combined 30-plus years of hands-on experience helping people in hundreds of organizations build effective visualizations. They have fought many 'best practices' battles and having endured bring an uncommon empathy to help you, the reader of this book, survive and thrive in the data visualization world.A well-designed dashboard can point out risks, opportunities, and more; but common challenges and misconceptions can make your dashboard useless at best, and misleading at worst. The Big Book of Dashboards gives you the tools, guidance, and models you need to produce great dashboards that inform, enlighten, and engage.

Statistics Done Wrong: The Woefully Complete Guide


Alex Reinhart - 2013
    Politicians and marketers present shoddy evidence for dubious claims all the time. But smart people make mistakes too, and when it comes to statistics, plenty of otherwise great scientists--yes, even those published in peer-reviewed journals--are doing statistics wrong."Statistics Done Wrong" comes to the rescue with cautionary tales of all-too-common statistical fallacies. It'll help you see where and why researchers often go wrong and teach you the best practices for avoiding their mistakes.In this book, you'll learn: - Why "statistically significant" doesn't necessarily imply practical significance- Ideas behind hypothesis testing and regression analysis, and common misinterpretations of those ideas- How and how not to ask questions, design experiments, and work with data- Why many studies have too little data to detect what they're looking for-and, surprisingly, why this means published results are often overestimates- Why false positives are much more common than "significant at the 5% level" would suggestBy walking through colorful examples of statistics gone awry, the book offers approachable lessons on proper methodology, and each chapter ends with pro tips for practicing scientists and statisticians. No matter what your level of experience, "Statistics Done Wrong" will teach you how to be a better analyst, data scientist, or researcher.

The Elements of Data Analytic Style


Jeffrey Leek - 2015
    This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. It is based in part on the authors blog posts, lecture materials, and tutorials. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world that has enrolled more than 1.76 million people. The book is useful as a companion to introductory courses in data science or data analysis. It is also a useful reference tool for people tasked with reading and critiquing data analyses. It is based on the authors popular open-source guides available through his Github account (https://github.com/jtleek). The paper is also available through Leanpub (https://leanpub.com/datastyle), if the book is purchased on that platform you are entitled to lifetime free updates.

The Hundred-Page Machine Learning Book


Andriy Burkov - 2019
    During that week, you will learn almost everything modern machine learning has to offer. The author and other practitioners have spent years learning these concepts.Companion wiki — the book has a continuously updated wiki that extends some book chapters with additional information: Q&A, code snippets, further reading, tools, and other relevant resources.Flexible price and formats — choose from a variety of formats and price options: Kindle, hardcover, paperback, EPUB, PDF. If you buy an EPUB or a PDF, you decide the price you pay!Read first, buy later — download book chapters for free, read them and share with your friends and colleagues. Only if you liked the book or found it useful in your work, study or business, then buy it.

Head First PMP


Jennifer Greene.PMP & Andrew Stellman, PMP - 2007
    The second edition of this book helps you prepare for the PMP certification exam using a visually rich format designed for the way your brain works. You'll find a full-length sample exam included inside the book. More than just proof of passing a test, a PMP certification means that you have the knowledge to solve most common project problems. But studying for a difficult four-hour exam on project management isn't easy, even for experienced project managers. Drawing on the latest research in neurobiology, cognitive science, and learning theory, Head First PMP offers you a multi-sensory experience that helps the material stick, not a text-heavy approach that puts you to sleep. This book will help you:Learn PMP's underlying concepts to help you understand the PMBOK principles and pass the certification exam with flying colorsGet 100% coverage of the latest principles and certification objectives in The PMBOK Guide, Fourth Edition, including two new processes: Collect Requirements and Identify StakeholdersMake use of a thorough and effective preparation guide with hundreds of practice questions and exam strategiesExplore the material through puzzles, games, problems, and exercises that make learning easy and entertainingHead First PMP puts project management principles into context to help you understand, remember, and apply them -- not just on the exam, but also on the job.

Getting Started with SQL: A Hands-On Approach for Beginners


Thomas Nield - 2016
    If you're a business or IT professional, this short hands-on guide teaches you how to pull and transform data with SQL in significant ways. You will quickly master the fundamentals of SQL and learn how to create your own databases.Author Thomas Nield provides exercises throughout the book to help you practice your newfound SQL skills at home, without having to use a database server environment. Not only will you learn how to use key SQL statements to find and manipulate your data, but you'll also discover how to efficiently design and manage databases to meet your needs.You'll also learn how to:Explore relational databases, including lightweight and centralized modelsUse SQLite and SQLiteStudio to create lightweight databases in minutesQuery and transform data in meaningful ways by using SELECT, WHERE, GROUP BY, and ORDER BYJoin tables to get a more complete view of your business dataBuild your own tables and centralized databases by using normalized design principlesManage data by learning how to INSERT, DELETE, and UPDATE records

UNIX Power Tools


Jerry Peek - 1993
    It also covers add-on utilities and how to take advantage of clever features in the most popular UNIX utilities.Loaded with even more practical advice about almost every aspect of UNIX, this edition addresses the technology that UNIX users face today, differing from the first edition in a number of important ways.First, it slants the blend of options and commands more toward the POSIX utilities, including the GNU versions; the bash and tcsh shells have greater coverage, but we've kept the first edition's emphasis on the core concepts of sh and csh that will help you use all UNIX shells; and, Perl is more important than awk these days, so we've de-emphasized awk in this edition.This is a browser's book...like a magazine that you don't read from start to finish, but leaf through repeatedly until you realize that you've read it all. The book is structured so that it bursts at the seams with cross references. Interesting "sidebars" explore syntax or point out other directions for exploration, including relevant technical details that might not be immediately apparent. You'll find articles abstracted from other O'Reilly books, new information that highlights program "tricks" and "gotchas," tips posted to the Net over the years, and other accumulated wisdom.The 53 chapters in this book discuss topics like file management, text editors, shell programming -- even office automation. Overall, there's plenty of material here to satisfy even the most voracious appetites. The bottom line? UNIX Power Tools is loaded with practical advice about almost every aspect of UNIX. It will help you think creatively about UNIX, and will help you get to the point where you can analyze your own problems. Your own solutions won't be far behind.The CD-ROM includes all of the scripts and aliases from the book, plus perl, GNU emacs, netpbm (graphics manipulation utilities), ispell,screen, the sc spreadsheet, and about 60 other freeware programs. In addition to the source code, all the software is precompiled for Sun4, Digital UNIX, IBM AIX, HP/UX, Red Hat Linux, Solaris, and SCO UNIX.

Fluent Python: Clear, Concise, and Effective Programming


Luciano Ramalho - 2015
    With this hands-on guide, you'll learn how to write effective, idiomatic Python code by leveraging its best and possibly most neglected features. Author Luciano Ramalho takes you through Python's core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time.Many experienced programmers try to bend Python to fit patterns they learned from other languages, and never discover Python features outside of their experience. With this book, those Python programmers will thoroughly learn how to become proficient in Python 3.This book covers:Python data model: understand how special methods are the key to the consistent behavior of objectsData structures: take full advantage of built-in types, and understand the text vs bytes duality in the Unicode ageFunctions as objects: view Python functions as first-class objects, and understand how this affects popular design patternsObject-oriented idioms: build classes by learning about references, mutability, interfaces, operator overloading, and multiple inheritanceControl flow: leverage context managers, generators, coroutines, and concurrency with the concurrent.futures and asyncio packagesMetaprogramming: understand how properties, attribute descriptors, class decorators, and metaclasses work"

Python Data Science Handbook: Tools and Techniques for Developers


Jake Vanderplas - 2016
    Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.With this handbook, you’ll learn how to use: * IPython and Jupyter: provide computational environments for data scientists using Python * NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python * Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python * Matplotlib: includes capabilities for a flexible range of data visualizations in Python * Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms