Book picks similar to
Data Science by John D. Kelleher
data-science
non-fiction
science
tech
Grokking Algorithms An Illustrated Guide For Programmers and Other Curious People
Aditya Y. Bhargava - 2015
The algorithms you'll use most often as a programmer have already been discovered, tested, and proven. If you want to take a hard pass on Knuth's brilliant but impenetrable theories and the dense multi-page proofs you'll find in most textbooks, this is the book for you. This fully-illustrated and engaging guide makes it easy for you to learn how to use algorithms effectively in your own programs.Grokking Algorithms is a disarming take on a core computer science topic. In it, you'll learn how to apply common algorithms to the practical problems you face in day-to-day life as a programmer. You'll start with problems like sorting and searching. As you build up your skills in thinking algorithmically, you'll tackle more complex concerns such as data compression or artificial intelligence. Whether you're writing business software, video games, mobile apps, or system utilities, you'll learn algorithmic techniques for solving problems that you thought were out of your grasp. For example, you'll be able to:Write a spell checker using graph algorithmsUnderstand how data compression works using Huffman codingIdentify problems that take too long to solve with naive algorithms, and attack them with algorithms that give you an approximate answer insteadEach carefully-presented example includes helpful diagrams and fully-annotated code samples in Python. By the end of this book, you will know some of the most widely applicable algorithms as well as how and when to use them.
The C++ Programming Language
Bjarne Stroustrup - 1986
For this special hardcover edition, two new appendixes on locales and standard library exception safety (also available at www.research.att.com/ bs/) have been added. The result is complete, authoritative coverage of the C++ language, its standard library, and key design techniques. Based on the ANSI/ISO C++ standard, The C++ Programming Language provides current and comprehensive coverage of all C++ language features and standard library components. For example:abstract classes as interfaces class hierarchies for object-oriented programming templates as the basis for type-safe generic software exceptions for regular error handling namespaces for modularity in large-scale software run-time type identification for loosely coupled systems the C subset of C++ for C compatibility and system-level work standard containers and algorithms standard strings, I/O streams, and numerics C compatibility, internationalization, and exception safety Bjarne Stroustrup makes C++ even more accessible to those new to the language, while adding advanced information and techniques that even expert C++ programmers will find invaluable.
The Formula: How Algorithms Solve all our Problems … and Create More
Luke Dormehl - 2014
What if everything in life could be reduced to a simple formula? What if numbers were able to tell us which partners we were best matched with – not just in terms of attractiveness, but for a long-term committed marriage? Or if they could say which films would be the biggest hits at the box office, and what changes could be made to those films to make them even more successful? Or even who out of us is likely to commit certain crimes, and when? This may sound like the world of science-fiction, but in fact it is just the tip of the iceberg in a world that is increasingly ruled by complex algorithms and neural networks.In The Formula, Luke Dormehl takes you inside the world of numbers, asking how we came to believe in the all-conquering power of algorithms; introducing the mathematicians, artificial intelligence experts and Silicon Valley entrepreneurs who are shaping this brave new world, and ultimately asking how we survive in an era where numbers can sometimes seem to create as many problems as they solve.
Python for Data Analysis
Wes McKinney - 2011
It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It's ideal for analysts new to Python and for Python programmers new to scientific computing.Use the IPython interactive shell as your primary development environmentLearn basic and advanced NumPy (Numerical Python) featuresGet started with data analysis tools in the pandas libraryUse high-performance tools to load, clean, transform, merge, and reshape dataCreate scatter plots and static or interactive visualizations with matplotlibApply the pandas groupby facility to slice, dice, and summarize datasetsMeasure data by points in time, whether it's specific instances, fixed periods, or intervalsLearn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
Structure and Interpretation of Computer Programs
Harold Abelson - 1984
This long-awaited revision contains changes throughout the text. There are new implementations of most of the major programming systems in the book, including the interpreters and compilers, and the authors have incorporated many small changes that reflect their experience teaching the course at MIT since the first edition was published. A new theme has been introduced that emphasizes the central role played by different approaches to dealing with time in computational models: objects with state, concurrent programming, functional programming and lazy evaluation, and nondeterministic programming. There are new example sections on higher-order procedures in graphics and on applications of stream processing in numerical programming, and many new exercises. In addition, all the programs have been reworked to run in any Scheme implementation that adheres to the IEEE standard.
A Mind at Play: How Claude Shannon Invented the Information Age
Jimmy Soni - 2017
He constructed a fleet of customized unicycles and a flamethrowing trumpet, outfoxed Vegas casinos, and built juggling robots. He also wrote the seminal text of the digital revolution, which has been called “the Magna Carta of the Information Age.” His discoveries would lead contemporaries to compare him to Albert Einstein and Isaac Newton. His work anticipated by decades the world we’d be living in today—and gave mathematicians and engineers the tools to bring that world to pass.In this elegantly written, exhaustively researched biography, Jimmy Soni and Rob Goodman reveal Claude Shannon’s full story for the first time. It’s the story of a small-town Michigan boy whose career stretched from the era of room-sized computers powered by gears and string to the age of Apple. It’s the story of the origins of our digital world in the tunnels of MIT and the “idea factory” of Bell Labs, in the “scientists’ war” with Nazi Germany, and in the work of Shannon’s collaborators and rivals, thinkers like Alan Turing, John von Neumann, Vannevar Bush, and Norbert Wiener.And it’s the story of Shannon’s life as an often reclusive, always playful genius. With access to Shannon’s family and friends, A Mind at Play brings this singular innovator and creative genius to life.
Concrete Mathematics: A Foundation for Computer Science
Ronald L. Graham - 1988
"More concretely," the authors explain, "it is the controlled manipulation of mathematical formulas, using a collection of techniques for solving problems."
Practical Statistics for Data Scientists: 50 Essential Concepts
Peter Bruce - 2017
Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.With this book, you'll learn:Why exploratory data analysis is a key preliminary step in data scienceHow random sampling can reduce bias and yield a higher quality dataset, even with big dataHow the principles of experimental design yield definitive answers to questionsHow to use regression to estimate outcomes and detect anomaliesKey classification techniques for predicting which categories a record belongs toStatistical machine learning methods that "learn" from dataUnsupervised learning methods for extracting meaning from unlabeled data
Information Theory, Inference and Learning Algorithms
David J.C. MacKay - 2002
These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.
The Art of Computer Programming, Volumes 1-3 Boxed Set
Donald Ervin Knuth - 1998
For the first time, these books are available as a boxed, three-volume set. The handsome slipcase makes this set an ideal gift for the recent computer science graduate or professional programmer. Offering a description of classical computer science, this multi-volume work is a useful resource in programming theory and practice for students, researchers, and practitioners alike. For programmers, it offers cookbook solutions to their day-to-day problems.
Tubes: A Journey to the Center of the Internet
Andrew Blum - 2012
But what is it physically? And where is it really? Our mental map of the network is as blank as the map of the ocean that Columbus carried on his first Atlantic voyage. The Internet, its material nuts and bolts, is an unexplored territory. Until now.In Tubes, journalist Andrew Blum goes inside the Internet's physical infrastructure and flips on the lights, revealing an utterly fresh look at the online world we think we know. It is a shockingly tactile realm of unmarked compounds, populated by a special caste of engineer who pieces together our networks by hand; where glass fibers pulse with light and creaky telegraph buildings, tortuously rewired, become communication hubs once again. From the room in Los Angeles where the Internet first flickered to life to the caverns beneath Manhattan where new fiber-optic cable is buried; from the coast of Portugal, where a ten-thousand-mile undersea cable just two thumbs wide connects Europe and Africa, to the wilds of the Pacific Northwest, where Google, Microsoft, and Facebook have built monumental data centers—Blum chronicles the dramatic story of the Internet's development, explains how it all works, and takes the first-ever in-depth look inside its hidden monuments.This is a book about real places on the map: their sounds and smells, their storied pasts, their physical details, and the people who live there. For all the talk of the "placelessness" of our digital age, the Internet is as fixed in real, physical spaces as the railroad or telephone. You can map it and touch it, and you can visit it. Is the Internet in fact "a series of tubes" as Ted Stevens, the late senator from Alaska, once famously described it? How can we know the Internet's possibilities if we don't know its parts?Like Tracy Kidder's classic The Soul of a New Machine or Tom Vanderbilt's recent bestseller Traffic, Tubes combines on-the-ground reporting and lucid explanation into an engaging, mind-bending narrative to help us understand the physical world that underlies our digital lives.
Stuff Matters: Exploring the Marvelous Materials That Shape Our Man-Made World
Mark Miodownik - 2013
Why is glass see-through? What makes elastic stretchy? Why does a paper clip bend? Why does any material look and behave the way it does? These are the sorts of questions that Mark Miodownik a globally-renowned materials scientist has spent his life exploring In this book he examines the materials he encounters in a typical morning, from the steel in his razor and the graphite in his pencil to the foam in his sneakers and the concrete in a nearby skyscraper.
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Ralph Kimball - 1996
Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)
Mining of Massive Datasets
Anand Rajaraman - 2011
This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Open Access
Peter Suber - 2012
We take advantage of this revolutionary opportunity when we make our work "open access" digital, online, free of charge, and free of most copyright and licensing restrictions. Open access is made possible by the Internet and copyright-holder consent, and many authors, musicians, filmmakers, and other creators who depend on royalties are understandably unwilling to give their consent. But for 350 years, scholars have written peer-reviewed journal articles for impact, not for money, and are free to consent to open access without losing revenue.In this concise introduction, Peter Suber tells us what open access is and isn't, how it benefits authors and readers of research, how we pay for it, how it avoids copyright problems, how it has moved from the periphery to the mainstream, and what its future may hold. Distilling a decade of Suber's influential writing and thinking about open access, this is the indispensable book on the subject for researchers, librarians, administrators, funders, publishers, and policy makers.ContentsSeries Foreword viiPreface ix1 What Is Open Access? 12 Motivation 293 Varieties 494 Policies 775 Scope 976 Copyright 1257 Economics 1338 Casualties 1499 Future 16310 Self-Help 169Glossary 175Notes 177Additional Resources 219Index 223