Big Data: A Revolution That Will Transform How We Live, Work, and Think


Viktor Mayer-Schönberger - 2013
    “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing.www.big-data-book.com

Python Essential Reference (Developer's Library)


David Beazley - 1999
    This text concisely describes the Python language and its programming environment for those readers already familiar with languages such as C and C++.

Data Science from Scratch: First Principles with Python


Joel Grus - 2015
    In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

What Is Data Science?


Mike Loukides - 2011
    Five years ago, in What is Web 2.0, Tim O'Reilly said that "data is the next Intel Inside." But what does that statement mean? Why do we suddenly care about statistics and about data? This report examines the many sides of data science -- the technologies, the companies and the unique skill sets.The web is full of "data-driven apps." Almost any e-commerce application is a data-driven application. There's a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn't really what we mean by "data science." A data application acquires its value from the data itself, and creates more data as a result. It's not just an application with data; it's a data product. Data science enables the creation of data products.

Deep Learning with Python


François Chollet - 2017
    It is the technology behind photo tagging systems at Facebook and Google, self-driving cars, speech recognition systems on your smartphone, and much more.In particular, Deep learning excels at solving machine perception problems: understanding the content of image data, video data, or sound data. Here's a simple example: say you have a large collection of images, and that you want tags associated with each image, for example, "dog," "cat," etc. Deep learning can allow you to create a system that understands how to map such tags to images, learning only from examples. This system can then be applied to new images, automating the task of photo tagging. A deep learning model only has to be fed examples of a task to start generating useful results on new data.

Code: The Hidden Language of Computer Hardware and Software


Charles Petzold - 1999
    And through CODE, we see how this ingenuity and our very human compulsion to communicate have driven the technological innovations of the past two centuries. Using everyday objects and familiar language systems such as Braille and Morse code, author Charles Petzold weaves an illuminating narrative for anyone who’s ever wondered about the secret inner life of computers and other smart machines. It’s a cleverly illustrated and eminently comprehensible story—and along the way, you’ll discover you’ve gained a real context for understanding today’s world of PCs, digital media, and the Internet. No matter what your level of technical savvy, CODE will charm you—and perhaps even awaken the technophile within.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Python Algorithms: Mastering Basic Algorithms in the Python Language


Magnus Lie Hetland - 2010
    Written by Magnus Lie Hetland, author of Beginning Python, this book is sharply focused on classical algorithms, but it also gives a solid understanding of fundamental algorithmic problem-solving techniques.The book deals with some of the most important and challenging areas of programming and computer science, but in a highly pedagogic and readable manner. The book covers both algorithmic theory and programming practice, demonstrating how theory is reflected in real Python programs. Well-known algorithms and data structures that are built into the Python language are explained, and the user is shown how to implement and evaluate others himself.

Networking for Systems Administrators (IT Mastery Book 5)


Michael W. Lucas - 2015
    Servers give sysadmins a incredible visibility into the network—once they know how to unlock it. Most sysadmins don’t need to understand window scaling, or the differences between IPv4 and IPv6 echo requests, or other intricacies of the TCP/IP protocols. You need only enough to deploy your own applications and get easy support from the network team.This book teaches you:•How modern networks really work•The essentials of TCP/IP•The next-generation protocol, IPv6•The right tools to diagnose network problems, and how to use them•Troubleshooting everything from the physical wire to DNS•How to see the traffic you send and receive•Connectivity testing•How to communicate with your network team to quickly resolve problemsA systems administrator doesn’t need to know the innards of TCP/IP, but knowing enough to diagnose your own network issues transforms a good sysadmin into a great one.

Hadoop Explained


Aravind Shenoy - 2014
    Hadoop allowed small and medium sized companies to store huge amounts of data on cheap commodity servers in racks. The introduction of Big Data has allowed businesses to make decisions based on quantifiable analysis. Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera, and Dell to name a few. This book introduces you to Hadoop and to concepts such as ‘MapReduce’, ‘Rack Awareness’, ‘Yarn’ and ‘HDFS Federation’, which will help you get acquainted with the technology.

Programming in Python 3: A Complete Introduction to the Python Language


Mark Summerfield - 2008
    It brings together all the knowledge needed to write any program, use any standard or third-party Python 3 library, and create new library modules of your own.

SQL Cookbook


Anthony Molinaro - 2005
    You'd like to learn how to do more work with SQL inside the database before pushing data across the network to your applications. You'd like to take your SQL skills to the next level.Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing outPowerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES functionPivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result setBucketization, and why you should never use that term in Brooklyn.How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniquesThe technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a stringWritten in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days. The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.

Python Data Science Handbook: Tools and Techniques for Developers


Jake Vanderplas - 2016
    Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.With this handbook, you’ll learn how to use: * IPython and Jupyter: provide computational environments for data scientists using Python * NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python * Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python * Matplotlib: includes capabilities for a flexible range of data visualizations in Python * Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling


Ralph Kimball - 1996
    Here is a complete library of dimensional modeling techniques-- the most comprehensive collection ever written. Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball's classic guide is more than sixty percent updated.The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including:* Retail sales and e-commerce* Inventory management* Procurement* Order management* Customer relationship management (CRM)* Human resources management* Accounting* Financial services* Telecommunications and utilities* Education* Transportation* Health care and insuranceBy the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.This book is also available as part of the Kimball's Data Warehouse Toolkit Classics Box Set (ISBN: 9780470479575) with the following 3 books:The Data Warehouse Toolkit, 2nd Edition (9780471200246)The Data Warehouse Lifecycle Toolkit, 2nd Edition (9780470149775)The Data Warehouse ETL Toolkit (9780764567575)

The Hundred-Page Machine Learning Book


Andriy Burkov - 2019
    During that week, you will learn almost everything modern machine learning has to offer. The author and other practitioners have spent years learning these concepts.Companion wiki — the book has a continuously updated wiki that extends some book chapters with additional information: Q&A, code snippets, further reading, tools, and other relevant resources.Flexible price and formats — choose from a variety of formats and price options: Kindle, hardcover, paperback, EPUB, PDF. If you buy an EPUB or a PDF, you decide the price you pay!Read first, buy later — download book chapters for free, read them and share with your friends and colleagues. Only if you liked the book or found it useful in your work, study or business, then buy it.