Web Scraping with Python: Collecting Data from the Modern Web
Ryan Mitchell - 2015
With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.
Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.
Learn how to parse complicated HTML pages
Traverse multiple pages and sites
Get a general overview of APIs and how they work
Learn several methods for storing the data you scrape
Download, read, and extract data from documents
Use tools and techniques to clean badly formatted data
Read and write natural languages
Crawl through forms and logins
Understand how to scrape JavaScript
Learn image processing and text recognition
Real World Haskell: Code You Can Believe In
Bryan O'Sullivan - 2008
You'll learn how to use Haskell in a variety of practical ways, from short scripts to large and demanding applications. Real World Haskell takes you through the basics of functional programming at a brisk pace, and then helps you increase your understanding of Haskell in real-world issues like I/O, performance, dealing with data, concurrency, and more as you move through each chapter. With this book, you will:Understand the differences between procedural and functional programming Learn the features of Haskell, and how to use it to develop useful programs Interact with filesystems, databases, and network services Write solid code with automated tests, code coverage, and error handling Harness the power of multicore systems via concurrent and parallel programming You'll find plenty of hands-on exercises, along with examples of real Haskell programs that you can modify, compile, and run. Whether or not you've used a functional language before, if you want to understand why Haskell is coming into its own as a practical language in so many major organizations, Real World Haskell is the best place to start.
Designing Data-Intensive Applications
Martin Kleppmann - 2015
Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures
Naked Statistics: Stripping the Dread from the Data
Charles Wheelan - 2012
How can we catch schools that cheat on standardized tests? How does Netflix know which movies you’ll like? What is causing the rising incidence of autism? As best-selling author Charles Wheelan shows us in Naked Statistics, the right data and a few well-chosen statistical tools can help us answer these questions and more.For those who slept through Stats 101, this book is a lifesaver. Wheelan strips away the arcane and technical details and focuses on the underlying intuition that drives statistical analysis. He clarifies key concepts such as inference, correlation, and regression analysis, reveals how biased or careless parties can manipulate or misrepresent data, and shows us how brilliant and creative researchers are exploiting the valuable data from natural experiments to tackle thorny questions.And in Wheelan’s trademark style, there’s not a dull page in sight. You’ll encounter clever Schlitz Beer marketers leveraging basic probability, an International Sausage Festival illuminating the tenets of the central limit theorem, and a head-scratching choice from the famous game show Let’s Make a Deal—and you’ll come away with insights each time. With the wit, accessibility, and sheer fun that turned Naked Economics into a bestseller, Wheelan defies the odds yet again by bringing another essential, formerly unglamorous discipline to life.
The Art of Doing Science and Engineering: Learning to Learn
Richard Hamming - 1996
By presenting actual experiences and analyzing them as they are described, the author conveys the developmental thought processes employed and shows a style of thinking that leads to successful results is something that can be learned. Along with spectacular successes, the author also conveys how failures contributed to shaping the thought processes. Provides the reader with a style of thinking that will enhance a person's ability to function as a problem-solver of complex technical issues. Consists of a collection of stories about the author's participation in significant discoveries, relating how those discoveries came about and, most importantly, provides analysis about the thought processes and reasoning that took place as the author and his associates progressed through engineering problems.
Kingpin: How One Hacker Took Over the Billion-Dollar Cybercrime Underground
Kevin Poulsen - 2011
Max 'Vision' Butler was a white-hat hacker and a celebrity throughout the programming world, even serving as a consultant to the FBI. But there was another side to Max. As the black-hat 'Iceman', he'd seen the fraudsters around him squabble, their ranks riddled with infiltrators, their methods inefficient, and in their dysfunction was the ultimate challenge: he would stage a coup and steal their ill-gotten gains from right under their noses.Through the story of Max Butler's remarkable rise, KINGPIN lays bare the workings of a silent crime wave affecting millions worldwide. It exposes vast online-fraud supermarkets stocked with credit card numbers, counterfeit cheques, hacked bank accounts and fake passports. Thanks to Kevin Poulsen's remarkable access to both cops and criminals, we step inside the quiet,desperate battle that law enforcement fights against these scammers. And learn that the boy next door may not be all he seems.
UNIX and Linux System Administration Handbook
Evi Nemeth - 2010
This is one of those cases. The UNIX System Administration Handbook is one of the few books we ever measured ourselves against." -From the Foreword by Tim O'Reilly, founder of O'Reilly Media "This book is fun and functional as a desktop reference. If you use UNIX and Linux systems, you need this book in your short-reach library. It covers a bit of the systems' history but doesn't bloviate. It's just straightfoward information delivered in colorful and memorable fashion." -Jason A. Nunnelley"This is a comprehensive guide to the care and feeding of UNIX and Linux systems. The authors present the facts along with seasoned advice and real-world examples. Their perspective on the variations among systems is valuable for anyone who runs a heterogeneous computing facility." -Pat Parseghian The twentieth anniversary edition of the world's best-selling UNIX system administration book has been made even better by adding coverage of the leading Linux distributions: Ubuntu, openSUSE, and RHEL. This book approaches system administration in a practical way and is an invaluable reference for both new administrators and experienced professionals. It details best practices for every facet of system administration, including storage management, network design and administration, email, web hosting, scripting, software configuration management, performance analysis, Windows interoperability, virtualization, DNS, security, management of IT service organizations, and much more. UNIX(R) and Linux(R) System Administration Handbook, Fourth Edition, reflects the current versions of these operating systems: Ubuntu(R) LinuxopenSUSE(R) LinuxRed Hat(R) Enterprise Linux(R)Oracle America(R) Solaris(TM) (formerly Sun Solaris)HP HP-UX(R)IBM AIX(R)
A Common-Sense Guide to Data Structures and Algorithms: Level Up Your Core Programming Skills
Jay Wengrow - 2017
If you have received one of these copies, please contact the Pragmatic Bookshelf at support@pragprog.com, and we will replace it for you.Algorithms and data structures are much more than abstract concepts. Mastering them enables you to write code that runs faster and more efficiently, which is particularly important for today's web and mobile apps. This book takes a practical approach to data structures and algorithms, with techniques and real-world scenarios that you can use in your daily production code. Graphics and examples make these computer science concepts understandable and relevant. You can use these techniques with any language; examples in the book are in JavaScript, Python, and Ruby.Use Big O notation, the primary tool for evaluating algorithms, to measure and articulate the efficiency of your code, and modify your algorithm to make it faster. Find out how your choice of arrays, linked lists, and hash tables can dramatically affect the code you write. Use recursion to solve tricky problems and create algorithms that run exponentially faster than the alternatives. Dig into advanced data structures such as binary trees and graphs to help scale specialized applications such as social networks and mapping software. You'll even encounter a single keyword that can give your code a turbo boost. Jay Wengrow brings to this book the key teaching practices he developed as a web development bootcamp founder and educator.Use these techniques today to make your code faster and more scalable.
Data Science at the Command Line: Facing the Future with Time-Tested Tools
Jeroen Janssens - 2014
You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms
The C++ Programming Language
Bjarne Stroustrup - 1986
For this special hardcover edition, two new appendixes on locales and standard library exception safety (also available at www.research.att.com/ bs/) have been added. The result is complete, authoritative coverage of the C++ language, its standard library, and key design techniques. Based on the ANSI/ISO C++ standard, The C++ Programming Language provides current and comprehensive coverage of all C++ language features and standard library components. For example:abstract classes as interfaces class hierarchies for object-oriented programming templates as the basis for type-safe generic software exceptions for regular error handling namespaces for modularity in large-scale software run-time type identification for loosely coupled systems the C subset of C++ for C compatibility and system-level work standard containers and algorithms standard strings, I/O streams, and numerics C compatibility, internationalization, and exception safety Bjarne Stroustrup makes C++ even more accessible to those new to the language, while adding advanced information and techniques that even expert C++ programmers will find invaluable.
React: Up and Running
Stoyan Stefanov - 2015
With "React: Up and Running" you'll learn how to get off the ground with React, with no prior knowledge.This book teaches you how to build components, the building blocks of your apps, as well as how to organize the components into large-scale apps. In addition, you ll learn about unit testing and optimizing performance, while focusing on the application s data (and letting the UI take care of itself)."
Practical SQL: A Beginner's Guide to Storytelling with Data
Anthony DeBarros - 2022
An approachable guide to programming in SQL (Structured Query Language) that will teach even beginning programmers how to build powerful databases and analyze data to find meaningful information.Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language) written by longtime professional journalist Anthony DeBarros. SQL is the primary tool that programmers, web developers, researchers, journalists, and others use to explore data in a database. DeBarros focuses on using SQL to find the story in data, with the aid of the popular open-source database PostgreSQL and the pgAdmin interface.This thoroughly revised second edition includes a new chapter describing how to set up PostgreSQL and more extensive discussion of pgAdmin's best features. The author has also added a chapter on the JSON data format that shows readers how to store and query JSON data. DeBarros has also updated the data in the book throughout, added coverage of additional topics, and perfected the book's examples.Readers love DeBarros's use of exercises and real-world examples that demonstrate how to:- Create databases and related tables using your own data - Correctly define data typesAggregate, sort, and filter data to find patterns - Clean their data and transfer data as text files - Create advanced queries and automate tasksThis book uses PostgreSQL, but the SQL syntax is applicable to many database applications, including Microsoft SQL Server and MySQL.
Understanding Computation: From Simple Machines to Impossible Programs
Tom Stuart - 2013
Understanding Computation explains theoretical computer science in a context you’ll recognize, helping you appreciate why these ideas matter and how they can inform your day-to-day programming.Rather than use mathematical notation or an unfamiliar academic programming language like Haskell or Lisp, this book uses Ruby in a reductionist manner to present formal semantics, automata theory, and functional programming with the lambda calculus. It’s ideal for programmers versed in modern languages, with little or no formal training in computer science.* Understand fundamental computing concepts, such as Turing completeness in languages* Discover how programs use dynamic semantics to communicate ideas to machines* Explore what a computer can do when reduced to its bare essentials* Learn how universal Turing machines led to today’s general-purpose computers* Perform complex calculations, using simple languages and cellular automata* Determine which programming language features are essential for computation* Examine how halting and self-referencing make some computing problems unsolvable* Analyze programs by using abstract interpretation and type systems
Feynman Lectures On Computation
Richard P. Feynman - 1996
Feynman gave his famous course on computation at the California Institute of Technology, he asked Tony Hey to adapt his lecture notes into a book. Although led by Feynman, the course also featured, as occasional guest speakers, some of the most brilliant men in science at that time, including Marvin Minsky, Charles Bennett, and John Hopfield. Although the lectures are now thirteen years old, most of the material is timeless and presents a “Feynmanesque” overview of many standard and some not-so-standard topics in computer science such as reversible logic gates and quantum computers.
Machine Learning: A Visual Starter Course For Beginner's
Oliver Theobald - 2017
If you have ever found yourself lost halfway through other introductory materials on this topic, this is the book for you. If you don't understand set terminology such as vectors, hyperplanes, and centroids, then this is also the book for you. This starter course isn't a picture story book but does include many visual examples that break algorithms down into a digestible and practical format. As a starter course, this book connects the dots and offers the crash course I wish I had when I first started. The kind of guide I wish had before I started taking on introductory courses that presume you’re two days away from an advanced mathematics exam. That’s why this introductory course doesn’t go further on the subject than other introductory books, but rather, goes a step back. A half-step back in order to help everyone make his or her first strides in machine learning and is an ideal study companion for the visual learner. In this step-by-step guide you will learn: - How to download free datasets - What tools and software packages you need - Data scrubbing techniques, including one-hot encoding, binning and dealing with missing data - Preparing data for analysis, including k-fold Validation - Regression analysis to create trend lines - Clustering, including k-means and k-nearest Neighbors - Naive Bayes Classifier to predict new classes - Anomaly detection and SVM algorithms to combat anomalies and outliers - The basics of Neural Networks - Bias/Variance to improve your machine learning model - Decision Trees to decode classification
Please feel welcome to join this starter course by buying a copy, or sending a free sample to your preferred device.