Web Scraping with Python: Collecting Data from the Modern Web


Ryan Mitchell - 2015
    With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice. Learn how to parse complicated HTML pages Traverse multiple pages and sites Get a general overview of APIs and how they work Learn several methods for storing the data you scrape Download, read, and extract data from documents Use tools and techniques to clean badly formatted data Read and write natural languages Crawl through forms and logins Understand how to scrape JavaScript Learn image processing and text recognition

UNIX and Linux System Administration Handbook


Evi Nemeth - 2010
    This is one of those cases. The UNIX System Administration Handbook is one of the few books we ever measured ourselves against." -From the Foreword by Tim O'Reilly, founder of O'Reilly Media "This book is fun and functional as a desktop reference. If you use UNIX and Linux systems, you need this book in your short-reach library. It covers a bit of the systems' history but doesn't bloviate. It's just straightfoward information delivered in colorful and memorable fashion." -Jason A. Nunnelley"This is a comprehensive guide to the care and feeding of UNIX and Linux systems. The authors present the facts along with seasoned advice and real-world examples. Their perspective on the variations among systems is valuable for anyone who runs a heterogeneous computing facility." -Pat Parseghian The twentieth anniversary edition of the world's best-selling UNIX system administration book has been made even better by adding coverage of the leading Linux distributions: Ubuntu, openSUSE, and RHEL. This book approaches system administration in a practical way and is an invaluable reference for both new administrators and experienced professionals. It details best practices for every facet of system administration, including storage management, network design and administration, email, web hosting, scripting, software configuration management, performance analysis, Windows interoperability, virtualization, DNS, security, management of IT service organizations, and much more. UNIX(R) and Linux(R) System Administration Handbook, Fourth Edition, reflects the current versions of these operating systems: Ubuntu(R) LinuxopenSUSE(R) LinuxRed Hat(R) Enterprise Linux(R)Oracle America(R) Solaris(TM) (formerly Sun Solaris)HP HP-UX(R)IBM AIX(R)

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Machine Learning: The Art and Science of Algorithms That Make Sense of Data


Peter Flach - 2012
    Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

Python for Everybody: Exploring Data in Python 3


Charles Severance - 2016
    You can think of the Python programming language as your tool to solve data problems that are beyond the capability of a spreadsheet.Python is an easy to use and easy to learn programming language that is freely available on Macintosh, Windows, or Linux computers. So once you learn Python you can use it for the rest of your career without needing to purchase any software.This book uses the Python 3 language. The earlier Python 2 version of this book is titled "Python for Informatics: Exploring Information".

Programming Collective Intelligence: Building Smart Web 2.0 Applications


Toby Segaran - 2002
    With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in a dataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."-- Tim Wolters, CTO, Collective Intellect

Agile Web Development with Rails: A Pragmatic Guide


Dave Thomas - 2005
    A full Rails application probably has less total code than the XML you'd need to configure the same application in other frameworks. With this book you'll learn how to use "ActiveRecord" to connect business objects and database tables. No more painful object-relational mapping. Just create your business objects and let Rails do the rest. You'll learn how to use the "Action Pack" framework to route incoming requests and render pages using easy-to-write templates and components. See how to exploit the Rails service frameworks to send emails, implement web services, and create dynamic, user-centric web-pages using built-in Javascript and Ajax support. There are extensive chapters on testing, deployment, and scaling. You'll see how easy it is to install Rails using your web server of choice (such as Apache or lighttpd) or using its own included web server. You'll be writing applications that work with your favorite database (MySQL, Oracle, Postgres, and more) in no time at all. You'll create a complete online store application in the extended tutorial section, so you'll see how a full Rails application is developed---iteratively and rapidly. Rails strives to honor the Pragmatic Programmer's "DRY Principle" by avoiding the extra work of configuration files and code annotations. You can develop in real-time: make a change, and watch it work immediately. Forget XML. Everything in Rails, from templates to control flow to business logic, is written in Ruby, the language of choice for programmers who like to get the job done well (and leave work ontime for a change). Rails is the framework of choice for the new generation of Web 2.0 developers. Agile Web Development with Rails is the book for that generation, written by Dave Thomas (Pragmatic Programmer and author of Programming Ruby) and David Heinemeier Hansson, who created Rails.

Discrete Mathematics and Its Applications


Kenneth H. Rosen - 2000
    These themes include mathematical reasoning, combinatorial analysis, discrete structures, algorithmic thinking, and enhanced problem-solving skills through modeling. Its intent is to demonstrate the relevance and practicality of discrete mathematics to all students. The Fifth Edition includes a more thorough and linear presentation of logic, proof types and proof writing, and mathematical reasoning. This enhanced coverage will provide students with a solid understanding of the material as it relates to their immediate field of study and other relevant subjects. The inclusion of applications and examples to key topics has been significantly addressed to add clarity to every subject. True to the Fourth Edition, the text-specific web site supplements the subject matter in meaningful ways, offering additional material for students and instructors. Discrete math is an active subject with new discoveries made every year. The continual growth and updates to the web site reflect the active nature of the topics being discussed. The book is appropriate for a one- or two-term introductory discrete mathematics course to be taken by students in a wide variety of majors, including computer science, mathematics, and engineering. College Algebra is the only explicit prerequisite.

Database Management Systems


Raghu Ramakrishnan - 1997
    Coherent explanations and practical examples have made this one of the leading texts in the field. The third edition continues in this tradition, enhancing it with more practical material. The new edition has been reorganized to allow more flexibility in the way the course is taught. Now, instructors can easily choose whether they would like to teach a course which emphasizes database application development or a course that emphasizes database systems issues. New overview chapters at the beginning of parts make it possible to skip other chapters in the part if you don't want the detail.More applications and examples have been added throughout the book, including SQL and Oracle examples. The applied flavor is further enhanced by the two new database applications chapters.

The Tangled Web: A Guide to Securing Modern Web Applications


Michal Zalewski - 2011
    Every piece of the web application stack, from HTTP requests to browser-side scripts, comes with important yet subtle security consequences. To keep users safe, it is essential for developers to confidently navigate this landscape.In The Tangled Web, Michal Zalewski, one of the world's top browser security experts, offers a compelling narrative that explains exactly how browsers work and why they're fundamentally insecure. Rather than dispense simplistic advice on vulnerabilities, Zalewski examines the entire browser security model, revealing weak points and providing crucial information for shoring up web application security. You'll learn how to:Perform common but surprisingly complex tasks such as URL parsing and HTML sanitization Use modern security features like Strict Transport Security, Content Security Policy, and Cross-Origin Resource Sharing Leverage many variants of the same-origin policy to safely compartmentalize complex web applications and protect user credentials in case of XSS bugs Build mashups and embed gadgets without getting stung by the tricky frame navigation policy Embed or host user-supplied content without running into the trap of content sniffing For quick reference, "Security Engineering Cheat Sheets" at the end of each chapter offer ready solutions to problems you're most likely to encounter. With coverage extending as far as planned HTML5 features, The Tangled Web will help you create secure web applications that stand the test of time.

Fundamentals of Software Architecture: An Engineering Approach


Mark Richards - 2020
    Until now. This practical guide provides the first comprehensive overview of software architecture's many aspects. You'll examine architectural characteristics, architectural patterns, component determination, diagramming and presenting architecture, evolutionary architecture, and many other topics.Authors Neal Ford and Mark Richards help you learn through examples in a variety of popular programming languages, such as Java, C#, JavaScript, and others. You'll focus on architecture principles with examples that apply across all technology stacks.

Growing Object-Oriented Software, Guided by Tests


Steve Freeman - 2009
    This one's a keeper." --Robert C. Martin "If you want to be an expert in the state of the art in TDD, you need to understand the ideas in this book."--Michael Feathers Test-Driven Development (TDD) is now an established technique for delivering better software faster. TDD is based on a simple idea: Write tests for your code before you write the code itself. However, this simple idea takes skill and judgment to do well. Now there's a practical guide to TDD that takes you beyond the basic concepts. Drawing on a decade of experience building real-world systems, two TDD pioneers show how to let tests guide your development and "grow" software that is coherent, reliable, and maintainable. Steve Freeman and Nat Pryce describe the processes they use, the design principles they strive to achieve, and some of the tools that help them get the job done. Through an extended worked example, you'll learn how TDD works at multiple levels, using tests to drive the features and the object-oriented structure of the code, and using Mock Objects to discover and then describe relationships between objects. Along the way, the book systematically addresses challenges that development teams encounter with TDD--from integrating TDD into your processes to testing your most difficult features. Coverage includes - Implementing TDD effectively: getting started, and maintaining your momentum throughout the project - Creating cleaner, more expressive, more sustainable code - Using tests to stay relentlessly focused on sustaining quality - Understanding how TDD, Mock Objects, and Object-Oriented Design come together in the context of a real software development project - Using Mock Objects to guide object-oriented designs - Succeeding where TDD is difficult: managing complex test data, and testing persistence and concurrency

Cryptography Engineering: Design Principles and Practical Applications


Niels Ferguson - 2010
    Cryptography is vital to keeping information safe, in an era when the formula to do so becomes more and more challenging. Written by a team of world-renowned cryptography experts, this essential guide is the definitive introduction to all major areas of cryptography: message security, key negotiation, and key management. You'll learn how to think like a cryptographer. You'll discover techniques for building cryptography into products from the start and you'll examine the many technical changes in the field.After a basic overview of cryptography and what it means today, this indispensable resource covers such topics as block ciphers, block modes, hash functions, encryption modes, message authentication codes, implementation issues, negotiation protocols, and more. Helpful examples and hands-on exercises enhance your understanding of the multi-faceted field of cryptography.An author team of internationally recognized cryptography experts updates you on vital topics in the field of cryptography Shows you how to build cryptography into products from the start Examines updates and changes to cryptography Includes coverage on key servers, message security, authentication codes, new standards, block ciphers, message authentication codes, and more Cryptography Engineering gets you up to speed in the ever-evolving field of cryptography.

Pearls of Functional Algorithm Design


Richard S. Bird - 2010
    These 30 short chapters each deal with a particular programming problem drawn from sources as diverse as games and puzzles, intriguing combinatorial tasks, and more familiar areas such as data compression and string matching. Each pearl starts with the statement of the problem expressed using the functional programming language Haskell, a powerful yet succinct language for capturing algorithmic ideas clearly and simply. The novel aspect of the book is that each solution is calculated from an initial formulation of the problem in Haskell by appealing to the laws of functional programming. Pearls of Functional Algorithm Design will appeal to the aspiring functional programmer, students and teachers interested in the principles of algorithm design, and anyone seeking to master the techniques of reasoning about programs in an equational style.

Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites


Matthew A. Russell - 2011
    You’ll learn how to combine social web data, analysis techniques, and visualization to find what you’ve been looking for in the social haystack—as well as useful information you didn’t know existed.Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started is a programming background and a willingness to learn basic Python tools.Get a straightforward synopsis of the social web landscapeUse adaptable scripts on GitHub to harvest data from social network APIs such as Twitter, Facebook, LinkedIn, and Google+Learn how to employ easy-to-use Python tools to slice and dice the data you collectExplore social connections in microformats with the XHTML Friends NetworkApply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detectionBuild interactive visualizations with web technologies based upon HTML5 and JavaScript toolkits"A rich, compact, useful, practical introduction to a galaxy of tools, techniques, and theories for exploring structured and unstructured data." --Alex Martelli, Senior Staff Engineer, Google