Web Scraping with Python: Collecting Data from the Modern Web


Ryan Mitchell - 2015
    With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice. Learn how to parse complicated HTML pages Traverse multiple pages and sites Get a general overview of APIs and how they work Learn several methods for storing the data you scrape Download, read, and extract data from documents Use tools and techniques to clean badly formatted data Read and write natural languages Crawl through forms and logins Understand how to scrape JavaScript Learn image processing and text recognition

Programming Groovy


Venkat Subramaniam - 2008
    But recently, the industry has turned to dynamic languages for increased productivity and speed to market.Groovy is one of a new breed of dynamic languages that run on the Java platform. You can use these new languages on the JVM and intermix them with your existing Java code. You can leverage your Java investments while benefiting from advanced features including true Closures, Meta Programming, the ability to create internal DSLs, and a higher level of abstraction.If you're an experienced Java developer, Programming Groovy will help you learn the necessary fundamentals of programming in Groovy. You'll see how to use Groovy to do advanced programming including using Meta Programming, Builders, Unit Testing with Mock objects, processing XML, working with Databases and creating your own Domain-Specific Languages (DSLs).

HTML & XHTML: The Definitive Guide


Chuck Musciano - 1996
    For nearly a decade, hundreds of thousands of web developers have turned to HTML & XHTML: The Definitive Guide to master standards-based web development. Truly a definitive guide, the book combines a unique balance of tutorial material with a comprehensive reference that even the most experienced web professionals keep close at hand. From basic syntax and semantics to guidelines aimed at helping you develop your own distinctive style, this classic is all you need to become fluent in the language of web design.The new sixth edition guides you through every element of HTML and XHTML in detail, explaining how each element works and how it interacts with other elements. You'll also find detailed discussions of CSS (Cascading Style Sheets), which is intricately related to web page development. The most all-inclusive, up-to-date book on these languages available, this edition covers HTML 4.01, XHTML 1.0, and CSS2, with a preview of the upcoming XHTML2 and CSS3. Other topics include the newer initiatives in XHTML (XForms, XFrames, and modularization) and the essentials of XML for advanced readers. You'll learn how to:Use style sheets to control your document's appearance Work with programmatically generated HTML Create tables, both simple and complex Use frames to coordinate sets of documents Design and build interactive forms and dynamic documents Insert images, sound files, video, Java applets, and JavaScript programs Create documents that look good on a variety of browsersThe authors apply a natural learning approach that uses straightforward language and plenty of examples. Throughout the book, they offer suggestions for style and composition to help you decide how to best use HTML and XHTML to accomplish a variety of tasks. You'll learn what works and what doesn't, and what makes sense to those who view your web pages and what might be confusing. Written for anyone who wants to learn the language of the Web--from casual users to the full-time design professionals--this is the single most important book on HTML and XHTML you can own.Bill Kennedy is chief technical officer of MobileRobots, Inc. When not hacking new HTML pages or writing about them, "Dr. Bill" (Ph.D. in biophysics from Loyola University of Chicago) is out promoting the company's line of mobile, autonomous robots that can be used for artificial intelligence, fuzzy logic research, and education.Chuck Musciano began his career as a compiler writer and crafter of tools at Harris Corporations' Advanced Technology Group and is now a manager of Unix Systems in Harris' Corporate Data Center.

sed & awk


Dale Dougherty - 1990
    The most common operation done with sed is substitution, replacing one block of text with another. awk is a complete programming language. Unlike many conventional languages, awk is "data driven" -- you specify what kind of data you are interested in and the operations to be performed when that data is found. awk does many things for you, including automatically opening and closing data files, reading records, breaking the records up into fields, and counting the records. While awk provides the features of most conventional programming languages, it also includes some unconventional features, such as extended regular expression matching and associative arrays. sed & awk describes both programs in detail and includes a chapter of example sed and awk scripts. This edition covers features of sed and awk that are mandated by the POSIX standard. This most notably affects awk, where POSIX standardized a new variable, CONVFMT, and new functions, toupper() and tolower(). The CONVFMT variable specifies the conversion format to use when converting numbers to strings (awk used to use OFMT for this purpose). The toupper() and tolower() functions each take a (presumably mixed case) string argument and return a new version of the string with all letters translated to the corresponding case. In addition, this edition covers GNU sed, newly available since the first edition. It also updates the first edition coverage of Bell Labs nawk and GNU awk (gawk), covers mawk, an additional freely available implementation of awk, and briefly discusses three commercial versions of awk, MKS awk, Thompson Automation awk (tawk), and Videosoft (VSAwk).

Don't Make Me Think, Revisited: A Common Sense Approach to Web Usability


Steve Krug - 2000
    And it’s still short, profusely illustrated…and best of all–fun to read.If you’ve read it before, you’ll rediscover what made Don’t Make Me Think so essential to Web designers and developers around the world. If you’ve never read it, you’ll see why so many people have said it should be required reading for anyone working on Web sites.