Book picks similar to
The Data Engineering Cookbook by Andreas Kretz
tech
computer-science
data-engineering
0-dec
Algorithms of the Intelligent Web
Haralambos Marmanis - 2009
They use powerful techniques to process information intelligently and offer features based on patterns and relationships in data. Algorithms of the Intelligent Web shows readers how to use the same techniques employed by household names like Google Ad Sense, Netflix, and Amazon to transform raw data into actionable information.Algorithms of the Intelligent Web is an example-driven blueprint for creating applications that collect, analyze, and act on the massive quantities of data users leave in their wake as they use the web. Readers learn to build Netflix-style recommendation engines, and how to apply the same techniques to social-networking sites. See how click-trace analysis can result in smarter ad rotations. All the examples are designed both to be reused and to illustrate a general technique- an algorithm-that applies to a broad range of scenarios.As they work through the book's many examples, readers learn about recommendation systems, search and ranking, automatic grouping of similar objects, classification of objects, forecasting models, and autonomous agents. They also become familiar with a large number of open-source libraries and SDKs, and freely available APIs from the hottest sites on the internet, such as Facebook, Google, eBay, and Yahoo.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
How to Measure Anything in Cybersecurity Risk
Douglas W. Hubbard - 2016
In his bestselling book How to Measure Anything, author Douglas W. Hubbard opened the business world's eyes to the critical need for better measurement. This book expands upon that premise and draws from The Failure of Risk Management to sound the alarm in the cybersecurity realm. Some of the field's premier risk management approaches actually create more risk than they mitigate, and questionable methods have been duplicated across industries and embedded in the products accepted as gospel. This book sheds light on these blatant risks, and provides alternate techniques that can help improve your current situation. You'll also learn which approaches are too risky to save, and are actually more damaging than a total lack of any security.Dangerous risk management methods abound; there is no industry more critically in need of solutions than cybersecurity. This book provides solutions where they exist, and advises when to change tracks entirely.Discover the shortcomings of cybersecurity's best practices Learn which risk management approaches actually create risk Improve your current practices with practical alterations Learn which methods are beyond saving, and worse than doing nothing Insightful and enlightening, this book will inspire a closer examination of your company's own risk management practices in the context of cybersecurity. The end goal is airtight data protection, so finding cracks in the vault is a positive thing--as long as you get there before the bad guys do. How to Measure Anything in Cybersecurity Risk is your guide to more robust protection through better quantitative processes, approaches, and techniques.
Artificial Intelligence: The Insights You Need from Harvard Business Review (HBR Insights Series)
Harvard Business Review - 2019
What should you and your company be doing today to ensure that you're poised for success and keeping up with your competitors in the age of AI?Artificial Intelligence: The Insights You Need from Harvard Business Review brings you today's most essential thinking on AI and explains how to launch the right initiatives at your company to capitalize on the opportunity of the machine intelligence revolution.Business is changing. Will you adapt or be left behind?Get up to speed and deepen your understanding of the topics that are shaping your company's future with the Insights You Need from Harvard Business Review series. Featuring HBR's smartest thinking on fast-moving issues--blockchain, cybersecurity, AI, and more--each book provides the foundational introduction and practical case studies your organization needs to compete today and collects the best research, interviews, and analysis to get it ready for tomorrow. You can't afford to ignore how these issues will transform the landscape of business and society. The Insights You Need series will help you grasp these critical ideas--and prepare you and your company for the future.
The Human Face of Big Data
Rick Smolan - 2012
Its enable us to sense, measure, and understand aspects of our existence in ways never before possible. The Human Face of Big Data captures, in glorious photographs and moving essays, an extraordinary revolution sweeping, almost invisibly, through business, academia, government, healthcare, and everyday life. It's already enabling us to provide a healthier life for our children. To provide our seniors with independence while keeping them safe. To help us conserve precious resources like water and energy. To alert us to tiny changes in our health, weeks or years before we develop a life-threatening illness. To peer into our own individual genetic makeup. To create new forms of life. And soon, as many predict, to re-engineer our own species. And we've barely scratched the surface . . . Over the past decade, Rick Smolan and Jennifer Erwitt, co-founders of Against All Odds Productions, have produced a series of ambitious global projects in collaboration with hundreds of the world's leading photographers, writers, and graphic designers. Their Day in the Life projects were credited for creating a mass market for large-format illustrated books (rare was the coffee table book without one). Today their projects aim at sparking global conversations about emerging topics ranging from the Internet (24 Hours in Cyberspace), to Microprocessors (One Digital Day), to how the human race is learning to heal itself, (The Power to Heal) to the global water crisis (Blue Planet Run). This year Smolan and Erwitt dispatched photographers and writers in every corner of the globe to explore the world of “Big Data” and to determine if it truly does, as many in the field claim, represent a brand new toolset for humanity, helping address the biggest challenges facing our species. The book features 10 essays by noted writers:Introduction: OCEANS OF DATA by Dan GardnerChapter 1: REFLECTIONS IN A DIGITAL MIRROR by Juan Enriquez, CEO, BiotechnomomyChapter 2: OUR DATA OURSELVES by Kate Green, the EconomistChapter 3: QUANTIFYING MYSELF by AJ Jacobs, EsquireChapter 4: DARK DATA by Marc Goodman, Future Crime InstituteChapter 5: THE SENTIENT SENSOR MESH by Susan Karlin, Fast CompanyChapter 6: TAKING THE PULSE OF THE PLANET by Esther Dyson, EDventureChapter 7: CITIZEN SCIENCE by Gareth Cook, the Boston GlobeChapter 8: A DEMOGRAPH OF ONE by Michael Malone, Forbes magazineChapter 9: THE ART OF DATA by Aaron Koblin, Google Artist in ResidenceChapter 10: DATA DRIVEN by Jonathan Harris, Cowbird The book will also feature stunning info graphics from NIGEL HOLMES.1) GOOGLING GOOGLE: all the ways Google uses Data to help humanity2) DATA IS THE NEW OIL3) THE WORLD ACCORDING TO TWITTER4) AUCTIONING EYEBALLS: The world of Internet advertising5) FACEBOOK: A Billion Friends
Penetration Testing: A Hands-On Introduction to Hacking
Georgia Weidman - 2014
This beginner-friendly book opens with some basics of programming and helps you navigate Kali Linux, an operating system that comes preloaded with useful computer security tools like Wireshark and Metasploit. You'll learn about gathering information on a target, social engineering, capturing network traffic, analyzing vulnerabilities, developing exploits, and more. Hands-on examples discuss even advanced topics like mobile device security and bypassing anti-virus software.
The Implementation (TCP/IP Illustrated, Volume 2)
Gary R. Wright - 1995
"TCP/IP Illustrated, Volume 2" contains a thorough explanation of how TCP/IP protocols are implemented. There isn't a more practical or up-to-date bookothis volume is the only one to cover the de facto standard implementation from the 4.4BSD-Lite release, the foundation for TCP/IP implementations run daily on hundreds of thousands of systems worldwide. Combining 500 illustrations with 15,000 lines of real, working code, "TCP/IP Illustrated, Volume 2" uses a teach-by-example approach to help you master TCP/IP implementation. You will learn about such topics as the relationship between the sockets API and the protocol suite, and the differences between a host implementation and a router. In addition, the book covers the newest features of the 4.4BSD-Lite release, including multicasting, long fat pipe support, window scale, timestamp options, and protection against wrapped sequence numbers, and many other topics. Comprehensive in scope, based on a working standard, and thoroughly illustrated, this book is an indispensable resource for anyone working with TCP/IP.
Linux Bible
Christopher Negus - 2005
Whether you're new to Linux or need a reliable update and reference, this is an excellent resource. Veteran bestselling author Christopher Negus provides a complete tutorial packed with major updates, revisions, and hands-on exercises so that you can confidently start using Linux today. Offers a complete restructure, complete with exercises, to make the book a better learning tool Places a strong focus on the Linux command line tools and can be used with all distributions and versions of Linux Features in-depth coverage of the tools that a power user and a Linux administrator need to get startedThis practical learning tool is ideal for anyone eager to set up a new Linux desktop system at home or curious to learn how to manage Linux server systems at work.
Python Pocket Reference
Mark Lutz - 1998
Hundreds of thousands of Python developers around the world rely on Python for general-purpose tasks, Internet scripting, systems programming, user interfaces, and product customization. Available on all major computing platforms, including commercial versions of Unix, Linux, Windows, and Mac OS X, Python is portable, powerful and remarkable easy to use.With its convenient, quick-reference format, "Python Pocket Reference," 3rd Edition is the perfect on-the-job reference. More importantly, it's now been refreshed to cover the language's latest release, Python 2.4. For experienced Python developers, this book is a compact toolbox that delivers need-to-know information at the flip of a page. This third edition also includes an easy-lookup index to help developers find answers fast!Python 2.4 is more than just optimization and library enhancements; it's also chock full of bug fixes and upgrades. And these changes are addressed in the "Python Pocket Reference," 3rd Edition. New language features, new and upgraded built-ins, and new and upgraded modules and packages--they're all clarified in detail.The "Python Pocket Reference," 3rd Edition serves as the perfect companion to "Learning Python" and "Programming Python."
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Trevor Hastie - 2001
With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.
Graph Databases
Ian Robinson - 2013
With this practical book, you’ll learn how to design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains. Whether you want to speed up your response to user queries or build a database that can adapt as your business evolves, this book shows you how to apply the schema-free graph model to real-world problems.Learn how different organizations are using graph databases to outperform their competitors. With this book’s data modeling, query, and code examples, you’ll quickly be able to implement your own solution.Model data with the Cypher query language and property graph modelLearn best practices and common pitfalls when modeling with graphsPlan and implement a graph database solution in test-driven fashionExplore real-world examples to learn how and why organizations use a graph databaseUnderstand common patterns and components of graph database architectureUse analytical techniques and algorithms to mine graph database information
Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference
Cameron Davidson-Pilon - 2014
However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice-freeing you to get results using computing power.
Bayesian Methods for Hackers
illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You'll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you've mastered these techniques, you'll constantly turn to this guide for the working PyMC code you need to jumpstart future projects. Coverage includes - Learning the Bayesian "state of mind" and its practical implications - Understanding how computers perform Bayesian inference - Using the PyMC Python library to program Bayesian analyses - Building and debugging models with PyMC - Testing your model's "goodness of fit" - Opening the "black box" of the Markov Chain Monte Carlo algorithm to see how and why it works - Leveraging the power of the "Law of Large Numbers" - Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning - Using loss functions to measure an estimate's weaknesses based on your goals and desired outcomes - Selecting appropriate priors and understanding how their influence changes with dataset size - Overcoming the "exploration versus exploitation" dilemma: deciding when "pretty good" is good enough - Using Bayesian inference to improve A/B testing - Solving data science problems when only small amounts of data are available Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.
The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws
Dafydd Stuttard - 2007
The authors explain each category of vulnerability using real-world examples, screen shots and code extracts. The book is extremely practical in focus, and describes in detail the steps involved in detecting and exploiting each kind of security weakness found within a variety of applications such as online banking, e-commerce and other web applications. The topics covered include bypassing login mechanisms, injecting code, exploiting logic flaws and compromising other users. Because every web application is different, attacking them entails bringing to bear various general principles, techniques and experience in an imaginative way. The most successful hackers go beyond this, and find ways to automate their bespoke attacks. This handbook describes a proven methodology that combines the virtues of human intelligence and computerized brute force, often with devastating results.The authors are professional penetration testers who have been involved in web application security for nearly a decade. They have presented training courses at the Black Hat security conferences throughout the world. Under the alias "PortSwigger," Dafydd developed the popular Burp Suite of web application hack tools.
Head First Data Analysis: A Learner's Guide to Big Numbers, Statistics, and Good Decisions
Michael G. Milton - 2009
If your job requires you to manage and analyze all kinds of data, turn to Head First Data Analysis, where you'll quickly learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. Whether you're a product developer researching the market viability of a new product or service, a marketing manager gauging or predicting the effectiveness of a campaign, a salesperson who needs data to support product presentations, or a lone entrepreneur responsible for all of these data-intensive functions and more, the unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool. You'll learn how to:Determine which data sources to use for collecting information Assess data quality and distinguish signal from noise Build basic data models to illuminate patterns, and assimilate new information into the models Cope with ambiguous information Design experiments to test hypotheses and draw conclusions Use segmentation to organize your data within discrete market groups Visualize data distributions to reveal new relationships and persuade others Predict the future with sampling and probability models Clean your data to make it useful Communicate the results of your analysis to your audience Using the latest research in cognitive science and learning theory to craft a multi-sensory learning experience, Head First Data Analysis uses a visually rich format designed for the way your brain works, not a text-heavy approach that puts you to sleep.
Gray Hat Python: Python Programming for Hackers and Reverse Engineers
Justin Seitz - 2008
But until now, there has been no real manual on how to use Python for a variety of hacking tasks. You had to dig through forum posts and man pages, endlessly tweaking your own code to get everything working. Not anymore.Gray Hat Python explains the concepts behind hacking tools and techniques like debuggers, trojans, fuzzers, and emulators. But author Justin Seitz goes beyond theory, showing you how to harness existing Python-based security tools - and how to build your own when the pre-built ones won't cut it.You'll learn how to:Automate tedious reversing and security tasks Design and program your own debugger Learn how to fuzz Windows drivers and create powerful fuzzers from scratch Have fun with code and library injection, soft and hard hooking techniques, and other software trickery Sniff secure traffic out of an encrypted web browser session Use PyDBG, Immunity Debugger, Sulley, IDAPython, PyEMU, and more The world's best hackers are using Python to do their handiwork. Shouldn't you?
Hadoop: The Definitive Guide
Tom White - 2009
Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!