Book picks similar to
The Data Engineering Cookbook by Andreas Kretz
tech
data-engineering
computer-science
audio_wanted
R for Everyone: Advanced Analytics and Graphics
Jared P. Lander - 2013
R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES - Exploring R, RStudio, and R packages - Using R for math: variable types, vectors, calling functions, and more - Exploiting data structures, including data.frames, matrices, and lists - Creating attractive, intuitive statistical graphics - Writing user-defined functions - Controlling program flow with if, ifelse, and complex checks - Improving program efficiency with group manipulations - Combining and reshaping multiple datasets - Manipulating strings using R's facilities and regular expressions - Creating normal, binomial, and Poisson probability distributions - Programming basic statistics: mean, standard deviation, and t-tests - Building linear, generalized linear, and nonlinear models - Assessing the quality of models and variable selection - Preventing overfitting, using the Elastic Net and Bayesian methods - Analyzing univariate and multivariate time series data - Grouping data via K-means and hierarchical clustering - Preparing reports, slideshows, and web pages with knitr - Building reusable R packages with devtools and Rcpp - Getting involved with the R global community
Data Structures Using C++
D.S. Malik - 2003
D.S. Malik is ideal for a one-semester course focused on data structures. Clearly written with the student in mind, this text focuses on Data Structures and includes advanced topics in C++ such as Linked Lists and the Standard Template Library (STL). This student-friendly text features abundant Programming Examples and extensive use of visual diagrams to reinforce difficult topics. Students will find Dr. Malik's use of complete programming code and clear display of syntax, explanation, and example easy to read and conducive to learning.
Database Internals: A deep-dive into how distributed data systems work
Alex Petrov - 2019
But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.This book examines:Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable log structured storage engines, with differences and use-cases for eachDistributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns, from UDP to reliable consensus protocolsDatabase clusters: Discover how to achieve consistent models for replicated data
R Packages
Hadley Wickham - 2015
This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham’s package development philosophy. In the process, you’ll work with devtools, roxygen, and testthat, a set of R packages that automate common development tasks. Devtools encapsulates best practices that Hadley has learned from years of working with this programming language.
Ideal for developers, data scientists, and programmers with various backgrounds, this book starts you with the basics and shows you how to improve your package writing over time. You’ll learn to focus on what you want your package to do, rather than think about package structure.
Learn about the most useful components of an R package, including vignettes and unit tests
Automate anything you can, taking advantage of the years of development experience embodied in devtools
Get tips on good style, such as organizing functions into files
Streamline your development process with devtools
Learn the best way to submit your package to the Comprehensive R Archive Network (CRAN)
Learn from a well-respected member of the R community who created 30 R packages, including ggplot2, dplyr, and tidyr
What Is Data Science?
Mike Loukides - 2011
Five years ago, in What is Web 2.0, Tim O'Reilly said that "data is the next Intel Inside." But what does that statement mean? Why do we suddenly care about statistics and about data? This report examines the many sides of data science -- the technologies, the companies and the unique skill sets.The web is full of "data-driven apps." Almost any e-commerce application is a data-driven application. There's a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn't really what we mean by "data science." A data application acquires its value from the data itself, and creates more data as a result. It's not just an application with data; it's a data product. Data science enables the creation of data products.
The Book of PoC||GTFO
Manul Laphroaig - 2017
Until now, the journal has only been available online or printed and distributed for free at hacker conferences worldwide.Consistent with the journal's quirky, biblical style, this book comes with all the trimmings: a leatherette cover, ribbon bookmark, bible paper, and gilt-edged pages. The book features more than 80 technical essays from numerous famous hackers, authors of classics like "Reliable Code Execution on a Tamagotchi," "ELFs are Dorky, Elves are Cool," "Burning a Phone," "Forget Not the Humble Timing Attack," and "A Sermon on Hacker Privilege." Twenty-four full-color pages by Ange Albertini illustrate many of the clever tricks described in the text.
We Are Anonymous: Inside the Hacker World of LulzSec, Anonymous, and the Global Cyber Insurgency
Parmy Olson - 2012
WE ARE ANONYMOUS is the first full account of how a loosely assembled group of hackers scattered across the globe formed a new kind of insurgency, seized headlines, and tortured the feds-and the ultimate betrayal that would eventually bring them down. Parmy Olson goes behind the headlines and into the world of Anonymous and LulzSec with unprecedented access, drawing upon hundreds of conversations with the hackers themselves, including exclusive interviews with all six core members of LulzSec. In late 2010, thousands of hacktivists joined a mass digital assault on the websites of VISA, MasterCard, and PayPal to protest their treatment of WikiLeaks. Other targets were wide ranging-the websites of corporations from Sony Entertainment and Fox to the Vatican and the Church of Scientology were hacked, defaced, and embarrassed-and the message was that no one was safe. Thousands of user accounts from pornography websites were released, exposing government employees and military personnel.Although some attacks were perpetrated by masses of users who were rallied on the message boards of 4Chan, many others were masterminded by a small, tight-knit group of hackers who formed a splinter group of Anonymous called LulzSec. The legend of Anonymous and LulzSec grew in the wake of each ambitious hack. But how were they penetrating intricate corporate security systems? Were they anarchists or activists? Teams or lone wolves? A cabal of skilled hackers or a disorganized bunch of kids?WE ARE ANONYMOUS delves deep into the internet's underbelly to tell the incredible full story of the global cyber insurgency movement, and its implications for the future of computer security.
Head First Statistics
Dawn Griffiths - 2008
Whether you're a student, a professional, or just curious about statistical analysis, Head First's brain-friendly formula helps you get a firm grasp of statistics so you can understand key points and actually use them. Learn to present data visually with charts and plots; discover the difference between taking the average with mean, median, and mode, and why it's important; learn how to calculate probability and expectation; and much more.Head First Statistics is ideal for high school and college students taking statistics and satisfies the requirements for passing the College Board's Advanced Placement (AP) Statistics Exam. With this book, you'll:Study the full range of topics covered in first-year statistics Tackle tough statistical concepts using Head First's dynamic, visually rich format proven to stimulate learning and help you retain knowledge Explore real-world scenarios, ranging from casino gambling to prescription drug testing, to bring statistical principles to life Discover how to measure spread, calculate odds through probability, and understand the normal, binomial, geometric, and Poisson distributions Conduct sampling, use correlation and regression, do hypothesis testing, perform chi square analysis, and moreBefore you know it, you'll not only have mastered statistics, you'll also see how they work in the real world. Head First Statistics will help you pass your statistics course, and give you a firm understanding of the subject so you can apply the knowledge throughout your life.
Creating a Data-Driven Organization: Practical Advice from the Trenches
Carl Anderson - 2015
This practical book shows you how true data-drivenness involves processes that require genuine buy-in across your company, from analysts and management to the C-Suite and the board.Through interviews and examples from data scientists and analytics leaders in a variety of industries, author Carl Anderson explains the analytics value chain you need to adopt when building predictive business models—from data collection and analysis to the insights and leadership that drive concrete actions. You’ll learn what works and what doesn’t, and why creating a data-driven culture throughout your organization is essential.
Start from the bottom up: learn how to collect the right data the right way
Hire analysts with the right skills, and organize them into teams
Examine statistical and visualization tools, and fact-based story-telling methods
Collect and analyze data while respecting privacy and ethics
Understand how analysts and their managers can help spur a data-driven culture
Learn the importance of data leadership and C-level positions such as chief data officer and chief analytics officer
Learning JavaScript
Shelley Powers - 2006
JavaScript lets designers add sparkle and life to web pages, while more complex JavaScript has led to the rise of Ajax -- the latest rage in web development that allows developers to create powerful and more responsive applications in the browser window."Learning JavaScript" introduces this powerful scripting language to web designers and developers in easy-to-understand terms. Using the latest examples from modern browser development practices, this book teaches you how to integrate the language with the browser environment, and how to practice proper coding techniques for standards-compliant web sites. By the end of the book, you'll be able to use all of the JavaScript language and many of the object models provided by web browsers, and you'll even be able to create a basic Ajax application.
Silence on the Wire: A Field Guide to Passive Reconnaissance and Indirect Attacks
Michal Zalewski - 2005
Silence on the Wire uncovers these silent attacks so that system administrators can defend against them, as well as better understand and monitor their systems.Silence on the Wire dissects several unique and fascinating security and privacy problems associated with the technologies and protocols used in everyday computing, and shows how to use this knowledge to learn more about others or to better defend systems. By taking an indepth look at modern computing, from hardware on up, the book helps the system administrator to better understand security issues, and to approach networking from a new, more creative perspective. The sys admin can apply this knowledge to network monitoring, policy enforcement, evidence analysis, IDS, honeypots, firewalls, and forensics.
Designing Event-Driven Systems
Ben Stopford - 2018
Many of these patterns are successful by themselves, but as this practical ebook demonstrates, they provide a more holistic and compelling approach when applied together.Author Ben Stopford explains how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems.* Learn why streaming beats request-response based architectures in complex, contemporary use cases* Understand why replayable logs such as Kafka provide a backbone for both service communication and shared datasets* Explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches* Apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”* Build service ecosystems that blend event-driven and request-driven interfaces using a replayable log and Kafka's Streams API* Scale beyond individual teams into larger, department- and company-sized architectures, using event streams as a source of truth
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
Eric Siegel - 2013
Rather than a "how to" for hands-on techies, the book entices lay-readers and experts alike by covering new case studies and the latest state-of-the-art techniques.You have been predicted — by companies, governments, law enforcement, hospitals, and universities. Their computers say, "I knew you were going to do that!" These institutions are seizing upon the power to predict whether you're going to click, buy, lie, or die.Why? For good reason: predicting human behavior combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales.How? Prediction is powered by the world's most potent, booming unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future — lifting a bit of the fog off our hazy view of tomorrow — means pay dirt.In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: -What type of mortgage risk Chase Bank predicted before the recession. -Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves. -Why early retirement decreases life expectancy and vegetarians miss fewer flights. -Five reasons why organizations predict death, including one health insurance company. -How U.S. Bank, European wireless carrier Telenor, and Obama's 2012 campaign calculated the way to most strongly influence each individual. -How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! -How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. -How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. -What's predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia. A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.Predictive analytics transcends human perception. This book's final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can't even be sure has happened afterward — but that can be predicted in advance?Whether you are a consumer of it — or consumed by it — get a handle on the power of Predictive Analytics.
Hacking: The Art of Exploitation
Jon Erickson - 2003
This book explains the technical aspects of hacking, including stack based overflows, heap based overflows, string exploits, return-into-libc, shellcode, and cryptographic attacks on 802.11b.
Designing Data-Intensive Applications
Martin Kleppmann - 2015
Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures