Book picks similar to
Data Pipelines with Apache Airflow by Bas P. Harenslak
data
data-engineering
data-eng
software-engineering
Problem Solving with Algorithms and Data Structures Using Python
Bradley N. Miller - 2005
It is also about Python. However, there is much more. The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence. This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving. We cover abstract data types and data structures, writing algorithms, and solving problems. We look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.
Learning Perl
Randal L. Schwartz - 1993
Written by three prominent members of the Perl community who each have several years of experience teaching Perl around the world, this edition has been updated to account for all the recent changes to the language up to Perl 5.8.Perl is the language for people who want to get work done. It started as a tool for Unix system administrators who needed something powerful for small tasks. Since then, Perl has blossomed into a full-featured programming language used for web programming, database manipulation, XML processing, and system administration--on practically all platforms--while remaining the favorite tool for the small daily tasks it was designed for. You might start using Perl because you need it, but you'll continue to use it because you love it.Informed by their years of success at teaching Perl as consultants, the authors have re-engineered the Llama to better match the pace and scope appropriate for readers getting started with Perl, while retaining the detailed discussion, thorough examples, and eclectic wit for which the Llama is famous.The book includes new exercises and solutions so you can practice what you've learned while it's still fresh in your mind. Here are just some of the topics covered:Perl variable typessubroutinesfile operationsregular expressionstext processingstrings and sortingprocess managementusing third party modulesIf you ask Perl programmers today what book they relied on most when they were learning Perl, you'll find that an overwhelming majority will point to the Llama. With good reason. Other books may teach you to program in Perl, but this book will turn you into a Perl programmer.
Introduction to Computation and Programming Using Python
John V. Guttag - 2013
It provides students with skills that will enable them to make productive use of computational techniques, including some of the tools and techniques of "data science" for using computation to model and interpret data. The book is based on an MIT course (which became the most popular course offered through MIT's OpenCourseWare) and was developed for use not only in a conventional classroom but in in a massive open online course (or MOOC) offered by the pioneering MIT--Harvard collaboration edX.Students are introduced to Python and the basics of programming in the context of such computational concepts and techniques as exhaustive enumeration, bisection search, and efficient approximation algorithms. The book does not require knowledge of mathematics beyond high school algebra, but does assume that readers are comfortable with rigorous thinking and not intimidated by mathematical concepts. Although it covers such traditional topics as computational complexity and simple algorithms, the book focuses on a wide range of topics not found in most introductory texts, including information visualization, simulations to model randomness, computational techniques to understand data, and statistical techniques that inform (and misinform) as well as two related but relatively advanced topics: optimization problems and dynamic programming.Introduction to Computation and Programming Using Python can serve as a stepping-stone to more advanced computer science courses, or as a basic grounding in computational problem solving for students in other disciplines.
Confident Data Skills: Master the Fundamentals of Working with Data and Supercharge Your Career
Kirill Eremenko - 2018
From entertainment to politics, from technology to advertising and from science to the business world, understanding and using data is now one of the most transferable and transferable skills out there. Learning how to work with data may seem intimidating or difficult but with
Confident Data Skills
you will be able to master the fundamentals and supercharge your professional abilities. This essential book covers data mining, preparing data, analysing data, communicating data, financial modelling, visualizing insights and presenting data through film making and dynamic simulations.In-depth international case studies from a wide range of organizations, including Netflix, LinkedIn, Goodreads, Deep Blue, Alpha Go and Mike's Hard Lemonade Co. show successful data techniques in practice and inspire you to turn knowledge into innovation.
Confident Data Skills
also provides insightful guidance on how you can use data skills to enhance your employability and improve how your industry or company works through your data skills. Expert author and instructor, Kirill Eremenko, is committed to making the complex simple and inspiring you to have the confidence to develop an understanding, adeptness and love of data.
Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
George Gilder - 2018
Gilder says or writes is ever delivered at anything less than the fullest philosophical decibel... Mr. Gilder sounds less like a tech guru than a poet, and his words tumble out in a romantic cascade." “Google’s algorithms assume the world’s future is nothing more than the next moment in a random process. George Gilder shows how deep this assumption goes, what motivates people to make it, and why it’s wrong: the future depends on human action.” — Peter Thiel, founder of PayPal and Palantir Technologies and author of Zero to One: Notes on Startups, or How to Build the Future The Age of Google, built on big data and machine intelligence, has been an awesome era. But it’s coming to an end. In Life after Google, George Gilder—the peerless visionary of technology and culture—explains why Silicon Valley is suffering a nervous breakdown and what to expect as the post-Google age dawns. Google’s astonishing ability to “search and sort” attracts the entire world to its search engine and countless other goodies—videos, maps, email, calendars….And everything it offers is free, or so it seems. Instead of paying directly, users submit to advertising. The system of “aggregate and advertise” works—for a while—if you control an empire of data centers, but a market without prices strangles entrepreneurship and turns the Internet into a wasteland of ads. The crisis is not just economic. Even as advances in artificial intelligence induce delusions of omnipotence and transcendence, Silicon Valley has pretty much given up on security. The Internet firewalls supposedly protecting all those passwords and personal information have proved hopelessly permeable. The crisis cannot be solved within the current computer and network architecture. The future lies with the “cryptocosm”—the new architecture of the blockchain and its derivatives. Enabling cryptocurrencies such as bitcoin and ether, NEO and Hashgraph, it will provide the Internet a secure global payments system, ending the aggregate-and-advertise Age of Google. Silicon Valley, long dominated by a few giants, faces a “great unbundling,” which will disperse computer power and commerce and transform the economy and the Internet. Life after Google is almost here. For fans of "Wealth and Poverty," "Knowledge and Power," and "The Scandal of Money."
Data Modeling Essentials
Graeme Simsion - 1992
In order to enable students to apply the basics of data modeling to real models, the book addresses the realities of developing systems in real-world situations by assessing the merits of a variety of possible solutions as well as using language and diagramming methods that represent industry practice.This revised edition has been given significantly expanded coverage and reorganized for greater reader comprehension even as it retains its distinctive hallmarks of readability and usefulness. Beginning with the basics, the book provides a thorough grounding in theory before guiding the reader through the various stages of applied data modeling and database design. Later chapters address advanced subjects, including business rules, data warehousing, enterprise-wide modeling and data management. It includes an entirely new section discussing the development of logical and physical modeling, along with new material describing a powerful technique for model verification. It also provides an excellent resource for additional lectures and exercises.This text is the ideal reference for data modelers, data architects, database designers, DBAs, and systems analysts, as well as undergraduate and graduate-level students looking for a real-world perspective.
Machine Learning for Hackers
Drew Conway - 2012
Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyze sample datasets and write simple machine learning algorithms. "Machine Learning for Hackers" is ideal for programmers from any background, including business, government, and academic research.Develop a naive Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a "whom to follow" recommendation system from Twitter data
Exploring CQRS and Event Sourcing
Dominic Betts - 2012
It presents a learning journey, not definitive guidance. It describes the experiences of a development team with no prior CQRS proficiency in building, deploying (to Windows Azure), and maintaining a sample real-world, complex, enterprise system to showcase various CQRS and ES concepts, challenges, and techniques.The development team did not work in isolation; we actively sought input from industry experts and from a wide group of advisors to ensure that the guidance is both detailed and practical.The CQRS pattern and event sourcing are not mere simplistic solutions to the problems associated with large-scale, distributed systems. By providing you with both a working application and written guidance, we expect you’ll be well prepared to embark on your own CQRS journey.
Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation
Jez Humble - 2010
This groundbreaking new book sets out the principles and technical practices that enable rapid, incremental delivery of high quality, valuable new functionality to users. Through automation of the build, deployment, and testing process, and improved collaboration between developers, testers, and operations, delivery teams can get changes released in a matter of hours-- sometimes even minutes-no matter what the size of a project or the complexity of its code base. Jez Humble and David Farley begin by presenting the foundations of a rapid, reliable, low-risk delivery process. Next, they introduce the "deployment pipeline," an automated process for managing all changes, from check-in to release. Finally, they discuss the "ecosystem" needed to support continuous delivery, from infrastructure, data and configuration management to governance. The authors introduce state-of-the-art techniques, including automated infrastructure management and data migration, and the use of virtualization. For each, they review key issues, identify best practices, and demonstrate how to mitigate risks. Coverage includes - Automating all facets of building, integrating, testing, and deploying software - Implementing deployment pipelines at team and organizational levels - Improving collaboration between developers, testers, and operations - Developing features incrementally on large and distributed teams - Implementing an effective configuration management strategy - Automating acceptance testing, from analysis to implementation - Testing capacity and other non-functional requirements - Implementing continuous deployment and zero-downtime releases - Managing infrastructure, data, components and dependencies - Navigating risk management, compliance, and auditing Whether you're a developer, systems administrator, tester, or manager, this book will help your organization move from idea to release faster than ever--so you can deliver value to your business rapidly and reliably.
Learning OpenCV: Computer Vision with the OpenCV Library
Gary Bradski - 2008
Freeman, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of TechnologyLearning OpenCV puts you in the middle of the rapidly expanding field of computer vision. Written by the creators of the free open source OpenCV library, this book introduces you to computer vision and demonstrates how you can quickly build applications that enable computers to "see" and make decisions based on that data. Computer vision is everywhere-in security systems, manufacturing inspection systems, medical image analysis, Unmanned Aerial Vehicles, and more. It stitches Google maps and Google Earth together, checks the pixels on LCD screens, and makes sure the stitches in your shirt are sewn properly. OpenCV provides an easy-to-use computer vision framework and a comprehensive library with more than 500 functions that can run vision code in real time.Learning OpenCV will teach any developer or hobbyist to use the framework quickly with the help of hands-on exercises in each chapter. This book includes:A thorough introduction to OpenCV Getting input from cameras Transforming images Segmenting images and shape matching Pattern recognition, including face detection Tracking and motion in 2 and 3 dimensions 3D reconstruction from stereo vision Machine learning algorithms Getting machines to see is a challenging but entertaining goal. Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book you need to get started.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
Eric Siegel - 2013
Rather than a "how to" for hands-on techies, the book entices lay-readers and experts alike by covering new case studies and the latest state-of-the-art techniques.You have been predicted — by companies, governments, law enforcement, hospitals, and universities. Their computers say, "I knew you were going to do that!" These institutions are seizing upon the power to predict whether you're going to click, buy, lie, or die.Why? For good reason: predicting human behavior combats financial risk, fortifies healthcare, conquers spam, toughens crime fighting, and boosts sales.How? Prediction is powered by the world's most potent, booming unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn.Predictive analytics unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future — lifting a bit of the fog off our hazy view of tomorrow — means pay dirt.In this rich, entertaining primer, former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: -What type of mortgage risk Chase Bank predicted before the recession. -Predicting which people will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves. -Why early retirement decreases life expectancy and vegetarians miss fewer flights. -Five reasons why organizations predict death, including one health insurance company. -How U.S. Bank, European wireless carrier Telenor, and Obama's 2012 campaign calculated the way to most strongly influence each individual. -How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! -How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. -How judges and parole boards rely on crime-predicting computers to decide who stays in prison and who goes free. -What's predicted by the BBC, Citibank, ConEd, Facebook, Ford, Google, IBM, the IRS, Match.com, MTV, Netflix, Pandora, PayPal, Pfizer, and Wikipedia. A truly omnipresent science, predictive analytics affects everyone, every day. Although largely unseen, it drives millions of decisions, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate.Predictive analytics transcends human perception. This book's final chapter answers the riddle: What often happens to you that cannot be witnessed, and that you can't even be sure has happened afterward — but that can be predicted in advance?Whether you are a consumer of it — or consumed by it — get a handle on the power of Predictive Analytics.
Machine Learning: A Visual Starter Course For Beginner's
Oliver Theobald - 2017
If you have ever found yourself lost halfway through other introductory materials on this topic, this is the book for you. If you don't understand set terminology such as vectors, hyperplanes, and centroids, then this is also the book for you. This starter course isn't a picture story book but does include many visual examples that break algorithms down into a digestible and practical format. As a starter course, this book connects the dots and offers the crash course I wish I had when I first started. The kind of guide I wish had before I started taking on introductory courses that presume you’re two days away from an advanced mathematics exam. That’s why this introductory course doesn’t go further on the subject than other introductory books, but rather, goes a step back. A half-step back in order to help everyone make his or her first strides in machine learning and is an ideal study companion for the visual learner. In this step-by-step guide you will learn: - How to download free datasets - What tools and software packages you need - Data scrubbing techniques, including one-hot encoding, binning and dealing with missing data - Preparing data for analysis, including k-fold Validation - Regression analysis to create trend lines - Clustering, including k-means and k-nearest Neighbors - Naive Bayes Classifier to predict new classes - Anomaly detection and SVM algorithms to combat anomalies and outliers - The basics of Neural Networks - Bias/Variance to improve your machine learning model - Decision Trees to decode classification
Please feel welcome to join this starter course by buying a copy, or sending a free sample to your preferred device.
Peopleware: Productive Projects and Teams
Tom DeMarco - 1987
The answers aren't easy -- just incredibly successful.
RESTful Web APIs
Leonard Richardson - 2013
With this practical guide, you’ll learn what it takes to design usable REST APIs that evolve over time. By focusing on solutions that cross a variety of domains, this book shows you how to create powerful and secure applications, using the tools designed for the world’s most successful distributed computing system: the World Wide Web.You’ll explore the concepts behind REST, learn different strategies for creating hypermedia-based APIs, and then put everything together with a step-by-step guide to designing a RESTful Web API.Examine API design strategies, including the collection pattern and pure hypermediaUnderstand how hypermedia ties representations together into a coherent APIDiscover how XMDP and ALPS profile formats can help you meet the Web API "semantic challenge"Learn close to two-dozen standardized hypermedia data formatsApply best practices for using HTTP in API implementationsCreate Web APIs with the JSON-LD standard and other the Linked Data approachesUnderstand the CoAP protocol for using REST in embedded systems
Designing Event-Driven Systems
Ben Stopford - 2018
Many of these patterns are successful by themselves, but as this practical ebook demonstrates, they provide a more holistic and compelling approach when applied together.Author Ben Stopford explains how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems.* Learn why streaming beats request-response based architectures in complex, contemporary use cases* Understand why replayable logs such as Kafka provide a backbone for both service communication and shared datasets* Explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches* Apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”* Build service ecosystems that blend event-driven and request-driven interfaces using a replayable log and Kafka's Streams API* Scale beyond individual teams into larger, department- and company-sized architectures, using event streams as a source of truth