Find a book to read

Book picks similar to
Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

devops

sre

tech

Prometheus: Up & Running: Infrastructure and Application Performance Monitoring

devops

tech

computing

Brian Brazil - 2018

Get up to speed with Prometheus, the metrics-based monitoring system used by tens of thousands of organizations in production.

This practical guide provides application developers, sysadmins, and DevOps practitioners with a hands-on introduction to the most important aspects of Prometheus, including dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.This open source system has gained popularity over the past few years for good reason. With its simple yet powerful data model and query language, Prometheus does one thing, and it does it well. Author and Prometheus developer Brian Brazil guides you through Prometheus setup, the Node exporter, and the Alertmanager, then demonstrates how to use them for application and infrastructure monitoring.Know where and how much to apply instrumentation to your application codeIdentify metrics with labels using unique key-value pairsGet an introduction to Grafana, a popular tool for building dashboardsLearn how to use the Node Exporter to monitor your infrastructureUse service discovery to provide different views of your machines and servicesUse Prometheus with Kubernetes and examine exporters you can use with containersConvert data from other monitoring systems into the Prometheus format

Terraform: Up & Running: Writing Infrastructure as Code

Yevgeniy Brikman - 2019

Terraform has become a key player in the DevOps world for defining, launching, and managing infrastructure as code (IaC) across a variety of cloud and virtualization platforms, including AWS, Google Cloud, Azure, and more.

This hands-on second edition, expanded and thoroughly updated for Terraform version 0.12 and beyond, shows you the fastest way to get up and running.Gruntwork cofounder Yevgeniy (Jim) Brikman walks you through code examples that demonstrate Terraform's simple, declarative programming language for deploying and managing infrastructure with a few commands. Veteran sysadmins, DevOps engineers, and novice developers will quickly go from Terraform basics to running a full stack that can support a massive amount of traffic and a large team of developers.Explore changes from Terraform 0.9 through 0.12, including backends, workspaces, and first-class expressionsLearn how to write production-grade Terraform modulesDive into manual and automated testing for Terraform codeCompare Terraform to Chef, Puppet, Ansible, CloudFormation, and Salt StackDeploy server clusters, load balancers, and databasesUse Terraform to manage the state of your infrastructureCreate reusable infrastructure with Terraform modulesUse advanced Terraform syntax to achieve zero-downtime deployment

Practical Monitoring

Mike Julian - 2017

Do you have a nagging feeling that your monitoring could be improved, but you just aren t sure how? This is the book for you.

"Monitoring Monitoring" explains what makes your monitoring less than stellar, and provides a practical approach to designing and implementing a monitoring strategy, from the application down to the hardware in the datacenter and everything in between.In the world of technical operations, monitoring is core to everything you do. In today s changing landscape of microservices, cloud infrastructure, and more, monitoring is experiencing a new surge of growth, bringing along new methodologies, new ways of thinking, and new tools.Complete with a primer on statistics and a monitoring vocabulary, this book helps you identify the main areas you need to monitor and shows you how to approach them. It s ideal for operations engineers, system administrators, system and software engineers, site reliability engineers, network engineers, and other operations professionals."

Infrastructure as Code: Managing Servers in the Cloud

Kief Morris - 2015

Virtualization, cloud, containers, server automation, and software-defined networking are meant to simplify IT operations.

But many organizations adopting these technologies have found that it only leads to a faster-growing sprawl of unmanageable systems. This is where infrastructure as code can help. With this practical guide, author Kief Morris of ThoughtWorks shows you how to effectively use principles, practices, and patterns pioneered through the DevOps movement to manage cloud age infrastructure.Ideal for system administrators, infrastructure engineers, team leads, and architects, this book demonstrates various tools, techniques, and patterns you can use to implement infrastructure as code. In three parts, you'll learn about the platforms and tooling involved in creating and configuring infrastructure elements, patterns for using these tools, and practices for making infrastructure as code work in your environment.Examine the pitfalls that organizations fall into when adopting the new generation of infrastructure technologiesUnderstand the capabilities and service models of dynamic infrastructure platformsLearn about tools that provide, provision, and configure core infrastructure resourcesExplore services and tools for managing a dynamic infrastructureLearn specific patterns and practices for provisioning servers, building server templates, and updating running servers

Site Reliability Engineering: How Google Runs Production Systems

Betsy Beyer - 2016

The overwhelming majority of a software system's lifespan is spent in use, not in design or implementation.

So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?In this collection of essays and articles, key members of Google's Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You'll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient--lessons directly applicable to your organization.This book is divided into four sections: Introduction--Learn what site reliability engineering is and why it differs from conventional IT industry practicesPrinciples--Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)Practices--Understand the theory and practice of an SRE's day-to-day work: building and operating large distributed computing systemsManagement--Explore Google's best practices for training, communication, and meetings that your organization can use

Effective Devops: Building a Culture of Collaboration, Affinity, and Tooling at Scale

devops

tech

work

Jennifer Davis - 2015

This practical guide addresses technical, cultural, and managerial challenges of implementing and maintaining a DevOps culture by describing failures and successes.

Authors Katherine Daniels and Jennifer Davis provide with actionable strategies you can use to engineer sustainable changes in your environment regardless of your level within your organization.

Kubernetes: Up & Running

Kelsey Hightower - 2016

Legend has it that Google deploys over two billion application containers a week.

How's that possible? Google revealed the secret through a project called Kubernetes, an open source cluster orchestrator (based on its internal Borg system) that radically simplifies the task of building, deploying, and maintaining scalable distributed systems in the cloud. This practical guide shows you how Kubernetes and container technology can help you achieve new levels of velocity, agility, reliability, and efficiency.Authors Kelsey Hightower, Brendan Burns, and Joe Beda--who've worked on Kubernetes at Google--explain how this system fits into the lifecycle of a distributed application. You will learn how to use tools and APIs to automate scalable distributed systems, whether it is for online services, machine-learning applications, or a cluster of Raspberry Pi computers.Explore the distributed system challenges that Kubernetes addressesDive into containerized application development, using containers such as DockerCreate and run containers on Kubernetes, using Docker's Image format and container runtimeExplore specialized objects essential for running applications in productionReliably roll out new software versions without downtime or errorsGet examples of how to develop and deploy real-world applications in Kubernetes

Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment

devops

tech

technical

Justin Garrison - 2017

If you're considering the use of cloud native applications and microservices, you also need an infrastructure with the same elasticity and scalability of the applications they're running.

This practical guide shows you how to design and maintain infrastructure capable of managing the full lifecycle of these implementations.Engineers Justin Garrison (Walt Disney Animation Studios) and Kris Nova (Dies, Inc.) reveal hard-earned lessons on architecting infrastructure for massive scale and best in class monitoring, alerting, and troubleshooting. The authors focus on Cloud Native Computing Foundation projects and explain where each is crucial to managing modern applications.Understand the fundamentals of cloud native application design, and how it differs from traditional application designLearn how cloud native infrastructure is different from traditional infrastructureManage application lifecycles running on cloud native infrastructure, using Kubernetes for application deployment, scaling, and upgradesMonitor cloud native infrastructure and applications, using fluentd for logging and prometheus + graphana for visualizing dataDebug running applications and learn how to trace a distributed application and dig deep into a running system with OpenTracing

The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations

Gene Kim - 2015

Increase profitability, elevate work culture, and exceed productivity goals through DevOps practices.More than ever, the effective management of technology is critical for business competitiveness.

For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater whether it's the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace."Table of contentsPrefaceSpreading the Aha! MomentIntroductionPART I: THE THREE WAYS1. Agile, continuous delivery and the three ways2. The First Way: The Principles of Flow3. The Second Way: The Principle of Feedback4. The Third Way: The Principles of Continual LearningPART II: WHERE TO START5. Selecting which value stream to start with6. Understanding the work in our value stream…7. How to design our organization and architecture8. How to get great outcomes by integrating operations into the daily work for developmentPART III: THE FIRST WAY: THE TECHNICAL PRACTICES OF FLOW9. Create the foundations of our deployment pipeline10. Enable fast and reliable automated testing11. Enable and practice continuous integration12. Automate and enable low-risk releases13. Architect for low-risk releasesPART IV: THE SECOND WAY: THE TECHNICAL PRACTICES OF FEEDBACK14*. Create telemetry to enable seeing abd solving problems15. Analyze telemetry to better anticipate problems16. Enable feedbackso development and operation can safely deploy code17. Integrate hypothesis-driven development and A/B testing into our daily work18. Create review and coordination processes to increase quality of our current workPART V: THE THRID WAY: THE TECHNICAL PRACTICES OF CONTINUAL LEARNING19. Enable and inject learning into daily work20. Convert local discoveries into global improvements21. Reserve time to create organizational learning22. Information security as everyone’s job, every day23. Protecting the deployment pipelinePART VI: CONCLUSIONA call to actionConclusion to the DevOps HandbookAPPENDICES1. The convergence of Devops2. The theory of constraints and core chronic conflicts3. Tabular form of downward spiral4. The dangers of handoffs and queues5. Myths of industrial safety6. The Toyota Andon Cord7. COTS Software8. Post-mortem meetings9. The Simian Army10. Transparent uptimeAdditional ResourcesEndnotes

Building Microservices: Designing Fine-Grained Systems

Sam Newman - 2014

Distributed systems have become more fine-grained in the past 10 years, shifting from code-heavy monolithic applications to smaller, self-contained microservices.

But developing these systems brings its own set of headaches. With lots of examples and practical advice, this book takes a holistic view of the topics that system architects and administrators must consider when building, managing, and evolving microservice architectures.Microservice technologies are moving quickly. Author Sam Newman provides you with a firm grounding in the concepts while diving into current solutions for modeling, integrating, testing, deploying, and monitoring your own autonomous services. You'll follow a fictional company throughout the book to learn how building a microservice architecture affects a single domain.Discover how microservices allow you to align your system design with your organization's goalsLearn options for integrating a service with the rest of your systemTake an incremental approach when splitting monolithic codebasesDeploy individual microservices through continuous integrationExamine the complexities of testing and monitoring distributed servicesManage security with user-to-service and service-to-service modelsUnderstand the challenges of scaling microservice architectures

Jenkins: The Definitive Guide

John Ferguson Smart - 2011

Streamline software development with Jenkins, the popular Java-based open source tool that has revolutionized the way teams think about Continuous Integration (CI).

This complete guide shows you how to automate your build, integration, release, and deployment processes with Jenkins—and demonstrates how CI can save you time, money, and many headaches. Ideal for developers, software architects, and project managers, Jenkins: The Definitive Guide is both a CI tutorial and a comprehensive Jenkins reference. Through its wealth of best practices and real-world tips, you'll discover how easy it is to set up a CI service with Jenkins. Learn how to install, configure, and secure your Jenkins server Organize and monitor general-purpose build jobs Integrate automated tests to verify builds, and set up code quality reporting Establish effective team notification strategies and techniques Configure build pipelines, parameterized jobs, matrix builds, and other advanced jobs Manage a farm of Jenkins servers to run distributed builds Implement automated deployment and continuous delivery

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Neha Narkhede - 2017

Every enterprise application creates data, whether it� s log messages, metrics, user activity, outgoing messages, or something else.

And how to move all of this data becomes nearly as important as the data itself. If you� re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you� ll learn Kafka� s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.Understand publish-subscribe messaging and how it fits in the big data ecosystem.Explore Kafka producers and consumers for writing and reading messagesUnderstand Kafka patterns and use-case requirements to ensure reliable data deliveryGet best practices for building data pipelines and applications with KafkaManage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasksLearn the most critical metrics among Kafka� s operational measurementsExplore how Kafka� s stream delivery capabilities make it a perfect source for stream processing systems

Dynamic Reteaming: The Art and Wisdom of Changing Teams

Heidi Helfand - 2019

Dynamic Reteaming shares real stories of how successful software companies have thrived through changing their teams as opposed to keeping them the same.

Team change will happen whether we like it or not. People will come and go from our teams. Our companies might double in size or even get acquired. We can catalyze team change to reduce the risk of attrition, learning and career stagnation and the development of knowledge silos. Dynamic Reteaming describes practices for effective reteaming as well as antipatterns. In this book you'll learn how to integrate new people into an existing team, how to deal with the loss of team members, when to split a team, how to isolate teams for focused innovation, how to rotate team members for knowledge sharing, how to break through organizational stagnation and much more. Learn to apply the five team change patterns: Isolation, One by One, Grow and Split, Merging and Switching. Get practical tips and tricks for managing team change.

Streaming Systems

Tyler Akidau - 2018

Streaming data is a big deal in big data these days.

As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.Expanded from Tyler Akidau's popular blog posts Streaming 101 and Streaming 102, this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.You'll explore:How streaming and batch data processing patterns compareThe core principles and concepts behind robust out-of-order data processingHow watermarks track progress and completeness in infinite datasetsHow exactly-once data processing techniques ensure correctnessHow the concepts of streams and tables form the foundations of both batch and streaming data processingThe practical motivations behind a powerful persistent state mechanism, driven by a real-world exampleHow time-varying relations provide a link between stream processing and the world of SQL and relational algebra

Beyond the Twelve-Factor App Exploring the DNA of Highly Scalable, Resilient Cloud Applications

Kevin Hoffman - 2016

Understanding how to design systems to run in the cloud has never been more important than it is today.

Cloud computing is rapidly transitioning from a niche technology embraced by startups and tech-forward companies to the foundation upon which enterprise systems build their future. In order to compete in today’s marketplace, organizations large and small are embracing cloud architectures and practices.

Book picks similar toSeeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Prometheus: Up & Running: Infrastructure and Application Performance Monitoring

Terraform: Up & Running: Writing Infrastructure as Code

Practical Monitoring

Infrastructure as Code: Managing Servers in the Cloud

Site Reliability Engineering: How Google Runs Production Systems

Effective Devops: Building a Culture of Collaboration, Affinity, and Tooling at Scale

Kubernetes: Up & Running

Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment

The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations

Building Microservices: Designing Fine-Grained Systems

Jenkins: The Definitive Guide

Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Dynamic Reteaming: The Art and Wisdom of Changing Teams

Streaming Systems

Beyond the Twelve-Factor App Exploring the DNA of Highly Scalable, Resilient Cloud Applications

Book picks similar to
Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman