Packt+ | Advance your knowledge in tech

You're reading from The DevOps 2.2 Toolkit Self-Sufficient Docker Clusters

Product type Paperback

Published in Mar 2018

Publisher Packt

ISBN-13 9781788991278

Length 360 pages

Edition 1st Edition

Tools

Docker

Concepts

DevOps

Author (1):

Viktor Farcic

View More author details

Table of Contents (23) Chapters

Title Page

Dedication

Contributor

Packt Upsell

Preface

1. Introduction to Self-Adapting and Self-Healing Systems FREE CHAPTER

2. Choosing a Solution for Metrics Storage and Query

3. Deploying and Configuring Prometheus

4. Scraping Metrics

5. Defining Cluster-Wide Alerts

6. Alerting Humans

7. Alerting the System

The four quadrants of a dynamic and self-sufficient system

8. Self-Healing Applied to Services

9. Self-Adaptation Applied to Services

10. Painting the Big Picture – The Self-Sufficient System Thus Far

11. Instrumenting Services

12. Self-Adaptation Applied to Instrumented Services

13. Setting Up a Production Cluster

14. Self-Healing Applied to Infrastructure

15. Self-Adaptation Applied to Infrastructure

16. Blueprint of a Self-Sufficient System

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Setting up the objectives

We need to define the scope of what we want to accomplish through instrumentation. We'll keep it small by limiting ourselves to a single goal. We'll scale services if their response times are over an upper limit and de-scale them if they're below a lower limit. Any other alert will lead to a notification to Slack. That does not mean that Slack notifications should exist forever. Instead, they should be treated as a temporary solution until we find a way to translate manual corrective actions into automated responses performed by the system.

A good example of alerts that are often treated manually are responses with errors (status codes 500 and above). We'll send alerts whenever they reach a threshold over a specified period. They will result in Slack notifications that will become pending tasks for humans. An internal rule should be to fix the problem first, evaluate why it happened, and write a script that will repeat the same set of steps. With such a script, we...