Chapter 15. Self-Adaptation Applied to Infrastructure
Our goal is within reach. We adopted schedulers (Docker Swarm in this case) that provide self-healing applied to services. We saw how Docker For AWS accomplishes a similar goal but on the infrastructure level. We used Prometheus, Alertmanager, and Jenkins to build a system that automatically adapts services to ever-changing conditions. The metrics we're storing in Prometheus are a combination of those gathered through exporters and those we added to our services through instrumentation. The only thing we're missing is self-adaptation applied to infrastructure. If we manage to build it, we'll close the circle and witness a self-sufficient system capable of running without (almost) any human intervention.
The logic behind self-adaptation applied to infrastructure is not much different from the one we used with services. We need metrics, alerts, and scripts that will adapt cluster capacity whenever conditions change.
We already have all the...