CloudPro #102

ReplicaSet ≠ High Availability (Until You Test This)

30 second summary of today's CloudPro for you:

Running your app in Kubernetes doesn’t automatically make it highly available. This walkthrough shows how ReplicaSets handle pod failures, node loss, and unhealthy containers, and what really happens behind the scenes when things go wrong. Adapted from The Kubernetes Bible.

> 8-minute read

> Hands-on commands included

> Bonus at the end for readers like you

Cheers,

Shreyans Singh

Editor-in-Chief

Share This Article!

replicaset-high-availability-until-you-test-this-img-0

The Problem: One Dead Pod, and Your App Stalls

Let’s say you’ve got a stateless NGINX app deployed in a multi-node Kubernetes cluster using a ReplicaSet. You think you’re covered because there are 4 replicas. But then you:

delete a pod manually
drain one of the nodes
simulate a container failure

In all three cases, you’re expecting automatic recovery. But it’s not magic. It's ReplicaSet (and sometimes liveness probes) doing the heavy lifting.

Let’s walk through all three failure modes and see what Kubernetes does.

Pod Deletion? No Problem.

This scenario demonstrates how a ReplicaSet restores deleted pods to maintain the desired number of replicas.

Here's a step-by-step walkthrough:

1. Define the ReplicaSet manifest: Save the following YAML as nginx-replicaset-example.yaml:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
 name: nginx-replicaset-example
 namespace: rs-ns
spec:
 replicas: 4
 selector:
 matchLabels:
 app: nginx
 environment: test
 template:
 metadata:
 labels:
 app: nginx
 environment: test
 spec:
 containers:
 - name: nginx
 image: nginx:1.17
 ports:
 - containerPort: 80

2. Create the namespace: This ensures all your resources are scoped properly.

kubectl create -f ns-rs.yaml

3. Deploy the ReplicaSet: The manifest defines a ReplicaSet with 4 NGINX pods.

kubectl apply -f nginx-replicaset-example.yaml

4.Delete a pod manually: Simulate a pod failure by deleting one of the running pods.

kubectl delete pod <pod-name> -n rs-ns

5.Verify that the ReplicaSet restores the pod: The controller detects the change and automatically spins up a new pod to maintain the desired count.

kubectl get pods -n rs-ns
kubectl describe rs/nginx-replicaset-example -n rs-ns

Within seconds, the ReplicaSet controller notices the missing pod and recreates it to meet the declared replica count.

Takeaway:ReplicaSets automatically maintain the number of desired pods, making recovery from manual deletions fast and hands-free.

2. Node Failure? Here's What Actually Happens

This scenario demonstrates how ReplicaSets maintain high availability when a node goes down by rescheduling pods onto available nodes:

Here's a step-by-step walkthrough:

1. Expose your app with a Service:

kubectl apply -f nginx-service.yaml

This creates a service to access your app across pods.

2. Forward traffic from your local machine to the Kubernetes Service:

kubectl port-forward svc/nginx-service 8080:80 -n rs-ns
curl localhost:8080

This confirms your service is working and traffic is flowing to the pods.

3. Check where the pods are currently running:

kubectl get pods -n rs-ns -o wide

This shows which node each pod is scheduled on.

4. Simulate node failure by cordoning and draining the node:

kubectl cordon kind-worker

Prevents new pods from being scheduled on this node.

kubectl drain kind-worker --ignore-daemonsets

Evicts all running pods from the node while ignoring daemonsets.

kubectl delete node kind-worker

Removes the node from the cluster to simulate a full node failure.

Within moments, the ReplicaSet detects the missing pods and spins up new ones on the remaining healthy nodes. Your Service automatically reroutes traffic to these new pods.

5. Verify that everything is still working:

kubectl get pods -n rs-ns -o wide
curl localhost:8080

You’ll see that traffic still flows, and the app remains accessible without downtime.

Takeaway:

The ReplicaSet ensures that the desired number of pod replicas is always maintained, even when a node goes offline. It handles pod rescheduling automatically, as long as there's sufficient capacity in your cluster.

3. Unhealthy Container? Probes Save the Day

Let’s see how Kubernetes handles an unhealthy container using liveness probes.

Here's a step-by-step walkthrough:

1. Add the following liveness probe to your ReplicaSet pod spec. It instructs the kubelet to check container health after 2 seconds and repeat every 2 seconds:

livenessProbe:
 httpGet:
 path: /
 port: 80
 initialDelaySeconds: 2
 periodSeconds: 2

Apply your updated ReplicaSet manifest and wait for the pod to be up and running.
Simulate a container failure by deleting the default NGINX index file:

kubectl exec -it <pod-name> -- rm /usr/share/nginx/html/index.html

Check what happens by describing the pod:

kubectl describe pod <pod-name>

You’ll see Liveness probe failed events, followed by automatic container restarts.

Takeaway:

The kubelet, not the ReplicaSet, manages container health. But when used with ReplicaSets, probes help create a resilient system that self-heals when a container goes bad.

Cleanup

You can delete the ReplicaSet and its pods:

kubectl delete rs/nginx-replicaset-livenessprobe-example

Or just delete the controller, leaving pods untouched:

kubectl delete rs/nginx-replicaset-livenessprobe-example --cascade=orphan

Key Takeaways

ReplicaSets guarantee pod replication and replacement—not health checking
Liveness probes enable kubelet to restart broken containers
Node failure recovery works if your cluster has enough capacity and replicas are spread
HA = ReplicaSets + Probes + Services, working in tandem

👋This walkthrough was adapted from just one chapter of The Kubernetes Bible, Second Edition: a 720-page, hands-on guide to mastering Kubernetes across cloud and on-prem environments.

If you’re tackling real production workloads or preparing for certs like CKA/CKAD/CKS, the book dives deeper into everything from ReplicaSets and Deployments to StatefulSets, autoscaling, Helm, traffic routing, and advanced security practices.

For the next 72 hours, CloudPro readers get 30% off the ebook and 20% off print.

Order Now

Sponsored:

Curious how AI is changing secure coding? Join Sonya Moisset from Snyk on Aug 28 to explore real-world strategies for protecting your AI-driven SDLC and earn a CPE credit while you're at it. Register now.

Want faster builds and better mobile apps? Learn proven CI/CD tips from Bitrise and Embrace experts to speed up development and ship higher-quality apps. Register here.

📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.

If you have any comments or feedback, just reply back to this email.

Thanks for reading and have a great day!