Kubernetes High Availability Architecture: Complete Guide for DevOps Engineers 2025
Table of Contents
TL;DR + Key Takeaways
What You’ll Learn & Do:
- How to design a 3-master HA control plane with etcd quorum best practices
- kubeadm HA lab: exact commands to bootstrap a self-managed HA cluster
- Checklist & emergency recovery commands for on-call incidents
Estimated Read Time: 12-15 minutes | Difficulty: Intermediate to Advanced
Introduction: Kubernetes High Availability Architecture
Picture this: It’s 3 AM, and your production Kubernetes cluster just lost its single master node due to a hardware failure. Your entire application stack is down, customers can’t access services, and you’re frantically trying to restore from backups while calculating the revenue impact of each passing minute.
Now imagine the same scenario, but with a properly configured High Availability Kubernetes cluster. One master node fails, but the other two seamlessly take over. Your applications continue running, users remain unaffected, and you can address the failed node during normal business hours.
This is why Kubernetes High Availability architecture isn’t just a nice-to-have feature—it’s absolutely critical for any production environment where downtime translates to lost revenue, damaged reputation, and sleepless nights for your engineering team.
In this comprehensive guide, we’ll dive deep into everything you need to know about designing, implementing, and maintaining highly available Kubernetes clusters that can withstand component failures while keeping your applications running smoothly.
What Does High Availability Mean in Kubernetes?
High Availability in Kubernetes refers to the architectural approach of eliminating single points of failure across both the control plane and data plane components. When properly implemented, your cluster can continue operating normally even when individual nodes, components, or entire availability zones experience failures.
Let’s illustrate this with a concrete example:
Single Master Scenario (Non-HA):

When the master node fails, your entire cluster becomes unmanageable. You can’t deploy new pods, scale applications, or perform any cluster operations, even though worker nodes might still be running existing workloads.
Multi-Master HA Scenario:

In this setup, if Master1 fails, the load balancer redirects traffic to Master2 and Master3, ensuring continuous cluster operation.
The key difference is resilience through redundancy. Instead of hoping nothing breaks, we design systems that gracefully handle failures as a normal part of operations.
Kubernetes HA Architecture Overview
Understanding the complete picture of Kubernetes HA architecture requires examining both the control plane and worker node components, along with how they interact to provide seamless failover capabilities.
Control Plane in HA Mode
The Kubernetes control plane consists of several critical components that must be made highly available:
API Server: Multiple API server instances run simultaneously behind a load balancer. Each instance is stateless, making horizontal scaling straightforward.
etcd Cluster: The distributed key-value store runs in cluster mode with multiple nodes to ensure data consistency and availability.
Controller Manager: Runs in active-passive mode with leader election. Only one instance is active at a time, but others stand ready to take over.
Scheduler: Also uses leader election for active-passive operation, ensuring only one scheduler makes pod placement decisions at any given time.
Worker Nodes in HA Setup
Worker nodes contribute to overall cluster availability through:
Multi-Zone Distribution: Spreading worker nodes across multiple availability zones prevents single zone failures from taking down your entire application stack.
Node Redundancy: Running enough worker nodes to handle the failure of several nodes without capacity issues.
Pod Replicas: Applications deployed with multiple replicas across different nodes ensure service continuity during node failures.
Complete HA Architecture Diagram

This architecture provides multiple layers of redundancy:
- Zone-level redundancy: Components spread across availability zones
- Node-level redundancy: Multiple nodes of each type
- Component-level redundancy: Multiple instances of critical services
- Network-level redundancy: Load balancers provide traffic distribution and failover
Key Components in HA Setup
Multiple API Servers Behind Load Balancer
The API server is the gateway to your Kubernetes cluster, handling all REST API requests from kubectl, kubelet, and other components. In an HA setup:
Load Balancer Configuration:
# Example HAProxy configuration for API servers
global
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660
defaults
mode http
timeout connect 5s
timeout client 30s
timeout server 30s
frontend k8s-api-frontend
bind *:6443
mode tcp
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
server master1 10.0.1.10:6443 check
server master2 10.0.1.11:6443 check
server master3 10.0.1.12:6443 check
Health Check Configuration: The load balancer continuously monitors API server health using the /livez and /readyz endpoints, automatically removing failed instances from rotation.
etcd Cluster Configuration
etcd is perhaps the most critical component requiring careful HA design. It stores all cluster state, including:
- Pod specifications and status
- Service definitions
- ConfigMaps and Secrets
- RBAC policies
Quorum Requirements: etcd uses the Raft consensus algorithm, which requires a majority of nodes to be available for the cluster to function. This is why you need an odd number of etcd nodes:
- 3 nodes: Can tolerate 1 failure (2 nodes = majority)
- 5 nodes: Can tolerate 2 failures (3 nodes = majority)
- 7 nodes: Can tolerate 3 failures (4 nodes = majority)
etcd Cluster Bootstrap Example:
# Node 1 (10.0.1.10)
etcd --name=etcd-1 \
--data-dir=/var/lib/etcd \
--initial-advertise-peer-urls=https://10.0.1.10:2380 \
--listen-peer-urls=https://10.0.1.10:2380 \
--advertise-client-urls=https://10.0.1.10:2379 \
--listen-client-urls=https://10.0.1.10:2379,https://127.0.0.1:2379 \
--initial-cluster=etcd-1=https://10.0.1.10:2380,etcd-2=https://10.0.1.11:2380,etcd-3=https://10.0.1.12:2380 \
--initial-cluster-state=new \
--initial-cluster-token=k8s-etcd-cluster
Controller Manager & Scheduler Redundancy
Unlike API servers, the controller manager and scheduler use leader election to ensure only one active instance at a time, preventing conflicting operations:
Leader Election in Practice:
# Controller Manager configuration with leader election
apiVersion: v1
kind: Pod
metadata:
name: kube-controller-manager
spec:
containers:
- command:
- kube-controller-manager
- --leader-elect=true
- --leader-elect-lease-duration=15s
- --leader-elect-renew-deadline=10s
- --leader-elect-retry-period=2s
name: kube-controller-manager
When the active controller manager fails, another instance automatically becomes the leader within seconds, ensuring continuous cluster operations.
Worker Node Scaling and Pod Replicas
Worker nodes in HA clusters should be:
Distributed Across Zones: Prevents single availability zone failures from impacting application availability.
Sized for N+1 Redundancy: If you need capacity for 100 pods, design for 150 pods across multiple nodes so you can lose several nodes without capacity issues.
Example Deployment with Anti-Affinity:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
containers:
- name: web-app
image: nginx:1.21
This configuration ensures pod replicas are distributed across different nodes, preventing single node failures from taking down the entire application.
HA Deployment Models
When designing your Kubernetes HA architecture, you’ll need to choose between two primary etcd deployment topologies, each with distinct advantages and trade-offs.
Stacked etcd Topology
In the stacked topology, etcd runs as a static pod on the same nodes as other control plane components.
Architecture:

Pros:
- Simplified deployment: Fewer nodes to manage and configure
- Lower infrastructure costs: Uses existing control plane nodes
- Easier networking: No additional network configuration required
- Built-in co-location: Control plane and etcd failures are coupled, making troubleshooting more straightforward
Cons:
- Coupled failures: If a control plane node fails, you lose both API server capacity and etcd capacity
- Resource contention: etcd competes with other control plane components for CPU and memory
- Less isolation: etcd performance can be impacted by other control plane workloads
Kubernetes official HA documentation
External etcd Topology
In the external topology, etcd runs on dedicated nodes separate from the control plane.
Architecture:

Pros:
- Better isolation: etcd gets dedicated resources and doesn’t compete with other components
- Independent scaling: Can scale etcd cluster independently of control plane
- Decoupled failures: Control plane node failure doesn’t affect etcd capacity
- Performance optimization: Can tune etcd nodes specifically for database workloads
Cons:
- Higher complexity: More nodes to manage and configure
- Increased costs: Requires additional infrastructure
- Network overhead: Additional network hops between control plane and etcd
- More failure modes: Additional network and node failure scenarios to handle
Cost Comparison: Single vs HA Architecture
| Configuration | Infrastructure | Monthly Cost* | Downtime Risk | ROI After First Outage |
|---|---|---|---|---|
| Single Master | 1 master + 3 workers | $480/month | 45min = $50K loss | N/A |
| Stacked HA | 3 masters + 3 workers | $960/month | <30sec = $0 loss | 10,400% ROI |
| External etcd HA | 3 masters + 3 etcd + 3 workers | $1,440/month | <30sec = $0 loss | 5,200% ROI |
*Based on AWS m5.large instances ($0.096/hr), assumes $50K/hr revenue impact
Comparison Table
| Aspect | Stacked etcd | External etcd |
|---|---|---|
| Infrastructure Cost | Lower (3 nodes) | Higher (6+ nodes) |
| Operational Complexity | Simpler | More complex |
| Failure Tolerance | Coupled failures | Independent failures |
| Resource Isolation | Shared resources | Dedicated resources |
| Network Latency | Lower (local) | Higher (network) |
| Scalability | Limited | Independent |
| Recommended Use Case | Dev/Test, Small prod | Large prod, Critical systems |
Load Balancer & Networking Considerations
The load balancer sits at the heart of your Kubernetes HA architecture, serving as the single point of entry for all API server communication. Getting this right is crucial for both availability and performance.
API Server HA with External Load Balancer
Layer 4 (TCP) vs Layer 7 (HTTP) Load Balancing:
For Kubernetes API servers, Layer 4 TCP load balancing is typically preferred because the API server uses both HTTP and WebSocket protocols, TLS termination should happen at the API server for security, and it provides lower latency compared to Layer 7 inspection. <details> <summary><strong>📋 Cloud Provider Load Balancer Examples (Click to expand)</strong></summary>
AWS Network Load Balancer (NLB) Configuration:
{
"Type": "network",
"Scheme": "internal",
"IpAddressType": "ipv4",
"Listeners": [
{
"Protocol": "TCP",
"Port": 6443,
"TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/k8s-api-servers"
}
],
"HealthCheck": {
"Protocol": "TCP",
"Port": "6443",
"HealthyThreshold": 2,
"UnhealthyThreshold": 2,
"Interval": 10
}
}
GCP Load Balancer with Health Checks:
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: k8s-api-ssl-cert
spec:
domains:
- k8s-api.example.com
---
apiVersion: v1
kind: Service
metadata:
name: k8s-api-lb
annotations:
cloud.google.com/load-balancer-type: "External"
networking.gke.io/managed-certificates: "k8s-api-ssl-cert"
spec:
type: LoadBalancer
ports:
- port: 6443
targetPort: 6443
protocol: TCP
selector:
component: kube-apiserver
DNS and Service Discovery
Internal DNS Configuration: Set up internal DNS records that point to your load balancer:
# Example DNS entries
k8s-api.internal.example.com. 300 IN A 10.0.1.100
kubernetes.default.svc.cluster.local. 300 IN A 10.0.1.100
kubeconfig for HA Cluster:
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTi...
server: https://k8s-api.internal.example.com:6443
name: ha-cluster
contexts:
- context:
cluster: ha-cluster
user: admin
name: ha-cluster-admin
current-context: ha-cluster-admin
users:
- name: admin
user:
client-certificate-data: LS0tLS1CRUdJTi...
client-key-data: LS0tLS1CRUdJTi...
Health Checks and Failover
API Server Health Endpoints:
/livez– Liveness check (is the server running?)/readyz– Readiness check (is the server ready to serve traffic?)/healthz– General health check (deprecated but still available)
Advanced Health Check Configuration:
# HAProxy health check with HTTP
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
tcp-check send-binary 474554202F6C6976657A20485454502F312E310D0A486F73743A206C6F63616C686F73740D0A0D0A
tcp-check expect string "HTTP/1.1 200 OK"
server master1 10.0.1.10:6443 check inter 5s rise 2 fall 3
server master2 10.0.1.11:6443 check inter 5s rise 2 fall 3
server master3 10.0.1.12:6443 check inter 5s rise 2 fall 3
Connection Draining: When an API server needs maintenance, graceful shutdown allows existing connections to complete while preventing new connections.
Cloud vs On-Premises HA Architectures
The choice between cloud-managed and self-managed Kubernetes HA depends on your requirements for control, cost, and operational complexity.
Cloud Provider Managed HA
AWS EKS (Elastic Kubernetes Service): EKS provides fully managed control plane HA out of the box:
# Create EKS cluster with HA control plane
aws eks create-cluster \
--name production-cluster \
--version 1.28 \
--role-arn arn:aws:iam::123456789012:role/eks-service-role \
--resources-vpc-config subnetIds=subnet-12345,subnet-67890,subnet-abcde
What AWS manages:
- Multiple API servers across AZs
- etcd cluster with automated backups
- Control plane scaling and updates
- Load balancer for API server access
- Certificate rotation and security patches
GCP GKE (Google Kubernetes Engine):
# Create regional GKE cluster (multi-zone HA)
gcloud container clusters create production-cluster \
--region=us-central1 \
--num-nodes=2 \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--enable-autorepair \
--enable-autoupgrade
Azure AKS (Azure Kubernetes Service):
# Create AKS cluster with availability zones
az aks create \
--resource-group myResourceGroup \
--name production-cluster \
--node-count 3 \
--zones 1 2 3 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
On-Premises HA with kubeadm
For on-premises deployments, you have complete control but also complete responsibility for HA setup.
Infrastructure Prerequisites:
- Minimum 6 nodes (3 masters, 3 workers) for external etcd
- Or minimum 3 nodes for stacked etcd topology
- External load balancer (HAProxy, NGINX, F5, etc.)
- Shared storage for persistent volumes (optional but recommended)
Real-World Example: HA Setup in AWS with Self-Managed Nodes <details> <summary><strong>🔧 Complete AWS HA Infrastructure Setup (Click to expand)</strong></summary>
Step 1: Infrastructure Setup
# Create VPC and subnets across 3 AZs
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=k8s-vpc}]'
# Create subnets in different AZs
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.1.0/24 --availability-zone us-west-2a
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.2.0/24 --availability-zone us-west-2b
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.3.0/24 --availability-zone us-west-2c
# Create Network Load Balancer
aws elbv2 create-load-balancer \
--name k8s-api-nlb \
--scheme internal \
--type network \
--subnets subnet-12345 subnet-67890 subnet-abcde
Step 2: Launch EC2 Instances
# Auto Scaling Group for master nodes
MasterASG:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
- !Ref PrivateSubnet3
LaunchTemplate:
LaunchTemplateId: !Ref MasterLaunchTemplate
Version: !GetAtt MasterLaunchTemplate.LatestVersionNumber
MinSize: 3
MaxSize: 3
DesiredCapacity: 3
TargetGroupARNs:
- !Ref APIServerTargetGroup
Tags:
- Key: Name
Value: k8s-master
PropagateAtLaunch: true
- Key: kubernetes.io/role
Value: master
PropagateAtLaunch: true
Step 3: Load Balancer Target Group
{
"TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/k8s-api-servers",
"Targets": [
{"Id": "i-1234567890abcdef0", "Port": 6443},
{"Id": "i-0987654321fedcba0", "Port": 6443},
{"Id": "i-abcdef1234567890", "Port": 6443}
],
"HealthCheckProtocol": "TCP",
"HealthCheckPort": "6443",
"HealthyThresholdCount": 2,
"UnhealthyThresholdCount": 2
}
Let’s walk through setting up a production-grade HA cluster on AWS EC2 instances:
Benefits of Self-Managed HA:
- Complete control over Kubernetes version and configuration
- Custom networking and security policies
- Cost optimization through reserved instances and spot pricing
- Integration with existing infrastructure and monitoring
Challenges:
- Operational overhead for updates, backups, and monitoring
- Need for deep Kubernetes expertise
- Responsibility for security patching and compliance
- Higher complexity in troubleshooting
Security in HA Clusters
High availability inherently increases your attack surface by introducing more components and network connections. Implementing robust security measures is essential.
TLS Everywhere
Every component in your HA cluster must communicate over encrypted channels. <details> <summary><strong>🔒 Complete TLS Configuration Examples (Click to expand)</strong></summary>
API Server TLS configuration:
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.0.1.10
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --etcd-servers=https://10.0.1.10:2379,https://10.0.1.11:2379,https://10.0.1.12:2379
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
Certificate Management for HA:
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
DNS.5 = k8s-api.internal.example.com
IP.1 = 10.96.0.1
IP.2 = 10.0.1.10
IP.3 = 10.0.1.11
IP.4 = 10.0.1.12
IP.5 = 10.0.1.100
EOF
RBAC (Role-Based Access Control)
Multi-Master RBAC Considerations:
# ClusterRole for HA cluster management
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-admin-ha
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-admin-ha-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin-ha
subjects:
- kind: ServiceAccount
name: cluster-admin-ha
namespace: kube-system
etcd Encryption at Rest
Enable etcd Encryption:
# EncryptionConfiguration for etcd
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
- configmaps
providers:
- aescbc:
keys:
- name: key1
secret: c2VjcmV0IGlzIHNlY3VyZQ==
- identity: {}
Apply encryption config to all API servers:
# Update API server configuration on all master nodes
sudo cp encryption-config.yaml /etc/kubernetes/
sudo sed -i 's|--audit-log-path=/var/log/audit.log|--encryption-provider-config=/etc/kubernetes/encryption-config.yaml --audit-log-path=/var/log/audit.log|' /etc/kubernetes/manifests/kube-apiserver.yaml
Network Security
Network Policies for Control Plane:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: control-plane-access
namespace: kube-system
spec:
podSelector:
matchLabels:
component: kube-apiserver
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: TCP
port: 6443
- from: []
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: TCP
port: 2379
- protocol: TCP
port: 2380
Hands-On Lab: kubeadm HA Setup
⚠️ SAFETY WARNING
This lab modifies etcd configuration and certificates. Never run these commands on production clusters. Use dedicated test/lab environments only. Always backup etcd before making changes:etcdctl snapshot save backup-$(date +%Y%m%d-%H%M%S).db
Now let’s walk through creating a production-ready HA cluster using kubeadm. This lab assumes you have three Ubuntu 20.04 nodes prepared with kubeadm, kubelet, and kubectl installed.
Prerequisites Setup
Node Specifications:
- 3 Master nodes: 2 vCPU, 4GB RAM, 20GB disk each
- 3 Worker nodes: 2 vCPU, 4GB RAM, 50GB disk each
- 1 Load balancer node: 1 vCPU, 2GB RAM, 10GB disk
Network Requirements:
# Ensure all nodes can communicate
# Master nodes: 10.0.1.10, 10.0.1.11, 10.0.1.12
# Worker nodes: 10.0.1.20, 10.0.1.21, 10.0.1.22
# Load balancer: 10.0.1.100
Step 1: Provision and Configure Load Balancer
Install and Configure HAProxy:
# On load balancer node (10.0.1.100)
sudo apt update && sudo apt install -y haproxy
# Configure HAProxy
sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
global
log stdout local0
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
mode http
log global
option httplog
option dontlognull
option log-health-checks
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 20s
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
# Kubernetes API Server
frontend k8s-api-frontend
bind *:6443
mode tcp
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
server master1 10.0.1.10:6443 check inter 5s rise 2 fall 3
server master2 10.0.1.11:6443 check inter 5s rise 2 fall 3 backup
server master3 10.0.1.12:6443 check inter 5s rise 2 fall 3 backup
# HAProxy Stats
listen stats
bind *:8080
stats enable
stats uri /stats
stats refresh 30s
EOF
# Enable and start HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy
sudo systemctl status haproxy
Step 2: Initialize First Control Plane Node
Create kubeadm configuration:
# On master1 (10.0.1.10)
cat > kubeadm-config.yaml <<EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.0.1.10
bindPort: 6443
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.2
controlPlaneEndpoint: "k8s-api.internal.example.com:6443" # Use FQDN for production
networking:
serviceSubnet: "10.96.0.0/16"
podSubnet: "10.244.0.0/16"
dnsDomain: "cluster.local"
etcd:
local:
dataDir: "/var/lib/etcd"
apiServer:
advertiseAddress: 10.0.1.10
extraArgs:
authorization-mode: "Node,RBAC"
enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota"
certSANs:
- "k8s-api.internal.example.com"
- "10.0.1.100" # Load balancer IP
- "10.0.1.10"
- "10.0.1.11"
- "10.0.1.12"
controllerManager:
extraArgs:
bind-address: "0.0.0.0"
scheduler:
extraArgs:
bind-address: "0.0.0.0"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF
# Initialize the first control plane node
sudo kubeadm init --config=kubeadm-config.yaml --upload-certs
Save join commands and certificates:
# The output will include commands like these (save for later steps):
# For additional control plane nodes:
kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH> \
--control-plane --certificate-key <CERT_KEY> \
--apiserver-advertise-address=<NODE_IP>
# For worker nodes:
kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
# Replace placeholders with actual values from kubeadm init output
Step 3: Configure kubectl and Network
Set up kubectl access:
# On master1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Install network plugin (Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Step 4: Join Additional Control Plane Nodes
On master2 (10.0.1.11):
# Join as control plane node using the command from Step 2
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
--discovery-token-ca-cert-hash sha256:hash... \
--control-plane --certificate-key certificate_key... \
--apiserver-advertise-address=10.0.1.11
On master3 (10.0.1.12):
# Join as control plane node
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
--discovery-token-ca-cert-hash sha256:hash... \
--control-plane --certificate-key certificate_key... \
--apiserver-advertise-address=10.0.1.12
Step 5: Join Worker Nodes
On worker1, worker2, worker3:
# Join as worker nodes
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
--discovery-token-ca-cert-hash sha256:hash...
Step 6: Verify HA Setup
Check cluster status:
# Verify all nodes are ready
kubectl get nodes -o wide
# Check control plane components
kubectl get pods -n kube-system
# Verify etcd cluster health
kubectl exec -n kube-system etcd-master1 -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Test API server HA by stopping one master
sudo systemctl stop kubelet # On master1
kubectl get nodes # Should still work via load balancer
Load Balancer Health Check:
# Check HAProxy stats
curl http://10.0.1.100:8080/stats
# Test API server through load balancer
curl -k https://10.0.1.100:6443/livez
curl -k https://10.0.1.100:6443/readyz
Common Pitfalls & Troubleshooting
Even with careful planning, HA clusters can encounter issues. Here are the most common problems and their solutions.
Split-Brain in etcd
Quick Diagnosis Commands:
# Check etcd member status (copy-ready)
kubectl exec -n kube-system etcd-master1 -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Expected healthy output:
# 8e9e05c52164694d, started, etcd-1, https://10.0.1.10:2380, https://10.0.1.10:2379, false
# 294b66c6fd4a7c55, started, etcd-2, https://10.0.1.11:2380, https://10.0.1.11:2379, false
# 3e7b3f6d7a8b9c2e, started, etcd-3, https://10.0.1.12:2380, https://10.0.1.12:2379, false
# Check for leader (should show one leader=true)
kubectl exec -n kube-system etcd-master1 -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint status --write-out=table
Emergency Recovery (copy-ready):
# 1. Create backup before any changes
etcdctl snapshot save /tmp/etcd-backup-$(date +%Y%m%d-%H%M%S).db
# 2. If split-brain detected, remove unhealthy member
etcdctl member remove 8e9e05c52164694d
# 3. Re-add the node with fresh data
etcdctl member add etcd-2 --peer-urls=https://10.0.1.11:2380
# 4. On the problematic node, reset and rejoin
sudo kubeadm reset --force
sudo kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> --control-plane...
API Server Connection Refused
Quick Diagnosis Commands:
# Check API server process (expected: running)
sudo systemctl status kubelet
# Expected output: Active: active (running)
# Check API server container (expected: 1 running container)
sudo crictl ps | grep apiserver
# Expected: CONTAINER ID IMAGE STATE NAME
# Test API server health (expected: HTTP 200 OK)
curl -k https://127.0.0.1:6443/livez
# Expected output: ok
# Check load balancer status
curl -s http://10.0.1.100:8080/stats | grep -A5 k8s-api-backend
# Expected: server status UP for healthy nodes
Emergency Fix Commands:
# 1. Restart kubelet (solves 70% of API server issues)
sudo systemctl restart kubelet
sleep 30
kubectl get nodes # Test if working
# 2. If still failing, check etcd connectivity
sudo netstat -tulpn | grep :2379
# Expected: tcp LISTEN 127.0.0.1:2379
# 3. Last resort: reset and rejoin master node
sudo kubeadm reset --force
sudo kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> --control-plane...
Certificate Expiration
Quick Check Commands:
# Check all certificate expiration (copy-ready)
sudo kubeadm certs check-expiration
# Expected: All certs valid for >30 days
# Manual check for API server cert
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout | grep -A 2 Validity
# Expected: Not After date in future
Emergency Renewal (copy-ready):
# 1. Renew all certificates (run on ALL master nodes)
sudo kubeadm certs renew all
# 2. Restart kubelet to pick up new certs
sudo systemctl restart kubelet
# 3. Update your kubeconfig
sudo cp /etc/kubernetes/admin.conf ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
# 4. Verify cluster access
kubectl get nodes
# Expected: All nodes listed without authentication errors
Load Balancer Down/Misconfigured
Quick Check Commands:
# Test load balancer health (expected: stats page loads)
curl -s http://10.0.1.100:8080/stats | head -20
# Test API server through LB (expected: ok)
curl -k https://10.0.1.100:6443/livez
# Check HAProxy process (expected: running)
sudo systemctl status haproxy
# Expected: Active: active (running)
Emergency Fix:
# 1. Restart HAProxy
sudo systemctl restart haproxy
# 2. If config issues, update and reload
sudo vim /etc/haproxy/haproxy.cfg # Fix config
sudo systemctl reload haproxy
# 3. Test each backend individually
curl -k https://10.0.1.10:6443/livez # Direct to master1
curl -k https://10.0.1.11:6443/livez # Direct to master2
curl -k https://10.0.1.12:6443/livez # Direct to master3
Recovery Procedures
Complete Cluster Recovery:
# 1. Stop all Kubernetes services
sudo systemctl stop kubelet
# 2. Restore etcd from backup
sudo etcdctl snapshot restore /backup/etcd-backup.db \
--data-dir /var/lib/etcd-restore \
--initial-cluster master1=https://10.0.1.10:2380,master2=https://10.0.1.11:2380,master3=https://10.0.1.12:2380 \
--initial-advertise-peer-urls https://10.0.1.10:2380
# 3. Update etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.old
sudo mv /var/lib/etcd-restore /var/lib/etcd
# 4. Start services
sudo systemctl start kubelet
Monitoring and Alerting:
# Prometheus alerts for HA cluster
groups:
- name: kubernetes-ha
rules:
- alert: APIServerDown
expr: up{job="kubernetes-apiservers"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Kubernetes API server is down"
- alert: etcdNoLeader
expr: etcd_server_has_leader{job="kubernetes-etcd"} == 0
for: 10s
labels:
severity: critical
annotations:
summary: "etcd cluster has no leader"
- alert: etcdHighNumberOfFailedGRPCRequests
expr: sum(rate(grpc_server_handled_total{job="kubernetes-etcd",grpc_code!="OK"}[5m])) / sum(rate(grpc_server_handled_total{job="kubernetes-etcd"}[5m])) > 0.01
for: 10m
labels:
severity: warning
annotations:
summary: "etcd cluster has high number of failed gRPC requests"
HA in Exams & Real-World DevOps
Understanding Kubernetes High Availability is crucial not just for production environments, but also for advancing your career through certifications and real-world problem-solving.
Kubernetes Certification Relevance
CKA (Certified Kubernetes Administrator) Topics:
- Cluster installation and configuration (25% of exam)
- Managing highly available Kubernetes clusters
- Understanding etcd backup and restore procedures
- Troubleshooting cluster components
CKS (Certified Kubernetes Security Specialist) Areas:
- Securing cluster communications with TLS
- Implementing RBAC in multi-master environments
- etcd encryption configuration
- Network security policies for control plane
Sample CKA HA Question: “Your 3-node Kubernetes cluster is experiencing issues with the control plane. One of the master nodes has failed, but the cluster is still functional. Describe the steps to replace the failed master node while maintaining high availability.”
Answer Approach:
# 1. Identify the failed node
kubectl get nodes
kubectl describe node master-2 # Shows NotReady status
# 2. Remove failed node from cluster
kubectl delete node master-2
# 3. Remove from etcd cluster
kubectl exec -n kube-system etcd-master-1 -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member remove 8e9e05c52164694d
# 4. Provision new node and join to cluster
kubeadm join 10.0.1.100:6443 --token token \
--discovery-token-ca-cert-hash sha256:hash \
--control-plane --certificate-key key
Real-World Case Study: E-commerce Platform Outage Prevention
Scenario: A major e-commerce company running Black Friday sales on Kubernetes.
Challenge: Single master node cluster couldn’t handle the traffic spike and went down, causing $2M in lost revenue during peak shopping hours.
HA Solution Implemented:
Architecture Changes:
- Migrated from single master to 3-master HA setup
- Implemented external etcd cluster with 5 nodes
- Added multiple availability zones
- Configured automated scaling for worker nodes
Technical Implementation:
# Production deployment with anti-affinity
apiVersion: apps/v1
kind: Deployment
metadata:
name: ecommerce-frontend
spec:
replicas: 12
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 3
maxUnavailable: 2
selector:
matchLabels:
app: ecommerce-frontend
template:
metadata:
labels:
app: ecommerce-frontend
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ecommerce-frontend
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/zone
operator: In
values:
- us-west-2a
- us-west-2b
- us-west-2c
containers:
- name: frontend
image: ecommerce/frontend:v2.1
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
Monitoring Setup:
# SLI/SLO monitoring for HA effectiveness
apiVersion: v1
kind: ConfigMap
metadata:
name: slo-config
data:
config.yaml: |
slis:
- name: api_availability
description: "API server availability"
query: 'avg_over_time(up{job="kubernetes-apiservers"}[5m])'
target: 0.999 # 99.9% uptime SLO
- name: etcd_availability
description: "etcd cluster availability"
query: 'min(etcd_server_has_leader)'
target: 1.0 # 100% leader availability
- name: pod_startup_time
description: "Time for pods to become ready"
query: 'histogram_quantile(0.95, kube_pod_start_time_seconds)'
target: 30 # 95th percentile under 30 seconds
Results After HA Implementation:
- Zero control plane downtime during next Black Friday (20x traffic increase)
- Recovery time reduced from 45 minutes to under 30 seconds for node failures
- Cost optimization through better resource utilization and spot instances
- Monitoring improvements with 95% reduction in false positive alerts
Key Lessons Learned:
- Don’t wait for failure – Implement HA before you need it
- Test failure scenarios – Regularly conduct chaos engineering exercises
- Monitor everything – Use comprehensive metrics and alerting
- Document procedures – Ensure team can respond quickly to incidents
- Automate recovery – Reduce manual intervention during failures
Business Impact of HA
Cost-Benefit Analysis:
Without HA:
- Single master failure = 45 minutes downtime
- Peak traffic outage cost = $2M/hour
- Engineer overtime for manual recovery = $5K/incident
- Customer trust impact = Immeasurable
With HA:
- Infrastructure cost increase = 200% (3 masters vs 1)
- Operational complexity increase = 150%
- Downtime reduction = 99%
- ROI = 400% after first major outage prevention
Career Development Benefits:
For DevOps Engineers:
- Deep understanding of distributed systems concepts
- Experience with production-grade architecture decisions
- Troubleshooting skills for complex failure scenarios
- Leadership opportunities in incident response
For Organizations:
- Reduced operational risk and improved SLAs
- Better resource utilization and cost optimization
- Improved team confidence in production deployments
- Foundation for advanced features like multi-cluster, disaster recovery
Frequently Asked Questions (FAQ)
What is Kubernetes High Availability?
Kubernetes High Availability ensures that control plane and worker nodes are deployed redundantly so the cluster keeps running even if some nodes fail. This eliminates single points of failure and provides continuous cluster operations during hardware or software failures.
How many control plane nodes are recommended for HA?
At least three control plane nodes are recommended for HA clusters to achieve quorum and avoid downtime. This follows the etcd requirement for odd numbers (3, 5, 7) to maintain consensus and prevents split-brain scenarios.
What is the difference between stacked and external etcd topologies?
In stacked topology, etcd runs alongside control plane nodes on the same machines, making it simpler and more cost-effective. In external topology, etcd runs on a separate dedicated cluster for better isolation, performance, and scalability but requires more infrastructure.
How is API server HA achieved?
Multiple API servers are placed behind a load balancer so traffic continues even if one server fails. The load balancer distributes requests across healthy API server instances and automatically removes failed servers from rotation using health checks.
What happens when an etcd node fails in an HA cluster?
When one etcd node fails in a 3-node cluster, the remaining 2 nodes maintain quorum and the cluster continues operating normally. The failed node can be replaced without downtime. However, if 2 nodes fail simultaneously, the cluster loses quorum and becomes read-only until quorum is restored.
Conclusion & Next Steps
Kubernetes High Availability architecture represents the foundation of production-ready container orchestration. Throughout this comprehensive guide, we’ve explored how eliminating single points of failure through redundant control planes, clustered etcd deployments, and proper load balancing creates resilient systems that can withstand component failures while maintaining service availability.
The key takeaways from our deep dive include:
Multi-master setups with at least three control plane nodes provide the redundancy needed for production workloads, while etcd clustering with odd-numbered nodes ensures data consistency and quorum-based decision making. External load balancers serve as the critical entry point that enables seamless failover between API server instances.
Choosing between stacked and external etcd topologies depends on your specific requirements—stacked offers simplicity and cost savings, while external provides better isolation and scalability. Cloud-managed services like EKS, GKE, and AKS deliver HA capabilities out of the box, while self-managed clusters offer complete control at the cost of operational complexity.
Security remains paramount in HA environments, requiring TLS encryption for all component communications, proper RBAC implementation, and etcd encryption at rest. Regular certificate rotation and comprehensive monitoring ensure your cluster remains both available and secure.
The hands-on kubeadm lab demonstrated that implementing HA isn’t just theoretical—with proper planning and execution, you can build production-grade clusters that handle real-world failures gracefully. Understanding common pitfalls like split-brain scenarios, certificate expiration, and load balancer misconfiguration prepares you to troubleshoot effectively when issues arise.
Your journey toward Kubernetes mastery continues here. HA concepts form the foundation, but production environments require additional layers like monitoring, logging, backup strategies, and disaster recovery planning.
Ready to take your Kubernetes expertise to the next level?
Next recommended reading: Kubernetes Control Plane Deep Dive Guide – Master the inner workings of API servers, controllers, and schedulers.
Downloadable HA Cheat Sheet
Quick Reference: Kubernetes HA Architecture
🔧 HA Topology Decision Matrix
| Requirement | Stacked etcd | External etcd |
|---|---|---|
| Small clusters (<50 nodes) | ✅ Recommended | ⚠️ Overkill |
| Large clusters (>100 nodes) | ❌ Not recommended | ✅ Recommended |
| Cost optimization | ✅ Lower cost | ❌ Higher cost |
| Performance isolation | ❌ Shared resources | ✅ Dedicated |
| Operational simplicity | ✅ Easier | ❌ More complex |
| Maximum availability | ⚠️ Good | ✅ Excellent |
🔍 Essential Commands Cheat Sheet
# Check cluster health
kubectl get nodes
kubectl get pods -n kube-system
kubectl get componentstatuses
# etcd cluster status
kubectl exec -n kube-system etcd-master1 -- etcdctl \
--endpoints=localhost:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Certificate expiration check
sudo kubeadm certs check-expiration
# API server health endpoints
curl -k https://API_SERVER:6443/livez
curl -k https://API_SERVER:6443/readyz
curl -k https://API_SERVER:6443/healthz
⚡ Emergency Recovery Procedures
# etcd backup
etcdctl snapshot save backup.db
# etcd restore
etcdctl snapshot restore backup.db \
--data-dir /var/lib/etcd-restore \
--initial-cluster master1=https://10.0.1.10:2380,master2=https://10.0.1.11:2380,master3=https://10.0.1.12:2380
# Certificate renewal
sudo kubeadm certs renew all
sudo systemctl restart kubelet
# Reset and rejoin node
sudo kubeadm reset
kubeadm join LOAD_BALANCER:6443 --token TOKEN --control-plane...
📊 Monitoring Metrics
up{job="kubernetes-apiservers"}– API server availabilityetcd_server_has_leader– etcd leader statuskube_node_status_condition{condition="Ready"}– Node healthkube_pod_container_status_ready– Pod readiness
🎯 Production Checklist
- ✅ Odd number of etcd nodes (3, 5, 7)
- ✅ Load balancer with health checks
- ✅ TLS certificates for all components
- ✅ Regular etcd backups automated
- ✅ Multi-zone node distribution
- ✅ Resource requests/limits on critical pods
- ✅ Monitoring and alerting configured
- ✅ Runbook for common failure scenarios
- ✅ Disaster recovery procedures tested
- ✅ Regular chaos engineering exercises
Save this cheat sheet for quick reference during production deployments and troubleshooting sessions.
