Kubernetes High Availability Architecture: Complete Guide for DevOps Engineers 2025

TL;DR + Key Takeaways

What You’ll Learn & Do:

How to design a 3-master HA control plane with etcd quorum best practices
kubeadm HA lab: exact commands to bootstrap a self-managed HA cluster
Checklist & emergency recovery commands for on-call incidents

Estimated Read Time: 12-15 minutes | Difficulty: Intermediate to Advanced

Introduction: Kubernetes High Availability Architecture

Picture this: It’s 3 AM, and your production Kubernetes cluster just lost its single master node due to a hardware failure. Your entire application stack is down, customers can’t access services, and you’re frantically trying to restore from backups while calculating the revenue impact of each passing minute.

Now imagine the same scenario, but with a properly configured High Availability Kubernetes cluster. One master node fails, but the other two seamlessly take over. Your applications continue running, users remain unaffected, and you can address the failed node during normal business hours.

This is why Kubernetes High Availability architecture isn’t just a nice-to-have feature—it’s absolutely critical for any production environment where downtime translates to lost revenue, damaged reputation, and sleepless nights for your engineering team.

In this comprehensive guide, we’ll dive deep into everything you need to know about designing, implementing, and maintaining highly available Kubernetes clusters that can withstand component failures while keeping your applications running smoothly.

What Does High Availability Mean in Kubernetes?

High Availability in Kubernetes refers to the architectural approach of eliminating single points of failure across both the control plane and data plane components. When properly implemented, your cluster can continue operating normally even when individual nodes, components, or entire availability zones experience failures.

Let’s illustrate this with a concrete example:

Single Master Scenario (Non-HA):

When the master node fails, your entire cluster becomes unmanageable. You can’t deploy new pods, scale applications, or perform any cluster operations, even though worker nodes might still be running existing workloads.

Multi-Master HA Scenario:

In this setup, if Master1 fails, the load balancer redirects traffic to Master2 and Master3, ensuring continuous cluster operation.

The key difference is resilience through redundancy. Instead of hoping nothing breaks, we design systems that gracefully handle failures as a normal part of operations.

Kubernetes HA Architecture Overview

Understanding the complete picture of Kubernetes HA architecture requires examining both the control plane and worker node components, along with how they interact to provide seamless failover capabilities.

Control Plane in HA Mode

The Kubernetes control plane consists of several critical components that must be made highly available:

API Server: Multiple API server instances run simultaneously behind a load balancer. Each instance is stateless, making horizontal scaling straightforward.

etcd Cluster: The distributed key-value store runs in cluster mode with multiple nodes to ensure data consistency and availability.

Controller Manager: Runs in active-passive mode with leader election. Only one instance is active at a time, but others stand ready to take over.

Scheduler: Also uses leader election for active-passive operation, ensuring only one scheduler makes pod placement decisions at any given time.

Worker Nodes in HA Setup

Worker nodes contribute to overall cluster availability through:

Multi-Zone Distribution: Spreading worker nodes across multiple availability zones prevents single zone failures from taking down your entire application stack.

Node Redundancy: Running enough worker nodes to handle the failure of several nodes without capacity issues.

Pod Replicas: Applications deployed with multiple replicas across different nodes ensure service continuity during node failures.

Complete HA Architecture Diagram

This architecture provides multiple layers of redundancy:

Zone-level redundancy: Components spread across availability zones
Node-level redundancy: Multiple nodes of each type
Component-level redundancy: Multiple instances of critical services
Network-level redundancy: Load balancers provide traffic distribution and failover

Key Components in HA Setup

Multiple API Servers Behind Load Balancer

The API server is the gateway to your Kubernetes cluster, handling all REST API requests from kubectl, kubelet, and other components. In an HA setup:

Load Balancer Configuration:

# Example HAProxy configuration for API servers
global
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s

frontend k8s-api-frontend
    bind *:6443
    mode tcp
    default_backend k8s-api-backend

backend k8s-api-backend
    mode tcp
    balance roundrobin
    option tcp-check
    server master1 10.0.1.10:6443 check
    server master2 10.0.1.11:6443 check
    server master3 10.0.1.12:6443 check

Health Check Configuration: The load balancer continuously monitors API server health using the /livez and /readyz endpoints, automatically removing failed instances from rotation.

etcd Cluster Configuration

etcd is perhaps the most critical component requiring careful HA design. It stores all cluster state, including:

Pod specifications and status
Service definitions
ConfigMaps and Secrets
RBAC policies

Quorum Requirements: etcd uses the Raft consensus algorithm, which requires a majority of nodes to be available for the cluster to function. This is why you need an odd number of etcd nodes:

3 nodes: Can tolerate 1 failure (2 nodes = majority)
5 nodes: Can tolerate 2 failures (3 nodes = majority)
7 nodes: Can tolerate 3 failures (4 nodes = majority)

etcd Cluster Bootstrap Example:

# Node 1 (10.0.1.10)
etcd --name=etcd-1 \
  --data-dir=/var/lib/etcd \
  --initial-advertise-peer-urls=https://10.0.1.10:2380 \
  --listen-peer-urls=https://10.0.1.10:2380 \
  --advertise-client-urls=https://10.0.1.10:2379 \
  --listen-client-urls=https://10.0.1.10:2379,https://127.0.0.1:2379 \
  --initial-cluster=etcd-1=https://10.0.1.10:2380,etcd-2=https://10.0.1.11:2380,etcd-3=https://10.0.1.12:2380 \
  --initial-cluster-state=new \
  --initial-cluster-token=k8s-etcd-cluster

Controller Manager & Scheduler Redundancy

Unlike API servers, the controller manager and scheduler use leader election to ensure only one active instance at a time, preventing conflicting operations:

Leader Election in Practice:

# Controller Manager configuration with leader election
apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
spec:
  containers:
  - command:
    - kube-controller-manager
    - --leader-elect=true
    - --leader-elect-lease-duration=15s
    - --leader-elect-renew-deadline=10s
    - --leader-elect-retry-period=2s
    name: kube-controller-manager

When the active controller manager fails, another instance automatically becomes the leader within seconds, ensuring continuous cluster operations.

Worker Node Scaling and Pod Replicas

Worker nodes in HA clusters should be:

Distributed Across Zones: Prevents single availability zone failures from impacting application availability.

Sized for N+1 Redundancy: If you need capacity for 100 pods, design for 150 pods across multiple nodes so you can lose several nodes without capacity issues.

Example Deployment with Anti-Affinity:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-app
            topologyKey: kubernetes.io/hostname
      containers:
      - name: web-app
        image: nginx:1.21

This configuration ensures pod replicas are distributed across different nodes, preventing single node failures from taking down the entire application.

HA Deployment Models

When designing your Kubernetes HA architecture, you’ll need to choose between two primary etcd deployment topologies, each with distinct advantages and trade-offs.

Stacked etcd Topology

In the stacked topology, etcd runs as a static pod on the same nodes as other control plane components.

Architecture:

Pros:

Simplified deployment: Fewer nodes to manage and configure
Lower infrastructure costs: Uses existing control plane nodes
Easier networking: No additional network configuration required
Built-in co-location: Control plane and etcd failures are coupled, making troubleshooting more straightforward

Cons:

Coupled failures: If a control plane node fails, you lose both API server capacity and etcd capacity
Resource contention: etcd competes with other control plane components for CPU and memory
Less isolation: etcd performance can be impacted by other control plane workloads

Kubernetes official HA documentation

External etcd Topology

In the external topology, etcd runs on dedicated nodes separate from the control plane.

Architecture:

Pros:

Better isolation: etcd gets dedicated resources and doesn’t compete with other components
Independent scaling: Can scale etcd cluster independently of control plane
Decoupled failures: Control plane node failure doesn’t affect etcd capacity
Performance optimization: Can tune etcd nodes specifically for database workloads

Cons:

Higher complexity: More nodes to manage and configure
Increased costs: Requires additional infrastructure
Network overhead: Additional network hops between control plane and etcd
More failure modes: Additional network and node failure scenarios to handle

Cost Comparison: Single vs HA Architecture

Configuration	Infrastructure	Monthly Cost*	Downtime Risk	ROI After First Outage
Single Master	1 master + 3 workers	$480/month	45min = $50K loss	N/A
Stacked HA	3 masters + 3 workers	$960/month	<30sec = $0 loss	10,400% ROI
External etcd HA	3 masters + 3 etcd + 3 workers	$1,440/month	<30sec = $0 loss	5,200% ROI

*Based on AWS m5.large instances ($0.096/hr), assumes $50K/hr revenue impact

Comparison Table

Aspect	Stacked etcd	External etcd
Infrastructure Cost	Lower (3 nodes)	Higher (6+ nodes)
Operational Complexity	Simpler	More complex
Failure Tolerance	Coupled failures	Independent failures
Resource Isolation	Shared resources	Dedicated resources
Network Latency	Lower (local)	Higher (network)
Scalability	Limited	Independent
Recommended Use Case	Dev/Test, Small prod	Large prod, Critical systems

Load Balancer & Networking Considerations

The load balancer sits at the heart of your Kubernetes HA architecture, serving as the single point of entry for all API server communication. Getting this right is crucial for both availability and performance.

API Server HA with External Load Balancer

Layer 4 (TCP) vs Layer 7 (HTTP) Load Balancing:

For Kubernetes API servers, Layer 4 TCP load balancing is typically preferred because the API server uses both HTTP and WebSocket protocols, TLS termination should happen at the API server for security, and it provides lower latency compared to Layer 7 inspection. <details> <summary>📋 Cloud Provider Load Balancer Examples (Click to expand)</summary>

AWS Network Load Balancer (NLB) Configuration:

{
  "Type": "network",
  "Scheme": "internal",
  "IpAddressType": "ipv4",
  "Listeners": [
    {
      "Protocol": "TCP",
      "Port": 6443,
      "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/k8s-api-servers"
    }
  ],
  "HealthCheck": {
    "Protocol": "TCP",
    "Port": "6443",
    "HealthyThreshold": 2,
    "UnhealthyThreshold": 2,
    "Interval": 10
  }
}

GCP Load Balancer with Health Checks:

apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
  name: k8s-api-ssl-cert
spec:
  domains:
    - k8s-api.example.com
---
apiVersion: v1
kind: Service
metadata:
  name: k8s-api-lb
  annotations:
    cloud.google.com/load-balancer-type: "External"
    networking.gke.io/managed-certificates: "k8s-api-ssl-cert"
spec:
  type: LoadBalancer
  ports:
  - port: 6443
    targetPort: 6443
    protocol: TCP
  selector:
    component: kube-apiserver

DNS and Service Discovery

Internal DNS Configuration: Set up internal DNS records that point to your load balancer:

# Example DNS entries
k8s-api.internal.example.com.    300    IN    A    10.0.1.100
kubernetes.default.svc.cluster.local.  300  IN  A  10.0.1.100

kubeconfig for HA Cluster:

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTi...
    server: https://k8s-api.internal.example.com:6443
  name: ha-cluster
contexts:
- context:
    cluster: ha-cluster
    user: admin
  name: ha-cluster-admin
current-context: ha-cluster-admin
users:
- name: admin
  user:
    client-certificate-data: LS0tLS1CRUdJTi...
    client-key-data: LS0tLS1CRUdJTi...

Health Checks and Failover

API Server Health Endpoints:

/livez – Liveness check (is the server running?)
/readyz – Readiness check (is the server ready to serve traffic?)
/healthz – General health check (deprecated but still available)

Advanced Health Check Configuration:

# HAProxy health check with HTTP
backend k8s-api-backend
    mode tcp
    balance roundrobin
    option tcp-check
    tcp-check send-binary 474554202F6C6976657A20485454502F312E310D0A486F73743A206C6F63616C686F73740D0A0D0A
    tcp-check expect string "HTTP/1.1 200 OK"
    server master1 10.0.1.10:6443 check inter 5s rise 2 fall 3
    server master2 10.0.1.11:6443 check inter 5s rise 2 fall 3
    server master3 10.0.1.12:6443 check inter 5s rise 2 fall 3

Connection Draining: When an API server needs maintenance, graceful shutdown allows existing connections to complete while preventing new connections.

Cloud vs On-Premises HA Architectures

The choice between cloud-managed and self-managed Kubernetes HA depends on your requirements for control, cost, and operational complexity.

Cloud Provider Managed HA

AWS EKS (Elastic Kubernetes Service): EKS provides fully managed control plane HA out of the box:

# Create EKS cluster with HA control plane
aws eks create-cluster \
  --name production-cluster \
  --version 1.28 \
  --role-arn arn:aws:iam::123456789012:role/eks-service-role \
  --resources-vpc-config subnetIds=subnet-12345,subnet-67890,subnet-abcde

What AWS manages:

Multiple API servers across AZs
etcd cluster with automated backups
Control plane scaling and updates
Load balancer for API server access
Certificate rotation and security patches

GCP GKE (Google Kubernetes Engine):

# Create regional GKE cluster (multi-zone HA)
gcloud container clusters create production-cluster \
  --region=us-central1 \
  --num-nodes=2 \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=10 \
  --enable-autorepair \
  --enable-autoupgrade

Azure AKS (Azure Kubernetes Service):

# Create AKS cluster with availability zones
az aks create \
  --resource-group myResourceGroup \
  --name production-cluster \
  --node-count 3 \
  --zones 1 2 3 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 5

On-Premises HA with kubeadm

For on-premises deployments, you have complete control but also complete responsibility for HA setup.

Infrastructure Prerequisites:

Minimum 6 nodes (3 masters, 3 workers) for external etcd
Or minimum 3 nodes for stacked etcd topology
External load balancer (HAProxy, NGINX, F5, etc.)
Shared storage for persistent volumes (optional but recommended)

Real-World Example: HA Setup in AWS with Self-Managed Nodes <details> <summary>🔧 Complete AWS HA Infrastructure Setup (Click to expand)</summary>

Step 1: Infrastructure Setup

# Create VPC and subnets across 3 AZs
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=k8s-vpc}]'

# Create subnets in different AZs
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.1.0/24 --availability-zone us-west-2a
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.2.0/24 --availability-zone us-west-2b
aws ec2 create-subnet --vpc-id vpc-12345 --cidr-block 10.0.3.0/24 --availability-zone us-west-2c

# Create Network Load Balancer
aws elbv2 create-load-balancer \
  --name k8s-api-nlb \
  --scheme internal \
  --type network \
  --subnets subnet-12345 subnet-67890 subnet-abcde

Step 2: Launch EC2 Instances

# Auto Scaling Group for master nodes
MasterASG:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    VPCZoneIdentifier:
      - !Ref PrivateSubnet1
      - !Ref PrivateSubnet2  
      - !Ref PrivateSubnet3
    LaunchTemplate:
      LaunchTemplateId: !Ref MasterLaunchTemplate
      Version: !GetAtt MasterLaunchTemplate.LatestVersionNumber
    MinSize: 3
    MaxSize: 3
    DesiredCapacity: 3
    TargetGroupARNs:
      - !Ref APIServerTargetGroup
    Tags:
      - Key: Name
        Value: k8s-master
        PropagateAtLaunch: true
      - Key: kubernetes.io/role
        Value: master
        PropagateAtLaunch: true

Step 3: Load Balancer Target Group

{
  "TargetGroupArn": "arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/k8s-api-servers",
  "Targets": [
    {"Id": "i-1234567890abcdef0", "Port": 6443},
    {"Id": "i-0987654321fedcba0", "Port": 6443},
    {"Id": "i-abcdef1234567890", "Port": 6443}
  ],
  "HealthCheckProtocol": "TCP",
  "HealthCheckPort": "6443",
  "HealthyThresholdCount": 2,
  "UnhealthyThresholdCount": 2
}

Let’s walk through setting up a production-grade HA cluster on AWS EC2 instances:

Benefits of Self-Managed HA:

Complete control over Kubernetes version and configuration
Custom networking and security policies
Cost optimization through reserved instances and spot pricing
Integration with existing infrastructure and monitoring

Challenges:

Operational overhead for updates, backups, and monitoring
Need for deep Kubernetes expertise
Responsibility for security patching and compliance
Higher complexity in troubleshooting

Security in HA Clusters

High availability inherently increases your attack surface by introducing more components and network connections. Implementing robust security measures is essential.

TLS Everywhere

Every component in your HA cluster must communicate over encrypted channels. <details> <summary>🔒 Complete TLS Configuration Examples (Click to expand)</summary>

API Server TLS configuration:

apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=10.0.1.10
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --etcd-servers=https://10.0.1.10:2379,https://10.0.1.11:2379,https://10.0.1.12:2379
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key

Certificate Management for HA:

[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
subjectAltName = @alt_names
[alt_names]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster.local
DNS.5 = k8s-api.internal.example.com
IP.1 = 10.96.0.1
IP.2 = 10.0.1.10
IP.3 = 10.0.1.11
IP.4 = 10.0.1.12
IP.5 = 10.0.1.100
EOF

RBAC (Role-Based Access Control)

Multi-Master RBAC Considerations:

# ClusterRole for HA cluster management
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-admin-ha
rules:
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch", "patch"]
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-ha-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin-ha
subjects:
- kind: ServiceAccount
  name: cluster-admin-ha
  namespace: kube-system

etcd Encryption at Rest

Enable etcd Encryption:

# EncryptionConfiguration for etcd
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  - configmaps
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: c2VjcmV0IGlzIHNlY3VyZQ==
  - identity: {}

Apply encryption config to all API servers:

# Update API server configuration on all master nodes
sudo cp encryption-config.yaml /etc/kubernetes/
sudo sed -i 's|--audit-log-path=/var/log/audit.log|--encryption-provider-config=/etc/kubernetes/encryption-config.yaml --audit-log-path=/var/log/audit.log|' /etc/kubernetes/manifests/kube-apiserver.yaml

Network Security

Network Policies for Control Plane:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: control-plane-access
  namespace: kube-system
spec:
  podSelector:
    matchLabels:
      component: kube-apiserver
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 6443
  - from: []
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 2379
    - protocol: TCP
      port: 2380

Hands-On Lab: kubeadm HA Setup

⚠️ SAFETY WARNING
This lab modifies etcd configuration and certificates. Never run these commands on production clusters. Use dedicated test/lab environments only. Always backup etcd before making changes: etcdctl snapshot save backup-$(date +%Y%m%d-%H%M%S).db

Now let’s walk through creating a production-ready HA cluster using kubeadm. This lab assumes you have three Ubuntu 20.04 nodes prepared with kubeadm, kubelet, and kubectl installed.

Prerequisites Setup

Node Specifications:

3 Master nodes: 2 vCPU, 4GB RAM, 20GB disk each
3 Worker nodes: 2 vCPU, 4GB RAM, 50GB disk each
1 Load balancer node: 1 vCPU, 2GB RAM, 10GB disk

Network Requirements:

# Ensure all nodes can communicate
# Master nodes: 10.0.1.10, 10.0.1.11, 10.0.1.12
# Worker nodes: 10.0.1.20, 10.0.1.21, 10.0.1.22
# Load balancer: 10.0.1.100

Step 1: Provision and Configure Load Balancer

Install and Configure HAProxy:

# On load balancer node (10.0.1.100)
sudo apt update && sudo apt install -y haproxy

# Configure HAProxy
sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
global
    log stdout local0
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    mode http
    log global
    option httplog
    option dontlognull
    option log-health-checks
    option forwardfor except 127.0.0.0/8
    option redispatch
    retries 3
    timeout http-request 10s
    timeout queue 20s
    timeout connect 10s
    timeout client 1m
    timeout server 1m
    timeout http-keep-alive 10s
    timeout check 10s

# Kubernetes API Server
frontend k8s-api-frontend
    bind *:6443
    mode tcp
    default_backend k8s-api-backend

backend k8s-api-backend
    mode tcp
    balance roundrobin
    option tcp-check
    server master1 10.0.1.10:6443 check inter 5s rise 2 fall 3
    server master2 10.0.1.11:6443 check inter 5s rise 2 fall 3 backup
    server master3 10.0.1.12:6443 check inter 5s rise 2 fall 3 backup

# HAProxy Stats
listen stats
    bind *:8080
    stats enable
    stats uri /stats
    stats refresh 30s
EOF

# Enable and start HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy
sudo systemctl status haproxy

Step 2: Initialize First Control Plane Node

Create kubeadm configuration:

# On master1 (10.0.1.10)
cat > kubeadm-config.yaml <<EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.0.1.10
  bindPort: 6443
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: external
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.2
controlPlaneEndpoint: "k8s-api.internal.example.com:6443"  # Use FQDN for production
networking:
  serviceSubnet: "10.96.0.0/16"
  podSubnet: "10.244.0.0/16"
  dnsDomain: "cluster.local"
etcd:
  local:
    dataDir: "/var/lib/etcd"
apiServer:
  advertiseAddress: 10.0.1.10
  extraArgs:
    authorization-mode: "Node,RBAC"
    enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota"
  certSANs:
  - "k8s-api.internal.example.com"
  - "10.0.1.100"  # Load balancer IP
  - "10.0.1.10"
  - "10.0.1.11"
  - "10.0.1.12"
controllerManager:
  extraArgs:
    bind-address: "0.0.0.0"
scheduler:
  extraArgs:
    bind-address: "0.0.0.0"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOF

# Initialize the first control plane node
sudo kubeadm init --config=kubeadm-config.yaml --upload-certs

Save join commands and certificates:

# The output will include commands like these (save for later steps):

# For additional control plane nodes:
kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> \
  --discovery-token-ca-cert-hash sha256:<HASH> \
  --control-plane --certificate-key <CERT_KEY> \
  --apiserver-advertise-address=<NODE_IP>

# For worker nodes:  
kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> \
  --discovery-token-ca-cert-hash sha256:<HASH>

# Replace placeholders with actual values from kubeadm init output

Step 3: Configure kubectl and Network

Set up kubectl access:

# On master1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install network plugin (Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Step 4: Join Additional Control Plane Nodes

On master2 (10.0.1.11):

# Join as control plane node using the command from Step 2
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
  --discovery-token-ca-cert-hash sha256:hash... \
  --control-plane --certificate-key certificate_key... \
  --apiserver-advertise-address=10.0.1.11

On master3 (10.0.1.12):

# Join as control plane node
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
  --discovery-token-ca-cert-hash sha256:hash... \
  --control-plane --certificate-key certificate_key... \
  --apiserver-advertise-address=10.0.1.12

Step 5: Join Worker Nodes

On worker1, worker2, worker3:

# Join as worker nodes
sudo kubeadm join 10.0.1.100:6443 --token abc123.xyz789 \
  --discovery-token-ca-cert-hash sha256:hash...

Step 6: Verify HA Setup

Check cluster status:

# Verify all nodes are ready
kubectl get nodes -o wide

# Check control plane components
kubectl get pods -n kube-system

# Verify etcd cluster health
kubectl exec -n kube-system etcd-master1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list

# Test API server HA by stopping one master
sudo systemctl stop kubelet  # On master1
kubectl get nodes  # Should still work via load balancer

Load Balancer Health Check:

# Check HAProxy stats
curl http://10.0.1.100:8080/stats

# Test API server through load balancer
curl -k https://10.0.1.100:6443/livez
curl -k https://10.0.1.100:6443/readyz

kubeadm command reference

Common Pitfalls & Troubleshooting

Even with careful planning, HA clusters can encounter issues. Here are the most common problems and their solutions.

Split-Brain in etcd

Quick Diagnosis Commands:

# Check etcd member status (copy-ready)
kubectl exec -n kube-system etcd-master1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list

# Expected healthy output:
# 8e9e05c52164694d, started, etcd-1, https://10.0.1.10:2380, https://10.0.1.10:2379, false
# 294b66c6fd4a7c55, started, etcd-2, https://10.0.1.11:2380, https://10.0.1.11:2379, false  
# 3e7b3f6d7a8b9c2e, started, etcd-3, https://10.0.1.12:2380, https://10.0.1.12:2379, false

# Check for leader (should show one leader=true)
kubectl exec -n kube-system etcd-master1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint status --write-out=table

Emergency Recovery (copy-ready):

# 1. Create backup before any changes
etcdctl snapshot save /tmp/etcd-backup-$(date +%Y%m%d-%H%M%S).db

# 2. If split-brain detected, remove unhealthy member
etcdctl member remove 8e9e05c52164694d

# 3. Re-add the node with fresh data
etcdctl member add etcd-2 --peer-urls=https://10.0.1.11:2380

# 4. On the problematic node, reset and rejoin
sudo kubeadm reset --force
sudo kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> --control-plane...

API Server Connection Refused

Quick Diagnosis Commands:

# Check API server process (expected: running)
sudo systemctl status kubelet
# Expected output: Active: active (running)

# Check API server container (expected: 1 running container)
sudo crictl ps | grep apiserver
# Expected: CONTAINER ID IMAGE STATE NAME

# Test API server health (expected: HTTP 200 OK)
curl -k https://127.0.0.1:6443/livez
# Expected output: ok

# Check load balancer status
curl -s http://10.0.1.100:8080/stats | grep -A5 k8s-api-backend
# Expected: server status UP for healthy nodes

Emergency Fix Commands:

# 1. Restart kubelet (solves 70% of API server issues)
sudo systemctl restart kubelet
sleep 30
kubectl get nodes  # Test if working

# 2. If still failing, check etcd connectivity
sudo netstat -tulpn | grep :2379
# Expected: tcp LISTEN 127.0.0.1:2379

# 3. Last resort: reset and rejoin master node
sudo kubeadm reset --force
sudo kubeadm join k8s-api.internal.example.com:6443 --token <TOKEN> --control-plane...

Certificate Expiration

Quick Check Commands:

# Check all certificate expiration (copy-ready)
sudo kubeadm certs check-expiration
# Expected: All certs valid for >30 days

# Manual check for API server cert
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout | grep -A 2 Validity
# Expected: Not After date in future

Emergency Renewal (copy-ready):

# 1. Renew all certificates (run on ALL master nodes)
sudo kubeadm certs renew all

# 2. Restart kubelet to pick up new certs
sudo systemctl restart kubelet

# 3. Update your kubeconfig
sudo cp /etc/kubernetes/admin.conf ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# 4. Verify cluster access
kubectl get nodes
# Expected: All nodes listed without authentication errors

Load Balancer Down/Misconfigured

Quick Check Commands:

# Test load balancer health (expected: stats page loads)
curl -s http://10.0.1.100:8080/stats | head -20

# Test API server through LB (expected: ok)
curl -k https://10.0.1.100:6443/livez

# Check HAProxy process (expected: running)
sudo systemctl status haproxy
# Expected: Active: active (running)

Emergency Fix:

# 1. Restart HAProxy
sudo systemctl restart haproxy

# 2. If config issues, update and reload
sudo vim /etc/haproxy/haproxy.cfg  # Fix config
sudo systemctl reload haproxy

# 3. Test each backend individually
curl -k https://10.0.1.10:6443/livez  # Direct to master1
curl -k https://10.0.1.11:6443/livez  # Direct to master2
curl -k https://10.0.1.12:6443/livez  # Direct to master3

Recovery Procedures

Complete Cluster Recovery:

# 1. Stop all Kubernetes services
sudo systemctl stop kubelet

# 2. Restore etcd from backup
sudo etcdctl snapshot restore /backup/etcd-backup.db \
  --data-dir /var/lib/etcd-restore \
  --initial-cluster master1=https://10.0.1.10:2380,master2=https://10.0.1.11:2380,master3=https://10.0.1.12:2380 \
  --initial-advertise-peer-urls https://10.0.1.10:2380

# 3. Update etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.old
sudo mv /var/lib/etcd-restore /var/lib/etcd

# 4. Start services
sudo systemctl start kubelet

Monitoring and Alerting:

# Prometheus alerts for HA cluster
groups:
- name: kubernetes-ha
  rules:
  - alert: APIServerDown
    expr: up{job="kubernetes-apiservers"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Kubernetes API server is down"
  
  - alert: etcdNoLeader
    expr: etcd_server_has_leader{job="kubernetes-etcd"} == 0
    for: 10s
    labels:
      severity: critical
    annotations:
      summary: "etcd cluster has no leader"
      
  - alert: etcdHighNumberOfFailedGRPCRequests
    expr: sum(rate(grpc_server_handled_total{job="kubernetes-etcd",grpc_code!="OK"}[5m])) / sum(rate(grpc_server_handled_total{job="kubernetes-etcd"}[5m])) > 0.01
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "etcd cluster has high number of failed gRPC requests"

HA in Exams & Real-World DevOps

Understanding Kubernetes High Availability is crucial not just for production environments, but also for advancing your career through certifications and real-world problem-solving.

Kubernetes Certification Relevance

CKA (Certified Kubernetes Administrator) Topics:

Cluster installation and configuration (25% of exam)
Managing highly available Kubernetes clusters
Understanding etcd backup and restore procedures
Troubleshooting cluster components

CKS (Certified Kubernetes Security Specialist) Areas:

Securing cluster communications with TLS
Implementing RBAC in multi-master environments
etcd encryption configuration
Network security policies for control plane

Sample CKA HA Question: “Your 3-node Kubernetes cluster is experiencing issues with the control plane. One of the master nodes has failed, but the cluster is still functional. Describe the steps to replace the failed master node while maintaining high availability.”

Answer Approach:

# 1. Identify the failed node
kubectl get nodes
kubectl describe node master-2  # Shows NotReady status

# 2. Remove failed node from cluster
kubectl delete node master-2

# 3. Remove from etcd cluster
kubectl exec -n kube-system etcd-master-1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member remove 8e9e05c52164694d

# 4. Provision new node and join to cluster
kubeadm join 10.0.1.100:6443 --token token \
  --discovery-token-ca-cert-hash sha256:hash \
  --control-plane --certificate-key key

Real-World Case Study: E-commerce Platform Outage Prevention

Scenario: A major e-commerce company running Black Friday sales on Kubernetes.

Challenge: Single master node cluster couldn’t handle the traffic spike and went down, causing $2M in lost revenue during peak shopping hours.

HA Solution Implemented:

Architecture Changes:

Migrated from single master to 3-master HA setup
Implemented external etcd cluster with 5 nodes
Added multiple availability zones
Configured automated scaling for worker nodes

Technical Implementation:

# Production deployment with anti-affinity
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ecommerce-frontend
spec:
  replicas: 12
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 3
      maxUnavailable: 2
  selector:
    matchLabels:
      app: ecommerce-frontend
  template:
    metadata:
      labels:
        app: ecommerce-frontend
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - ecommerce-frontend
            topologyKey: kubernetes.io/hostname
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/zone
                operator: In
                values:
                - us-west-2a
                - us-west-2b
                - us-west-2c
      containers:
      - name: frontend
        image: ecommerce/frontend:v2.1
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"

Monitoring Setup:

# SLI/SLO monitoring for HA effectiveness
apiVersion: v1
kind: ConfigMap
metadata:
  name: slo-config
data:
  config.yaml: |
    slis:
      - name: api_availability
        description: "API server availability"
        query: 'avg_over_time(up{job="kubernetes-apiservers"}[5m])'
        target: 0.999  # 99.9% uptime SLO
      
      - name: etcd_availability
        description: "etcd cluster availability" 
        query: 'min(etcd_server_has_leader)'
        target: 1.0  # 100% leader availability
        
      - name: pod_startup_time
        description: "Time for pods to become ready"
        query: 'histogram_quantile(0.95, kube_pod_start_time_seconds)'
        target: 30  # 95th percentile under 30 seconds

Results After HA Implementation:

Zero control plane downtime during next Black Friday (20x traffic increase)
Recovery time reduced from 45 minutes to under 30 seconds for node failures
Cost optimization through better resource utilization and spot instances
Monitoring improvements with 95% reduction in false positive alerts

Key Lessons Learned:

Don’t wait for failure – Implement HA before you need it
Test failure scenarios – Regularly conduct chaos engineering exercises
Monitor everything – Use comprehensive metrics and alerting
Document procedures – Ensure team can respond quickly to incidents
Automate recovery – Reduce manual intervention during failures

Business Impact of HA

Cost-Benefit Analysis:

Without HA:

Single master failure = 45 minutes downtime
Peak traffic outage cost = $2M/hour
Engineer overtime for manual recovery = $5K/incident
Customer trust impact = Immeasurable

With HA:

Infrastructure cost increase = 200% (3 masters vs 1)
Operational complexity increase = 150%
Downtime reduction = 99%
ROI = 400% after first major outage prevention

Career Development Benefits:

For DevOps Engineers:

Deep understanding of distributed systems concepts
Experience with production-grade architecture decisions
Troubleshooting skills for complex failure scenarios
Leadership opportunities in incident response

For Organizations:

Reduced operational risk and improved SLAs
Better resource utilization and cost optimization
Improved team confidence in production deployments
Foundation for advanced features like multi-cluster, disaster recovery

Frequently Asked Questions (FAQ)

What is Kubernetes High Availability?

Kubernetes High Availability ensures that control plane and worker nodes are deployed redundantly so the cluster keeps running even if some nodes fail. This eliminates single points of failure and provides continuous cluster operations during hardware or software failures.

How many control plane nodes are recommended for HA?

At least three control plane nodes are recommended for HA clusters to achieve quorum and avoid downtime. This follows the etcd requirement for odd numbers (3, 5, 7) to maintain consensus and prevents split-brain scenarios.

What is the difference between stacked and external etcd topologies?

In stacked topology, etcd runs alongside control plane nodes on the same machines, making it simpler and more cost-effective. In external topology, etcd runs on a separate dedicated cluster for better isolation, performance, and scalability but requires more infrastructure.

How is API server HA achieved?

Multiple API servers are placed behind a load balancer so traffic continues even if one server fails. The load balancer distributes requests across healthy API server instances and automatically removes failed servers from rotation using health checks.

What happens when an etcd node fails in an HA cluster?

When one etcd node fails in a 3-node cluster, the remaining 2 nodes maintain quorum and the cluster continues operating normally. The failed node can be replaced without downtime. However, if 2 nodes fail simultaneously, the cluster loses quorum and becomes read-only until quorum is restored.

Conclusion & Next Steps

Kubernetes High Availability architecture represents the foundation of production-ready container orchestration. Throughout this comprehensive guide, we’ve explored how eliminating single points of failure through redundant control planes, clustered etcd deployments, and proper load balancing creates resilient systems that can withstand component failures while maintaining service availability.

The key takeaways from our deep dive include:

Multi-master setups with at least three control plane nodes provide the redundancy needed for production workloads, while etcd clustering with odd-numbered nodes ensures data consistency and quorum-based decision making. External load balancers serve as the critical entry point that enables seamless failover between API server instances.

Choosing between stacked and external etcd topologies depends on your specific requirements—stacked offers simplicity and cost savings, while external provides better isolation and scalability. Cloud-managed services like EKS, GKE, and AKS deliver HA capabilities out of the box, while self-managed clusters offer complete control at the cost of operational complexity.

Security remains paramount in HA environments, requiring TLS encryption for all component communications, proper RBAC implementation, and etcd encryption at rest. Regular certificate rotation and comprehensive monitoring ensure your cluster remains both available and secure.

The hands-on kubeadm lab demonstrated that implementing HA isn’t just theoretical—with proper planning and execution, you can build production-grade clusters that handle real-world failures gracefully. Understanding common pitfalls like split-brain scenarios, certificate expiration, and load balancer misconfiguration prepares you to troubleshoot effectively when issues arise.

Your journey toward Kubernetes mastery continues here. HA concepts form the foundation, but production environments require additional layers like monitoring, logging, backup strategies, and disaster recovery planning.

Ready to take your Kubernetes expertise to the next level?

Next recommended reading: Kubernetes Control Plane Deep Dive Guide – Master the inner workings of API servers, controllers, and schedulers.

Downloadable HA Cheat Sheet

Quick Reference: Kubernetes HA Architecture

🔧 HA Topology Decision Matrix

Requirement	Stacked etcd	External etcd
Small clusters (<50 nodes)	✅ Recommended	⚠️ Overkill
Large clusters (>100 nodes)	❌ Not recommended	✅ Recommended
Cost optimization	✅ Lower cost	❌ Higher cost
Performance isolation	❌ Shared resources	✅ Dedicated
Operational simplicity	✅ Easier	❌ More complex
Maximum availability	⚠️ Good	✅ Excellent

🔍 Essential Commands Cheat Sheet

# Check cluster health
kubectl get nodes
kubectl get pods -n kube-system
kubectl get componentstatuses

# etcd cluster status
kubectl exec -n kube-system etcd-master1 -- etcdctl \
  --endpoints=localhost:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  member list

# Certificate expiration check
sudo kubeadm certs check-expiration

# API server health endpoints
curl -k https://API_SERVER:6443/livez
curl -k https://API_SERVER:6443/readyz
curl -k https://API_SERVER:6443/healthz

⚡ Emergency Recovery Procedures

# etcd backup
etcdctl snapshot save backup.db

# etcd restore
etcdctl snapshot restore backup.db \
  --data-dir /var/lib/etcd-restore \
  --initial-cluster master1=https://10.0.1.10:2380,master2=https://10.0.1.11:2380,master3=https://10.0.1.12:2380

# Certificate renewal
sudo kubeadm certs renew all
sudo systemctl restart kubelet

# Reset and rejoin node
sudo kubeadm reset
kubeadm join LOAD_BALANCER:6443 --token TOKEN --control-plane...

📊 Monitoring Metrics

up{job="kubernetes-apiservers"} – API server availability
etcd_server_has_leader – etcd leader status
kube_node_status_condition{condition="Ready"} – Node health
kube_pod_container_status_ready – Pod readiness

🎯 Production Checklist

✅ Odd number of etcd nodes (3, 5, 7)
✅ Load balancer with health checks
✅ TLS certificates for all components
✅ Regular etcd backups automated
✅ Multi-zone node distribution
✅ Resource requests/limits on critical pods
✅ Monitoring and alerting configured
✅ Runbook for common failure scenarios
✅ Disaster recovery procedures tested
✅ Regular chaos engineering exercises

Save this cheat sheet for quick reference during production deployments and troubleshooting sessions.

Table of Contents