Time to fix: 2 minutes
Difficulty: Easy
Occurs when: Your container starts, crashes immediately, and Kubernetes keeps trying to restart it

CrashLoopBackOff Kubernetes Error: How to Fix It Quickly (30 seconds)

# See why it's crashing (this solves 80% of cases):
kubectl logs <pod-name> -n <namespace> --previous
# Note: -n <namespace> is optional if using default namespace

# If you see the error, fix it directly:
kubectl edit deployment/<deployment-name> -n <namespace>
# Look for: wrong image, missing env vars, bad commands, aggressive liveness probes

What This Error Means

CrashLoopBackOff means your container starts successfully but exits/crashes immediately. Kubernetes tries to restart it, but it keeps crashing, so K8s implements an exponential back-off delay between restart attempts (10s, 20s, 40s, 80s… up to 5 minutes). Your application is essentially in a crash-restart-crash loop.

⚠️ Important: This error can be caused by either application failures OR misconfigured liveness probes that kill healthy containers.

To get a broader perspective on how Kubernetes handles pod lifecycle and status codes, check out the Ultimate Kubernetes Tutorial for Beginners, which walks you through pod states and lifecycle concepts.

For more details on how Kubernetes handles pod states, see the official Pod Lifecycle documentation.

Kubernetes CrashLoopBackOff Fix Proven & Complete Guide

The 4 Most Common Causes

1. Application Error or Missing Configuration (60% of cases)

Check:

kubectl logs <pod-name> -n <namespace> --previous
# Or use short form:
kubectl logs <pod-name> -n <namespace> -p

# Look for errors like:
# - "Error: Config file not found"
# - "Cannot connect to database"
# - "Missing required environment variable: API_KEY"

Fix:

# For missing ConfigMap (with safety check):
kubectl create configmap app-config --from-file=config.yaml \
  --dry-run=client -o yaml | kubectl apply -f -

# For missing Secret (safe creation):
kubectl create secret generic app-secret \
  --from-literal=api-key=your-value \
  --dry-run=client -o yaml | kubectl apply -f -

# For missing environment variable:
kubectl set env deployment/<deployment-name> -n <namespace> API_KEY=your-value

2. Wrong Command or Entrypoint (20% of cases)

Check:

kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Command"
# Verify the command actually exists in your container

Fix:

# Update the deployment with correct command:
kubectl edit deployment/<deployment-name> -n <namespace>
# Fix the command/args section:
# command: ["/bin/sh"]
# args: ["-c", "your-correct-command"]

3. Aggressive Liveness Probe (15% of cases)

Check:

kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -A10 "Liveness"
# Look for "Liveness probe failed" messages

Fix:

# Edit deployment to adjust probe timing:
kubectl edit deployment/&lt;deployment-name> -n &lt;namespace>
# Increase initialDelaySeconds and failureThreshold

⚠️ Caution: Overly aggressive liveness probes can CAUSE CrashLoopBackOff. Always use failureThreshold: 3 and adequate initialDelaySeconds.

Kubernetes also provides a troubleshooting guide for pods that covers other failure scenarios you might encounter.

4. Insufficient Resources / OOM (5% of cases)

Check:

# Check for OOM events specifically:
kubectl get events --field-selector reason=OOMKilled \
  --sort-by=.metadata.creationTimestamp -n &lt;namespace>

# Or check pod events:
kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -A10 "Events"

Fix:

# Increase memory and CPU limits:
kubectl set resources deployment/&lt;deployment-name> -n &lt;namespace> \
  --requests=memory=256Mi,cpu=250m \
  --limits=memory=512Mi,cpu=500m

# Note: 256Mi = 268,435,456 bytes; 250m = 0.25 CPU cores

Real Production Debugging Example

Here’s an actual debugging session from a production incident:

The Problem:

$ kubectl get pods -n production
NAME                           READY   STATUS             RESTARTS   AGE
payment-service-7d4b4b59c-x8kmh   0/1     CrashLoopBackOff   8          15m

Step 1: Check which node and get basic info:

$ kubectl get pod payment-service-7d4b4b59c-x8kmh -n production -o wide
NAME                           READY   STATUS             NODE         
payment-service-7d4b4b59c-x8kmh   0/1     CrashLoopBackOff   node-us-east-1a

# This shows it's on a specific node (useful for node-specific issues)

Step 2: Check the logs:

$ kubectl logs payment-service-7d4b4b59c-x8kmh -n production -p
Error: Missing required environment variable: STRIPE_API_KEY
    at validateEnv (/app/src/config.js:15:11)
    at Object.&lt;anonymous> (/app/src/index.js:3:1)
npm ERR! code ELIFECYCLE
npm ERR! errno 1

Step 3: Verify the secret exists:

$ kubectl get secrets -n production | grep stripe
# No output - secret is missing!

Step 4: Create the missing secret safely:

$ kubectl create secret generic stripe-secret \
  --from-literal=STRIPE_API_KEY=sk_live_xxxxx \
  --dry-run=client -o yaml -n production | kubectl apply -f -
secret/stripe-secret created

Step 5: Update deployment to use the secret:

$ kubectl edit deployment/payment-service -n production
# Added:
env:
- name: STRIPE_API_KEY
  valueFrom:
    secretKeyRef:
      name: stripe-secret
      key: STRIPE_API_KEY

Step 6: Verify the fix:

$ kubectl rollout status deployment/payment-service -n production
deployment "payment-service" successfully rolled out

$ kubectl get pods -n production
NAME                           READY   STATUS    RESTARTS   AGE
payment-service-8c5f5f6b7-n9kmh   1/1     Running   0          45s

Complete Step-by-Step Solution

1. Identify the crashing pod with full details:

kubectl get pods --all-namespaces -o wide | grep CrashLoopBackOff
# Shows pod, namespace, node, and IP

2. Check the logs from the previous crash:

kubectl logs &lt;pod-name> -n &lt;namespace> -p
# Use -p or --previous to see crash reason

3. Get detailed pod information:

kubectl describe pod &lt;pod-name> -n &lt;namespace>
# Pay attention to:
# - Image: (is it correct and not using 'latest'?)
# - Command: (does it exist?)
# - Environment: (are all variables set?)
# - Mounts: (are volumes mounted correctly?)
# - Liveness/Readiness: (are probes too aggressive?)
# - Events: (what's the exact failure?)

4. Modern debugging with ephemeral containers (K8s 1.23+):

# Debug with ephemeral container:
kubectl debug &lt;pod-name> -n &lt;namespace> -it --image=busybox:latest

# Or copy the pod with different command:
kubectl debug &lt;pod-name> -n &lt;namespace> -it --copy-to=debug-pod --container=app -- sh

5. Apply the appropriate fix:

For missing ConfigMap/Secret:

# List existing configs:
kubectl get configmaps,secrets -n &lt;namespace>

# Create if missing (with safety):
kubectl create configmap app-config --from-literal=key=value \
  --dry-run=client -o yaml -n &lt;namespace> | kubectl apply -f -

For wrong image or tag:

# Check if image is pullable:
kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -i "pull"

# Update image (never use 'latest' in production):
kubectl set image deployment/&lt;deployment-name> -n &lt;namespace> \
  &lt;container-name>=myregistry/my-app:v1.2.3

For permission issues:

# Add security context (ensure UID exists in container):
kubectl patch deployment &lt;deployment-name> -n &lt;namespace> --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/securityContext", 
       "value": {"runAsUser": 1000, "runAsGroup": 1000, "fsGroup": 1000}}]'

# ⚠️ Warning: runAsUser must exist in container's /etc/passwd

6. Verify the fix worked:

# Watch pod status in real-time:
kubectl get pod &lt;pod-name> -n &lt;namespace> -w

# Check rollout status:
kubectl rollout status deployment/&lt;deployment-name> -n &lt;namespace>

# Verify logs are clean:
kubectl logs &lt;pod-name> -n &lt;namespace> --follow

Still Not Working?

Advanced debugging steps:

# Check resource quotas:
kubectl describe resourcequota -n &lt;namespace>
kubectl describe limitrange -n &lt;namespace>

# Review deployment history:
kubectl rollout history deployment/&lt;deployment-name> -n &lt;namespace>

# Check recent events sorted by time:
kubectl get events --sort-by='.lastTimestamp' -n &lt;namespace>

# Debug with a sleep container:
kubectl run debug-pod --image=&lt;your-image> -n &lt;namespace> \
  --command -- sleep 3600
kubectl exec -it debug-pod -n &lt;namespace> -- /bin/sh

Prevent This Error

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  template:
    spec:
      containers:
      - name: my-app
        image: myregistry/my-app:v1.2.3  # NEVER use :latest in production
        
        # Properly configured health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 60  # Give app time to start
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3      # Don't be too aggressive
          successThreshold: 1
        
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1
        
        # Resource limits with proper units
        resources:
          requests:
            memory: "256Mi"  # 268,435,456 bytes
            cpu: "250m"      # 0.25 CPU cores
          limits:
            memory: "512Mi"  # 536,870,912 bytes
            cpu: "500m"      # 0.5 CPU cores
        
        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
        
        # Proper logging to stdout/stderr
        command: ["/bin/sh"]
        args: ["-c", "exec your-app >> /proc/1/fd/1 2>> /proc/1/fd/2"]
        
        # Security context (ensure user exists in container)
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000    # Must exist in container's /etc/passwd
          runAsGroup: 1000
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false

FAQ Section

What is CrashLoopBackOff in Kubernetes?

CrashLoopBackOff is a status indicating that a pod’s container is repeatedly crashing and Kubernetes is waiting (with exponential backoff) before trying to restart it again. The container exits immediately or shortly after starting.

How do I fix CrashLoopBackOff quickly?

First, check the logs with kubectl logs <pod-name> -p to see why it crashed. 80% of the time it’s missing configuration, wrong commands, or aggressive liveness probes. Fix the specific issue and the pod will recover.

Why does Kubernetes keep restarting my pod?

Kubernetes restarts pods when containers exit with non-zero status codes. Common reasons include application crashes, missing dependencies, configuration errors, OOM kills, or failed liveness probes.

What’s the difference between CrashLoopBackOff and Error status?

Error means the container exited with an error code. CrashLoopBackOff means it’s repeatedly failing and Kubernetes is backing off before retry attempts.

How long does CrashLoopBackOff last?

The backoff delay increases exponentially: 10s, 20s, 40s, 80s, up to 5 minutes maximum. After each successful run of at least 10 minutes, the delay resets.

Can liveness probes cause CrashLoopBackOff?

Yes! Aggressive liveness probes (low initialDelaySeconds or failureThreshold) can kill healthy containers during startup, causing CrashLoopBackOff.

Common Error Messages and Solutions

Error in Logs	Root Cause	Solution
`Cannot connect to database`	Network/DNS/Credentials	Check service discovery, connection string
`Permission denied`	Security context	Add proper runAsUser/fsGroup
`No such file or directory`	Wrong workdir/missing files	Verify Dockerfile WORKDIR and image contents
`bind: address already in use`	Port conflict	Check for duplicate services or wrong port
`Segmentation fault`	Memory corruption/limit	Increase memory limits, fix application bug
`Module not found`	Missing dependencies	Rebuild image with all requirements
`signal: killed`	OOM or manual termination	Check memory limits and events
`exec format error`	Wrong architecture	Build for correct platform (amd64/arm64)

Exit Codes Reference

Exit Code	Meaning	Common Fix
0	Success (check app logic)	Application exiting when it shouldn’t
1	General errors	Check application logs
125	Docker run failed	Invalid container configuration
126	Command not executable	Add execute permissions
127	Command not found	Fix command path or install missing binary
137	SIGKILL (OOM)	Increase memory limits
139	Segmentation fault	Debug application or increase memory
143	SIGTERM (graceful shutdown)	Normal during updates

💡 Pro Tips:

Always use kubectl logs <pod> -p for crash reasons (current logs just show restart)
Set imagePullPolicy: IfNotPresent for local testing to avoid registry issues
Use kubectl debug for modern debugging (K8s 1.23+)
Keep probe periodSeconds less than terminationGracePeriodSeconds

🔥 Quick Win: Check liveness probe configuration first – it’s often the hidden cause!

⚡ Debug Faster: kubectl get events --sort-by='.lastTimestamp' -n <namespace> shows all recent events in order

🚨 Related Errors:

ImagePullBackOff – Can’t pull container image
OOMKilled – Container exceeded memory limit
CreateContainerConfigError – Missing ConfigMap/Secret
Error – Container exited with error
Init:CrashLoopBackOff – Init container crashing

📧 Get our K8s Error Fix Cheatsheet: All commands in one PDF → Download Free

Last updated: January 2025 | Kubernetes versions: 1.26-1.29 tested | Cloud providers: EKS, GKE, AKS compatible

CrashLoopBackOff Kubernetes Error: How to Fix It Quickly (30 seconds)

What This Error Means

The 4 Most Common Causes

1. Application Error or Missing Configuration (60% of cases)

2. Wrong Command or Entrypoint (20% of cases)

3. Aggressive Liveness Probe (15% of cases)

4. Insufficient Resources / OOM (5% of cases)

Real Production Debugging Example

The Problem:

Step 1: Check which node and get basic info:

Step 2: Check the logs:

Step 3: Verify the secret exists:

Step 4: Create the missing secret safely:

Step 5: Update deployment to use the secret:

Step 6: Verify the fix:

Complete Step-by-Step Solution

1. Identify the crashing pod with full details:

2. Check the logs from the previous crash:

3. Get detailed pod information:

4. Modern debugging with ephemeral containers (K8s 1.23+):

5. Apply the appropriate fix:

6. Verify the fix worked:

Still Not Working?

Advanced debugging steps:

Prevent This Error

FAQ Section

What is CrashLoopBackOff in Kubernetes?

How do I fix CrashLoopBackOff quickly?

Why does Kubernetes keep restarting my pod?

What’s the difference between CrashLoopBackOff and Error status?

How long does CrashLoopBackOff last?

Can liveness probes cause CrashLoopBackOff?

Common Error Messages and Solutions

Exit Codes Reference

Share this:

Like this:

Related

Similar Posts

6 Comments

Leave a ReplyCancel reply

Discover more from DevOps Tooling