|

Kubernetes CrashLoopBackOff Fix: Proven & Complete Guide 2025

Time to fix: 2 minutes
Difficulty: Easy
Occurs when: Your container starts, crashes immediately, and Kubernetes keeps trying to restart it

CrashLoopBackOff Kubernetes Error: How to Fix It Quickly (30 seconds)

# See why it's crashing (this solves 80% of cases):
kubectl logs <pod-name> -n <namespace> --previous
# Note: -n <namespace> is optional if using default namespace

# If you see the error, fix it directly:
kubectl edit deployment/<deployment-name> -n <namespace>
# Look for: wrong image, missing env vars, bad commands, aggressive liveness probes

What This Error Means

CrashLoopBackOff means your container starts successfully but exits/crashes immediately. Kubernetes tries to restart it, but it keeps crashing, so K8s implements an exponential back-off delay between restart attempts (10s, 20s, 40s, 80s… up to 5 minutes). Your application is essentially in a crash-restart-crash loop.

⚠️ Important: This error can be caused by either application failures OR misconfigured liveness probes that kill healthy containers.

To get a broader perspective on how Kubernetes handles pod lifecycle and status codes, check out the Ultimate Kubernetes Tutorial for Beginners, which walks you through pod states and lifecycle concepts.

For more details on how Kubernetes handles pod states, see the official Pod Lifecycle documentation.

Kubernetes CrashLoopBackOff Fix Proven & Complete Guide
Kubernetes CrashLoopBackOff Fix Proven & Complete Guide

The 4 Most Common Causes

1. Application Error or Missing Configuration (60% of cases)

Check:

kubectl logs <pod-name> -n <namespace> --previous
# Or use short form:
kubectl logs <pod-name> -n <namespace> -p

# Look for errors like:
# - "Error: Config file not found"
# - "Cannot connect to database"
# - "Missing required environment variable: API_KEY"

Fix:

# For missing ConfigMap (with safety check):
kubectl create configmap app-config --from-file=config.yaml \
  --dry-run=client -o yaml | kubectl apply -f -

# For missing Secret (safe creation):
kubectl create secret generic app-secret \
  --from-literal=api-key=your-value \
  --dry-run=client -o yaml | kubectl apply -f -

# For missing environment variable:
kubectl set env deployment/<deployment-name> -n <namespace> API_KEY=your-value

2. Wrong Command or Entrypoint (20% of cases)

Check:

kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Command"
# Verify the command actually exists in your container

Fix:

# Update the deployment with correct command:
kubectl edit deployment/<deployment-name> -n <namespace>
# Fix the command/args section:
# command: ["/bin/sh"]
# args: ["-c", "your-correct-command"]

3. Aggressive Liveness Probe (15% of cases)

Check:

kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -A10 "Liveness"
# Look for "Liveness probe failed" messages

Fix:

# Edit deployment to adjust probe timing:
kubectl edit deployment/&lt;deployment-name> -n &lt;namespace>
# Increase initialDelaySeconds and failureThreshold

⚠️ Caution: Overly aggressive liveness probes can CAUSE CrashLoopBackOff. Always use failureThreshold: 3 and adequate initialDelaySeconds.

Kubernetes also provides a troubleshooting guide for pods that covers other failure scenarios you might encounter.

4. Insufficient Resources / OOM (5% of cases)

Check:

# Check for OOM events specifically:
kubectl get events --field-selector reason=OOMKilled \
  --sort-by=.metadata.creationTimestamp -n &lt;namespace>

# Or check pod events:
kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -A10 "Events"

Fix:

# Increase memory and CPU limits:
kubectl set resources deployment/&lt;deployment-name> -n &lt;namespace> \
  --requests=memory=256Mi,cpu=250m \
  --limits=memory=512Mi,cpu=500m

# Note: 256Mi = 268,435,456 bytes; 250m = 0.25 CPU cores

Real Production Debugging Example

Here’s an actual debugging session from a production incident:

The Problem:

$ kubectl get pods -n production
NAME                           READY   STATUS             RESTARTS   AGE
payment-service-7d4b4b59c-x8kmh   0/1     CrashLoopBackOff   8          15m

Step 1: Check which node and get basic info:

$ kubectl get pod payment-service-7d4b4b59c-x8kmh -n production -o wide
NAME                           READY   STATUS             NODE         
payment-service-7d4b4b59c-x8kmh   0/1     CrashLoopBackOff   node-us-east-1a

# This shows it's on a specific node (useful for node-specific issues)

Step 2: Check the logs:

$ kubectl logs payment-service-7d4b4b59c-x8kmh -n production -p
Error: Missing required environment variable: STRIPE_API_KEY
    at validateEnv (/app/src/config.js:15:11)
    at Object.&lt;anonymous> (/app/src/index.js:3:1)
npm ERR! code ELIFECYCLE
npm ERR! errno 1

Step 3: Verify the secret exists:

$ kubectl get secrets -n production | grep stripe
# No output - secret is missing!

Step 4: Create the missing secret safely:

$ kubectl create secret generic stripe-secret \
  --from-literal=STRIPE_API_KEY=sk_live_xxxxx \
  --dry-run=client -o yaml -n production | kubectl apply -f -
secret/stripe-secret created

Step 5: Update deployment to use the secret:

$ kubectl edit deployment/payment-service -n production
# Added:
env:
- name: STRIPE_API_KEY
  valueFrom:
    secretKeyRef:
      name: stripe-secret
      key: STRIPE_API_KEY

Step 6: Verify the fix:

$ kubectl rollout status deployment/payment-service -n production
deployment "payment-service" successfully rolled out

$ kubectl get pods -n production
NAME                           READY   STATUS    RESTARTS   AGE
payment-service-8c5f5f6b7-n9kmh   1/1     Running   0          45s

Complete Step-by-Step Solution

1. Identify the crashing pod with full details:

kubectl get pods --all-namespaces -o wide | grep CrashLoopBackOff
# Shows pod, namespace, node, and IP

2. Check the logs from the previous crash:

kubectl logs &lt;pod-name> -n &lt;namespace> -p
# Use -p or --previous to see crash reason

3. Get detailed pod information:

kubectl describe pod &lt;pod-name> -n &lt;namespace>
# Pay attention to:
# - Image: (is it correct and not using 'latest'?)
# - Command: (does it exist?)
# - Environment: (are all variables set?)
# - Mounts: (are volumes mounted correctly?)
# - Liveness/Readiness: (are probes too aggressive?)
# - Events: (what's the exact failure?)

4. Modern debugging with ephemeral containers (K8s 1.23+):

# Debug with ephemeral container:
kubectl debug &lt;pod-name> -n &lt;namespace> -it --image=busybox:latest

# Or copy the pod with different command:
kubectl debug &lt;pod-name> -n &lt;namespace> -it --copy-to=debug-pod --container=app -- sh

5. Apply the appropriate fix:

For missing ConfigMap/Secret:

# List existing configs:
kubectl get configmaps,secrets -n &lt;namespace>

# Create if missing (with safety):
kubectl create configmap app-config --from-literal=key=value \
  --dry-run=client -o yaml -n &lt;namespace> | kubectl apply -f -

For wrong image or tag:

# Check if image is pullable:
kubectl describe pod &lt;pod-name> -n &lt;namespace> | grep -i "pull"

# Update image (never use 'latest' in production):
kubectl set image deployment/&lt;deployment-name> -n &lt;namespace> \
  &lt;container-name>=myregistry/my-app:v1.2.3

For permission issues:

# Add security context (ensure UID exists in container):
kubectl patch deployment &lt;deployment-name> -n &lt;namespace> --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/securityContext", 
       "value": {"runAsUser": 1000, "runAsGroup": 1000, "fsGroup": 1000}}]'

# ⚠️ Warning: runAsUser must exist in container's /etc/passwd

6. Verify the fix worked:

# Watch pod status in real-time:
kubectl get pod &lt;pod-name> -n &lt;namespace> -w

# Check rollout status:
kubectl rollout status deployment/&lt;deployment-name> -n &lt;namespace>

# Verify logs are clean:
kubectl logs &lt;pod-name> -n &lt;namespace> --follow

Still Not Working?

Advanced debugging steps:

# Check resource quotas:
kubectl describe resourcequota -n &lt;namespace>
kubectl describe limitrange -n &lt;namespace>

# Review deployment history:
kubectl rollout history deployment/&lt;deployment-name> -n &lt;namespace>

# Check recent events sorted by time:
kubectl get events --sort-by='.lastTimestamp' -n &lt;namespace>

# Debug with a sleep container:
kubectl run debug-pod --image=&lt;your-image> -n &lt;namespace> \
  --command -- sleep 3600
kubectl exec -it debug-pod -n &lt;namespace> -- /bin/sh

Prevent This Error

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  template:
    spec:
      containers:
      - name: my-app
        image: myregistry/my-app:v1.2.3  # NEVER use :latest in production
        
        # Properly configured health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 60  # Give app time to start
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3      # Don't be too aggressive
          successThreshold: 1
        
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1
        
        # Resource limits with proper units
        resources:
          requests:
            memory: "256Mi"  # 268,435,456 bytes
            cpu: "250m"      # 0.25 CPU cores
          limits:
            memory: "512Mi"  # 536,870,912 bytes
            cpu: "500m"      # 0.5 CPU cores
        
        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
        
        # Proper logging to stdout/stderr
        command: ["/bin/sh"]
        args: ["-c", "exec your-app >> /proc/1/fd/1 2>> /proc/1/fd/2"]
        
        # Security context (ensure user exists in container)
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000    # Must exist in container's /etc/passwd
          runAsGroup: 1000
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false

FAQ Section

What is CrashLoopBackOff in Kubernetes?

CrashLoopBackOff is a status indicating that a pod’s container is repeatedly crashing and Kubernetes is waiting (with exponential backoff) before trying to restart it again. The container exits immediately or shortly after starting.

How do I fix CrashLoopBackOff quickly?

First, check the logs with kubectl logs <pod-name> -p to see why it crashed. 80% of the time it’s missing configuration, wrong commands, or aggressive liveness probes. Fix the specific issue and the pod will recover.

Why does Kubernetes keep restarting my pod?

Kubernetes restarts pods when containers exit with non-zero status codes. Common reasons include application crashes, missing dependencies, configuration errors, OOM kills, or failed liveness probes.

What’s the difference between CrashLoopBackOff and Error status?

Error means the container exited with an error code. CrashLoopBackOff means it’s repeatedly failing and Kubernetes is backing off before retry attempts.

How long does CrashLoopBackOff last?

The backoff delay increases exponentially: 10s, 20s, 40s, 80s, up to 5 minutes maximum. After each successful run of at least 10 minutes, the delay resets.

Can liveness probes cause CrashLoopBackOff?

Yes! Aggressive liveness probes (low initialDelaySeconds or failureThreshold) can kill healthy containers during startup, causing CrashLoopBackOff.

Common Error Messages and Solutions

Error in LogsRoot CauseSolution
Cannot connect to databaseNetwork/DNS/CredentialsCheck service discovery, connection string
Permission deniedSecurity contextAdd proper runAsUser/fsGroup
No such file or directoryWrong workdir/missing filesVerify Dockerfile WORKDIR and image contents
bind: address already in usePort conflictCheck for duplicate services or wrong port
Segmentation faultMemory corruption/limitIncrease memory limits, fix application bug
Module not foundMissing dependenciesRebuild image with all requirements
signal: killedOOM or manual terminationCheck memory limits and events
exec format errorWrong architectureBuild for correct platform (amd64/arm64)

Exit Codes Reference

Exit CodeMeaningCommon Fix
0Success (check app logic)Application exiting when it shouldn’t
1General errorsCheck application logs
125Docker run failedInvalid container configuration
126Command not executableAdd execute permissions
127Command not foundFix command path or install missing binary
137SIGKILL (OOM)Increase memory limits
139Segmentation faultDebug application or increase memory
143SIGTERM (graceful shutdown)Normal during updates

💡 Pro Tips:

  • Always use kubectl logs <pod> -p for crash reasons (current logs just show restart)
  • Set imagePullPolicy: IfNotPresent for local testing to avoid registry issues
  • Use kubectl debug for modern debugging (K8s 1.23+)
  • Keep probe periodSeconds less than terminationGracePeriodSeconds

🔥 Quick Win: Check liveness probe configuration first – it’s often the hidden cause!

Debug Faster: kubectl get events --sort-by='.lastTimestamp' -n <namespace> shows all recent events in order


🚨 Related Errors:

📧 Get our K8s Error Fix Cheatsheet: All commands in one PDF → Download Free


Last updated: January 2025 | Kubernetes versions: 1.26-1.29 tested | Cloud providers: EKS, GKE, AKS compatible

Similar Posts

6 Comments

Leave a Reply