Kubernetes CrashLoopBackOff Fix: Proven & Complete Guide 2025
Time to fix: 2 minutes
Difficulty: Easy
Occurs when: Your container starts, crashes immediately, and Kubernetes keeps trying to restart it
CrashLoopBackOff Kubernetes Error: How to Fix It Quickly (30 seconds)
# See why it's crashing (this solves 80% of cases):
kubectl logs <pod-name> -n <namespace> --previous
# Note: -n <namespace> is optional if using default namespace
# If you see the error, fix it directly:
kubectl edit deployment/<deployment-name> -n <namespace>
# Look for: wrong image, missing env vars, bad commands, aggressive liveness probes
What This Error Means
CrashLoopBackOff means your container starts successfully but exits/crashes immediately. Kubernetes tries to restart it, but it keeps crashing, so K8s implements an exponential back-off delay between restart attempts (10s, 20s, 40s, 80s… up to 5 minutes). Your application is essentially in a crash-restart-crash loop.
⚠️ Important: This error can be caused by either application failures OR misconfigured liveness probes that kill healthy containers.
To get a broader perspective on how Kubernetes handles pod lifecycle and status codes, check out the Ultimate Kubernetes Tutorial for Beginners, which walks you through pod states and lifecycle concepts.
For more details on how Kubernetes handles pod states, see the official Pod Lifecycle documentation.

The 4 Most Common Causes
1. Application Error or Missing Configuration (60% of cases)
Check:
kubectl logs <pod-name> -n <namespace> --previous
# Or use short form:
kubectl logs <pod-name> -n <namespace> -p
# Look for errors like:
# - "Error: Config file not found"
# - "Cannot connect to database"
# - "Missing required environment variable: API_KEY"
Fix:
# For missing ConfigMap (with safety check):
kubectl create configmap app-config --from-file=config.yaml \
--dry-run=client -o yaml | kubectl apply -f -
# For missing Secret (safe creation):
kubectl create secret generic app-secret \
--from-literal=api-key=your-value \
--dry-run=client -o yaml | kubectl apply -f -
# For missing environment variable:
kubectl set env deployment/<deployment-name> -n <namespace> API_KEY=your-value
2. Wrong Command or Entrypoint (20% of cases)
Check:
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Command"
# Verify the command actually exists in your container
Fix:
# Update the deployment with correct command:
kubectl edit deployment/<deployment-name> -n <namespace>
# Fix the command/args section:
# command: ["/bin/sh"]
# args: ["-c", "your-correct-command"]
3. Aggressive Liveness Probe (15% of cases)
Check:
kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Liveness"
# Look for "Liveness probe failed" messages
Fix:
# Edit deployment to adjust probe timing:
kubectl edit deployment/<deployment-name> -n <namespace>
# Increase initialDelaySeconds and failureThreshold
⚠️ Caution: Overly aggressive liveness probes can CAUSE CrashLoopBackOff. Always use failureThreshold: 3 and adequate initialDelaySeconds.
Kubernetes also provides a troubleshooting guide for pods that covers other failure scenarios you might encounter.
4. Insufficient Resources / OOM (5% of cases)
Check:
# Check for OOM events specifically:
kubectl get events --field-selector reason=OOMKilled \
--sort-by=.metadata.creationTimestamp -n <namespace>
# Or check pod events:
kubectl describe pod <pod-name> -n <namespace> | grep -A10 "Events"
Fix:
# Increase memory and CPU limits:
kubectl set resources deployment/<deployment-name> -n <namespace> \
--requests=memory=256Mi,cpu=250m \
--limits=memory=512Mi,cpu=500m
# Note: 256Mi = 268,435,456 bytes; 250m = 0.25 CPU cores
Real Production Debugging Example
Here’s an actual debugging session from a production incident:
The Problem:
$ kubectl get pods -n production
NAME READY STATUS RESTARTS AGE
payment-service-7d4b4b59c-x8kmh 0/1 CrashLoopBackOff 8 15m
Step 1: Check which node and get basic info:
$ kubectl get pod payment-service-7d4b4b59c-x8kmh -n production -o wide
NAME READY STATUS NODE
payment-service-7d4b4b59c-x8kmh 0/1 CrashLoopBackOff node-us-east-1a
# This shows it's on a specific node (useful for node-specific issues)
Step 2: Check the logs:
$ kubectl logs payment-service-7d4b4b59c-x8kmh -n production -p
Error: Missing required environment variable: STRIPE_API_KEY
at validateEnv (/app/src/config.js:15:11)
at Object.<anonymous> (/app/src/index.js:3:1)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
Step 3: Verify the secret exists:
$ kubectl get secrets -n production | grep stripe
# No output - secret is missing!
Step 4: Create the missing secret safely:
$ kubectl create secret generic stripe-secret \
--from-literal=STRIPE_API_KEY=sk_live_xxxxx \
--dry-run=client -o yaml -n production | kubectl apply -f -
secret/stripe-secret created
Step 5: Update deployment to use the secret:
$ kubectl edit deployment/payment-service -n production
# Added:
env:
- name: STRIPE_API_KEY
valueFrom:
secretKeyRef:
name: stripe-secret
key: STRIPE_API_KEY
Step 6: Verify the fix:
$ kubectl rollout status deployment/payment-service -n production
deployment "payment-service" successfully rolled out
$ kubectl get pods -n production
NAME READY STATUS RESTARTS AGE
payment-service-8c5f5f6b7-n9kmh 1/1 Running 0 45s
Complete Step-by-Step Solution
1. Identify the crashing pod with full details:
kubectl get pods --all-namespaces -o wide | grep CrashLoopBackOff
# Shows pod, namespace, node, and IP
2. Check the logs from the previous crash:
kubectl logs <pod-name> -n <namespace> -p
# Use -p or --previous to see crash reason
3. Get detailed pod information:
kubectl describe pod <pod-name> -n <namespace>
# Pay attention to:
# - Image: (is it correct and not using 'latest'?)
# - Command: (does it exist?)
# - Environment: (are all variables set?)
# - Mounts: (are volumes mounted correctly?)
# - Liveness/Readiness: (are probes too aggressive?)
# - Events: (what's the exact failure?)
4. Modern debugging with ephemeral containers (K8s 1.23+):
# Debug with ephemeral container:
kubectl debug <pod-name> -n <namespace> -it --image=busybox:latest
# Or copy the pod with different command:
kubectl debug <pod-name> -n <namespace> -it --copy-to=debug-pod --container=app -- sh
5. Apply the appropriate fix:
For missing ConfigMap/Secret:
# List existing configs:
kubectl get configmaps,secrets -n <namespace>
# Create if missing (with safety):
kubectl create configmap app-config --from-literal=key=value \
--dry-run=client -o yaml -n <namespace> | kubectl apply -f -
For wrong image or tag:
# Check if image is pullable:
kubectl describe pod <pod-name> -n <namespace> | grep -i "pull"
# Update image (never use 'latest' in production):
kubectl set image deployment/<deployment-name> -n <namespace> \
<container-name>=myregistry/my-app:v1.2.3
For permission issues:
# Add security context (ensure UID exists in container):
kubectl patch deployment <deployment-name> -n <namespace> --type='json' \
-p='[{"op": "add", "path": "/spec/template/spec/securityContext",
"value": {"runAsUser": 1000, "runAsGroup": 1000, "fsGroup": 1000}}]'
# ⚠️ Warning: runAsUser must exist in container's /etc/passwd
6. Verify the fix worked:
# Watch pod status in real-time:
kubectl get pod <pod-name> -n <namespace> -w
# Check rollout status:
kubectl rollout status deployment/<deployment-name> -n <namespace>
# Verify logs are clean:
kubectl logs <pod-name> -n <namespace> --follow
Still Not Working?
Advanced debugging steps:
# Check resource quotas:
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>
# Review deployment history:
kubectl rollout history deployment/<deployment-name> -n <namespace>
# Check recent events sorted by time:
kubectl get events --sort-by='.lastTimestamp' -n <namespace>
# Debug with a sleep container:
kubectl run debug-pod --image=<your-image> -n <namespace> \
--command -- sleep 3600
kubectl exec -it debug-pod -n <namespace> -- /bin/sh
Prevent This Error
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: production
spec:
template:
spec:
containers:
- name: my-app
image: myregistry/my-app:v1.2.3 # NEVER use :latest in production
# Properly configured health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Give app time to start
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # Don't be too aggressive
successThreshold: 1
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
# Resource limits with proper units
resources:
requests:
memory: "256Mi" # 268,435,456 bytes
cpu: "250m" # 0.25 CPU cores
limits:
memory: "512Mi" # 536,870,912 bytes
cpu: "500m" # 0.5 CPU cores
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# Proper logging to stdout/stderr
command: ["/bin/sh"]
args: ["-c", "exec your-app >> /proc/1/fd/1 2>> /proc/1/fd/2"]
# Security context (ensure user exists in container)
securityContext:
runAsNonRoot: true
runAsUser: 1000 # Must exist in container's /etc/passwd
runAsGroup: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
FAQ Section
What is CrashLoopBackOff in Kubernetes?
CrashLoopBackOff is a status indicating that a pod’s container is repeatedly crashing and Kubernetes is waiting (with exponential backoff) before trying to restart it again. The container exits immediately or shortly after starting.
How do I fix CrashLoopBackOff quickly?
First, check the logs with kubectl logs <pod-name> -p to see why it crashed. 80% of the time it’s missing configuration, wrong commands, or aggressive liveness probes. Fix the specific issue and the pod will recover.
Why does Kubernetes keep restarting my pod?
Kubernetes restarts pods when containers exit with non-zero status codes. Common reasons include application crashes, missing dependencies, configuration errors, OOM kills, or failed liveness probes.
What’s the difference between CrashLoopBackOff and Error status?
Error means the container exited with an error code. CrashLoopBackOff means it’s repeatedly failing and Kubernetes is backing off before retry attempts.
How long does CrashLoopBackOff last?
The backoff delay increases exponentially: 10s, 20s, 40s, 80s, up to 5 minutes maximum. After each successful run of at least 10 minutes, the delay resets.
Can liveness probes cause CrashLoopBackOff?
Yes! Aggressive liveness probes (low initialDelaySeconds or failureThreshold) can kill healthy containers during startup, causing CrashLoopBackOff.
Common Error Messages and Solutions
| Error in Logs | Root Cause | Solution |
|---|---|---|
Cannot connect to database | Network/DNS/Credentials | Check service discovery, connection string |
Permission denied | Security context | Add proper runAsUser/fsGroup |
No such file or directory | Wrong workdir/missing files | Verify Dockerfile WORKDIR and image contents |
bind: address already in use | Port conflict | Check for duplicate services or wrong port |
Segmentation fault | Memory corruption/limit | Increase memory limits, fix application bug |
Module not found | Missing dependencies | Rebuild image with all requirements |
signal: killed | OOM or manual termination | Check memory limits and events |
exec format error | Wrong architecture | Build for correct platform (amd64/arm64) |
Exit Codes Reference
| Exit Code | Meaning | Common Fix |
|---|---|---|
| 0 | Success (check app logic) | Application exiting when it shouldn’t |
| 1 | General errors | Check application logs |
| 125 | Docker run failed | Invalid container configuration |
| 126 | Command not executable | Add execute permissions |
| 127 | Command not found | Fix command path or install missing binary |
| 137 | SIGKILL (OOM) | Increase memory limits |
| 139 | Segmentation fault | Debug application or increase memory |
| 143 | SIGTERM (graceful shutdown) | Normal during updates |
💡 Pro Tips:
- Always use
kubectl logs <pod> -pfor crash reasons (current logs just show restart) - Set
imagePullPolicy: IfNotPresentfor local testing to avoid registry issues - Use
kubectl debugfor modern debugging (K8s 1.23+) - Keep probe
periodSecondsless thanterminationGracePeriodSeconds
🔥 Quick Win: Check liveness probe configuration first – it’s often the hidden cause!
⚡ Debug Faster: kubectl get events --sort-by='.lastTimestamp' -n <namespace> shows all recent events in order
🚨 Related Errors:
- ImagePullBackOff – Can’t pull container image
- OOMKilled – Container exceeded memory limit
- CreateContainerConfigError – Missing ConfigMap/Secret
- Error – Container exited with error
- Init:CrashLoopBackOff – Init container crashing
📧 Get our K8s Error Fix Cheatsheet: All commands in one PDF → Download Free
Last updated: January 2025 | Kubernetes versions: 1.26-1.29 tested | Cloud providers: EKS, GKE, AKS compatible

6 Comments