Job pod failure matched a pod failure policy rule
Production Risk
Medium — a misconfigured FailJob rule can permanently fail a batch job on the first transient error.
A Job pod terminated and its exit code or condition matched a rule in the Job's podFailurePolicy. Depending on the matched rule's action, the Job will either retry the pod, ignore the failure, or mark the Job as failed without retrying.
- 1Application container exited with a specific exit code defined in a podFailurePolicy rule.
- 2Pod condition (e.g. DisruptionTarget) matched a policy rule with action FailJob or Ignore.
- 3Policy rule with action Count decremented the backoff budget.
Job pod exits and the controller evaluates podFailurePolicy rules against the pod's exit code or conditions.
kubectl describe job my-job # Events: PodFailurePolicyMatch rule matched, action: FailJob
expected output
Status:
Failed: 1
Conditions:
Type: Failed
Reason: PodFailurePolicyMatchFix 1
Review and adjust podFailurePolicy rules
WHEN Job is failing faster than expected due to a policy match
kubectl get job my-job -o yaml | grep -A 20 podFailurePolicy
Why this works
Inspecting the policy shows which exit codes trigger FailJob vs. retry so you can tune the rules to match your application's exit code semantics.
Fix 2
Add Ignore rule for node-level disruption exit codes
WHEN Node evictions are counting against job retries
podFailurePolicy:
rules:
- action: Ignore
onPodConditions:
- type: DisruptionTargetWhy this works
Ignoring DisruptionTarget failures prevents node evictions from consuming the Job's backoffLimit.
✕
podFailurePolicy introduced as alpha.
podFailurePolicy promoted to beta.
Kubernetes 1.26 — Job Pod Failure Policy
Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev