PodFailurePolicyMatch
KubernetesWARNINGNotableWorkloadsHIGH confidence

Job pod failure matched a pod failure policy rule

Production Risk

Medium — a misconfigured FailJob rule can permanently fail a batch job on the first transient error.

What this means

A Job pod terminated and its exit code or condition matched a rule in the Job's podFailurePolicy. Depending on the matched rule's action, the Job will either retry the pod, ignore the failure, or mark the Job as failed without retrying.

Why it happens
  1. 1Application container exited with a specific exit code defined in a podFailurePolicy rule.
  2. 2Pod condition (e.g. DisruptionTarget) matched a policy rule with action FailJob or Ignore.
  3. 3Policy rule with action Count decremented the backoff budget.
How to reproduce

Job pod exits and the controller evaluates podFailurePolicy rules against the pod's exit code or conditions.

trigger — this will error
trigger — this will error
kubectl describe job my-job
# Events: PodFailurePolicyMatch rule matched, action: FailJob

expected output

Status:
  Failed: 1
  Conditions:
    Type: Failed
    Reason: PodFailurePolicyMatch

Fix 1

Review and adjust podFailurePolicy rules

WHEN Job is failing faster than expected due to a policy match

Review and adjust podFailurePolicy rules
kubectl get job my-job -o yaml | grep -A 20 podFailurePolicy

Why this works

Inspecting the policy shows which exit codes trigger FailJob vs. retry so you can tune the rules to match your application's exit code semantics.

Fix 2

Add Ignore rule for node-level disruption exit codes

WHEN Node evictions are counting against job retries

Add Ignore rule for node-level disruption exit codes
podFailurePolicy:
  rules:
  - action: Ignore
    onPodConditions:
    - type: DisruptionTarget

Why this works

Ignoring DisruptionTarget failures prevents node evictions from consuming the Job's backoffLimit.

What not to do

Version notes
Kubernetes 1.26

podFailurePolicy introduced as alpha.

Kubernetes 1.27

podFailurePolicy promoted to beta.

Sources
Official documentation ↗

Kubernetes 1.26 — Job Pod Failure Policy

Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev

← All Kubernetes errors