CLUSTERDOWN
RedisCRITICALClusterHIGH confidence

The cluster is down

Production Risk

Critical. The cluster is unavailable for writes and possibly reads, causing a total service outage.

What this means

This error indicates that a Redis Cluster node cannot process a command because it believes the cluster is not in a healthy, 'OK' state. This usually means a majority of primary nodes are not reachable.

Why it happens
  1. 1A network partition is preventing nodes from communicating with each other.
  2. 2More than half of the primary nodes in the cluster have crashed or are unreachable.
  3. 3A misconfiguration of `cluster-node-timeout` is causing nodes to prematurely failover.
How to reproduce

In a 3-primary cluster, two of the primaries go offline. A client connected to the remaining primary tries to run a command.

trigger — this will error
trigger — this will error
# Two out of three primaries are down
GET mykey

expected output

(error) CLUSTERDOWN The cluster is down.

Fix 1

Restore failed nodes and fix network partitions

WHEN This is the fundamental solution

Restore failed nodes and fix network partitions
# On each node, check the cluster state
redis-cli -c CLUSTER INFO
redis-cli -c CLUSTER NODES

Why this works

The cluster requires a majority of primary nodes to be online to operate (form a quorum). You must restart the failed Redis processes or resolve the network issues that are preventing them from communicating.

Fix 2

Use `redis-cli --cluster fix` to attempt an automatic repair

WHEN For simple cases where nodes just need to be correctly rediscovered

Use `redis-cli --cluster fix` to attempt an automatic repair
redis-cli --cluster fix <any-node-ip>:<port>

Why this works

The Redis CLI has a cluster management utility that can often fix minor issues by forcing nodes to update their configuration from other nodes that are still online.

What not to do

Immediately start promoting replicas to primaries manually

This can lead to a 'split-brain' scenario where different parts of the cluster disagree on who the primaries are, leading to data loss. Always let the cluster's failure detection algorithm manage failovers.

Sources
Official documentation ↗

Redis Cluster failure detection and quorum logic

Redis Cluster Tutorial

Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev

← All Redis errors