Skip to content

Find and fix a stuck observer

A read model looks stale, or a reactor’s side effect never fired. In an event-sourced system that usually means an observer — a projection, reducer, or reactor — hit an event it couldn’t process and paused that partition. The rest of the system keeps running; only the failing event source is stuck. Here’s how to find it and get it moving again.

Start with the health check — it counts failed partitions across the whole store in one shot:

Terminal window
cratis chronicle diagnose

If it reports failed partitions, list the observers and look at their state and sequence position:

Terminal window
cratis chronicle observers list --output plain

An observer whose sequence number is behind the event log tail, or whose state isn’t active, is your suspect.

List the failed partitions to see which event source failed and on which observer:

Terminal window
cratis chronicle failed-partitions list

Then read the actual error — the failing sequence number, the exception, and the attempt history:

Terminal window
cratis chronicle failed-partitions show <OBSERVER_ID> <PARTITION> --detailed

<PARTITION> is the event source id (for example user-42). The --detailed flag gives you the full stack trace; add -o json if you want to pipe it somewhere.

The failure is almost always a bug in the observer’s code — a null it didn’t expect, a missing case. Fix that in your application and redeploy. Then resume the partition without losing its state:

Terminal window
cratis chronicle observers retry-partition <OBSERVER_ID> <PARTITION>

Retry re-processes the event that failed. For a transient error or a corrected bug, this is the right recovery path — the observer picks up where it stopped.

4. If the state is corrupt, replay instead

Section titled “4. If the state is corrupt, replay instead”

If the partition’s read-model state is wrong (not just stuck) — say a bug wrote bad data before it threw — retrying won’t fix what’s already there. Rebuild just that partition from sequence zero:

Terminal window
cratis chronicle observers replay-partition <OBSERVER_ID> <PARTITION>

This discards the partition’s accumulated state and rebuilds it from history. Other partitions are untouched.

Terminal window
cratis chronicle diagnose

reports no failed partitions and the observer’s sequence number has caught up to the tail. If a whole observer is broken (a schema change, not one bad partition), see Replay a projection for a full rebuild.