Find and fix a stuck observer
A read model looks stale, or a reactor’s side effect never fired. In an event-sourced system that usually means an observer — a projection, reducer, or reactor — hit an event it couldn’t process and paused that partition. The rest of the system keeps running; only the failing event source is stuck. Here’s how to find it and get it moving again.
1. Spot the stuck observer
Section titled “1. Spot the stuck observer”Start with the health check — it counts failed partitions across the whole store in one shot:
cratis chronicle diagnoseIf it reports failed partitions, list the observers and look at their state and sequence position:
cratis chronicle observers list --output plainAn observer whose sequence number is behind the event log tail, or whose state isn’t active, is your suspect.
2. Read the failure
Section titled “2. Read the failure”List the failed partitions to see which event source failed and on which observer:
cratis chronicle failed-partitions listThen read the actual error — the failing sequence number, the exception, and the attempt history:
cratis chronicle failed-partitions show <OBSERVER_ID> <PARTITION> --detailed<PARTITION> is the event source id (for example user-42). The --detailed flag gives you the full
stack trace; add -o json if you want to pipe it somewhere.
3. Fix the cause, then retry
Section titled “3. Fix the cause, then retry”The failure is almost always a bug in the observer’s code — a null it didn’t expect, a missing case. Fix that in your application and redeploy. Then resume the partition without losing its state:
cratis chronicle observers retry-partition <OBSERVER_ID> <PARTITION>Retry re-processes the event that failed. For a transient error or a corrected bug, this is the right recovery path — the observer picks up where it stopped.
4. If the state is corrupt, replay instead
Section titled “4. If the state is corrupt, replay instead”If the partition’s read-model state is wrong (not just stuck) — say a bug wrote bad data before it threw — retrying won’t fix what’s already there. Rebuild just that partition from sequence zero:
cratis chronicle observers replay-partition <OBSERVER_ID> <PARTITION>This discards the partition’s accumulated state and rebuilds it from history. Other partitions are untouched.
Done when
Section titled “Done when”cratis chronicle diagnosereports no failed partitions and the observer’s sequence number has caught up to the tail. If a whole observer is broken (a schema change, not one bad partition), see Replay a projection for a full rebuild.