Skip to content

Event Sourcing

The Tech Strategy Tool stores all state changes as an append-only sequence of events. The current state of the strategy document is derived by replaying these events, optionally starting from a checkpoint snapshot. This page explains the event model, processing flow, checkpointing, and replay mechanics.

Core Concepts

Events as facts

An event represents something that happened — a fact, not a request. Events are immutable once stored. The event log is the source of truth for the system state.

Key types

Type Purpose
EventEnvelope Inbound command: event type, actor, target, data dictionary, optional lastSeenSequence
ProcessingResult Outcome: sequence number, timestamp, status (applied/rejected/no-change), rejection reason, notifications, emitted events
StoredEvent Persisted record: all envelope fields plus sequence number, timestamp, and status

Event Processing Flow

Every event submission goes through the same serialized path. Here is the complete sequence from user action to confirmed state update:

sequenceDiagram
    participant Browser
    participant API as API Host
    participant Lock as SemaphoreSlim(1,1)
    participant EP as Event Processor
    participant DB as PostgreSQL
    participant SSE as SSE Manager

    Browser->>API: POST /api/events
    API->>Lock: WaitAsync()
    Lock-->>API: Acquired
    API->>EP: ProcessEvent(envelope)
    EP->>EP: Validate, apply to in-memory state
    EP-->>API: ProcessingResult

    alt Event applied
        API->>DB: AppendAsync(emittedEvents)
        API->>DB: SaveBatchAsync(associations)
        opt ShouldCheckpoint
            API->>DB: Save checkpoint
            API->>EP: ResetCheckpointCounter()
        end
        API->>SSE: BroadcastNotification(s)
    else Event rejected
        API->>DB: AppendAsync(raw request as audit)
    end

    API->>Lock: Release()
    API-->>Browser: SubmitEventResponse

    Note over Browser,SSE: State update comes through SSE, not the HTTP response

    SSE-->>Browser: event: card-changed
    Browser->>API: GET /api/... (re-fetch card)
    API->>EP: Read in-memory state
    API-->>Browser: Updated card data

Step by step

  1. User action — The user edits a field (e.g., changes a principle title) and focus leaves the field. The field enters "Persisting" state with a spinner overlay.
  2. HTTP POST — The Blazor client sends POST /api/events with eventType, targetId, data, and lastSeenSequence. The request includes the session cookie and X-CSRF-Token: 1 header.
  3. Lock acquisition — The API acquires a SemaphoreSlim(1,1) lock. If another event is being processed, this request waits.
  4. Envelope construction — The API builds an EventEnvelope from the request body plus the authenticated user's identity.
  5. ProcessingEventProcessor.ProcessEvent(envelope) runs in-memory: assigns the next sequence number, validates the event, checks for conflicts, and applies the change (or rejects it).
  6. Persistence — Applied events are persisted via IEventStore.AppendAsync. Rejected events are persisted as audit records.
  7. Checkpoint check — If ShouldCheckpoint is true, the document is serialized and saved.
  8. SSE broadcast — Still inside the lock, notifications are broadcast to all connected clients.
  9. Lock release — The semaphore is released.
  10. HTTP response — The client receives a SubmitEventResponse with status and (if rejected) the reason and current server value.
  11. SSE confirmation — All connected clients (including the submitter) receive the SSE notification and re-fetch the affected card from the API.

Key design insight

The submitting client has no special treatment. It learns about its own change the same way every other client does — through the SSE notification triggering a card reload. This eliminates an entire class of consistency bugs where the submitter's local state diverges from what other clients see. The HTTP response only tells the client "accepted or rejected" — the actual state update comes through the SSE channel.

Single-writer constraint

The event processor is not thread-safe. The SemaphoreSlim(1,1) in EventEndpoints ensures that only one event is processed at a time. Never call ProcessEvent outside this lock.

Event Statuses

An event can have one of three outcomes:

Status Meaning
Applied The event was valid and the state was updated. Emitted events are persisted.
Rejected The event failed validation (e.g., entity not found, invalid data). The raw request is persisted as an audit record.
NoChange The event was valid but represents a no-op (e.g., setting a field to its current value). Nothing is persisted or broadcast.

No-op detection

Update handlers compare the new value against the current value (trimmed). If they are identical, the processor returns NoChange — no event is stored, no SSE notification is broadcast, and the sequence number is not incremented. This prevents unnecessary noise in the event log.

Checkpointing

Replaying the entire event log on every startup would be impractical. Checkpoints are periodic snapshots of the strategy document that serve as a starting point for replay.

graph LR
    CP1["Checkpoint<br/>seq 200"] -.-> CP2["Checkpoint<br/>seq 400"]
    CP2 -.-> CP3["Checkpoint<br/>seq 500"]
    CP3 -->|"Replay events 501-547"| Current["Current state<br/>seq 547"]

    style CP1 fill:#e8eaf6,stroke:#7986cb
    style CP2 fill:#e8eaf6,stroke:#7986cb
    style CP3 fill:#c8e6c9,stroke:#66bb6a
    style Current fill:#fff9c4,stroke:#fdd835

A checkpoint contains a full JSON serialization of the Strategy document at a specific sequence number — all teams, groups, principles, objectives, initiatives, their ordering, and all field values.

When checkpoints are taken

Checkpoints are triggered in three situations:

Periodic (every 100 applied events): The processor tracks applied events since the last checkpoint reset. When the count exceeds 100, ShouldCheckpoint returns true. The API serializes the document, persists the checkpoint, and calls ResetCheckpointCounter().

After restore: When a restore_history event replaces the in-memory document, the processor immediately signals for a checkpoint. This captures the restored state so future replays do not need to repeat the backward seek through history.

On cold start: After replay completes during startup, a checkpoint is saved unconditionally. This ensures that any old non-decomposed events (from before the decomposition pattern was introduced) are never replayed on subsequent restarts.

How checkpoints affect replay

  • Cold start: The processor loads the latest checkpoint, deserializes the document, and replays only events after that checkpoint's sequence number. Without checkpoints, every startup would replay the entire event history.
  • Restore: BuildDocumentAtSequence finds the nearest checkpoint at or before the target sequence and replays from there. Restore performance depends on checkpoint density, not total history length.

Example checkpoint document

A checkpoint's document_json is produced by JsonSerializer.Serialize(processor.CurrentDocument) using default System.Text.Json settings (PascalCase property names — different from the API's camelCase responses). Here is a realistic example showing one team with principles, a group, objectives, and initiatives:

{
  "Teams": [
    {
      "Id": "a1b2c3d4-0000-0000-0000-000000000001",
      "Name": "Platform Engineering",
      "Color": "#3498db",
      "FieldSequences": { "name": 3, "color": 5 },
      "Principles": [
        {
          "Id": "b1b2c3d4-0000-0000-0000-000000000001",
          "Title": "*Security* is non-negotiable",
          "Description": "All services must follow zero-trust principles",
          "FieldSequences": { "title": 12, "description": 15 }
        },
        {
          "Id": "b1b2c3d4-0000-0000-0000-000000000002",
          "Title": "Prefer *managed services* over self-hosted",
          "Description": "Reduce operational burden by using cloud-managed infrastructure",
          "FieldSequences": { "title": 8, "description": 9 }
        }
      ],
      "Groups": [
        {
          "Id": "c1b2c3d4-0000-0000-0000-000000000001",
          "Name": "Q1 Priorities",
          "Description": "Must-complete objectives for Q1",
          "FieldSequences": { "name": 20, "description": 22 }
        }
      ],
      "Objectives": [
        {
          "Id": "d1b2c3d4-0000-0000-0000-000000000001",
          "Title": "Migrate auth to OpenID Connect",
          "GroupId": "c1b2c3d4-0000-0000-0000-000000000001",
          "PrincipleIds": ["b1b2c3d4-0000-0000-0000-000000000001"],
          "Initiatives": [
            {
              "Id": "e1b2c3d4-0000-0000-0000-000000000001",
              "Name": "Evaluate identity providers",
              "Progress": 75,
              "JiraIssueKey": "PLAT-123",
              "FieldSequences": { "name": 30, "progress": 45 }
            },
            {
              "Id": "e1b2c3d4-0000-0000-0000-000000000002",
              "Name": "Implement OIDC integration",
              "Progress": 20,
              "JiraIssueKey": null,
              "FieldSequences": { "name": 31, "progress": 42 }
            }
          ],
          "FieldSequences": { "title": 25 },
          "TotalProgress": 48
        },
        {
          "Id": "d1b2c3d4-0000-0000-0000-000000000002",
          "Title": "Reduce CI build times by 50%",
          "GroupId": null,
          "PrincipleIds": ["b1b2c3d4-0000-0000-0000-000000000002"],
          "Initiatives": [],
          "FieldSequences": { "title": 27 },
          "TotalProgress": 0
        }
      ]
    }
  ]
}

Key structural details:

  • The JSON mirrors the domain model directly — no custom converters or transformations. Teams is an array of team objects, each containing nested arrays of Principles, Groups, and Objectives, with Initiatives nested inside objectives.
  • Array order is display order — Principles, Groups, Objectives, and Initiatives are all List<T> in the domain model. Their position in the JSON array determines the display order in the UI.
  • FieldSequences is a Dictionary<string, long> on every entity type (including Team), mapping field names to the sequence number of the last event that modified that field. This drives conflict detection and history associations.
  • PrincipleIds on objectives is a List<Guid> of references to principles in the same team.
  • GroupId on objectives is null for ungrouped objectives.
  • TotalProgress is a computed property (arithmetic mean of initiative progress values, rounded). It is serialized into the checkpoint but recalculated when initiatives are modified at runtime.
  • The entity lookup index (_index on Strategy) is marked [JsonIgnore] and rebuilt via RebuildIndex() after deserialization.

Checkpoint storage

Checkpoints are stored in the checkpoints table's document_json column, which uses PostgreSQL's JSONB type. The application treats this column as opaque — it always reads and writes the full document as a single value. No PostgreSQL JSON operators are used anywhere in the codebase.

Why JSONB?

Alternatives considered and why they were not chosen:

Approach Trade-off
Normalized tables (one row per team, principle, etc.) Writing a checkpoint becomes a multi-table transaction across 5+ tables with deletion handling. Reading becomes a multi-join query. This adds significant complexity for zero benefit — checkpoints are always loaded as a whole document, never queried relationally.
Binary serialization (Protocol Buffers, MessagePack) Potentially smaller and faster, but opaque to database tooling. You cannot inspect a checkpoint in a SQL client for debugging. Schema evolution requires versioned deserializers. Document sizes are small enough that JSON performance is not a concern.
Separate document store (MongoDB, S3) Introduces an additional infrastructure dependency for a single use case. PostgreSQL is the only persistence dependency, and keeping it that way aligns with the project's simplicity goals.

JSONB specifically (rather than plain text JSON) provides: automatic validation on write, binary storage with TOAST compression for values over ~2 KB, and the option to use JSON path operators for ad-hoc debugging queries — even though the application does not use them.

Document sizing

Scenario Approximate Size
Empty strategy ~15 bytes
Single team, no content ~120 bytes
Moderate (1 team, 2 principles, 1 group, 2 objectives, 2 initiatives) 2-3 KB
Typical real-world (5 teams, 5-10 principles each, 3-5 groups, 10-15 objectives, 30-50 initiatives) 20-50 KB
Large (10 teams, 20+ principles each, long descriptions at the 2000-char limit) 200-500 KB

The dominant size factors are PrincipleDescription (max 2,000 characters) and total entity count. FieldSequences add roughly 50-100 bytes per entity. PostgreSQL's TOAST compression further reduces on-disk size for documents over ~2 KB.

Checkpoint accumulation

Old checkpoints are never cleaned up. The application has no deletion logic for the checkpoints table. Checkpoints accumulate at the rate of:

  • 1 per 100 applied events (the checkpoint threshold)
  • 1 per restore operation
  • 1 per application startup (if events were replayed)

At typical usage (100 events per day), this means roughly one checkpoint per day. Even at the "large" document size of 500 KB, 1,000 checkpoints would occupy only 500 MB — well within PostgreSQL's comfort zone.

Why keep old checkpoints?

Old checkpoints make point-in-time restore efficient. The restore mechanism finds the nearest checkpoint at or before the target sequence number and replays from there. Without historical checkpoints, restoring to an old point would require replaying from the beginning of time. Checkpoint retention is therefore coupled to event retention — if you implement a data retention policy, keep at least one checkpoint at the retention boundary.

Ad-hoc inspection

Although the application treats checkpoints as opaque, JSONB makes them queryable for debugging and operations. An administrator could inspect checkpoint contents directly in SQL:

-- What teams existed at checkpoint sequence 500?
SELECT document_json->'Teams' FROM checkpoints WHERE sequence_number = 500;

-- How many principles does each team have in the latest checkpoint?
SELECT
  team->>'Name' AS team_name,
  jsonb_array_length(team->'Principles') AS principle_count
FROM checkpoints,
  jsonb_array_elements(document_json->'Teams') AS team
WHERE sequence_number = (SELECT MAX(sequence_number) FROM checkpoints);

This is an operational convenience provided by the JSONB choice, not an application feature.

Event Decomposition

Some user actions trigger multiple state changes. Rather than creating one complex event, the processor decomposes the action into discrete, self-contained events.

The problem

When a user assigns a principle from another team to their objective, the mechanics require: (a) create an independent copy of the principle in the owning team, and (b) link it. A single monolithic event would obscure the individual state changes in history.

The solution

Event handlers can return DerivedCommands — a list of additional commands to process. The processor processes each sequentially, collecting emitted events. Each derived command becomes its own stored event with its own sequence number.

Example: cross-team principle assignment

sequenceDiagram
    participant Client
    participant EP as Event Processor

    Client->>EP: assign_principle_to_objective (source from Team A)
    Note over EP: Detects cross-team assignment
    EP->>EP: Derived: create_principle (in Team B, with title + description + sourceTeamId)
    EP->>EP: Derived: assign_principle_to_objective (record link)
    EP-->>Client: Result with 2 emitted events

Why decompose?

  • Readable history — each stored event is a simple fact: "principle created," "title set," "principle assigned"
  • Per-entity history — the new principle's history shows its creation; the objective's history shows the assignment
  • Re-apply support — individual events can be re-applied without re-running the entire cross-team copy logic

EmittedEvents as source of truth

ProcessingResult.EmittedEvents is the authoritative list of what gets persisted. The API layer persists EmittedEvents, never the raw request data. For rejected events (where EmittedEvents is empty), the raw request is stored as an audit record.

Cold Start Replay

On application startup, ProcessorInitializer reconstructs the in-memory state:

sequenceDiagram
    participant Init as ProcessorInitializer
    participant CP as CheckpointStore
    participant ES as EventStore
    participant EP as EventProcessor

    Init->>CP: GetLatestAsync()
    CP-->>Init: Checkpoint (document JSON + seq number)
    Init->>Init: Deserialize Strategy from JSON
    Init->>EP: LoadFromCheckpoint(document, seqNumber)
    Init->>ES: GetEventsAfterAsync(seqNumber)
    ES-->>Init: Events since checkpoint
    loop Each event
        Init->>EP: ReplayEvent(storedEvent)
    end
    Init->>EP: ResetCheckpointCounter()
    Init->>CP: SaveAsync(new checkpoint)
    Note over Init: Ready to accept requests

ReplayEvent differs from ProcessEvent in that it skips rejected events (they are already stored) and does not generate notifications.

Point-in-Time Restore

The system can reconstruct state at any historical sequence number using BuildDocumentAtSequence:

  1. Find the nearest checkpoint at or before the target sequence number
  2. Replay events from the checkpoint to the target sequence
  3. Return the resulting document

This is used by the Admin Site's "Restore" feature, which submits a restore_history event that replaces the current document with the reconstructed state and forces an immediate checkpoint.

History Associations

Entity history queries cannot rely on TargetId alone because create events use TargetId for the parent entity (not the created entity), and cascaded effects (e.g., deleting a principle that was assigned to objectives) don't reference the affected entities in TargetId.

The history_associations table solves this by explicitly mapping each event to all entities it affects:

Column Type Description
entity_id UUID The entity whose history this entry belongs to
event_sequence BIGINT The triggering event's sequence number
previous_sequence BIGINT? For field edits: the previous event that modified the same field (enables before/after display)
is_transitive BOOLEAN false for direct actions, true for cascaded effects

How it works

Each event handler emits HistoryAssociation records alongside notifications. Three patterns:

  • Pattern A (field edits): Captures PreviousSequence from FieldSequences before mutation, enabling old→new value display without duplicating event data
  • Pattern B (structural): Simple direct association (create, delete, reorder, assign/remove)
  • Pattern C (cascaded): Direct association for the target entity plus transitive associations for affected entities (e.g., delete_principle → direct for principle + transitive for each objective that referenced it)

Cross-team copy provenance

When assign_principle_to_objective triggers a cross-team copy, the derived create_principle event carries sourceTeamId in its data dictionary. The history display uses this to show "Created principle (copied from 'Team Name')" instead of a generic creation message.

Backfillability

The association table can be rebuilt from the event stream at any time by replaying events through the processor and collecting the emitted associations. If association persistence fails, events are still correct — associations can be backfilled later.

Restore cleanup

When a restore_history event is processed, all associations with event_sequence greater than the target sequence are deleted, keeping the association table consistent with the restored state.

History Rebuild (Reindexing)

The history associations and checkpoints can be fully rebuilt from the event stream via POST /api/admin/rebuild-history. This is useful for:

  • Backfilling associations for events created before the association system existed
  • Recovering from corrupted checkpoints
  • Reapplying updated processing logic to the entire event log

Rebuild flow

sequenceDiagram
    participant Admin
    participant API
    participant RS as RebuildState
    participant PL as ProcessingLock
    participant DB as PostgreSQL

    Admin->>API: POST /api/admin/rebuild-history
    API->>RS: Suspend() — live path skips associations & checkpoints

    API->>DB: DELETE all checkpoints
    API->>DB: DELETE all history_associations

    loop Each stored event
        API->>API: ReplayEventWithAssociations()
        opt Every 100 applied events
            API->>DB: SaveBatch(associations)
            API->>DB: Save checkpoint
        end
    end

    Note over API,PL: Edge handover
    API->>PL: WaitAsync() — acquire processing lock
    API->>DB: Read gap events (arrived during rebuild)
    loop Each gap event
        API->>API: ReplayEventWithAssociations()
    end
    API->>DB: SaveBatch(gap associations)
    API->>DB: Save final checkpoint
    API->>RS: Resume() — re-enable live path
    API->>PL: Release()

    API-->>Admin: { eventsProcessed, checkpointsSaved, associationsSaved }

Suspension mechanism

During rebuild, the live event processing path continues normally (document mutation, event persistence, SSE notifications) but skips two operations:

  • Association persistence — checked via RebuildState.IsSuspended
  • Checkpoint saves — checked via RebuildState.IsSuspended

This means users can keep working during a rebuild — they just temporarily lose entity history visibility.

Edge handover

At the end of the rebuild, the service acquires the ProcessingLock to process any events that were submitted during the rebuild. This ensures no events are missed. The lock hold time is minimal — just the gap events.

Concurrency guard

RebuildState.IsRunning prevents concurrent rebuilds. If a rebuild is already in progress, the endpoint returns 409 Conflict.

TargetId Conventions

Events use TargetId with different semantics depending on the event type:

Event Type TargetId Meaning New Entity ID
create_team (none) data["id"]
create_group Parent team ID data["id"]
create_principle Parent team ID data["id"]
create_objective Parent team ID data["id"]
create_initiative Parent objective ID data["id"]
update_*, delete_* The entity being modified (n/a)

Note

For create events, TargetId is the parent entity, not the new entity. The new entity's ID is always in data["id"].

Further Reading