Event Sourcing¶
The Tech Strategy Tool stores all state changes as an append-only sequence of events. The current state of the strategy document is derived by replaying these events, optionally starting from a checkpoint snapshot. This page explains the event model, processing flow, checkpointing, and replay mechanics.
Core Concepts¶
Events as facts¶
An event represents something that happened — a fact, not a request. Events are immutable once stored. The event log is the source of truth for the system state.
Key types¶
| Type | Purpose |
|---|---|
EventEnvelope |
Inbound command: event type, actor, target, data dictionary, optional lastSeenSequence |
ProcessingResult |
Outcome: sequence number, timestamp, status (applied/rejected/no-change), rejection reason, notifications, emitted events |
StoredEvent |
Persisted record: all envelope fields plus sequence number, timestamp, and status |
Event Processing Flow¶
Every event submission goes through the same serialized path. Here is the complete sequence from user action to confirmed state update:
sequenceDiagram
participant Browser
participant API as API Host
participant Lock as SemaphoreSlim(1,1)
participant EP as Event Processor
participant DB as PostgreSQL
participant SSE as SSE Manager
Browser->>API: POST /api/events
API->>Lock: WaitAsync()
Lock-->>API: Acquired
API->>EP: ProcessEvent(envelope)
EP->>EP: Validate, apply to in-memory state
EP-->>API: ProcessingResult
alt Event applied
API->>DB: AppendAsync(emittedEvents)
API->>DB: SaveBatchAsync(associations)
opt ShouldCheckpoint
API->>DB: Save checkpoint
API->>EP: ResetCheckpointCounter()
end
API->>SSE: BroadcastNotification(s)
else Event rejected
API->>DB: AppendAsync(raw request as audit)
end
API->>Lock: Release()
API-->>Browser: SubmitEventResponse
Note over Browser,SSE: State update comes through SSE, not the HTTP response
SSE-->>Browser: event: card-changed
Browser->>API: GET /api/... (re-fetch card)
API->>EP: Read in-memory state
API-->>Browser: Updated card data
Step by step¶
- User action — The user edits a field (e.g., changes a principle title) and focus leaves the field. The field enters "Persisting" state with a spinner overlay.
- HTTP POST — The Blazor client sends
POST /api/eventswitheventType,targetId,data, andlastSeenSequence. The request includes the session cookie andX-CSRF-Token: 1header. - Lock acquisition — The API acquires a
SemaphoreSlim(1,1)lock. If another event is being processed, this request waits. - Envelope construction — The API builds an
EventEnvelopefrom the request body plus the authenticated user's identity. - Processing —
EventProcessor.ProcessEvent(envelope)runs in-memory: assigns the next sequence number, validates the event, checks for conflicts, and applies the change (or rejects it). - Persistence — Applied events are persisted via
IEventStore.AppendAsync. Rejected events are persisted as audit records. - Checkpoint check — If
ShouldCheckpointis true, the document is serialized and saved. - SSE broadcast — Still inside the lock, notifications are broadcast to all connected clients.
- Lock release — The semaphore is released.
- HTTP response — The client receives a
SubmitEventResponsewith status and (if rejected) the reason and current server value. - SSE confirmation — All connected clients (including the submitter) receive the SSE notification and re-fetch the affected card from the API.
Key design insight
The submitting client has no special treatment. It learns about its own change the same way every other client does — through the SSE notification triggering a card reload. This eliminates an entire class of consistency bugs where the submitter's local state diverges from what other clients see. The HTTP response only tells the client "accepted or rejected" — the actual state update comes through the SSE channel.
Single-writer constraint
The event processor is not thread-safe. The SemaphoreSlim(1,1) in EventEndpoints ensures that only one event is processed at a time. Never call ProcessEvent outside this lock.
Event Statuses¶
An event can have one of three outcomes:
| Status | Meaning |
|---|---|
| Applied | The event was valid and the state was updated. Emitted events are persisted. |
| Rejected | The event failed validation (e.g., entity not found, invalid data). The raw request is persisted as an audit record. |
| NoChange | The event was valid but represents a no-op (e.g., setting a field to its current value). Nothing is persisted or broadcast. |
No-op detection¶
Update handlers compare the new value against the current value (trimmed). If they are identical, the processor returns NoChange — no event is stored, no SSE notification is broadcast, and the sequence number is not incremented. This prevents unnecessary noise in the event log.
Checkpointing¶
Replaying the entire event log on every startup would be impractical. Checkpoints are periodic snapshots of the strategy document that serve as a starting point for replay.
graph LR
CP1["Checkpoint<br/>seq 200"] -.-> CP2["Checkpoint<br/>seq 400"]
CP2 -.-> CP3["Checkpoint<br/>seq 500"]
CP3 -->|"Replay events 501-547"| Current["Current state<br/>seq 547"]
style CP1 fill:#e8eaf6,stroke:#7986cb
style CP2 fill:#e8eaf6,stroke:#7986cb
style CP3 fill:#c8e6c9,stroke:#66bb6a
style Current fill:#fff9c4,stroke:#fdd835
A checkpoint contains a full JSON serialization of the Strategy document at a specific sequence number — all teams, groups, principles, objectives, initiatives, their ordering, and all field values.
When checkpoints are taken¶
Checkpoints are triggered in three situations:
Periodic (every 100 applied events): The processor tracks applied events since the last checkpoint reset. When the count exceeds 100, ShouldCheckpoint returns true. The API serializes the document, persists the checkpoint, and calls ResetCheckpointCounter().
After restore: When a restore_history event replaces the in-memory document, the processor immediately signals for a checkpoint. This captures the restored state so future replays do not need to repeat the backward seek through history.
On cold start: After replay completes during startup, a checkpoint is saved unconditionally. This ensures that any old non-decomposed events (from before the decomposition pattern was introduced) are never replayed on subsequent restarts.
How checkpoints affect replay¶
- Cold start: The processor loads the latest checkpoint, deserializes the document, and replays only events after that checkpoint's sequence number. Without checkpoints, every startup would replay the entire event history.
- Restore:
BuildDocumentAtSequencefinds the nearest checkpoint at or before the target sequence and replays from there. Restore performance depends on checkpoint density, not total history length.
Example checkpoint document¶
A checkpoint's document_json is produced by JsonSerializer.Serialize(processor.CurrentDocument) using default System.Text.Json settings (PascalCase property names — different from the API's camelCase responses). Here is a realistic example showing one team with principles, a group, objectives, and initiatives:
{
"Teams": [
{
"Id": "a1b2c3d4-0000-0000-0000-000000000001",
"Name": "Platform Engineering",
"Color": "#3498db",
"FieldSequences": { "name": 3, "color": 5 },
"Principles": [
{
"Id": "b1b2c3d4-0000-0000-0000-000000000001",
"Title": "*Security* is non-negotiable",
"Description": "All services must follow zero-trust principles",
"FieldSequences": { "title": 12, "description": 15 }
},
{
"Id": "b1b2c3d4-0000-0000-0000-000000000002",
"Title": "Prefer *managed services* over self-hosted",
"Description": "Reduce operational burden by using cloud-managed infrastructure",
"FieldSequences": { "title": 8, "description": 9 }
}
],
"Groups": [
{
"Id": "c1b2c3d4-0000-0000-0000-000000000001",
"Name": "Q1 Priorities",
"Description": "Must-complete objectives for Q1",
"FieldSequences": { "name": 20, "description": 22 }
}
],
"Objectives": [
{
"Id": "d1b2c3d4-0000-0000-0000-000000000001",
"Title": "Migrate auth to OpenID Connect",
"GroupId": "c1b2c3d4-0000-0000-0000-000000000001",
"PrincipleIds": ["b1b2c3d4-0000-0000-0000-000000000001"],
"Initiatives": [
{
"Id": "e1b2c3d4-0000-0000-0000-000000000001",
"Name": "Evaluate identity providers",
"Progress": 75,
"JiraIssueKey": "PLAT-123",
"FieldSequences": { "name": 30, "progress": 45 }
},
{
"Id": "e1b2c3d4-0000-0000-0000-000000000002",
"Name": "Implement OIDC integration",
"Progress": 20,
"JiraIssueKey": null,
"FieldSequences": { "name": 31, "progress": 42 }
}
],
"FieldSequences": { "title": 25 },
"TotalProgress": 48
},
{
"Id": "d1b2c3d4-0000-0000-0000-000000000002",
"Title": "Reduce CI build times by 50%",
"GroupId": null,
"PrincipleIds": ["b1b2c3d4-0000-0000-0000-000000000002"],
"Initiatives": [],
"FieldSequences": { "title": 27 },
"TotalProgress": 0
}
]
}
]
}
Key structural details:
- The JSON mirrors the domain model directly — no custom converters or transformations.
Teamsis an array of team objects, each containing nested arrays ofPrinciples,Groups, andObjectives, withInitiativesnested inside objectives. - Array order is display order — Principles, Groups, Objectives, and Initiatives are all
List<T>in the domain model. Their position in the JSON array determines the display order in the UI. FieldSequencesis aDictionary<string, long>on every entity type (including Team), mapping field names to the sequence number of the last event that modified that field. This drives conflict detection and history associations.PrincipleIdson objectives is aList<Guid>of references to principles in the same team.GroupIdon objectives isnullfor ungrouped objectives.TotalProgressis a computed property (arithmetic mean of initiative progress values, rounded). It is serialized into the checkpoint but recalculated when initiatives are modified at runtime.- The entity lookup index (
_indexon Strategy) is marked[JsonIgnore]and rebuilt viaRebuildIndex()after deserialization.
Checkpoint storage¶
Checkpoints are stored in the checkpoints table's document_json column, which uses PostgreSQL's JSONB type. The application treats this column as opaque — it always reads and writes the full document as a single value. No PostgreSQL JSON operators are used anywhere in the codebase.
Why JSONB?¶
Alternatives considered and why they were not chosen:
| Approach | Trade-off |
|---|---|
| Normalized tables (one row per team, principle, etc.) | Writing a checkpoint becomes a multi-table transaction across 5+ tables with deletion handling. Reading becomes a multi-join query. This adds significant complexity for zero benefit — checkpoints are always loaded as a whole document, never queried relationally. |
| Binary serialization (Protocol Buffers, MessagePack) | Potentially smaller and faster, but opaque to database tooling. You cannot inspect a checkpoint in a SQL client for debugging. Schema evolution requires versioned deserializers. Document sizes are small enough that JSON performance is not a concern. |
| Separate document store (MongoDB, S3) | Introduces an additional infrastructure dependency for a single use case. PostgreSQL is the only persistence dependency, and keeping it that way aligns with the project's simplicity goals. |
JSONB specifically (rather than plain text JSON) provides: automatic validation on write, binary storage with TOAST compression for values over ~2 KB, and the option to use JSON path operators for ad-hoc debugging queries — even though the application does not use them.
Document sizing¶
| Scenario | Approximate Size |
|---|---|
| Empty strategy | ~15 bytes |
| Single team, no content | ~120 bytes |
| Moderate (1 team, 2 principles, 1 group, 2 objectives, 2 initiatives) | 2-3 KB |
| Typical real-world (5 teams, 5-10 principles each, 3-5 groups, 10-15 objectives, 30-50 initiatives) | 20-50 KB |
| Large (10 teams, 20+ principles each, long descriptions at the 2000-char limit) | 200-500 KB |
The dominant size factors are PrincipleDescription (max 2,000 characters) and total entity count. FieldSequences add roughly 50-100 bytes per entity. PostgreSQL's TOAST compression further reduces on-disk size for documents over ~2 KB.
Checkpoint accumulation¶
Old checkpoints are never cleaned up. The application has no deletion logic for the checkpoints table. Checkpoints accumulate at the rate of:
- 1 per 100 applied events (the checkpoint threshold)
- 1 per restore operation
- 1 per application startup (if events were replayed)
At typical usage (100 events per day), this means roughly one checkpoint per day. Even at the "large" document size of 500 KB, 1,000 checkpoints would occupy only 500 MB — well within PostgreSQL's comfort zone.
Why keep old checkpoints?
Old checkpoints make point-in-time restore efficient. The restore mechanism finds the nearest checkpoint at or before the target sequence number and replays from there. Without historical checkpoints, restoring to an old point would require replaying from the beginning of time. Checkpoint retention is therefore coupled to event retention — if you implement a data retention policy, keep at least one checkpoint at the retention boundary.
Ad-hoc inspection¶
Although the application treats checkpoints as opaque, JSONB makes them queryable for debugging and operations. An administrator could inspect checkpoint contents directly in SQL:
-- What teams existed at checkpoint sequence 500?
SELECT document_json->'Teams' FROM checkpoints WHERE sequence_number = 500;
-- How many principles does each team have in the latest checkpoint?
SELECT
team->>'Name' AS team_name,
jsonb_array_length(team->'Principles') AS principle_count
FROM checkpoints,
jsonb_array_elements(document_json->'Teams') AS team
WHERE sequence_number = (SELECT MAX(sequence_number) FROM checkpoints);
This is an operational convenience provided by the JSONB choice, not an application feature.
Event Decomposition¶
Some user actions trigger multiple state changes. Rather than creating one complex event, the processor decomposes the action into discrete, self-contained events.
The problem¶
When a user assigns a principle from another team to their objective, the mechanics require: (a) create an independent copy of the principle in the owning team, and (b) link it. A single monolithic event would obscure the individual state changes in history.
The solution¶
Event handlers can return DerivedCommands — a list of additional commands to process. The processor processes each sequentially, collecting emitted events. Each derived command becomes its own stored event with its own sequence number.
Example: cross-team principle assignment¶
sequenceDiagram
participant Client
participant EP as Event Processor
Client->>EP: assign_principle_to_objective (source from Team A)
Note over EP: Detects cross-team assignment
EP->>EP: Derived: create_principle (in Team B, with title + description + sourceTeamId)
EP->>EP: Derived: assign_principle_to_objective (record link)
EP-->>Client: Result with 2 emitted events
Why decompose?¶
- Readable history — each stored event is a simple fact: "principle created," "title set," "principle assigned"
- Per-entity history — the new principle's history shows its creation; the objective's history shows the assignment
- Re-apply support — individual events can be re-applied without re-running the entire cross-team copy logic
EmittedEvents as source of truth
ProcessingResult.EmittedEvents is the authoritative list of what gets persisted. The API layer persists EmittedEvents, never the raw request data. For rejected events (where EmittedEvents is empty), the raw request is stored as an audit record.
Cold Start Replay¶
On application startup, ProcessorInitializer reconstructs the in-memory state:
sequenceDiagram
participant Init as ProcessorInitializer
participant CP as CheckpointStore
participant ES as EventStore
participant EP as EventProcessor
Init->>CP: GetLatestAsync()
CP-->>Init: Checkpoint (document JSON + seq number)
Init->>Init: Deserialize Strategy from JSON
Init->>EP: LoadFromCheckpoint(document, seqNumber)
Init->>ES: GetEventsAfterAsync(seqNumber)
ES-->>Init: Events since checkpoint
loop Each event
Init->>EP: ReplayEvent(storedEvent)
end
Init->>EP: ResetCheckpointCounter()
Init->>CP: SaveAsync(new checkpoint)
Note over Init: Ready to accept requests
ReplayEvent differs from ProcessEvent in that it skips rejected events (they are already stored) and does not generate notifications.
Point-in-Time Restore¶
The system can reconstruct state at any historical sequence number using BuildDocumentAtSequence:
- Find the nearest checkpoint at or before the target sequence number
- Replay events from the checkpoint to the target sequence
- Return the resulting document
This is used by the Admin Site's "Restore" feature, which submits a restore_history event that replaces the current document with the reconstructed state and forces an immediate checkpoint.
History Associations¶
Entity history queries cannot rely on TargetId alone because create events use TargetId for the parent entity (not the created entity), and cascaded effects (e.g., deleting a principle that was assigned to objectives) don't reference the affected entities in TargetId.
The history_associations table solves this by explicitly mapping each event to all entities it affects:
| Column | Type | Description |
|---|---|---|
entity_id |
UUID | The entity whose history this entry belongs to |
event_sequence |
BIGINT | The triggering event's sequence number |
previous_sequence |
BIGINT? | For field edits: the previous event that modified the same field (enables before/after display) |
is_transitive |
BOOLEAN | false for direct actions, true for cascaded effects |
How it works¶
Each event handler emits HistoryAssociation records alongside notifications. Three patterns:
- Pattern A (field edits): Captures
PreviousSequencefromFieldSequencesbefore mutation, enabling old→new value display without duplicating event data - Pattern B (structural): Simple direct association (create, delete, reorder, assign/remove)
- Pattern C (cascaded): Direct association for the target entity plus transitive associations for affected entities (e.g.,
delete_principle→ direct for principle + transitive for each objective that referenced it)
Cross-team copy provenance¶
When assign_principle_to_objective triggers a cross-team copy, the derived create_principle event carries sourceTeamId in its data dictionary. The history display uses this to show "Created principle (copied from 'Team Name')" instead of a generic creation message.
Backfillability¶
The association table can be rebuilt from the event stream at any time by replaying events through the processor and collecting the emitted associations. If association persistence fails, events are still correct — associations can be backfilled later.
Restore cleanup¶
When a restore_history event is processed, all associations with event_sequence greater than the target sequence are deleted, keeping the association table consistent with the restored state.
History Rebuild (Reindexing)¶
The history associations and checkpoints can be fully rebuilt from the event stream via POST /api/admin/rebuild-history. This is useful for:
- Backfilling associations for events created before the association system existed
- Recovering from corrupted checkpoints
- Reapplying updated processing logic to the entire event log
Rebuild flow¶
sequenceDiagram
participant Admin
participant API
participant RS as RebuildState
participant PL as ProcessingLock
participant DB as PostgreSQL
Admin->>API: POST /api/admin/rebuild-history
API->>RS: Suspend() — live path skips associations & checkpoints
API->>DB: DELETE all checkpoints
API->>DB: DELETE all history_associations
loop Each stored event
API->>API: ReplayEventWithAssociations()
opt Every 100 applied events
API->>DB: SaveBatch(associations)
API->>DB: Save checkpoint
end
end
Note over API,PL: Edge handover
API->>PL: WaitAsync() — acquire processing lock
API->>DB: Read gap events (arrived during rebuild)
loop Each gap event
API->>API: ReplayEventWithAssociations()
end
API->>DB: SaveBatch(gap associations)
API->>DB: Save final checkpoint
API->>RS: Resume() — re-enable live path
API->>PL: Release()
API-->>Admin: { eventsProcessed, checkpointsSaved, associationsSaved }
Suspension mechanism¶
During rebuild, the live event processing path continues normally (document mutation, event persistence, SSE notifications) but skips two operations:
- Association persistence — checked via
RebuildState.IsSuspended - Checkpoint saves — checked via
RebuildState.IsSuspended
This means users can keep working during a rebuild — they just temporarily lose entity history visibility.
Edge handover¶
At the end of the rebuild, the service acquires the ProcessingLock to process any events that were submitted during the rebuild. This ensures no events are missed. The lock hold time is minimal — just the gap events.
Concurrency guard¶
RebuildState.IsRunning prevents concurrent rebuilds. If a rebuild is already in progress, the endpoint returns 409 Conflict.
TargetId Conventions¶
Events use TargetId with different semantics depending on the event type:
| Event Type | TargetId Meaning | New Entity ID |
|---|---|---|
create_team |
(none) | data["id"] |
create_group |
Parent team ID | data["id"] |
create_principle |
Parent team ID | data["id"] |
create_objective |
Parent team ID | data["id"] |
create_initiative |
Parent objective ID | data["id"] |
update_*, delete_* |
The entity being modified | (n/a) |
Note
For create events, TargetId is the parent entity, not the new entity. The new entity's ID is always in data["id"].
Further Reading¶
- Event Processing — Handler details and the 29 event types
- Conflict Resolution — How concurrent edits are detected and resolved
- Entities — The domain model and entity relationships