Roadmap

Completed

MilestoneSummary
M1–M3Single-node log with Intention DAG/HLC, DAG conflict resolution (LWW), async actor pattern, two-node sync via Iroh, Mesh API with token-based join
M4Generic RSM platform: extracted lattice-model, lattice-kvstore, WAL-first persistence, gRPC introspection
M5Atomic multi-key transactions: batch().put(k1,v1).put(k2,v2).commit()
M6Multi-store: root store as control plane, StoreManager with live reconciliation, per-store gossip
DecouplingNode/Net separation via NodeProvider traits, NetEvent channel owned by net layer, unified StoreManager
Kernel AuditOrphan resolution refactor, MAX_CAUSAL_DEPS limit, zero .unwrap() policy, clean module hierarchy
M7Client/Daemon split: library extraction, lattice daemon with UDS gRPC, LatticeBackend trait, RPC event streaming, socket security
M8Reflection & introspection: lattice-api crate, deep type schemas, ReflectValue structured results, store watchers & FFI stream bindings
M10Fractal Store Model: replaced Mesh struct with recursive store hierarchy, TABLE_SYSTEM with HeadList CRDTs, RecursiveWatcher for child discovery, invite/revoke peer management, PeerStrategy (Independent/Inherited), flattened StoreManager, NetworkService (renamed from MeshService), proper Node::shutdown()
M11Weaver Migration & Protocol Sync: Intention DAG (IntentionStore), Negentropy set reconciliation, Smart Chain Fetch, stream-based Bootstrap (Clone) Protocol without gossip storms
M12Network Abstraction & Simulation: Transport/GossipLayer traits, IrohTransport extracted to lattice-net-iroh, ChannelTransport/BroadcastGossip in lattice-net-sim, gossip lag tracking, event-driven gap handling, SessionTracker decoupling, symmetric Negentropy sync
M13Crate Dependency Architecture: removed lattice-kernel from lattice-net, lattice-net-types, lattice-systemstore; moved proto types to lattice-proto; elevated shared types to lattice-store-base; removed phantom deps from lattice-bindings and lattice-api; moved WatchEvent/WatchEventKind from kvstore-api to kvstore (flipped dependency); kvstore-api demoted to dev-dep in lattice-node; RuntimeBuilder::with_opener() plugin mechanism with core-stores feature flag; store type constants removed from lattice-node re-exports
M14Slim Down Persistent State: DagQueries trait with find_lca/get_path/is_ancestor (causal-only BFS), KVTable unified state engine (shared by KvState + SystemTable), write-time LWW resolution, slim on-disk format (materialized value + intention hash pointers), ScopedDag wrapper, conflict detection on read (get/list return conflicted flag), Inspect command for full conflict state, branch inspection (store debug branch) via LCA + path traversal
M15Review & cleanup: (A) typed error enums replacing ~50 map_err + all String error variants, IntoStatus gRPC mapping, fixed discarded witness() result; (B) eliminated all production sleeps (event-driven auto-sync, Notify-based store lookup), standardized 26 test sleeps to tokio::time::timeout; (C) store_type threaded through JoinResponse proto → join flow, removed STORE_TYPE_KVSTORE hardcodes, dropped lattice-kvstore dev-dep from lattice-net-iroh; (D) removed sync_all thundering-herd fallback from handle_missing_dep — 2-tier recovery (fetch_chain → sync_with_peer) with sequential alternative-peer fallback via acceptable authors; ChannelNetwork::disconnect() for partition testing
M16Uniform Store Traits + Witness-First Architecture: (A) extracted SystemState from SystemLayer, unified transaction ownership, removed PersistentState<T>, slimmed StateMachine to apply() only; (B) ScopedDb for domain crates, StateBackend owned by SystemLayer, StateFactory folded into StateLogic::create(), SystemState implements StateLogic; (C) witness-first — witness log is WAL, project_new_entries() is single path into state machine, projection cursor tracks progress; (D) StoreRegistry absorbed into StoreManager, meta.db STORES_TABLE populated, recovery from persisted state; (E) bootstrap security tests (substituted intentions, invalid signatures); (F) ProjectionStatus stall reporting, non-blocking projection; (G) apply_witnessed_batch collapsed into apply_ingested_batch, projection cursor uses witness content hash, scan_witness_log takes seq; (H) updated StateMachine error contract for witness-first semantics; (I) deleted StoreTypeProvider (merged into StateMachine/StateLogic), removed backfill_store_types migration, removed StoreRegistry
M17Housekeeping: (A) test infrastructure unification — consolidated test_node_builder, TestHarness, TestStore, TestCtx, MockProvider, helper functions into lattice-mockkernel; replaced 6 mock state machines with NullState + TrackingState; (B) test scope review — replaced KvState with NullState in non-KV tests, coverage audit; (C) proto definition audit; (D) crate dependency graph review — removed dead deps and overly broad re-exports

Milestone 18: Store & Log Lifecycle

Store/mesh lifecycle operations and log growth management. Two themes: (1) how users create, join, leave, and delete stores/meshes; (2) how the system manages unbounded log growth via epochs, pruning, and finality.

See: Connected DAG & Genesis Commitment for the epoch/pruning design. See also: docs/content/design/revocation-*.dot for DAG diagrams (render with dot -Tpdf).

Open Questions

These require design decisions before the relevant sub-milestones can be implemented.

1. Concurrent epoch creation. If two nodes simultaneously author epoch-triggering system ops (e.g., A revokes C while B revokes D at the same time), both actors create epoch N with the same seq but different hashes, different frontiers, and different required_acks. The current design has no resolution. Options:

2. Child store epochs on parent revocation. Child stores with PeerStrategy::Inherited share the parent’s PeerManager (same Arc). Gossip filtering (can_accept_gossip) therefore already reflects the parent’s revocation immediately — no separate peer governance needed in the child.

However, the child still needs its own epoch intention in its own DAG for: pruning boundary, snapshot computation, gossip topic rotation, and negentropy sync scoping. The child’s epoch must cite the child’s own genesis and author tips, with required_acks from the shared PeerManager.

Proposed mechanism: After the parent’s revocation creates epoch N on the parent’s actor, StoreManager propagates the epoch to inherited children. For each open child store with PeerStrategy::Inherited, it submits a kernel-level EpochOp directly to the child’s actor (not via the SSM — no system batch needed). The child’s epoch cites the child’s genesis + child’s current author tips. required_acks is read from the shared PeerManager.

Open sub-questions:

3. Offline node returns after multiple epochs. A node offline during epoch 1 and epoch 2 receives both on reconnect. It must ack both. If it acks epoch 2 (which transitively cites epoch 1 via the DAG), does that satisfy the epoch 1 required_acks check? The answer depends on whether the settlement check is “B’s tip cites epoch N directly” or “transitively.” If transitive, one ack for epoch 2 satisfies both — simpler. Needs to be made explicit.

18-Lifecycle: Store & Mesh Lifecycle

Complete the store/mesh lifecycle operations. Currently only create, join, and archive-child exist.

Join & Leave:

Add & Remove Stores:

Peer Management:

18A: Witness-Only Sync ✅

Negentropy set reconciliation now operates on witnessed intentions only. Floating (unwitnessed) intentions are excluded from fingerprints and range queries.

18B: Meta Table Separation

Prerequisite for epoch indexes and headless replication. Separate store metadata into two tables reflecting the log.db / state.db split. log.db is the durable backbone that always exists; state.db is optional (only present when an opener is available and projection is active).

Both databases retain store_id and store_type for independent identity verification — if files are moved or corrupted, each database rejects mismatches on open. log.db is the authoritative source (used by peek_info() to decide which opener to use); state.db uses them as a consistency check.

Rename TABLE_METATABLE_STATE_META (stays in state.db):

New TABLE_LOG_META (in log.db, alongside TABLE_INTENTIONS, TABLE_WITNESS, etc.):

This separation enables headless replication: a node can participate in sync and witnessing for a store it doesn’t have an opener for. log.db + TABLE_LOG_META is self-sufficient for the replication layer. state.db + TABLE_STATE_META is only needed for projection.

18B½: Self-Contained State (Pruning Prerequisite)

The state machine’s materialized state must be fully self-contained — no DAG lookups during conflict resolution or value reads. This is a hard prerequisite for pruning: once intentions are deleted from log.db, any code that reaches back into the DAG to resolve state breaks.

Currently KvTable::apply_head() calls dag.get_intention() to fetch the current winner’s (timestamp, author) for LWW tiebreaking. After pruning, that intention may be gone.

Fix: Store (hash, timestamp, author) per head in the Value proto rather than bare hashes. Conflict resolution becomes self-contained.

Note: This also applies to SystemTable — any system key with concurrent writes (e.g., concurrent SetPeerStatus) resolves via the same LWW mechanism. Same fix applies.

Migration: Existing state.db values are in the old bare-hash format. Full replay is required: bump KEY_SCHEMA_VERSION in TABLE_META, detect the old version on startup, delete state.db, let the projection loop rebuild from log.db with the new format. All values written during replay will be in the new head_entries format. This reuses the existing state.db recovery mechanism (test_state_db_recovery already validates this path).

18C: Epoch Intentions

Epoch intentions bridge the system and data partitions, enable revocation enforcement at the projection layer, and define pruning boundaries. See connected-dag.md Part 3 for the full design.

Kernel-level ops (UniversalOp variants):

Epoch creation flow:

Settlement and pruning:

Projection filter:

18D: Pruning Execution

Physical deletion of intentions below a settled epoch. The coordination protocol (epoch + epoch ack) is in 18C. This milestone covers the actual deletion and sync implications.

18E: Epoch-Aware Bootstrap Snapshots

Materialized state snapshots for efficient bootstrap of new peers joining a pruned store. Distinct from epoch intentions (which are DAG structural checkpoints) — these carry the projected state.

18E½: Store-Level Pruning Hints

Store-defined intention metadata that guides consensus pruning. Both hints are advisory — pruning only happens once all peers have attested past the relevant intentions (standard frontier rules).

18F: Checkpointing / Finality

18G: Hash Index Optimization ✅

18H: Advanced Sync Optimization (Future)


Milestone 19: Store Bootstrap

Rework the bootstrap protocol for both root and child stores. Witness records are the trust chain — they prove each intention was authorized at the time it was witnessed.

Root store bootstrap requires witness records because the joining node has no peer list yet and can’t independently verify intention signatures. It relies on the inviter’s witness signatures to vouch for each intention.

Child stores currently use Negentropy (set reconciliation) instead of the bootstrap protocol. This works only because peer revocation is unimplemented. Once peers can be revoked, a syncing node can’t distinguish “authored by someone authorized at the time” from “authored by a now-revoked peer.” Witness records from an authorized peer prove the intention was valid when witnessed.

19A: Split Witness and Intention Transfer

19B: Child Store Bootstrap

19C: Bootstrap Controller

19D: Pruning-Aware Bootstrap


App Hosting

Serve web applications at subdomains, backed by Lattice stores. Independent of milestone numbering — can be worked in parallel with M18/M19.

See: App Hosting Design for the full design.

A: Node-Local App Bindings ✅

AppBinding struct, MetaStore CRUD, LatticeBackend trait methods, REST API (/api/apps), subdomain routing, embedded app bundles, app shell HTML with SDK bootstrap.

A½: App Hosting Hardening ✅

Security, architecture, and web UI improvements.

A⅔: Web UI Modernization ✅

ES modules, path-based routing, event-driven state, dashboard.

B: Store Claiming via SystemOp

Tag stores with an app-id so apps can discover which stores belong to them. A store can be claimed by one app type; multiple stores can share the same app-id.

C: Claim-Aware Discovery

Apps discover their stores by filtering on app-id claims.

D: Auto-Provisioning

One-step app registration: create store, claim it, bind subdomain.

E: SDK List Response Convention

The SDK must not auto-unwrap repeated fields from protobuf responses. Implicit unwrapping breaks the moment a response gains pagination, metadata, or a second repeated field.


Milestone 20: Content-Addressable Store (CAS)

Node-local content-addressable blob storage. Replication policy managed separately. Requires M11 and M12.

20A: Low-Level Storage (lattice-cas)

20B: Replication & Safety (CasManager)

20C: Wasm & FUSE Integration

20D: CLI & Observability


Milestone 21: Lattice File Sync MVP

File sync over Lattice. Requires M11 (Sync) and M20 (CAS).

21A: Filesystem Logic

21B: FUSE Interface


Milestone 22: Wasm Runtime

Replace hardcoded state machines with dynamic Wasm modules.

22A: Wasm Integration

22B: Data Structures & Verification


Milestone 23: N-Node Simulator

Scriptable simulation framework for testing Lattice networking at scale. Built on the lattice-net-sim crate (M12B).


Milestone 24: Embedded Proof (“Lattice Nano”)

Run the kernel on the RP2350.

Because CLI is already separated from Daemon (M7) and storage is abstracted (M9), only the Daemon needs porting. Note: Requires substantial refactoring of lattice-kernel to support no_std.

24A: no_std Refactoring

24B: Hardware Demo


Technical Debt


Discussion


Future