Architecture · UNBLOCK docs — what's running, what's wired, what's planned

10 load-bearing tables · 45 deployed · 26 planned · 13 block types.

Deployed (prod D1 unblock-catalog-v01, schema v30 · just bumped 2026-05-05). 45 tables across four tiers (substrate · economic · cognition + ops · typed memory). blocks carries the load (~890k rows); 10 tables are wired into request paths; the rest are schema-only or background-fill. Per ADR-008, conversation + utterance are tier-1 block types — passive capture lives in the same table as everything else. The 13 block_type values: note, snippet, doc, code, trace, decision, anti-pattern, dataset, exploit, kg, conversation, utterance, other (per migrations/001_init.sql CHECK). Migration 030 (today, ADR-037) added 4 typed-memory tables: working_memory (per-user 2000-cap hot tier with composite-score eviction), rules (typed memory: rule/pattern/anti-pattern/preference/constraint/exception/convention with forward-pointer supersession), rule_scopes (multi-scope applicability), and rule_violations (observability for confidence decay). Plus 2 FTS5 virtuals. Schema borrows from Mem0 (working-memory shape), Memanto (typed-memory schema, 89.8% LongMemEval with one typed table), and Hindsight (confidence/bank separation). No KG layer — explicitly skipped per ADR-037.

Planned (codegen at migrations/000_dev_db.sql, not deployed). 26 tables generated by scripts/schema/def.py per ADR-034 + ADR-036 (per-user D1 + shared marketplace D1). Two tables (users, extraction_jobs) appear in both deployed and planned — counted once. So the integrated ERD shows 69 distinct tables: 45 deployed (43 unique post mig 030) + 26 planned (24 unique) + 2 evolving = 69 nodes. The "Full ERD" tab below renders all of them, color-coded.

View as:

Table	What it holds	Key columns
blocks	Every signed unit of knowledge. 13 block_types: `note`, `snippet`, `doc`, `code`, `trace`, `decision`, `anti-pattern`, `dataset`, `exploit`, `kg`, `conversation`, `utterance`, `other`.	block_id · user_id · parent_block · content_hash · scope · embedding_id
users	Tenant identity. Privy ID · wallet address · API-key hash collapse here.	user_id · privy_id · wallet_addr · created_at
grants	Read/write delegation between users on specific blocks. Issued by `share`; revocable.	grant_id · block_id · grantor · grantee · permissions · expires_at
listings	Marketplace state — block + price + tier + seller. Off-chain mirror; on-chain settlement via `relay`.	listing_id · block_id · seller · price_unblock · tier · status
purchases	Settled buys. tx_hash links to Base Sepolia.	purchase_id · listing_id · buyer · tx_hash · paid_at
cap_tokens	Ed25519-signed delegation tokens · 3 classes (read-only · scoped-write · admin-handoff). DDL deployed; verifier middleware on `/v1/query`; revocation gossip and EIP-1271-wallet → Ed25519-cap minting bridge are Phase 4 (see `revoked_cap_tokens`).	cap_token_id · block_id · issuer · recipient · permissions · revoked_at
outcome_traces (0 rows · writers Phase 4)	Did the buyer's task succeed using this block? Schema deployed. Writers unwired — no settlement path is producing traces yet. Reputation rollup endpoint exists in OpenAPI but reads against an empty table.	trace_id · block_id · agent_id · verdict · evidence_hash
escalations (0 rows · endpoint live)	When an agent can't resolve a query — pop up to a higher-tier validator or human reviewer. POST endpoint live; verifier wiring is Phase 4.	escalation_id · query · originating_agent · resolver · resolved_at

The tree shape matters. Conversation → utterance → derived (decision / anti-pattern / kg / dataset) is the load-bearing distinction. Every block knows its parent_block, so a decision can cite the exact utterance that minted it. Even rejected_alternatives stay linked — the platinum signal is the full deliberation, not just the picked outcome (ADR-008).

Integrated schema · table color = where it sits today.

"Implemented" in this doc means schema deployed in prod D1. It does not mean writers are firing or reads are externally usable. The honest split-by-axis is in the table below; the diagram color reduces to three visual states for legibility.

SCHEMA IN PROD · DDL on prod D1 v29 (41 tables; subset has rows)

PLANNED · ADR-036 codegen (26 tables, not deployed)

EVOLVING · in both, schema diverges

Per-table maturity (the 7-axis split, collapsed to a quick reference):

Tier	DDL deployed	Writes live	Reads live (external)	Prod traffic	Net / mainnet
Substrate (`blocks`, `extracted_facts`, FTS shadows)	Yes (8/8)	Yes (8/8)	Yes — `/v1/remember`, `/v1/query`, `/v1/extract`	Single-tenant dogfood	—
Economic (`listings`, `purchases`, `grants`, `cap_tokens`, `attestations`, `verifications`, `request_costs`)	Yes (11/11)	Partial · `request_costs` 2,420 rows; 10/11 are 0 rows	Endpoints live; reads return empty for 10/11	None	Base Sepolia testnet
Cognition (`belief_states`, `block_reconsolidations`, 12 others)	Yes (14/14)	Partial · 3/14 firing (`block_reconsolidations` 7,442 · `belief_states` 1,081 · `precision_state` 15)	Only `block_reconsolidations` reads back into `/v1/query`	None	—
Ops (`workspace_state`, `workspace_audit`, `escalations`, `consumer_tasks`, `webhooks`, `webhook_deliveries`)	Yes (6/6)	No (0/6)	POST endpoints live for `escalations`, `webhooks`	None	—
Typed memory · ADR-037 (`working_memory`, `rules`, `rule_scopes`, `rule_violations`) [mig 030, today]	Yes (4/4 · v30)	No (0/4)	No — wiring planned next sprint into `/v1/remember` + `/v1/query`	None	—

Test coverage and revenue-bearing axes are tracked separately in docs/handoff/COVERAGE-AUDIT-20260428.md; today there is no revenue and no mainnet.

Integrated tier overview · all 65 tables · color = status

flowchart LR subgraph T1["Tenancy + Identity"] direction TB USERS["users"]:::evolve UEA["user_external_accounts"]:::plan ORGS["orgs"]:::plan AGENTS["agents"]:::plan PERSONAS["personas"]:::plan AUTH["auth_nonces"]:::impl end subgraph T2["Substrate · prod (collapsed) vs dev (split)"] direction TB BLOCKS["blocks (13 types)"]:::impl BFTS["blocks_fts"]:::impl CONV["conversations"]:::plan UTT["utterances"]:::plan SP["source_provenance"]:::plan CP["conversation_participants"]:::plan CA["conversation_annotations"]:::plan UA["utterance_attributions"]:::plan EQP["eval_qa_pairs"]:::plan EQE["eval_qa_evidence"]:::plan end subgraph T3["Per-utterance extraction"] direction TB EFACTS["extracted_facts"]:::impl EFFTS["extracted_facts_fts"]:::impl UCLS["utterance_classifications"]:::plan EM["entity_mentions"]:::plan GFACTS["granite_facts"]:::plan end subgraph T4["Per-conversation derived"] direction TB SC["synthesis_cache"]:::impl TS["topic_segments"]:::plan KFACTS["kimi_facts"]:::plan KFE["kimi_fact_evidence"]:::plan end subgraph T5["Cross-conversation"] direction TB CE["canonical_entities"]:::plan PAT["patterns"]:::plan PE["pattern_evidence"]:::plan end subgraph T5b["User wiki / KG"] direction TB UW["user_wiki"]:::plan UWE["user_wiki_evidence"]:::plan end subgraph ECON["Economic / Marketplace"] direction TB LIST["listings"]:::impl PUR["purchases"]:::impl GR["grants"]:::impl ATT["attestations"]:::impl VER["verifications"]:::impl RCT["revoked_cap_tokens"]:::impl OPC["op_costs"]:::impl RC["request_costs"]:::impl UCR["user_costs_rollup"]:::impl FQ["forget_queue"]:::impl FA["forget_audit"]:::impl end subgraph BRAIN["Cognition · brain primitives"] direction TB ABS["abstractions"]:::impl BS["belief_states"]:::impl BOR["block_outcome_rollup"]:::impl BR["block_reconsolidations"]:::impl EXEC["executions"]:::impl MJ["metacog_judgments"]:::impl OT["outcome_traces"]:::impl PLN["plans"]:::impl PD["policy_decisions"]:::impl PS["precision_state"]:::impl RR["replay_runs"]:::impl RS["replay_sequences"]:::impl SM["self_model"]:::impl SIM["simulations"]:::impl end subgraph OPS["Ops"] direction TB CT["consumer_tasks"]:::impl ESC["escalations"]:::impl WH["webhooks"]:::impl WD["webhook_deliveries"]:::impl WS["workspace_state"]:::impl WA["workspace_audit"]:::impl end subgraph TM["Typed memory · ADR-037 · mig 030 (just landed)"] direction TB WM["working_memory"]:::evolve WMFTS["working_memory_fts"]:::evolve RUL["rules"]:::evolve RULFTS["rules_fts"]:::evolve RSC["rule_scopes"]:::evolve RVI["rule_violations"]:::evolve end subgraph T6["Orchestration + GC"] direction TB EJ["extraction_jobs"]:::evolve EXP["experiments"]:::impl SV["schema_version"]:::impl EPD["embeddings_pending_delete"]:::plan end USERS --> AGENTS USERS --> UEA USERS --> BLOCKS USERS --> CONV USERS --> UW ORGS --> CONV AGENTS --> CONV PERSONAS -.-> USERS PERSONAS -.-> AGENTS BLOCKS --> EFACTS BLOCKS --> EJ BLOCKS -.->|FTS5 shadow| BFTS EFACTS -.->|FTS5 shadow| EFFTS CONV --> UTT CONV --> SP CONV --> CP CONV --> CA CONV --> EQP EQP --> EQE UTT --> UA UTT --> UCLS UTT --> EM UTT --> GFACTS CONV --> TS CONV --> KFACTS KFACTS --> KFE EM --> CE PAT --> PE KFACTS -.-> PE UTT -.-> PE UW --> UWE KFACTS -.-> UWE PAT -.-> UWE CE -.-> UWE BLOCKS -.->|cap-token gates| LIST BLOCKS -.-> ATT BLOCKS -.-> VER BLOCKS -.-> GR BLOCKS -.-> FQ LIST --> PUR USERS -.-> RC USERS -.-> UCR BLOCKS -.->|hard FK| BR BS -.-> BS RR --> RS PLN -.-> EXEC PD -.-> EXEC PD -.-> MJ CT -.-> ESC CT -.-> OT WH --> WD WS --> WA EJ -.->|drains| GFACTS EJ -.->|drains| KFACTS EJ -.->|drains| UCLS EJ -.->|drains| EM EJ -.->|drains| TS EJ -.->|drains| PAT EPD -.->|GC queue| GFACTS EPD -.->|GC queue| KFACTS EPD -.->|GC queue| UTT EPD -.->|GC queue| CE EPD -.->|GC queue| UW BLOCKS -->|CASCADE| WM WM -.->|FTS5 shadow| WMFTS USERS -.-> WM BLOCKS -.->|source provenance| RUL RUL -->|forward-ptr supersedes| RUL RUL -->|CASCADE| RSC RUL -.->|FTS5 shadow| RULFTS RUL -->|CASCADE| RVI BLOCKS -.->|context_block SET NULL| RVI classDef impl fill:#1d4720,stroke:#5ade88,color:#5ade88,stroke-width:1.5px classDef plan fill:#4a3a1d,stroke:#e8b65a,color:#f0d090,stroke-width:1.5px classDef evolve fill:#2a1d4a,stroke:#a583ff,color:#c8a0e8,stroke-width:1.5px

Solid arrows = hard FK (CASCADE / SET NULL / RESTRICT). Dashed = soft / denorm / cross-tier. Six tables are EVOLVING: users (current adds display_name / persona_id / tz / working_style; planned splits external_accounts JSON into user_external_accounts); extraction_jobs (current is conversation-scoped per ADR-033; planned generalizes to target_kind + job_kind covering granite-triple / kimi-reason / classify / ner / topic-segment / pattern-scan / embed / wiki-sync); plus the four typed-memory tables that just landed via migration 030 (working_memory, rules, rule_scopes, rule_violations) — DDL on prod D1 v30, writers wiring next sprint. Live counts (prod): blocks 890k · extracted_facts 63k · block_reconsolidations 7.4k · request_costs 2.4k · synthesis_cache 1.5k · belief_states 1.1k · users 28 · working_memory 0 · rules 0 · everything else 0.

Implemented · 45 tables

Prod D1 unblock-catalog-v01 · schema_version 30 · last migration 030_working_memory_and_rules.sql (today, 2026-05-05) · ~966k total rows · 0 rows in the 4 new typed-memory tables.

Planned · 26 tables

ADR-034 codegen · migrations/000_dev_db.sql · ADR-036 per-user D1 + shared marketplace D1 · 0 rows (not deployed). Migration 031 (api_keys + revocation, TRACK-F) proposed but not on main HEAD.

Evolving · 6 tables

users + extraction_jobs (ADR-036 splits); working_memory + rules + rule_scopes + rule_violations (ADR-037 mig 030, schema deployed today, writers next sprint).

The shape change · prod → planned.

blocks (12 semantic shapes, 890k rows) splits into conversations + utterances + source_provenance + conversation_participants + conversation_annotations + utterance_attributions + eval_qa_pairs + eval_qa_evidence. The 14 columns currently NULL on atomic rows (agent_id, workspace_path, git_branch, affect_*, etc.) get proper homes.
extracted_facts (63k rows, single triple-store) splits into granite_facts (per-utterance synchronous, IBM Granite extraction) + kimi_facts (per-conversation async, K2.5 reasoning with ts_start/ts_end EVENT-time anchors). Plus kimi_fact_evidence + pattern_evidence as proper join tables instead of JSON arrays.
New per-utt classification + NER tier — utterance_classifications (8-emotion Plutchik + 6-axis toxicity + 9 boolean signals), entity_mentions (sparse NER), topic_segments (per-conversation boundaries).
New cross-conversation tier — canonical_entities (de-duped KG nodes per user with FTS5) + patterns (temporal/thematic/linguistic/behavioral/co-occurrence/instinct rules; instinct rules are imported from operator memory at user creation).
New user_wiki tier — 19 section_kinds (identity / location / profession / projects / relationships / preferences / beliefs / goals / skills / health / timeline / communication-style / working-style / technical-stack / finances / education / hobbies / moods / general). Auto-marked needs_refresh=1 on every utterance ingest; cron synthesizes content_md from referenced facts/patterns/entities.
Tenant integrity moves from app-code to FK layer — composite UNIQUE on parents + composite FKs on every extraction child make cross-user / cross-tier writes impossible at write time. No more soft user_id string keys.
Vectorize cascade GC — new embeddings_pending_delete queue + DELETE/UPDATE/soft-delete triggers on every embedding-bearing source table. GDPR forget actually propagates to FTS shadow + Vectorize.
ADR-036 sharded tenancy — instead of one prod D1 with soft user_id, planned design is per-user D1 + shared marketplace D1. The 26-table schema is a SINGLE per-user template; marketplace D1 holds public listings + cross-user signals only. Scales linearly with users instead of contesting one DB.

Where to drill down.

Implemented (prod v29): docs/architecture/erd-current-20260503.md — 644 LOC, table-by-table ERD against live D1.
Planned (codegen): scripts/schema/def.py (declarative tables) → scripts/schema/gen.py (mixin-emitted SQL) → migrations/000_dev_db.sql (artifact). Lint at scripts/schema/lint.py enforces invariants on every regeneration.
ADR-036 sharding: docs/decisions/ADR-036-per-user-db-architecture.md — per-user D1 + shared marketplace D1.
Evolution rationale: ADR-008 (conversation tier), ADR-031 (two-tier extraction), ADR-033 (async extraction jobs), ADR-034 (dev DB schema codegen), ADR-036 (per-user DB).

▾ Detailed table-level ERDs (drill-down)

Per-tier erDiagram blocks below show full column lists, types, and relationship cardinalities. Section borders match status colors above.

▸ Implemented · prod D1 v29 — table-by-table

Current · Substrate tier (12 tables · ~956k rows)

erDiagram USERS { TEXT user_id PK "GLOB usr_[0-9a-f]*" TEXT privy_id TEXT wallet_address TEXT email TEXT kyc_status "none|pending|verified|rejected" TEXT sanctions_status "clear|flagged|banned" REAL attestation_score INT verified_count INT blocks_listed INT created_at } BLOCKS { TEXT block_id PK "GLOB blk_[0-9a-f]*" TEXT user_id "soft FK to users" TEXT content_hash TEXT block_type "12-enum incl. conversation|utterance" TEXT scope "private|team|public" TEXT encryption_mode TEXT parent_block FK "self · ON DELETE SET NULL" TEXT tags "JSON" TEXT metadata "JSON ≤16KiB" TEXT cid TEXT embedding_id "Vectorize id" REAL salience_score REAL affect_valence REAL affect_arousal TEXT agent_id TEXT session_id INT ts_absolute INT deleted_at INT is_listed INT is_forgotten } BLOCKS_FTS { TEXT block_id "UNINDEXED" TEXT content TEXT tags } EXTRACTED_FACTS { TEXT fact_id PK "GLOB fct_[0-9a-f]*" TEXT user_id "soft FK" TEXT source_block_id FK "ON DELETE CASCADE" TEXT conversation_id "blk_* (no FK)" TEXT subject TEXT predicate TEXT object TEXT subject_type "10-enum" TEXT object_type "10-enum" TEXT speech_act "6-enum" TEXT discourse_role "13-enum" TEXT topical_domain "12-enum" TEXT privacy_class "7-enum" TEXT behavioral_signal "9-enum" REAL confidence TEXT extraction_model TEXT embedding_id } EXTRACTED_FACTS_FTS { TEXT fact_id "UNINDEXED" TEXT subject TEXT predicate TEXT object TEXT raw_text } EXTRACTION_JOBS { TEXT job_id PK "GLOB job_[0-9a-f]*" TEXT conversation_id FK "CASCADE → blocks" TEXT user_id "soft FK" TEXT job_kind TEXT model TEXT status "pending|running|done|failed|cancelled" INT chunks_total INT chunks_done INT attempts INT facts_inserted } AUTH_NONCES { TEXT nonce PK TEXT intended_for "0x-lowercase wallet" INT expires_at INT consumed } SYNTHESIS_CACHE { TEXT cache_key PK "sha256 hex (64)" TEXT answer INT abstain REAL confidence TEXT model INT expires_at } EXPERIMENTS { TEXT experiment_id PK "GLOB exp_*" TEXT parent_id "self-ref (no FK)" TEXT hypothesis TEXT config "JSON" TEXT commit_sha INT seed TEXT dataset_name TEXT dataset_sha256 TEXT metrics "JSON" REAL cost_usd TEXT outcome "kept|reverted|inconclusive|pending" } SCHEMA_VERSION { INT version PK TEXT name INT applied_at TEXT applied_by TEXT note } BLOCKS ||--o{ BLOCKS : "parent_block (SET NULL)" BLOCKS ||--o{ EXTRACTED_FACTS : "source_block_id (CASCADE)" BLOCKS ||--o{ EXTRACTION_JOBS : "conversation_id (CASCADE)" BLOCKS ||..o{ BLOCKS_FTS : "shadow FTS (manual sync)" EXTRACTED_FACTS ||..o{ EXTRACTED_FACTS_FTS : "shadow FTS (manual sync)" USERS ||..o{ BLOCKS : "user_id soft (no FK)" USERS ||..o{ EXTRACTED_FACTS : "user_id soft (no FK)" USERS ||..o{ EXTRACTION_JOBS : "user_id soft (no FK)" EXPERIMENTS ||..o{ EXPERIMENTS : "parent_id soft (no FK)"

blocks is overloaded — 12 semantic shapes; conv/utt = 99.998% of rows. No FK from any user_id column → users.user_id (multi-tenant isolation is app-code only). FTS5 hand-synced — drift-prone. Source of truth for migrations is schema_version (29 rows); wrangler's d1_migrations is empty/bypassed.

Current · Economic tier (11 tables · 2,420 rows · 10 of 11 are 0)

erDiagram LISTINGS { TEXT listing_id PK TEXT block_id FK TEXT user_id "seller, soft FK" REAL price_unblock INT tier "1..5" TEXT category TEXT royalty_share_with "JSON" INT token_id "NFT id when minted" TEXT tx_hash INT is_delisted INT listed_at } PURCHASES { TEXT purchase_id PK "doubles as relay token" TEXT listing_id FK TEXT block_id FK TEXT buyer_user_id "soft FK" TEXT seller_user_id "soft FK" REAL paid_unblock TEXT tx_hash TEXT status "pending|settled|failed|expired" TEXT payment_method "STALE: still allows privy" INT expires_at } GRANTS { TEXT grant_id PK TEXT cap_token_id "soft FK cap_tokens" TEXT audience "user:|team:|project:|org:|public" TEXT subject_user_id "soft FK" TEXT scope_type "7-enum" TEXT block_id FK "nullable" TEXT scope_filter_json TEXT permissions "csv {read,write,share,admin}" INT issued_at INT expires_at INT is_revoked } ATTESTATIONS { TEXT attestation_id PK TEXT block_id FK TEXT validator_id "soft FK users" REAL score "0..1" TEXT attestation_text TEXT signature TEXT metadata "JSON" INT attested_at } VERIFICATIONS { TEXT verification_id PK TEXT block_id FK TEXT user_id "soft FK" TEXT content_hash TEXT signature INT verified_at } REVOKED_CAP_TOKENS { TEXT jti PK INT status_index INT revoked_at TEXT revoked_by TEXT reason } OP_COSTS { TEXT op_name PK "14-enum" TEXT user_id PK "NULL = global default" REAL cost_compute_ms REAL cost_dollars REAL cost_latency_ms REAL expected_accuracy INT sample_count INT updated_at } REQUEST_COSTS { TEXT request_id PK TEXT user_id "soft FK" INT ts TEXT endpoint TEXT provider TEXT model_id INT embed_tok INT rerank_tok INT synth_in_tok INT synth_out_tok REAL cost_usd TEXT usage_source "cf_reported|estimated" } USER_COSTS_ROLLUP { TEXT user_id PK "soft FK" TEXT period PK "day|week" INT period_start PK INT requests REAL cost_usd INT computed_at } FORGET_QUEUE { TEXT block_id PK,FK TEXT user_id "soft FK" INT forgotten_at INT hard_delete_eligible_at "+30d" TEXT reason INT cascade_count INT hard_deleted } FORGET_AUDIT { TEXT audit_id PK TEXT block_id "soft FK blocks" TEXT user_id "soft FK" TEXT mode "soft|hard" TEXT reason INT cascade_count INT occurred_at } LISTINGS ||--o{ PURCHASES : "purchase of"

Soft cross-tier FKs to blocks, users, cap_tokens (no hard CASCADE — economic tier is logical-only). Marketplace + grants + attestations + verifications all gated on Phase 4 wiring. purchases.payment_method CHECK still allows 'privy' — stale per feedback_no_privy.md; tighten to wallet|relay. user_costs_rollup cron isn't running (0 rows despite 2,420 live request_costs).

Current · Cognition+Ops · brain primitives (14 tables · 8,538 rows · 11 of 14 are 0)

erDiagram ABSTRACTIONS { TEXT abstraction_id PK TEXT source_block_id PK TEXT user_id TEXT abstraction_kind TEXT emitted_block_id INT position REAL confidence TEXT extra INT created_at } BELIEF_STATES { TEXT belief_id PK TEXT owner_user_id TEXT proposition_text TEXT status "committed|doubted|revised|withdrawn" REAL confidence TEXT evidence_block_ids TEXT supersedes_belief_id FK INT committed_at INT doubted_at INT revised_at INT withdrawn_at } BLOCK_OUTCOME_ROLLUP { TEXT block_id PK INT total_uses INT successful_uses INT failed_uses REAL weighted_score REAL attested_share INT rollup_at } BLOCK_RECONSOLIDATIONS { TEXT reconsolidation_id PK TEXT block_id FK TEXT triggered_by_query_id INT triggered_at INT applied_at TEXT original_text_hash TEXT rewritten_text_hash REAL confidence_drift TEXT method TEXT error } EXECUTIONS { TEXT execution_id PK TEXT owner_user_id TEXT plan_id "soft FK plans" TEXT action_id TEXT predicted_outcome TEXT actual_outcome REAL deviation_score TEXT status TEXT decision_id "soft FK" TEXT task_id "soft FK" } METACOG_JUDGMENTS { TEXT judgment_id PK TEXT owner_user_id TEXT target_block_id "soft FK" TEXT target_decision_id "soft FK" TEXT kind TEXT verdict REAL confidence REAL cost_of_error_usd INT judged_at INT calibration_outcome } OUTCOME_TRACES { TEXT trace_id PK TEXT task_id FK "consumer_tasks" TEXT block_id TEXT cap_token_id TEXT outcome_kind REAL weight TEXT attested_by TEXT attestation_sig INT recorded_at } PLANS { TEXT plan_id PK TEXT owner_user_id TEXT seed_state_hash INT horizon INT n_branches TEXT action_sequence TEXT expected_outcomes REAL total_evc REAL total_cost_usd } POLICY_DECISIONS { TEXT decision_id PK TEXT owner_user_id TEXT candidate_ops TEXT evc_scores TEXT selected_op } PRECISION_STATE { TEXT precision_id PK TEXT user_id TEXT context_id REAL pi_top_down REAL pi_bottom_up REAL unexpected_uncertainty_signal REAL re_orient_threshold INT last_re_orient_at } REPLAY_RUNS { TEXT replay_id PK TEXT user_id INT blocks_visited REAL compression_ratio TEXT triggered_by REAL salience_threshold } REPLAY_SEQUENCES { TEXT replay_id PK,FK INT sequence_position PK TEXT block_id REAL salience_at_replay TEXT emitted_block_id } SELF_MODEL { TEXT owner_user_id PK TEXT topic PK REAL strength_at_topic REAL weakness_at_topic INT samples } SIMULATIONS { TEXT simulation_id PK TEXT owner_user_id TEXT seed_state_hash TEXT kind INT horizon TEXT trajectory TEXT termination } BELIEF_STATES ||--o{ BELIEF_STATES : "supersedes" REPLAY_RUNS ||--o{ REPLAY_SEQUENCES : "contains" PLANS ||--o{ EXECUTIONS : "soft plan_id" POLICY_DECISIONS ||--o{ EXECUTIONS : "soft decision_id" POLICY_DECISIONS ||--o{ METACOG_JUDGMENTS : "soft target_decision_id" EXECUTIONS ||--o{ EXECUTIONS : "soft course_corrected_from"

Production status: 14 tables, 3 of them have rows. block_reconsolidations (7,442) is the only one wired into a request path — the sensing-gate + reconsolidation hook on /v1/query writes here. belief_states (1,081) and precision_state (15) are written by background standalone scripts; nothing reads them back into a user-facing flow. outcome_traces, plans, simulations, self_model, policy_decisions, metacog_judgments, executions, abstractions, replay_runs, replay_sequences, block_outcome_rollup = 0 rows. The schema is the bet on what these surfaces will look like when Phase 4 wires writers; treat the prose accordingly. Foot-gun: outcome_traces.task_id CASCADE from consumer_tasks would nuke settlement evidence — switch to SET NULL or RESTRICT before Phase 4 turns writers on. Only hard FK leaving this tier today: block_reconsolidations.block_id → blocks.

Current · Cognition+Ops · ops (6 tables · all 0 rows pre-Phase-4)

erDiagram CONSUMER_TASKS { TEXT task_id PK TEXT consumer_agent_id TEXT task_descriptor INT budget_cents INT spent_cents TEXT status REAL self_confidence TEXT parent_task_id "soft FK self" INT started_at INT ended_at } ESCALATIONS { TEXT escalation_id PK TEXT task_id "soft FK consumer_tasks" TEXT agent_id TEXT reason REAL confidence INT cost_committed_cents INT cost_proposed_cents TEXT artifact_url TEXT question TEXT proposed_action TEXT priority INT deadline INT resolved_at TEXT resolution TEXT resolver_user_id } WEBHOOKS { TEXT webhook_id PK TEXT user_id TEXT url TEXT events TEXT filter_json TEXT secret_hash INT active INT last_delivered_at } WEBHOOK_DELIVERIES { TEXT delivery_id PK TEXT subscription_id FK TEXT payload_id TEXT event_name TEXT payload_json INT attempts TEXT status TEXT last_error INT next_retry_at } WORKSPACE_STATE { TEXT workspace_id PK TEXT user_id INT k_capacity TEXT current_chunks INT ignited_at INT version } WORKSPACE_AUDIT { TEXT audit_id PK TEXT workspace_id TEXT user_id TEXT event "ignite|evict|chunk|refuse" TEXT reason TEXT chunk_id REAL evc INT k_at_event } CONSUMER_TASKS ||--o{ CONSUMER_TASKS : "soft parent_task_id" CONSUMER_TASKS ||--o{ ESCALATIONS : "soft task_id" WEBHOOKS ||--o{ WEBHOOK_DELIVERIES : "subscription_id" WORKSPACE_STATE ||--o{ WORKSPACE_AUDIT : "soft workspace_id"

workspace_state + workspace_audit implement the global-workspace "ignition" primitive (chunks compete for K-slot capacity). Escalations queue is the human-in-the-loop surface for cap-token-blocked or low-confidence agent decisions. All 0 rows — Phase 4 wires writers.

▸ Planned · ADR-036 dev codegen — table-by-table (not deployed)

Schema is generated · 2026-05-03. The 26-table dev schema is produced by Python codegen — the SQL is the build artifact, not the source of truth.

scripts/schema/def.py — declarative tables: tier, columns, mixins (tenant, lifecycle, audit, FTS5, embedding, soft-delete).
scripts/schema/gen.py — emits the migration (26 tables, 6 FTS5 virtuals, triggers, indexes).
scripts/schema/lint.py — invariant lint for tenancy, FK shape, GDPR-cascade.

Structural fixes folded into codegen (review by Codex + Gemini Pro 3.1; details in repo, not here):

Composite FKs in every join table (kimi_fact_evidence, pattern_evidence, user_wiki_evidence) — cross-tenant / cross-tier writes impossible at FK layer.
Composite UNIQUEs on every parent (conversations, utterances, granite_facts, kimi_facts, canonical_entities, patterns, user_wiki) — composite FKs above have a key to point at.
Vectorize cascade — 3 triggers per embedding-bearing table (DELETE / UPDATE / soft-delete) feed embeddings_pending_delete.
FTS5 GDPR sync split — _au_del + _au_ins with WHEN NEW.is_forgotten = 0 guard; soft-delete actually removes from FTS shadow.
Wiki invalidation triggers on every memory-source table; uw_mark_refresh debounces.
Soft-delete cascade conversations → utterances; eval_qa_evidence promoted to a real join table.

Migration artifact: migrations/000_dev_db.sql. Adding a table = declare it in def.py with the right mixins; lint enforces compliance.

Dev DB · 2026-05-03 · 26 base tables across 6 + 1 tiers + 6 FTS5 virtuals + auto-sync triggers. Includes user_wiki + user_wiki_evidence for per-user living knowledge synthesized from kimi_facts + patterns + canonical_entities; auto-marked needs_refresh=1 on every new utterance via trigger. Full DDL in the repo: migrations/000_dev_db.sql.

Skeleton · 6 tiers · 24 tables (post-Codex)

flowchart LR subgraph T1["Tier 1 · Tenancy + Identity"] USERS([users]) UEA([user_external_accounts]) ORGS([orgs]) AGENTS([agents]) PERSONAS([personas]) end subgraph T2["Tier 2 · Substrate"] CONV([conversations]) SRC_PROV([source_provenance]) CONV_PART([conversation_participants]) CONV_ANNOT([conversation_annotations]) UTT([utterances]) UTT_ATTR([utterance_attributions]) EVAL_QA([eval_qa_pairs]) end subgraph T3["Tier 3 · Per-utterance extraction"] UTT_CLASS([utterance_classifications]) ENT_MENT([entity_mentions]) GFACTS([granite_facts]) end subgraph T4["Tier 4 · Per-conversation derived"] TOPIC([topic_segments]) KFACTS([kimi_facts]) KFE([kimi_fact_evidence]) end subgraph T5["Tier 5 · Cross-conversation derived"] CANON([canonical_entities]) PATTERNS([patterns]) PE([pattern_evidence]) end subgraph T5b["Tier 5b · User wiki / KG"] UWIKI([user_wiki]) UWE([user_wiki_evidence]) end subgraph T6["Tier 6 · Orchestration + GC"] JOBS([extraction_jobs]) EPD([embeddings_pending_delete]) end USERS --> UEA USERS --> AGENTS USERS --> CONV ORGS --> CONV AGENTS --> CONV PERSONAS -.-> USERS PERSONAS -.-> AGENTS PERSONAS -.-> CONV_PART CONV --> SRC_PROV CONV --> CONV_PART CONV --> CONV_ANNOT CONV --> UTT CONV --> EVAL_QA UTT --> UTT_ATTR UTT --> UTT_CLASS UTT --> ENT_MENT UTT --> GFACTS CONV --> TOPIC CONV --> KFACTS KFACTS --> KFE UTT --> KFE ENT_MENT --> CANON PATTERNS --> PE KFACTS -.-> PE UTT -.-> PE UTT_CLASS -.-> PE JOBS -.-> UTT_CLASS JOBS -.-> ENT_MENT JOBS -.-> GFACTS JOBS -.-> KFACTS JOBS -.-> TOPIC JOBS -.-> PATTERNS EPD -.-> UTT EPD -.-> GFACTS EPD -.-> KFACTS EPD -.-> CANON EPD -.-> UWIKI USERS --> UWIKI UWIKI --> UWE KFACTS -.-> UWE PATTERNS -.-> UWE CANON -.-> UWE UTT -.-> UWIKI classDef tenancy fill:#1c2231,stroke:#6c4cff,color:#a583ff classDef substrate fill:#161b27,stroke:#d8965a,color:#d8965a classDef perutt fill:#0f1320,stroke:#5ade88,color:#5ade88 classDef perconv fill:#0f1320,stroke:#3aa5ad,color:#7fd6dc classDef crossconv fill:#0f1320,stroke:#a36cd6,color:#c8a0e8 classDef wiki fill:#0f1320,stroke:#e8b65a,color:#f0d090 classDef ops fill:#1c2231,stroke:#d8965a,color:#d8965a class USERS,UEA,ORGS,AGENTS,PERSONAS tenancy class CONV,SRC_PROV,CONV_PART,CONV_ANNOT,UTT,UTT_ATTR,EVAL_QA substrate class UTT_CLASS,ENT_MENT,GFACTS perutt class KFACTS,KFE,TOPIC perconv class CANON,PATTERNS,PE crossconv class UWIKI,UWE wiki class JOBS,EPD ops

Solid arrows = hard FK CASCADE. Dashed = soft / denorm / cascade-target. New post-Codex: 7 normalization tables (split-from-JSON + cascade GC + eval). New user_wiki layer (gold): per-user living knowledge across 19 section_kinds (identity / location / profession / projects / relationships / preferences / beliefs / goals / skills / health / timeline / communication-style / working-style / technical-stack / finances / education / hobbies / moods / general). Auto-marked needs_refresh=1 on every utterance ingest; cron synthesizes via job_kind='wiki-sync'.

Tier 1 · Tenancy + Identity (4 tables)

erDiagram USERS { TEXT user_id PK "GLOB usr_[0-9a-f]*" TEXT display_name "Viraj" TEXT email TEXT wallet_address TEXT location "Seattle WA" TEXT tz "America/Los_Angeles" TEXT external_accounts "JSON {kaggle,hf,github,...}" TEXT working_style "brutally-honest, no-placeholder-data" TEXT persona_id FK INT is_forgotten } ORGS { TEXT org_id PK TEXT name INT created_at } AGENTS { TEXT agent_id PK "GLOB agt_[0-9a-f]*" TEXT user_id FK "CASCADE" TEXT agent_kind "claude-code|cursor|codex|perplexity|custom" TEXT persona_id FK "SET NULL" TEXT name TEXT version TEXT fingerprint "stable across reinstalls" INT first_seen_at INT last_seen_at } PERSONAS { TEXT persona_id PK "GLOB per_[0-9a-f]*" TEXT user_id FK "owner; NULL = system-defined" TEXT name "CEO|CTO|senior-engineer|specialist-codex|maintainer|contributor|reviewer|triager|tsc-member|emeritus|bot" TEXT description TEXT responsibilities "JSON array" TEXT operating_rules "JSON array · CLAUDE.md style" TEXT quality_bar "goldman-sachs|production|prototype|hackathon" TEXT communication_cadence "JSON {silence_threshold, requires_status_line, ...}" TEXT escalation_policy "JSON {escalate_when, escalate_to}" TEXT authority_scope "autonomous|autonomous-on-implementation-only|requires-approval-for-all" } USERS ||--o{ AGENTS : "runs" PERSONAS ||--o{ USERS : "assigns role" PERSONAS ||--o{ AGENTS : "assigns role"

Personas encode CEO/CTO/engineer (CLAUDE.md alpha-as-CTO model) and OSS roles (maintainer/contributor/reviewer). Same persona pool used by users and agents. Operating rules + quality bar + escalation policy travel with the persona.

Tier 2 · Substrate (4 tables) · conversations

erDiagram USERS { TEXT user_id PK } ORGS { TEXT org_id PK } AGENTS { TEXT agent_id PK } CONVERSATIONS { TEXT conversation_id PK "GLOB conv_[0-9a-f]*" TEXT user_id FK "CASCADE" TEXT org_id FK "SET NULL · nullable" TEXT agent_id FK "SET NULL · nullable" TEXT provider "anthropic|openai|google|cf-workers-ai|xai|meta|mistral|eval|unknown" TEXT model_id "claude-opus-4-7|gpt-5" TEXT tier "prod|eval" TEXT source_kind "locomo|claude-code|cursor" TEXT source_id "LoCoMo sample_id|chat id" TEXT ingest_path "passive-middleware|explicit-post|bulk-import|eval-loader" INT started_at "REAL source date · UTC sec" INT last_utterance_at "REAL source date · UTC sec" INT closed_at "outcome != ongoing" TEXT started_at_tz "IANA tz · default UTC" TEXT started_at_precision "second|minute|day|month" INT ingested_at INT forgotten_at INT hard_delete_eligible_at "+30d cron picker" TEXT workspace_path TEXT git_branch TEXT git_sha TEXT user_tz TEXT locale "BCP-47 default en" TEXT participants "JSON array" INT utterance_count INT total_tokens_in INT total_tokens_out REAL total_cost_usd TEXT tools_used "JSON array" TEXT outcome "ongoing|completed|abandoned|errored" TEXT scope "private|team|public" INT is_forgotten TEXT metadata "JSON residual" INT created_at INT updated_at } USERS ||--o{ CONVERSATIONS : "owns" ORGS ||--o{ CONVERSATIONS : "team-scope" AGENTS ||--o{ CONVERSATIONS : "produced by"

UNIQUE constraint (SQL only): (user_id, source_kind, source_id) — idempotent re-ingest of LoCoMo by sample_id.

source_provenance · OSS-mining sidecar (1:1 with conversations)

erDiagram CONVERSATIONS { TEXT conversation_id PK } SOURCE_PROVENANCE { TEXT conversation_id PK "FK CASCADE · 1:1" TEXT source_repo "vllm-project/vllm" TEXT source_commit "sha" TEXT source_url TEXT source_author_name TEXT source_author_email INT source_author_date TEXT source_committer_name TEXT source_committer_email TEXT source_license "Apache-2.0|MIT|BSD-3|BSD-2|ISC|CC0-1.0|proprietary|unknown" TEXT original_rights_holder TEXT derivation_chain "JSON array" TEXT ingest_method "oss-mining-deterministic|oss-mining-llm|manual-import|github-sync" INT created_at } CONVERSATIONS ||--o| SOURCE_PROVENANCE : "OSS-sourced"

Captures OSS authorship + license + commit/repo + Git-style attribution. Per sop_oss_block_mining.md: accept Apache-2.0 / MIT / BSD-3 / BSD-2 / ISC / CC0; reject GPL / LGPL / AGPL.

conversation_annotations · sidecar for eval ground-truth + document-kind taxonomy

erDiagram CONVERSATIONS { TEXT conversation_id PK } CONVERSATION_ANNOTATIONS { TEXT conversation_id PK "FK CASCADE" TEXT kind PK "eval: event_summary|observation|session_summary · doc: readme|adr|rfc|runbook|postmortem|design-doc|changelog-entry|commit-message|pr-description|issue|general" TEXT payload "JSON" INT created_at } CONVERSATIONS ||--o{ CONVERSATION_ANNOTATIONS : "annotated by"

utterances · turn-by-turn · sessions collapsed

erDiagram CONVERSATIONS { TEXT conversation_id PK } USERS { TEXT user_id PK } AGENTS { TEXT agent_id PK } UTTERANCES { TEXT utterance_id PK "GLOB utt_[0-9a-f]*" TEXT conversation_id FK "CASCADE" TEXT user_id FK "CASCADE · denormed" TEXT tier "prod|eval · denormed" INT turn_index "0-based across whole conv" INT session_index "0-based within conv" INT intra_session_index "0-based within session" TEXT role "user|assistant|system|tool|speaker_a|speaker_b" TEXT speaker_label "LoCoMo Caroline; live NULL" TEXT speaker_persona_id FK "SET NULL · CEO/CTO/engineer" TEXT author_name TEXT author_email "Git-style" TEXT signed_off_by "JSON array of {name,email}" TEXT reviewed_by "JSON array" TEXT co_authored_by "JSON array" TEXT text "plain · NOT JSON-wrapped" TEXT text_hash "sha256 64-char" TEXT modality "text|image|audio|code|mixed" TEXT text_format "plain|markdown|code|json" TEXT language "BCP-47" INT ts "REAL source wall-clock UTC sec" TEXT ts_precision "second|minute|day|month" INT session_started_at "LoCoMo session_N_date_time epoch" TEXT source_msg_id "LoCoMo dia_id D1:1" TEXT client_msg_id "agent uuid for retries" TEXT agent_id FK "SET NULL · per-turn override" TEXT model_id "per-turn override" TEXT tool_name TEXT tool_call_id TEXT parent_utterance_id FK "self · SET NULL" TEXT attachments "JSON array" INT tokens_in INT tokens_out REAL cost_usd TEXT embedding_id "Vectorize id · NULL until embedded" INT is_forgotten TEXT metadata "JSON residual" INT ingested_at INT created_at INT updated_at } CONVERSATIONS ||--o{ UTTERANCES : "contains" USERS ||--o{ UTTERANCES : "owns (denormed)" AGENTS ||--o{ UTTERANCES : "generated (assistant turns)" UTTERANCES ||--o{ UTTERANCES : "tool-result thread"

UNIQUE constraints (SQL only): (conversation_id, turn_index) · (conversation_id, source_msg_id). Companion FTS5 virtual: utterances_fts(text, speaker_label).

Tier 3 · Per-utterance extraction (3 tables) · utterance_classifications

erDiagram UTTERANCES { TEXT utterance_id PK } UTTERANCE_CLASSIFICATIONS { TEXT classification_id PK "GLOB cls_[0-9a-f]*" TEXT utterance_id FK "CASCADE" TEXT user_id FK "denormed" TEXT tier "denormed" TEXT conversation_id "denormed" INT ts "denormed · REAL source date" TEXT sentiment "positive|negative|neutral|mixed" REAL valence "-1..1" REAL arousal "0..1" TEXT dominant_emotion "joy|trust|fear|surprise|sadness|disgust|anger|anticipation" REAL emotion_intensity REAL toxicity_score "Perspective-API style 0..1" REAL severe_toxicity_score REAL insult_score REAL threat_score REAL identity_attack_score REAL profanity_score TEXT abuse_target "self|other-speaker|third-party|concept" TEXT speech_act "statement|question|command|promise|exclamation|greeting|closing" TEXT intent "request-help|share-info|give-command|apologize|praise|criticize" INT is_praise TEXT praise_target INT is_acknowledgment INT is_humor INT is_complaint INT is_correction INT is_disagreement INT is_sarcasm TEXT formality "formal|informal|mixed" REAL certainty INT hedging INT questions_asked INT imperatives INT exclamations TEXT classifier_model TEXT classifier_version TEXT extraction_run_id FK REAL confidence TEXT raw_scores "JSON dump" } UTTERANCES ||--o{ UTTERANCE_CLASSIFICATIONS : "classified"

One row per utterance per classifier_run. Powers "find toxic / sarcastic / praising / questioning / hedged" filters. UNIQUE on (utterance_id, classifier_model, classifier_version).

entity_mentions · sparse per-utterance NER

erDiagram UTTERANCES { TEXT utterance_id PK } CANONICAL_ENTITIES { TEXT entity_id PK } ENTITY_MENTIONS { TEXT mention_id PK "GLOB men_[0-9a-f]*" TEXT utterance_id FK "CASCADE" TEXT conversation_id "denormed" TEXT user_id "denormed · CASCADE" TEXT tier "denormed" INT ts "denormed" TEXT surface_text "as it appears" TEXT entity_type "Person|Place|Org|Event|Time|Product|Concept|File|Code|Other" INT char_start INT char_end TEXT canonical_entity_id FK "SET NULL · linking" REAL link_confidence TEXT extraction_model TEXT extraction_run_id FK REAL confidence } UTTERANCES ||--o{ ENTITY_MENTIONS : "mentions" CANONICAL_ENTITIES ||--o{ ENTITY_MENTIONS : "linked to"

0..N rows per utterance for each named entity surface form. Linked to canonical_entities for cross-conversation dedup. Char offsets preserved for highlighting.

granite_facts · per-utterance triple extraction · memory-framed

erDiagram UTTERANCES { TEXT utterance_id PK } CONVERSATIONS { TEXT conversation_id PK } USERS { TEXT user_id PK } GRANITE_FACTS { TEXT fact_id PK "GLOB gft_[0-9a-f]*" TEXT source_utterance_id FK "CASCADE" TEXT conversation_id FK "CASCADE · denormed" TEXT user_id FK "CASCADE · denormed" TEXT tier "prod|eval · denormed" TEXT subject TEXT predicate TEXT object TEXT raw_text "excerpt that produced fact" TEXT subject_type "Person|Org|Place|Time|Event|Concept|Product|File|Code|Other" TEXT object_type TEXT speech_act "assertive|directive|commissive|expressive|declarative|interrogative" TEXT discourse_role "decision|hypothesis|correction|tangent|wrap-up" TEXT topical_domain "code|design|finance|legal|personal|health|relationship" TEXT privacy_class "pii|payment|health|credentials|safe" TEXT behavioral_signal "preference|intent|constraint|value|belief" TEXT memory_kind "user|feedback|project|reference|sop" TEXT why_text "past incident or motivation" TEXT how_to_apply_text "when this kicks in" INT is_stale TEXT stale_reason INT last_verified_at INT ts "denormed from utterance ts · REAL source date" TEXT ts_precision TEXT extraction_model "granite-4.0-h-micro" TEXT extraction_schema_version "triple-v1" TEXT extraction_run_id REAL confidence "0..1" TEXT embedding_id INT is_superseded TEXT superseded_by FK "self · SET NULL" INT is_forgotten TEXT metadata "JSON residual" INT created_at INT updated_at } UTTERANCES ||--o{ GRANITE_FACTS : "extracted from" CONVERSATIONS ||--o{ GRANITE_FACTS : "scoped to (denormed)" USERS ||--o{ GRANITE_FACTS : "owns (denormed)" GRANITE_FACTS ||--o{ GRANITE_FACTS : "supersedes"

Companion FTS5: granite_facts_fts(subject, predicate, object, raw_text). Vectorize embedding ID per fact.

Tier 4 · Per-conversation derived (2 tables) · kimi_facts · 8 fact_kinds + event-time anchors · memory-framed

erDiagram CONVERSATIONS { TEXT conversation_id PK } USERS { TEXT user_id PK } KIMI_FACTS { TEXT fact_id PK "GLOB kft_[0-9a-f]*" TEXT source_conversation_id FK "CASCADE" TEXT user_id FK "CASCADE · denormed" TEXT tier "prod|eval · denormed" TEXT evidence_utterance_ids "JSON array of utt_[hex]" TEXT fact_kind "entity|relationship|event|preference|belief|goal|trait|summary" TEXT subject "core entity / actor" TEXT predicate "nullable for entity/summary" TEXT object "nullable" TEXT raw_text "evidence excerpt(s)" INT ts_start "EVENT-time start (when it happened)" INT ts_end "EVENT-time end · for ranged events" TEXT ts_precision "second|minute|day|month|year" TEXT subject_type TEXT object_type TEXT topical_domain TEXT privacy_class TEXT behavioral_signal TEXT extraction_model "kimi-k2.5" TEXT extraction_schema_version "kimi-v1" TEXT extraction_run_id REAL confidence TEXT reasoning_chain "K2.5 chain-of-thought · optional" TEXT memory_kind "user|feedback|project|reference|sop" TEXT why_text TEXT how_to_apply_text INT is_stale TEXT stale_reason INT last_verified_at TEXT embedding_id INT is_superseded TEXT superseded_by FK "self · SET NULL" INT is_forgotten TEXT metadata "JSON residual" INT created_at INT updated_at } CONVERSATIONS ||--o{ KIMI_FACTS : "extracted from (per-conv)" USERS ||--o{ KIMI_FACTS : "owns (denormed)" KIMI_FACTS ||--o{ KIMI_FACTS : "supersedes"

Key innovation vs granite: ts_start/ts_end are EVENT time, not utterance-mention time. The cat-3 (temporal) fix surface lives here. Plus same memory-framing (memory_kind / why_text / how_to_apply_text) as granite — both fact tables isomorphic for downstream "give me memories" retrieval. Async cron only — K2.5 subrequest can exceed 3-min Workers AI cap.

topic_segments · per-conversation topic boundaries

erDiagram CONVERSATIONS { TEXT conversation_id PK } UTTERANCES { TEXT utterance_id PK } TOPIC_SEGMENTS { TEXT segment_id PK "GLOB seg_[0-9a-f]*" TEXT conversation_id FK "CASCADE" TEXT user_id "denormed · CASCADE" TEXT tier "denormed" TEXT start_utterance_id FK TEXT end_utterance_id FK INT start_ts "denormed" INT end_ts "denormed" INT utterance_count TEXT topic_label TEXT topic_keywords "JSON array" TEXT topical_domain "code|design|finance|legal|personal|health|relationship|strategy|ops|education|research|other" TEXT segmentation_method "kimi-llm|embedding-similarity|lexical-cohesion" REAL confidence TEXT embedding_id } CONVERSATIONS ||--o{ TOPIC_SEGMENTS : "segmented into" UTTERANCES ||--o{ TOPIC_SEGMENTS : "boundary"

Tier 5 · Cross-conversation derived (2 tables) · canonical_entities

erDiagram USERS { TEXT user_id PK } CANONICAL_ENTITIES { TEXT entity_id PK "GLOB ent_[0-9a-f]*" TEXT user_id FK "CASCADE" TEXT tier TEXT entity_type "Person|Place|Org|Event|Time|Product|Concept|File|Code|Other" TEXT canonical_name TEXT description TEXT aliases "JSON array" INT mention_count "denorm aggregate" INT first_mentioned_at INT last_mentioned_at TEXT first_mention_id TEXT external_refs "JSON · {wikidata, github_url, address}" TEXT embedding_id INT is_stale TEXT stale_reason INT last_verified_at INT is_forgotten } USERS ||--o{ CANONICAL_ENTITIES : "owns"

Deduplicated entities per user. UNIQUE on (user_id, entity_type, canonical_name). Companion FTS5 over canonical_name + aliases + description.

patterns · recurring temporal/thematic/linguistic/behavioral/instinct patterns

erDiagram USERS { TEXT user_id PK } PATTERNS { TEXT pattern_id PK "GLOB pat_[0-9a-f]*" TEXT user_id FK "CASCADE" TEXT tier TEXT pattern_kind "temporal|thematic|linguistic|behavioral|co-occurrence|instinct" TEXT pattern_signature "compact dedup key" TEXT description "human-readable" TEXT trigger_text "for instinct kind" TEXT action_text "for instinct kind" TEXT category "preferences|deployment|tools|communication|debugging|workflow" TEXT cadence "daily|weekly|monthly|irregular" INT first_observed_at INT last_observed_at INT next_predicted_at INT occurrence_count TEXT evidence_kimi_fact_ids "JSON" TEXT evidence_utterance_ids "JSON" TEXT evidence_classification_ids "JSON" TEXT memory_kind "user|feedback|project|reference|sop" TEXT why_text TEXT how_to_apply_text REAL confidence TEXT detection_model TEXT detection_run_id FK TEXT embedding_id INT is_active INT is_stale INT is_forgotten } USERS ||--o{ PATTERNS : "owns" PATTERNS ||--o{ PATTERNS : "supersedes"

Absorbs instincts.md (trigger→action→confidence) via pattern_kind='instinct'. CHECK constraint: instinct rows MUST have trigger_text + action_text. UNIQUE (user_id, pattern_signature).

Tier 6 · Orchestration (1 table) · extraction_jobs

erDiagram USERS { TEXT user_id PK } EXTRACTION_JOBS { TEXT job_id PK "GLOB job_[0-9a-f]*" TEXT target_kind "utterance|conversation|user|cross-conv" TEXT target_id "utt_/conv_/usr_" TEXT user_id FK "CASCADE · denormed" TEXT tier "prod|eval" TEXT conversation_id "denormed" TEXT job_kind "granite-triple|kimi-reason|classify|ner|topic-segment|pattern-scan|embed" TEXT extraction_model TEXT extraction_schema_version TEXT config "JSON · prompt variant, temp" TEXT status "pending|running|done|failed|cancelled|superseded" INT enqueued_at INT started_at INT finished_at INT attempts INT max_attempts TEXT last_error INT chunks_total "for chunked K2.5 jobs" INT chunks_done INT chunk_size INT facts_emitted INT tokens_in INT tokens_out REAL cost_usd TEXT dedupe_key "hash · prevents duplicate enqueue" TEXT supersedes "prior job_id this replaces" } USERS ||--o{ EXTRACTION_JOBS : "owns" EXTRACTION_JOBS ||--o{ EXTRACTION_JOBS : "supersedes"

Single orchestration table for ALL extraction surfaces (granite per-utt, kimi per-conv, classify, ner, topic-segment, pattern-scan, embed). Per ADR-033 generalized: cron drainer reads pending jobs, processes chunks, persists progress between cron ticks. UNIQUE (dedupe_key, status IN pending/running) prevents duplicate work.

Specific structural fixes folded into the codegen (record, not credit).

GLOB pattern bug. SQLite's 'usr_[0-9a-f]*'-style CHECKs were decorative — * is wildcard, not "repeat previous class." Replaced all 14 ID-prefix CHECKs (and agent_kind) with length() = N AND substr(...) NOT GLOB '*[^0-9a-f]*'. Bad IDs now actually rejected at write.
Tenant/tier integrity via composite FKs. Parent tables (conversations, utterances) expose composite UNIQUE on (, user_id, tier); every extraction child references the composite key. Cross-user / cross-tier writes are now impossible at the FK layer (no triggers needed).
UNIQUE missing tier — fixed. conversations(user_id, tier, source_kind, source_id), canonical_entities(user_id, tier, entity_type, canonical_name), patterns(user_id, tier, pattern_signature) — prod/eval can no longer collide.
conversation_annotations PK was collapsing LoCoMo session data. Redesigned with annotation_id PK + (conversation_id, kind, session_index, ordinal) UNIQUE. Multiple observation / session_summary rows per conversation now coexist properly.
FTS5 + Vectorize GDPR shadow data. External-content FTS5 with AFTER INSERT/DELETE/UPDATE triggers; new embeddings_pending_delete table + per-source DELETE triggers. is_forgotten=1 now actually propagates to all search side-channels.
JSON evidence arrays violated provenance. Split patterns.evidence_*_ids and kimi_facts.evidence_utterance_ids into pattern_evidence and kimi_fact_evidence join tables with real FK CASCADE.
Lifecycle nullability tightened. is_superseded=0 OR superseded_by IS NOT NULL; is_stale=0 OR length(trim(stale_reason))>0; is_forgotten=0 OR forgotten_at IS NOT NULL. superseded_by ON DELETE switched to RESTRICT + no self-supersession.
Patterns lifecycle contradictions. Dropped is_active; live = is_forgotten=0 AND is_superseded=0 AND is_stale=0.
JSON-as-column splits. users.external_accounts → user_external_accounts. conversations.participants → conversation_participants. utterances.{signed_off_by, reviewed_by, co_authored_by} → utterance_attributions.
Eval data first-class. New eval_qa_pairs table for LoCoMo's 1,986 QA pairs (was being shoved into conversation_annotations.payload).
Money in integer micros. cost_usd REAL → cost_micros INTEGER (1 USD = 1,000,000) for stable accounting.
JSON validation. CHECK (col IS NULL OR json_valid(col)) on every JSON-bearing column.
Auto-updated timestamps. AFTER UPDATE triggers on every table with updated_at — no app reliance.
extraction_run_id immutability. ON DELETE switched to RESTRICT. Jobs outlive their facts (provenance preservation).
ts_precision design vs DDL mismatch fixed. Added 'hour' and 'year' to the enum.
utterances.text NOT NULL relaxed for image/audio/code/mixed modalities (text-only turns still require non-empty).

26 base tables · 6 FTS5 virtuals · 6 + 1 tiers · trigger-driven cascades. Out of scope here: the deployed prod schema (41 tables, 890k blocks) which still uses the monolithic blocks shape. Live prod ERD: docs/architecture/erd-current-20260503.md. Full DDL: migrations/000_dev_db.sql.

Category	iter-16	today (n≈96)	Δ	What changed
cat-1 · multi-hop list	0.105	0.282	+0.177	Refined ANSWER-TYPE & SCOPE prompt; `_who_relation_subqueries` helper; multi-hop sub-query expansion.
cat-2 · temporal	0.054	0.413	+0.359	TIME RESOLUTION Step 5: forbid ISO; match evidence specificity (date-format prompt fix).
cat-3 · open-domain	0.000	0.050	+0.050	H2 inferential force-answer carve-out for `does/do/would/is-likely` patterns (uncommitted).
cat-4 · single-hop	0.269	0.447	+0.178	Harness `--retrieval-user-id` auto-detect (was filtering to non-existent tenant); refusal phrase fix.
cat-5 · adversarial / refusal	0.913	0.826	−0.087	Canonical refusal phrase `"No information available."`; confidence floor 0.4 → 0.55; H3 entity-overlap hallucination guard (uncommitted, untested at n=100).

What is running, what is wired, what is planned.

10 verbs · ingest substrate · cap-tokens · outcome trace.

10 load-bearing tables · 45 deployed · 26 planned · 13 block types.

Integrated tier overview · all 65 tables · color = status

Current · Substrate tier (12 tables · ~956k rows)

Current · Economic tier (11 tables · 2,420 rows · 10 of 11 are 0)

Current · Cognition+Ops · brain primitives (14 tables · 8,538 rows · 11 of 14 are 0)

Current · Cognition+Ops · ops (6 tables · all 0 rows pre-Phase-4)

Skeleton · 6 tiers · 24 tables (post-Codex)

Tier 1 · Tenancy + Identity (4 tables)

Tier 2 · Substrate (4 tables) · conversations

source_provenance · OSS-mining sidecar (1:1 with conversations)

conversation_annotations · sidecar for eval ground-truth + document-kind taxonomy

utterances · turn-by-turn · sessions collapsed

Tier 3 · Per-utterance extraction (3 tables) · utterance_classifications

entity_mentions · sparse per-utterance NER

granite_facts · per-utterance triple extraction · memory-framed

Tier 4 · Per-conversation derived (2 tables) · kimi_facts · 8 fact_kinds + event-time anchors · memory-framed

topic_segments · per-conversation topic boundaries

Tier 5 · Cross-conversation derived (2 tables) · canonical_entities

patterns · recurring temporal/thematic/linguistic/behavioral/instinct patterns

Tier 6 · Orchestration (1 table) · extraction_jobs

The five-way fan-out behind `/v1/query`.

Today's F1 (LoCoMo10 · 2026-05-04)

The same primitive handles private memory and a paid market.

What is running, what is wired, what is planned.

10 verbs · ingest substrate · cap-tokens · outcome trace.

10 load-bearing tables · 45 deployed · 26 planned · 13 block types.

Integrated tier overview · all 65 tables · color = status

Current · Substrate tier (12 tables · ~956k rows)

Current · Economic tier (11 tables · 2,420 rows · 10 of 11 are 0)

Current · Cognition+Ops · brain primitives (14 tables · 8,538 rows · 11 of 14 are 0)

Current · Cognition+Ops · ops (6 tables · all 0 rows pre-Phase-4)

Skeleton · 6 tiers · 24 tables (post-Codex)

Tier 1 · Tenancy + Identity (4 tables)

Tier 2 · Substrate (4 tables) · conversations

source_provenance · OSS-mining sidecar (1:1 with conversations)

conversation_annotations · sidecar for eval ground-truth + document-kind taxonomy

utterances · turn-by-turn · sessions collapsed

Tier 3 · Per-utterance extraction (3 tables) · utterance_classifications

entity_mentions · sparse per-utterance NER

granite_facts · per-utterance triple extraction · memory-framed

Tier 4 · Per-conversation derived (2 tables) · kimi_facts · 8 fact_kinds + event-time anchors · memory-framed

topic_segments · per-conversation topic boundaries

Tier 5 · Cross-conversation derived (2 tables) · canonical_entities

patterns · recurring temporal/thematic/linguistic/behavioral/instinct patterns

Tier 6 · Orchestration (1 table) · extraction_jobs

The five-way fan-out behind /v1/query.

Today's F1 (LoCoMo10 · 2026-05-04)

The same primitive handles private memory and a paid market.

The five-way fan-out behind `/v1/query`.