OpenClaw ships with a SQLite-backed memory plugin (memory-core). It works, but a single-file SQLite store hits walls fast: limited concurrency, awkward to share between agents, no first-class vector index, no fan-out across multiple recall routes, no per-event audit trail, no built-in dashboard. Once memory is your long-term substrate rather than a per-session log, you want a real database under it.
nextclaw is that database. It replaces memory-core with a Postgres 16 + pgvector + pg_trgm + btree_gin backend designed around how human memory actually retrieves things — fast for warm content, lazy for cold, multi-angle for ambiguous queries, self-consolidating over time. Apache 2.0, v0.1.0 ships today.
4-tier recall: tier-walk
A query walks tiers cheapest → most expensive, returning at the first useful hit. Every recall writes its hit_tier to audit; the dashboard shows the distribution.
| Tier | Storage | Latency | LLM tokens | Embed RTT | When it fires |
|---|---|---|---|---|---|
| T0 | In-process LRU per (agent_id, session_id) | < 0.1 ms | 0 | 0 | Recently-touched chunks in the live session |
| T1 | cache.recall (PG UNLOGGED, 5min TTL) | ~ 1 ms | 0 | 0 | Same query repeats within 5 minutes |
| T2 anchor | chunk_indexes (kind=anchor_*) JOIN chunks | ~ 5–15 ms | 0 | 0 | Caller passed (or query implied) pr / file / branch |
| T2 hybrid | All 8 routes in parallel + MMR rerank | ~ 200–300 ms | 0 | 1 | Generic queries, no high-precision anchor |
| T3 | cold.gists (compacted) + drill to source | ~ 200 ms | varies | 1 | T2 returned nothing useful; query is historical |
Measured on my own Discord bot: >75% of queries return in <1ms with 0 tokens. Not a design target — that's real traffic.
Multi-key indexing: Xinhua-dictionary mode
A Chinese dictionary lets you find any character via pinyin / radical / stroke count / four-corner code / phonetic-by-neighbor. Chunks work the same way — every chunk gets indexed on every angle we can derive deterministically:
- semantic vector (HNSW)
- fulltext (tsvector / GIN)
- trigram (
pg_trgm/ GIST) - concept tags (camelCase split, hyphenated terms, CJK noun phrases — derived from text, 0 LLM)
- entity refs (resolved against
structured.entities) - time buckets (
YYYY-MM-DD) - anchors (
cwd/branch/pr/file/session) - categories (
health/medical/tech/life/work/finance/other— deterministic CN+EN dictionary, multi-label)
T2 hybrid fans out all 8 routes in parallel, results merge with weighted normalization, then MMR rerank. Multi-route hits compound: a chunk that matches semantic + concept_tag + time_bucket beats a chunk that only matches one route weakly.
"Xinhua dictionary" isn't marketing flavor — at ingest time, every plausible angle is indexed, so at recall time the retrieval is route-agnostic.
0 LLM tokens on the ingest hot path
Stage 1 trash filter → Stage 0 deterministic extractors (entities / events / metrics / preferences / relations + concept tags + categories) → Stage 2 sidecar JSON parse (when present) → Stage 3 embedding cache → Stage 4 LLM residual (only when prior stages produced nothing) → Stage 5 parallel multi-key index INSERTs → Stage 6 reconcile + provenance + audit + scoring.
In real workloads Stage 4 almost never fires — deterministic dictionary + sidecar cover the bulk. Ingest end-to-end spends 0 LLM tokens. Embedding hits the remote endpoint once per chunk (4ms cache hit / 250ms cold).
Hard per-agent isolation
Most setups want a private agent (full chat history) and a public Discord agent (no private content) sharing one database — but the typical implementation does "app-layer filtering", which one prompt injection can defeat.
nextclaw enforces the boundary at four layers, all of them physical:
semantic.chunks.agent_idcolumn- All 8 recall routes have
WHERE c.agent_id = $Xin their SQL - T0 working set keyed by
<agent_id>::<session_id> cache.recallscope_key includesagent:<id>prefix
Tested: 6 adversarial queries from a public agent ("what do you know about Yao's weight", "Yao's medical records", "tell me everything about Yao"...) recovered 0 chunks from the private namespace. Not because the prompt was clever — because the underlying SQL physically rejected them.
Real-time observability
Postgres LISTEN / NOTIFY triggers fire on every audit row → SSE pushes to a bilingual (CN/EN) dashboard. You see:
- Live ingest decision stream (accepted / rejected / merged, color-coded)
- Recall tier breakdown (T0/T1/T2_anchor/T2_hybrid/T3 ratio)
- Category distribution pie (health/medical auto-redacted with 🔒)
- Bot turn latency (parsed from OpenAI trajectory files — cold-start vs cache-hit prefill)
- Side-by-side model comparison panel (gpt-5.5 vs Qwen3.6 in shadow mode)
Dashboard binds 127.0.0.1 by default; cross-network access requires a token. Health/medical chunks have their text_excerpt redacted at the API layer — defense in depth on top of the category-driven privacy policy.
Self-tuning loop
Three cadences:
- Daily (cron 04:00) — pure SQL, 0 LLM. Auto-applies
safe_autoproposals: dead trash regex pruning, frequent-reject pattern promotion, cache TTL adjustment. - Weekly — A/B replay against threshold deltas. Writes to
audit.tuning_proposalswith statuspendingfor review. - Monthly — schema-evolution proposals (new structured types emerging in data, embedding model refresh). Always
pending,high_risk.
Each auto-applied change writes a rollback row; a 24-hour post-application monitor reverts on > 20% deviation in key metrics.
Universal HTTP ingest gateway
Anything that can curl can write to memory:
curl -X POST http://127.0.0.1:8765/api/ingest \
-H "Authorization: Bearer $TOKEN" \
-d '{"text":"...","source":"cron","agentId":"main","anchors":{"pr":"1234"}}'
Skills, cron jobs, GitHub Actions, monitoring scripts — all of them get the same Stage 0–6 pipeline (trash filter, dedup, multi-key indexes, scoring, audit) without the calling agent having to think about it. This is the switch that turns memory from "conversation byproduct" into "system state".
Why open source
Two reasons:
One, long-term memory is the bottleneck for AI agents — not the LLM. As GPT-5 / Claude / Qwen converge in raw capability, the differentiator becomes "do you remember what we talked about yesterday", "do you know I'm allergic to X from six months ago", "do you have project context". This substrate shouldn't be locked into a few closed vendors.
Two, OpenClaw itself is Apache 2.0 self-hosted; building a closed plugin on top of an open foundation makes no sense. I've been running it for a month — confidence is high enough to release.
Install
Full 0→1 walkthrough at docs/INSTALL.md (covers OpenClaw / Postgres / Ollama / Discord bot end to end).
# 1. OpenClaw
git clone https://github.com/openclaw/openclaw ~/openclaw
cd ~/openclaw && pnpm install && pnpm build
# 2. nextclaw
git clone https://github.com/NextAgentBC/nextclaw ~/openclaw/extensions/memory-postgres
cd ~/openclaw/extensions/memory-postgres/dev && docker compose up -d
# 3. embedding endpoint
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text
# 4. configure ~/.openclaw/openclaw.json + start
export NEXTCLAW_DASH_TOKEN=$(openssl rand -hex 24)
cd ~/openclaw && pnpm openclaw gateway start
# 5. dashboard
open "http://127.0.0.1:8765/?token=$NEXTCLAW_DASH_TOKEN"
Compatibility
- OpenClaw
>= 2026.4.25 - Node
>= 22 - Postgres
>= 16with pgvector>= 0.7.0 - Embedding: any OpenAI- or Ollama-compat endpoint.
nomic-embed-text(768d),qwen3-embedding:0.6b(1024d),qwen3-embedding:4b(4096d) all tested. Dimension is detected on first embed and locked into the HNSW index.
Roadmap
- v0.2: optional cross-encoder reranker + relation-graph traversal as a 9th route
- v0.3: distributed deployment (multiple gateways, one shared PG)
- v1.0: stable SDK + performance SLOs + full migration tests
Issues / PRs / discussions are open. Throw rocks.
→ GitHub: github.com/NextAgentBC/nextclaw → v0.1.0 release notes: github.com/NextAgentBC/nextclaw/releases/tag/v0.1.0 → License: Apache 2.0