nextclaw 0.1.0 — Postgres + pgvector long-term memory for OpenClaw

OpenClaw ships with a SQLite-backed memory plugin (memory-core). It works, but a single-file SQLite store hits walls fast: limited concurrency, awkward to share between agents, no first-class vector index, no fan-out across multiple recall routes, no per-event audit trail, no built-in dashboard. Once memory is your long-term substrate rather than a per-session log, you want a real database under it.

nextclaw is that database. It replaces memory-core with a Postgres 16 + pgvector + pg_trgm + btree_gin backend designed around how human memory actually retrieves things — fast for warm content, lazy for cold, multi-angle for ambiguous queries, self-consolidating over time. Apache 2.0, v0.1.0 ships today.

4-tier recall: tier-walk

A query walks tiers cheapest → most expensive, returning at the first useful hit. Every recall writes its hit_tier to audit; the dashboard shows the distribution.

Tier	Storage	Latency	LLM tokens	Embed RTT	When it fires
T0	In-process LRU per `(agent_id, session_id)`	< 0.1 ms	0	0	Recently-touched chunks in the live session
T1	`cache.recall` (PG UNLOGGED, 5min TTL)	~ 1 ms	0	0	Same query repeats within 5 minutes
T2 anchor	`chunk_indexes (kind=anchor_*)` JOIN chunks	~ 5–15 ms	0	0	Caller passed (or query implied) `pr` / `file` / `branch`
T2 hybrid	All 8 routes in parallel + MMR rerank	~ 200–300 ms	0	1	Generic queries, no high-precision anchor
T3	`cold.gists` (compacted) + drill to source	~ 200 ms	varies	1	T2 returned nothing useful; query is historical

Measured on my own Discord bot: >75% of queries return in <1ms with 0 tokens. Not a design target — that's real traffic.

Multi-key indexing: Xinhua-dictionary mode

A Chinese dictionary lets you find any character via pinyin / radical / stroke count / four-corner code / phonetic-by-neighbor. Chunks work the same way — every chunk gets indexed on every angle we can derive deterministically:

semantic vector (HNSW)
fulltext (tsvector / GIN)
trigram (pg_trgm / GIST)
concept tags (camelCase split, hyphenated terms, CJK noun phrases — derived from text, 0 LLM)
entity refs (resolved against structured.entities)
time buckets (YYYY-MM-DD)
anchors (cwd / branch / pr / file / session)
categories (health / medical / tech / life / work / finance / other — deterministic CN+EN dictionary, multi-label)

T2 hybrid fans out all 8 routes in parallel, results merge with weighted normalization, then MMR rerank. Multi-route hits compound: a chunk that matches semantic + concept_tag + time_bucket beats a chunk that only matches one route weakly.

"Xinhua dictionary" isn't marketing flavor — at ingest time, every plausible angle is indexed, so at recall time the retrieval is route-agnostic.

0 LLM tokens on the ingest hot path

Stage 1 trash filter → Stage 0 deterministic extractors (entities / events / metrics / preferences / relations + concept tags + categories) → Stage 2 sidecar JSON parse (when present) → Stage 3 embedding cache → Stage 4 LLM residual (only when prior stages produced nothing) → Stage 5 parallel multi-key index INSERTs → Stage 6 reconcile + provenance + audit + scoring.

In real workloads Stage 4 almost never fires — deterministic dictionary + sidecar cover the bulk. Ingest end-to-end spends 0 LLM tokens. Embedding hits the remote endpoint once per chunk (4ms cache hit / 250ms cold).

Hard per-agent isolation

Most setups want a private agent (full chat history) and a public Discord agent (no private content) sharing one database — but the typical implementation does "app-layer filtering", which one prompt injection can defeat.

nextclaw enforces the boundary at four layers, all of them physical:

semantic.chunks.agent_id column
All 8 recall routes have WHERE c.agent_id = $X in their SQL
T0 working set keyed by <agent_id>::<session_id>
cache.recall scope_key includes agent:<id> prefix

Tested: 6 adversarial queries from a public agent ("what do you know about Yao's weight", "Yao's medical records", "tell me everything about Yao"...) recovered 0 chunks from the private namespace. Not because the prompt was clever — because the underlying SQL physically rejected them.

Real-time observability

Postgres LISTEN / NOTIFY triggers fire on every audit row → SSE pushes to a bilingual (CN/EN) dashboard. You see:

Live ingest decision stream (accepted / rejected / merged, color-coded)
Recall tier breakdown (T0/T1/T2_anchor/T2_hybrid/T3 ratio)
Category distribution pie (health/medical auto-redacted with 🔒)
Bot turn latency (parsed from OpenAI trajectory files — cold-start vs cache-hit prefill)
Side-by-side model comparison panel (gpt-5.5 vs Qwen3.6 in shadow mode)

Dashboard binds 127.0.0.1 by default; cross-network access requires a token. Health/medical chunks have their text_excerpt redacted at the API layer — defense in depth on top of the category-driven privacy policy.

Self-tuning loop

Three cadences:

Daily (cron 04:00) — pure SQL, 0 LLM. Auto-applies safe_auto proposals: dead trash regex pruning, frequent-reject pattern promotion, cache TTL adjustment.
Weekly — A/B replay against threshold deltas. Writes to audit.tuning_proposals with status pending for review.
Monthly — schema-evolution proposals (new structured types emerging in data, embedding model refresh). Always pending, high_risk.

Each auto-applied change writes a rollback row; a 24-hour post-application monitor reverts on > 20% deviation in key metrics.

Universal HTTP ingest gateway

Anything that can curl can write to memory:

curl -X POST http://127.0.0.1:8765/api/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"text":"...","source":"cron","agentId":"main","anchors":{"pr":"1234"}}'

Skills, cron jobs, GitHub Actions, monitoring scripts — all of them get the same Stage 0–6 pipeline (trash filter, dedup, multi-key indexes, scoring, audit) without the calling agent having to think about it. This is the switch that turns memory from "conversation byproduct" into "system state".

Why open source

Two reasons:

One, long-term memory is the bottleneck for AI agents — not the LLM. As GPT-5 / Claude / Qwen converge in raw capability, the differentiator becomes "do you remember what we talked about yesterday", "do you know I'm allergic to X from six months ago", "do you have project context". This substrate shouldn't be locked into a few closed vendors.

Two, OpenClaw itself is Apache 2.0 self-hosted; building a closed plugin on top of an open foundation makes no sense. I've been running it for a month — confidence is high enough to release.

Install

Full 0→1 walkthrough at docs/INSTALL.md (covers OpenClaw / Postgres / Ollama / Discord bot end to end).

# 1. OpenClaw
git clone https://github.com/openclaw/openclaw ~/openclaw
cd ~/openclaw && pnpm install && pnpm build

# 2. nextclaw
git clone https://github.com/NextAgentBC/nextclaw ~/openclaw/extensions/memory-postgres
cd ~/openclaw/extensions/memory-postgres/dev && docker compose up -d

# 3. embedding endpoint
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text

# 4. configure ~/.openclaw/openclaw.json + start
export NEXTCLAW_DASH_TOKEN=$(openssl rand -hex 24)
cd ~/openclaw && pnpm openclaw gateway start

# 5. dashboard
open "http://127.0.0.1:8765/?token=$NEXTCLAW_DASH_TOKEN"

Compatibility

OpenClaw >= 2026.4.25
Node >= 22
Postgres >= 16 with pgvector >= 0.7.0
Embedding: any OpenAI- or Ollama-compat endpoint. nomic-embed-text (768d), qwen3-embedding:0.6b (1024d), qwen3-embedding:4b (4096d) all tested. Dimension is detected on first embed and locked into the HNSW index.

Roadmap

v0.2: optional cross-encoder reranker + relation-graph traversal as a 9th route
v0.3: distributed deployment (multiple gateways, one shared PG)
v1.0: stable SDK + performance SLOs + full migration tests

Issues / PRs / discussions are open. Throw rocks.

→ GitHub: github.com/NextAgentBC/nextclaw → v0.1.0 release notes: github.com/NextAgentBC/nextclaw/releases/tag/v0.1.0 → License: Apache 2.0

4-tier recall: tier-walk

A query walks tiers cheapest → most expensive, returning at the first useful hit. Every recall writes its hit_tier to audit; the dashboard shows the distribution.

Tier	Storage	Latency	LLM tokens	Embed RTT	When it fires
T0	In-process LRU per `(agent_id, session_id)`	< 0.1 ms	0	0	Recently-touched chunks in the live session
T1	`cache.recall` (PG UNLOGGED, 5min TTL)	~ 1 ms	0	0	Same query repeats within 5 minutes
T2 anchor	`chunk_indexes (kind=anchor_*)` JOIN chunks	~ 5–15 ms	0	0	Caller passed (or query implied) `pr` / `file` / `branch`
T2 hybrid	All 8 routes in parallel + MMR rerank	~ 200–300 ms	0	1	Generic queries, no high-precision anchor
T3	`cold.gists` (compacted) + drill to source	~ 200 ms	varies	1	T2 returned nothing useful; query is historical

Measured on my own Discord bot: >75% of queries return in <1ms with 0 tokens. Not a design target — that's real traffic.

Multi-key indexing: Xinhua-dictionary mode

semantic vector (HNSW)
fulltext (tsvector / GIN)
trigram (pg_trgm / GIST)
concept tags (camelCase split, hyphenated terms, CJK noun phrases — derived from text, 0 LLM)
entity refs (resolved against structured.entities)
time buckets (YYYY-MM-DD)
anchors (cwd / branch / pr / file / session)
categories (health / medical / tech / life / work / finance / other — deterministic CN+EN dictionary, multi-label)

"Xinhua dictionary" isn't marketing flavor — at ingest time, every plausible angle is indexed, so at recall time the retrieval is route-agnostic.

0 LLM tokens on the ingest hot path

Hard per-agent isolation

nextclaw enforces the boundary at four layers, all of them physical:

semantic.chunks.agent_id column
All 8 recall routes have WHERE c.agent_id = $X in their SQL
T0 working set keyed by <agent_id>::<session_id>
cache.recall scope_key includes agent:<id> prefix

Real-time observability

Postgres LISTEN / NOTIFY triggers fire on every audit row → SSE pushes to a bilingual (CN/EN) dashboard. You see:

Live ingest decision stream (accepted / rejected / merged, color-coded)
Recall tier breakdown (T0/T1/T2_anchor/T2_hybrid/T3 ratio)
Category distribution pie (health/medical auto-redacted with 🔒)
Bot turn latency (parsed from OpenAI trajectory files — cold-start vs cache-hit prefill)
Side-by-side model comparison panel (gpt-5.5 vs Qwen3.6 in shadow mode)

Self-tuning loop

Three cadences:

Daily (cron 04:00) — pure SQL, 0 LLM. Auto-applies safe_auto proposals: dead trash regex pruning, frequent-reject pattern promotion, cache TTL adjustment.
Weekly — A/B replay against threshold deltas. Writes to audit.tuning_proposals with status pending for review.
Monthly — schema-evolution proposals (new structured types emerging in data, embedding model refresh). Always pending, high_risk.

Each auto-applied change writes a rollback row; a 24-hour post-application monitor reverts on > 20% deviation in key metrics.

Universal HTTP ingest gateway

Anything that can curl can write to memory:

curl -X POST http://127.0.0.1:8765/api/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"text":"...","source":"cron","agentId":"main","anchors":{"pr":"1234"}}'

Why open source

Two reasons:

Two, OpenClaw itself is Apache 2.0 self-hosted; building a closed plugin on top of an open foundation makes no sense. I've been running it for a month — confidence is high enough to release.

Install

Full 0→1 walkthrough at docs/INSTALL.md (covers OpenClaw / Postgres / Ollama / Discord bot end to end).

# 1. OpenClaw
git clone https://github.com/openclaw/openclaw ~/openclaw
cd ~/openclaw && pnpm install && pnpm build

# 2. nextclaw
git clone https://github.com/NextAgentBC/nextclaw ~/openclaw/extensions/memory-postgres
cd ~/openclaw/extensions/memory-postgres/dev && docker compose up -d

# 3. embedding endpoint
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text

# 4. configure ~/.openclaw/openclaw.json + start
export NEXTCLAW_DASH_TOKEN=$(openssl rand -hex 24)
cd ~/openclaw && pnpm openclaw gateway start

# 5. dashboard
open "http://127.0.0.1:8765/?token=$NEXTCLAW_DASH_TOKEN"

Compatibility

OpenClaw >= 2026.4.25
Node >= 22
Postgres >= 16 with pgvector >= 0.7.0
Embedding: any OpenAI- or Ollama-compat endpoint. nomic-embed-text (768d), qwen3-embedding:0.6b (1024d), qwen3-embedding:4b (4096d) all tested. Dimension is detected on first embed and locked into the HNSW index.

Roadmap

v0.2: optional cross-encoder reranker + relation-graph traversal as a 9th route
v0.3: distributed deployment (multiple gateways, one shared PG)
v1.0: stable SDK + performance SLOs + full migration tests

Issues / PRs / discussions are open. Throw rocks.

→ GitHub: github.com/NextAgentBC/nextclaw → v0.1.0 release notes: github.com/NextAgentBC/nextclaw/releases/tag/v0.1.0 → License: Apache 2.0

nextclaw 0.1.0 — Postgres + pgvector long-term memory for OpenClaw

4-tier recall: tier-walk

Multi-key indexing: Xinhua-dictionary mode

0 LLM tokens on the ingest hot path

Hard per-agent isolation

Real-time observability

Self-tuning loop

Universal HTTP ingest gateway

Why open source

Install

Compatibility

Roadmap

Related reading

nextclaw 0.1.0 — Postgres + pgvector long-term memory for OpenClaw

4-tier recall: tier-walk

Multi-key indexing: Xinhua-dictionary mode

0 LLM tokens on the ingest hot path

Hard per-agent isolation

Real-time observability

Self-tuning loop

Universal HTTP ingest gateway

Why open source

Install

Compatibility

Roadmap

Related reading