Goodbye Docker: A Full Infrastructure Migration to Native systemd
Sometimes you finish the migration and realize — you never really needed Docker in the first place.
April 2, 2026 was a busy day for OpenClaw's infrastructure. Not the kind of migration that takes six months of planning and a hundred-slide deck — the kind where you wake up, decide it's time, and push it live before dinner. Looking back, this single day marked a clear before/after line in OpenClaw's evolution from "works" to "works elegantly."
Incident 1: Lingxiao Lost exec Access
Early in the morning, Lingxiao (OpenClaw's AI assistant) reported a strange problem from Discord: I only have 4 tools.
Normal operation means 17 tools, including exec, read, write, and other core capabilities. But that morning, the toolbox contained only: memory_search, memory_get, web_search, web_fetch.
Root cause was three stacked configuration errors:
Layer 1: Wrong statement in AGENTS.md
AGENTS.md contained the line "Discord sessions don't have exec permission" — a leftover conservative config that got treated as ground truth.
Layer 2: Incomplete tools.allow whitelist
// Before (only web and automation groups)
{
"allow": ["group:web", "group:automation"]
}
// After (added fs and runtime)
{
"allow": ["group:web", "group:automation", "group:fs", "group:runtime"]
}
Layer 3: Profile pointed to empty object
profile: "full" resolved to an empty config object. The actual full toolset lived in profile: "coding". Fixing the profile reference restored the count from 4 → 17.
Small fix, real impact — without exec, Lingxiao couldn't run automation directly and had to route everything through task-queue workarounds.
Incident 2: OpenClaw 3.28 → 3.31 → 4.1
While reconfiguring permissions, we also pushed through the version upgrade from 3.28 to 4.1.
What's New in 3.31
- SQLite task registry: persistent storage for the background task queue
- exec approval redesign: cleaner flow, better Discord approval buttons
- Discord plugin count doubled: ~10 commands grew to ~20
What's Fixed in 4.1
- SQLite sync deadlock: fixed WAL-mode concurrency issues under concurrent writes
- New
/taskscommand: view background task queue directly from Discord - exec allow-always persistence: no more re-authorizing after every restart
A Custom Patch: task-store-pg.mjs
Since production already runs PostgreSQL, we didn't want to pull in SQLite as a new dependency. A drop-in replacement was written:
// task-store-pg.mjs — PostgreSQL replacement for SQLite task store
import pg from 'pg';
export class PgTaskStore {
constructor(connStr) {
this.pool = new pg.Pool({ connectionString: connStr });
}
async init() {
await this.pool.query(`
CREATE TABLE IF NOT EXISTS oc_tasks (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
status TEXT DEFAULT 'pending',
payload JSONB,
result JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
)
`);
}
async enqueue(id, type, payload) {
await this.pool.query(
'INSERT INTO oc_tasks (id, type, payload) VALUES ($1, $2, $3)',
[id, type, JSON.stringify(payload)]
);
}
}
This keeps the entire stack on PostgreSQL and avoids introducing a second database engine.
Incident 3: Docker → Native systemd (The Main Event)
This is what the day was really about.
Before Migration
OpenClaw's infrastructure services ran in 5 Docker containers:
| Container | Service | Port |
|---|---|---|
| oc-db | PostgreSQL 14 | 5434 |
| oc-redis | Redis 7 | 6380 |
| oc-minio | MinIO | 9002 |
| oc-api | Flask/Gunicorn | 4001 |
| oc-monitor | Node.js monitor | — |
Docker itself isn't wrong — but on a bare-metal AMD Strix Halo with 128GB RAM running a handful of local services, maintaining Docker networking, volume mounts, and container lifecycle felt like overkill. The direct catalyst was wanting to upgrade from PostgreSQL 14 to PG17 for a newer pgvector build. Clean slate seemed better than in-place upgrade.
Migration Steps
Step 1: Add official PGDG repo and install PG17
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt update
sudo apt install -y postgresql-17 postgresql-17-pgvector
Step 2: Dump and restore data
# Export from Docker container
docker exec oc-db pg_dump -U postgres openclaw > /tmp/openclaw_backup.sql
# Result: 14MB dump, 17 tables
# Import to native PG17
sudo -u postgres psql -c "CREATE DATABASE openclaw;"
sudo -u postgres psql openclaw < /tmp/openclaw_backup.sql
Step 3: Create systemd service units
Example for oc-api:
# /etc/systemd/system/oc-api.service
[Unit]
Description=OpenClaw API (Flask/Gunicorn)
After=network.target postgresql.service redis.service
Requires=postgresql.service
[Service]
Type=notify
User=borui
WorkingDirectory=/home/borui/openclaw/api
EnvironmentFile=/home/borui/openclaw/api/.env
ExecStart=/home/borui/openclaw/api/venv/bin/gunicorn \
--workers 4 \
--bind 0.0.0.0:4001 \
--timeout 120 \
app:app
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Step 4: Update config files (6 locations)
# .env — before
DATABASE_URL=postgresql://postgres:password@localhost:5434/openclaw
REDIS_URL=redis://localhost:6380/0
# .env — after
DATABASE_URL=postgresql://postgres:password@localhost:5432/openclaw
REDIS_URL=redis://localhost:6379/0
Step 5: Verify and remove Docker
# Health check
curl http://localhost:4001/api/health
# {"status":"ok"}
# Verify pgvector extension
sudo -u postgres psql openclaw -c "SELECT extversion FROM pg_extension WHERE extname='vector';"
# 0.8.0
# Remove Docker entirely
docker stop oc-db oc-redis oc-minio oc-api oc-monitor
docker rm oc-db oc-redis oc-minio oc-api oc-monitor
docker system prune -af # freed ~2GB
After Migration
| Service | Port (Before) | Port (After) | Runtime |
|---|---|---|---|
| PostgreSQL 17 + pgvector 0.8 | 5434 | 5432 | systemd |
| Redis 7 | 6380 | 6379 | systemd |
| MinIO | 9002 | 9002 | systemd |
| Flask/Gunicorn oc-api | 4001 | 4001 | systemd |
| Node.js oc-monitor | — | — | systemd |
Zero Docker. Full systemd. All ports back to standard. docker ps returns nothing. Clean.
Incident 4: Blog Auto-Publishing Pipeline
As the day's final piece, the Blog auto-publishing pipeline went live.
Architecture is straightforward: fetcher.py scrapes AI news at 6AM Vancouver time, auto_publisher.py calls Claude via Anthropic API to generate bilingual articles, then posts them through the Blog Bot API.
# crontab entry
0 6 * * * /home/borui/openclaw/blog/run_pipeline.sh >> /var/log/blog_auto.log 2>&1
This article is the first one manually triggered through that same pipeline.
Takeaway
One sentence summary: today turned OpenClaw from a working prototype into a system that works elegantly.
Docker isn't wrong. But when your services have no isolation requirements from the host, when your team is one person, when you want systemctl status to show you everything in one shot — native systemd is the better choice.
Standard ports, cleaner config, one less abstraction layer, 2GB of reclaimed disk space, and significantly better readability. Worth it.