Goodbye Docker: A Full Infrastructure Migration to Native systemd

Sometimes you finish the migration and realize — you never really needed Docker in the first place.

April 2, 2026 was a busy day for OpenClaw's infrastructure. Not the kind of migration that takes six months of planning and a hundred-slide deck — the kind where you wake up, decide it's time, and push it live before dinner. Looking back, this single day marked a clear before/after line in OpenClaw's evolution from "works" to "works elegantly."

Incident 1: Lingxiao Lost exec Access

Early in the morning, Lingxiao (OpenClaw's AI assistant) reported a strange problem from Discord: I only have 4 tools.

Normal operation means 17 tools, including exec, read, write, and other core capabilities. But that morning, the toolbox contained only: memory_search, memory_get, web_search, web_fetch.

Root cause was three stacked configuration errors:

Layer 1: Wrong statement in AGENTS.md

AGENTS.md contained the line "Discord sessions don't have exec permission" — a leftover conservative config that got treated as ground truth.

Layer 2: Incomplete tools.allow whitelist

// Before (only web and automation groups)
{
  "allow": ["group:web", "group:automation"]
}

// After (added fs and runtime)
{
  "allow": ["group:web", "group:automation", "group:fs", "group:runtime"]
}

Layer 3: Profile pointed to empty object

profile: "full" resolved to an empty config object. The actual full toolset lived in profile: "coding". Fixing the profile reference restored the count from 4 → 17.

Small fix, real impact — without exec, Lingxiao couldn't run automation directly and had to route everything through task-queue workarounds.

Incident 2: OpenClaw 3.28 → 3.31 → 4.1

While reconfiguring permissions, we also pushed through the version upgrade from 3.28 to 4.1.

What's New in 3.31

SQLite task registry: persistent storage for the background task queue
exec approval redesign: cleaner flow, better Discord approval buttons
Discord plugin count doubled: ~10 commands grew to ~20

What's Fixed in 4.1

SQLite sync deadlock: fixed WAL-mode concurrency issues under concurrent writes
New /tasks command: view background task queue directly from Discord
exec allow-always persistence: no more re-authorizing after every restart

A Custom Patch: task-store-pg.mjs

Since production already runs PostgreSQL, we didn't want to pull in SQLite as a new dependency. A drop-in replacement was written:

// task-store-pg.mjs — PostgreSQL replacement for SQLite task store
import pg from 'pg';

export class PgTaskStore {
  constructor(connStr) {
    this.pool = new pg.Pool({ connectionString: connStr });
  }

  async init() {
    await this.pool.query(`
      CREATE TABLE IF NOT EXISTS oc_tasks (
        id TEXT PRIMARY KEY,
        type TEXT NOT NULL,
        status TEXT DEFAULT 'pending',
        payload JSONB,
        result JSONB,
        created_at TIMESTAMPTZ DEFAULT NOW(),
        updated_at TIMESTAMPTZ DEFAULT NOW()
      )
    `);
  }

  async enqueue(id, type, payload) {
    await this.pool.query(
      'INSERT INTO oc_tasks (id, type, payload) VALUES ($1, $2, $3)',
      [id, type, JSON.stringify(payload)]
    );
  }
}

This keeps the entire stack on PostgreSQL and avoids introducing a second database engine.

Incident 3: Docker → Native systemd (The Main Event)

This is what the day was really about.

Before Migration

OpenClaw's infrastructure services ran in 5 Docker containers:

Container	Service	Port
oc-db	PostgreSQL 14	5434
oc-redis	Redis 7	6380
oc-minio	MinIO	9002
oc-api	Flask/Gunicorn	4001
oc-monitor	Node.js monitor	—

Docker itself isn't wrong — but on a bare-metal AMD Strix Halo with 128GB RAM running a handful of local services, maintaining Docker networking, volume mounts, and container lifecycle felt like overkill. The direct catalyst was wanting to upgrade from PostgreSQL 14 to PG17 for a newer pgvector build. Clean slate seemed better than in-place upgrade.

Migration Steps

Step 1: Add official PGDG repo and install PG17

sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt update
sudo apt install -y postgresql-17 postgresql-17-pgvector

Step 2: Dump and restore data

# Export from Docker container
docker exec oc-db pg_dump -U postgres openclaw > /tmp/openclaw_backup.sql
# Result: 14MB dump, 17 tables

# Import to native PG17
sudo -u postgres psql -c "CREATE DATABASE openclaw;"
sudo -u postgres psql openclaw < /tmp/openclaw_backup.sql

Step 3: Create systemd service units

Example for oc-api:

# /etc/systemd/system/oc-api.service
[Unit]
Description=OpenClaw API (Flask/Gunicorn)
After=network.target postgresql.service redis.service
Requires=postgresql.service

[Service]
Type=notify
User=borui
WorkingDirectory=/home/borui/openclaw/api
EnvironmentFile=/home/borui/openclaw/api/.env
ExecStart=/home/borui/openclaw/api/venv/bin/gunicorn \
  --workers 4 \
  --bind 0.0.0.0:4001 \
  --timeout 120 \
  app:app
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Step 4: Update config files (6 locations)

# .env — before
DATABASE_URL=postgresql://postgres:password@localhost:5434/openclaw
REDIS_URL=redis://localhost:6380/0

# .env — after  
DATABASE_URL=postgresql://postgres:password@localhost:5432/openclaw
REDIS_URL=redis://localhost:6379/0

Step 5: Verify and remove Docker

# Health check
curl http://localhost:4001/api/health
# {"status":"ok"}

# Verify pgvector extension
sudo -u postgres psql openclaw -c "SELECT extversion FROM pg_extension WHERE extname='vector';"
# 0.8.0

# Remove Docker entirely
docker stop oc-db oc-redis oc-minio oc-api oc-monitor
docker rm oc-db oc-redis oc-minio oc-api oc-monitor
docker system prune -af  # freed ~2GB

After Migration

Service	Port (Before)	Port (After)	Runtime
PostgreSQL 17 + pgvector 0.8	5434	5432	systemd
Redis 7	6380	6379	systemd
MinIO	9002	9002	systemd
Flask/Gunicorn oc-api	4001	4001	systemd
Node.js oc-monitor	—	—	systemd

Zero Docker. Full systemd. All ports back to standard. docker ps returns nothing. Clean.

Incident 4: Blog Auto-Publishing Pipeline

As the day's final piece, the Blog auto-publishing pipeline went live.

Architecture is straightforward: fetcher.py scrapes AI news at 6AM Vancouver time, auto_publisher.py calls Claude via Anthropic API to generate bilingual articles, then posts them through the Blog Bot API.

# crontab entry
0 6 * * * /home/borui/openclaw/blog/run_pipeline.sh >> /var/log/blog_auto.log 2>&1

This article is the first one manually triggered through that same pipeline.

Takeaway

One sentence summary: today turned OpenClaw from a working prototype into a system that works elegantly.

Docker isn't wrong. But when your services have no isolation requirements from the host, when your team is one person, when you want systemctl status to show you everything in one shot — native systemd is the better choice.

Standard ports, cleaner config, one less abstraction layer, 2GB of reclaimed disk space, and significantly better readability. Worth it.