Self-Hosting Honcho: Agent Memory That Actually Works

Honcho May 12, 2026

I spent two days debugging why my Honcho instance wasn't creating memories. The deriver was running, logs looked fine, but no observations appeared. The fix was one boolean flag that defaults to the wrong value for personal use. Here's the complete working configuration I wish I'd had from day one.

What is Honcho?

Honcho is an agent memory system by Plastic Labs. Unlike simple chat history, it extracts facts about the user, builds a persistent profile, and enables dialectic reasoning — the agent can answer questions about you based on accumulated observations across sessions.

The managed cloud version uses their proprietary Neuromancer model. I self-host because I want my conversation data local and I want to choose my own LLM backend.

Architecture

Hermes Agent → Honcho API (localhost:8000)
                ├── PostgreSQL (messages, sessions)
                ├── Redis (cache)
                ├── LanceDB (vector embeddings)
                └── Deriver (background worker)
                        └── OpenRouter (LLM API)

The deriver is the magic. It processes every message, extracts facts, generates embeddings, and stores observations. Without it running correctly, Honcho is just a chat logger.

The Critical Fix: FLUSH_ENABLED

Here's what blocked me for hours. Honcho's deriver has a FLUSH_ENABLED setting that defaults to false.

When false, the deriver batches representation work until REPRESENTATION_BATCH_MAX_TOKENS (1024 tokens) is reached. In high-volume production, this saves API costs by grouping multiple observations into a single LLM call.

In a personal deployment? Messages trickle in slowly. The 1024-token threshold is never reached. Observations never appear.

Fix: Set FLUSH_ENABLED = true in config.toml and DERIVER_FLUSH_ENABLED=true in .env.

# config.toml
[deriver]
FLUSH_ENABLED = true

# .env
DERIVER_FLUSH_ENABLED=true

After enabling this, I saw Observation Count: 4 within 20 seconds of sending a message. Problem solved.

Complete Working Configuration

config.toml

[app]
LOG_LEVEL = "INFO"
EMBED_MESSAGES = true
MAX_EMBEDDING_TOKENS = 8192

[db]
CONNECTION_URI = "postgresql+psycopg://honcho:honcho@database:5432/honcho"

[auth]
USE_AUTH = false

[cache]
ENABLED = true
URL = "redis://redis:6379/0?suppress=true"

[vector_store]
TYPE = "lancedb"
DIMENSIONS = 1024
URI = "/app/lancedb_data"

[deriver]
FLUSH_ENABLED = true

[deriver.model_config]
transport = "openai"
model = "deepseek/deepseek-v4-flash"

[deriver.model_config.overrides]
base_url = "https://openrouter.ai/api/v1"
api_key_env = "LLM_OPENAI_API_KEY"

Key Configuration Details

Vector store: I use LanceDB instead of pgvector. Why? pgvector has a hardcoded 1536-dimension validation that cannot be overridden without source changes. My embedding model (baai/bge-m3) produces 1024 dimensions. LanceDB is file-based, accepts any dimension, and swaps in by changing TYPE.

TOML nesting: This tripped me up. Overrides must be under [*.model_config.overrides], not [*.overrides]. The model_config layer is essential.

Config mounts: Both api and deriver containers need ./config.toml:/app/config.toml:ro in docker-compose.yml. Without this, the deriver uses a baked-in config from the Docker image and ignores all your changes.

Model Selection

I use DeepSeek v4-flash via OpenRouter for all tiers. Here is how it compares to Gemini 2.5 Flash Lite:

Metric	Gemini Flash Lite	DeepSeek v4 Flash
Tool Calls	1	7
Input Tokens	6,330	21,151
Duration	5.2s	24.9s
Quality	Good	Excellent (more detailed)

DeepSeek is slower but makes 7x more tool calls, producing more thorough memory extractions. For a background worker where latency does not matter, the quality trade-off is worth it.

Verification Workflow

After setup, verify everything works:

# 1. Health check
curl -s http://localhost:8000/health
# → {"status":"ok"}

# 2. Create a session and message
curl -s -X POST http://localhost:8000/v3/workspaces/hermes/sessions \
  -H "Content-Type: application/json" \
  -d '{"id":"test","peer_id":"cosmo"}'

curl -s -X POST http://localhost:8000/v3/workspaces/hermes/sessions/test/messages \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"content":"I build agent systems.","peer_id":"cosmo"}]}'

# 3. Wait 20-30 seconds, check deriver logs
docker compose logs deriver --tail 20 | grep "Observation Count"
# → Observation Count: 4 count  ✅

# 4. Test dialectic reasoning
curl -s -X POST http://localhost:8000/v3/workspaces/hermes/peers/cosmo/chat \
  -H "Content-Type: application/json" \
  -d '{"query":"What do you know about this user?","agentic":true}'

Troubleshooting Checklist

No observations? Check FLUSH_ENABLED. This is the #1 cause.
401 AuthenticationError? Check PROVIDER is not "vllm" — only "openai", "anthropic", "gemini" are valid.
Embedding dimension mismatch? Switch from pgvector to LanceDB.
Config changes not applying? Verify volume mounts in docker-compose.yml for both api and deriver.
LanceDB permission errors? Use a named Docker volume instead of bind mount.

Key Takeaways

FLUSH_ENABLED=true is mandatory for personal use. The default false is for high-volume production only.
LanceDB > pgvector for flexible dimensions. No hardcoded 1536-dim constraint.
Config must be volume-mounted in both containers. The deriver will not see host changes otherwise.
DeepSeek v4-flash is excellent for deriver work. Slower than Gemini but far more thorough.
Verify end-to-end before declaring victory. Health check → message → observation → dialectic response.

Honcho with self-hosting is powerful once configured correctly. The documentation exists but scatters critical details across GitHub issues and source code. I hope this guide saves you the two days I spent debugging.

Recommended for you

Honcho

AI 에이전트 메모리 시스템 비교: Honcho vs Zep vs Mem0 vs Cognee

2 months ago • 9 min read

Hermes Agent

Building a Ghost CMS Publishing Pipeline for AI Agents

2 months ago • 3 min read

Honcho

Agent Memory Systems Compared: Honcho vs Zep vs Mem0 vs Cognee

2 months ago • 4 min read

PSR Ice Mining Economics: Analyzing the Lunar South Pole's Hidden Asset

달 남극 영구그림자구역(PSR)의 얼음 채굴 경제성 분석 (ko)

Moon Mining Commercialization Roadmap 2026-2035: Who, When, and How?

2026-2035 달 채굴 상용화 로드맵: 누가, 언제, 어떻게? (ko)

Self-Hosting Honcho: Agent Memory That Actually Works

What is Honcho?

Architecture

The Critical Fix: FLUSH_ENABLED

Complete Working Configuration

config.toml

Key Configuration Details

Model Selection

Verification Workflow

Troubleshooting Checklist

Key Takeaways

Tags

Gordon Jung

Recommended for you

AI 에이전트 메모리 시스템 비교: Honcho vs Zep vs Mem0 vs Cognee

Building a Ghost CMS Publishing Pipeline for AI Agents

Agent Memory Systems Compared: Honcho vs Zep vs Mem0 vs Cognee

PSR Ice Mining Economics: Analyzing the Lunar South Pole's Hidden Asset

달 남극 영구그림자구역(PSR)의 얼음 채굴 경제성 분석 (ko)

Moon Mining Commercialization Roadmap 2026-2035: Who, When, and How?

2026-2035 달 채굴 상용화 로드맵: 누가, 언제, 어떻게? (ko)

What is Honcho?

Architecture

The Critical Fix: FLUSH_ENABLED

Complete Working Configuration

config.toml

Key Configuration Details

Model Selection

Verification Workflow

Troubleshooting Checklist

Key Takeaways

Tags

Subscribe to our newsletter

Gordon Jung

Recommended for you

AI 에이전트 메모리 시스템 비교: Honcho vs Zep vs Mem0 vs Cognee

Building a Ghost CMS Publishing Pipeline for AI Agents

Agent Memory Systems Compared: Honcho vs Zep vs Mem0 vs Cognee