TL;DR
- Replaced 200+ YAML config files with 40+ PostgreSQL tables — query times dropped 98%
- Idempotent upsert migrations (
ON CONFLICT DO UPDATE) made every migration safely re-runnable - JSONB columns handle schema evolution without breaking changes
- YAML files kept as read-only exports — developers can still browse them, but DB is the source of truth
The Problem: YAML at Scale
If you're building a developer tool that manages projects, sprints, tasks, or configuration, you've probably started with YAML or JSON files. We did too. It works great at first — human-readable, version-controlled, easy to edit.
Then your tool grows. Ours looked like this:
sprints/
project-a/
sprint-01.yaml ... sprint-55.yaml
project-b/
sprint-01.yaml ...
config/
learnings/
2026-01-15-cache-pattern.yaml
...213 files...
events.yaml # 4000+ lines, growing daily
The breaking point? Two automated processes tried to update the same YAML file simultaneously. One process's changes silently overwrote the other's. We lost data that had to be manually re-entered.
The lesson: The moment you have more than one writer — whether that's a developer, a CI pipeline, or an AI agent — file-based storage becomes a race condition waiting to happen.
Why PostgreSQL, Not Just "Any Database"
We chose PostgreSQL specifically for three features that YAML files can't replicate:
- ACID transactions — concurrent writes are safe by default (MVCC handles isolation)
- JSONB columns — schema flexibility without losing query power. New fields go in JSONB first, graduate to real columns when stable
- pgvector extension — semantic search over knowledge entries using embedding vectors, something no file-based approach can match
The Migration Strategy: Idempotent Upserts
The key insight that made migration painless: every SQL statement must be safely re-runnable.
-- Every table uses IF NOT EXISTS
CREATE TABLE IF NOT EXISTS sprints (
id SERIAL PRIMARY KEY,
project_id INTEGER REFERENCES projects(id),
name VARCHAR(200) NOT NULL,
status VARCHAR(50) DEFAULT 'active',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(project_id, name)
);
-- Every INSERT uses ON CONFLICT
INSERT INTO sprints (project_id, name, status)
VALUES (1, 'sprint-alpha', 'active')
ON CONFLICT (project_id, name) DO UPDATE
SET metadata = sprints.metadata || EXCLUDED.metadata,
updated_at = NOW();
This means you can run any migration file twice (or ten times) and get the same result. No "migration already applied" errors. No manual rollbacks.
Schema Evolution: JSONB as an Escape Hatch
Traditional schema changes are painful — add a column, every config file needs updating, backward compatibility breaks. With JSONB:
-- Need a new field? Just add it to metadata
UPDATE sprints
SET metadata = metadata || '{"priority_score": 85}'
WHERE id = 42;
-- Query it immediately
SELECT name, metadata->>'priority_score' as priority
FROM sprints
WHERE (metadata->>'priority_score')::int > 80;
Our rule: new fields start in metadata JSONB. If they're used consistently across 3+ cycles, promote to a real column.
Performance: The Numbers
| Operation | YAML | PostgreSQL | Gain |
|---|---|---|---|
| List all items | 2.3s | 45ms | 51x |
| Full-text search | 1.8s | 62ms | 29x |
| Update one record | 340ms | 38ms | 9x |
| Concurrent writes | Race condition | ACID | Safe |
| Cross-project query | 4.1s | 78ms | 53x |
Five Rules We Live By
- DB is the only source of truth — if it's not in PostgreSQL, it doesn't exist
- Every INSERT has ON CONFLICT — no migration should fail on a second run
- YAML is export-only — generated from DB, never edited directly
- New fields start in JSONB — graduate to columns when stable
- IF NOT EXISTS on everything — CREATE TABLE, ADD COLUMN, CREATE INDEX
When to Make the Switch
You should consider DB-first when you have more than one writer, need cross-project queries, or your files exceed 50 files or 1000 lines. The migration is worth it earlier than you think.