i've been writing about AI-driven development in bits and pieces across my blog posts - from the early skepticism to the motivation barrier breakthrough to the convergence post where i started talking about testability as the linchpin of AI-written code. i've now been running this setup for about 6 months across devpad, budget-sync, corpus, and a handful of other projects. it's time to write the whole thing down.
this post covers the entire stack: opencode as the terminal AI agent, a multi-agent orchestration workflow i wrote, why corpus and Result<T, E> types are the secret sauce that makes AI-generated code actually reliable, and how the devpad MCP server ties my project management directly into the AI loop.
the problem with vibe coding
in my earlier post on gen-ai, i talked about feeling "robbed of my own progress" when relying solely on AI. the projects weren't structured the way i'd want them, the code was verbose and over-fitted to edge cases, and the AI would fill up its own context window writing massive files. the cracked engineers at Amazon weren't using AI to save time coding - they were using it to save time identifying areas in the code.
that observation led me to a question: what if the problem isn't AI writing code, but AI writing code without constraints? if i could give the AI a strict set of patterns, a way to verify its own output, and a workflow that prevents it from going off the rails - maybe it could actually produce code i'd be proud of.
turns out the answer was three things working together:
- a multi-agent orchestration system that separates planning from coding
- a functional error handling library that makes AI output type-checkable and testable
- MCP integrations that give the AI context about what to build
opencode: the terminal agent
i use opencode as my primary AI coding tool. it's open-source, not coupled to any single provider, and built by the sst team who clearly love terminals as much as i do. i run it inside tmux sessions - usually 2-3 projects going simultaneously, each with their own opencode instance.
the config lives at ~/.config/opencode/opencode.json:
{
"plugin": ["opencode-anthropic-auth@latest", "ocn"],
"instructions": ["~/.config/opencode/workflow.md", "~/.config/opencode/opencode-tooling.md"],
"mcp": {
"devpad": {
"type": "local",
"command": ["node", "/Users/tom/dev/devpad/packages/mcp/dist/index.js"],
"environment": {
"DEVPAD_API_KEY": "..."
}
}
}
}
the two instructions files are where the real magic happens - they get injected into every agent's system prompt. the mcp block connects devpad as a tool the AI can call directly. more on that later.
opencode gives you built-in agents (build, plan, explore) and lets you define custom sub-agents. the build agent is the default orchestrator - it reads files, dispatches work, and communicates with you. the plan agent is read-only, great for exploring unfamiliar code without risk. on top of these, i've defined two custom sub-agents: planner and coder.
here's what the full agent ecosystem looks like:
+-----------------------+
| YOU (terminal) |
+-----------+-----------+
|
v
+-----------+-----------+
| build (orchestrator) |
| reads workflow.md |
| dispatches agents |
+-----------+-----------+
|
+---------------------+---------------------+
| | |
v v v
+----------+--------+ +--------+--------+ +---------+---------+
| explore | | planner | | coder |
| (built-in) | | (custom agent) | | (custom agent) |
| | | | | |
| - file search | | - loads skills | | - loads skills |
| - codebase scan | | - writes plans | | - writes code |
| - read-only | | - .plans/*.md | | - runs tests |
| - quick/med/deep | | - phases work | | - verification |
+-------------------+ +-----------------+ | - git commits |
+-------------------+
the orchestration workflow
the workflow is defined in ~/.config/opencode/workflow.md and gets injected into every agent. it has 4 hard rules:
- never write code directly - the orchestrator is a dispatcher, not a programmer
- always commit after each phase - verification + commit after every phase, never batch
- always use the planner agent for plans - structured output that coder agents consume
- phase execution is strictly sequential - parallel within a phase, sequential between phases
these rules exist because without them, the AI will happily write 500 lines, skip testing, and move on. the orchestrator pattern forces a discipline loop:
+----------+ +----------+ +----------+ +----------+
| EXPLORE |---->| PLANNER |---->| CODER |---->| VERIFY |
| (context)| | (plan) | | (impl) | | (test) |
+----------+ +----+-----+ +----+-----+ +----+-----+
| | |
v v v
.plans/feat.md source code typecheck +
test + lint
|
v
git commit
|
+--------------------+
| next phase
v
+----+-----+
| CODER | (parallel agents OK
| CODER | within a phase)
| CODER |
+----+-----+
|
v
+----+-----+
| VERIFY |---> git commit
+----------+
for a real feature, the flow looks something like this:
User: "add a bookmark system to the app"
1. EXPLORE (medium): scan package structure, data models, test patterns
2. PLANNER (loads plan-feature + testing-strategy):
-> writes .plans/bookmark-system.md
-> Phase 1: Schema changes (sequential)
-> Phase 2: BookmarkService + API route (parallel)
-> Phase 3: UI integration (sequential)
3. CODER Phase 1: schema changes
-> loads drizzle-schema skill
-> adds bookmark table, generates migration
-> VERIFY: typecheck, migrate, COMMIT
4. CODER Phase 2 (parallel):
-> Agent A: BookmarkService (loads corpus-patterns, testing-strategy)
-> Agent B: /bookmarks API route
-> both skip verification (running in parallel)
-> VERIFY: typecheck, test, COMMIT
5. CODER Phase 3: UI wiring
-> VERIFY: full test suite, COMMIT
the key insight is that parallel coders skip verification. type errors are expected when only half the changes have landed. the verification agent runs after all parallel work completes, fixes any integration issues, and creates a single atomic commit.
skills: on-demand knowledge
one of the best features of opencode is the skills system. skills are markdown files that agents load on demand when they need specific recipes. instead of stuffing everything into the system prompt (which would eat context), agents call skill({ name: "corpus-patterns" }) right before they need it.
i have 9 skills:
~/.config/opencode/skills/
+-- corpus-patterns/ Result<T,E>, pipe() chains, ok()/err()
+-- drizzle-schema/ schema definition, queries, migrations
+-- monorepo-setup/ bun workspaces, package scaffolding
+-- testing-strategy/ in-memory fakes, Provider pattern
+-- git-workflow/ commit template, branch strategy
+-- plan-new-project/ greenfield planning checklist
+-- plan-feature/ feature planning for existing codebases
+-- plan-refactor/ refactor planning with dependency analysis
+-- plan-api-integration/ third-party API integration planning
the planner agent loads the appropriate planning skill based on task type. the coder agent loads recipe skills before writing code. this means when the coder writes a Drizzle migration, it follows the exact patterns from the drizzle-schema skill. when it writes error handling, it uses @f0rbit/corpus Result types from the corpus-patterns skill. consistency is enforced by knowledge injection, not by hope.
here's the skill loading matrix:
Task Type Planner Loads Coder Loads
+-----------+ +------------------+ +------------------+
| new |---------->| plan-new-project | | monorepo-setup |
| project | | monorepo-setup | | drizzle-schema |
| | | testing-strategy | | corpus-patterns |
+-----------+ +------------------+ | testing-strategy |
| feature |---------->| plan-feature | +------------------+
| addition | | testing-strategy |
+-----------+ +------------------+
| refactor |---------->| plan-refactor |
+-----------+ +------------------+
| API |---------->| plan-api-integ. |
| integ. | | testing-strategy |
+-----------+ +------------------+ +-----------------+
| commit | | |------>| git-workflow |
+-----------+ +------------------+ +-----------------+
corpus & Result types: why AI loves them
this is the part i'm most excited about. corpus started as a functional snapshotting library, but the Result<T, E> type and pipe() chains that it exports have become the backbone of how i structure all my code. and it turns out this pattern is incredibly AI-friendly.
here's the core idea from the corpus-patterns skill:
import { ok, err, pipe, type Result } from "@f0rbit/corpus";
// every fallible function returns Result<T, E>
const getUser = async (id: string): Promise<Result<User, NotFoundError>> => {
const user = await db.select().from(users).where(eq(users.id, id)).get();
if (!user) return err({ code: "NOT_FOUND", message: `User ${id} not found` });
return ok(user);
};
// chain operations - short-circuits on first error
const result = await pipe(getUser(id))
.flat_map(user => getPostsForUser(user.id))
.map(posts => ({ ...user, posts }))
.map_err(e => errors.apiError(500, e.message))
.result();
if (!result.ok) return result; // propagate error
never throw. never try/catch. errors are values.
why does this matter for AI development? three reasons:
1. the compiler catches what the AI misses
when every function returns Result<T, E>, the typescript compiler enforces that the caller handles both the success and error case. if the AI forgets to check result.ok before accessing .value, it's a type error. the verification agent catches it during tsc --noEmit and fixes it. no runtime surprises.
Traditional code: Result-based code:
+-------------------+ +-------------------+
| try { | | const result = |
| const user = | | await getUser() |
| await getUser | | |
| // happy path | | if (!result.ok) { |
| } catch (e) { | compiler | return result | <-- compiler
| // maybe handle | says NOTHING | // must handle | ENFORCES
| // maybe forget | about this | } | this check
| } | | |
+-------------------+ | result.value.name |
+-------------------+
2. pipe chains are composition-friendly
AI models are great at generating small, isolated functions. they're terrible at integrating 5 functions into a coherent flow with proper error handling. pipe() gives them a mechanical pattern to follow - chain .flat_map() for fallible operations, .map() for transforms, .map_err() to translate error types. the skill teaches this exact pattern, and the AI produces remarkably consistent code.
3. in-memory testing becomes trivial
because errors are values (not exceptions), you can test the entire error path by just checking result.ok === false. no mocking frameworks, no jest.spyOn, no try/catch in tests. combine this with the Provider pattern from the testing-strategy skill:
// interface
interface NotificationProvider {
send(userId: string, msg: string): Promise<Result<void, SendError>>;
}
// production implementation
class SlackNotificationProvider implements NotificationProvider { ... }
// test implementation - no mocking required
class InMemoryNotificationProvider implements NotificationProvider {
sent: Array<{ userId: string; msg: string }> = [];
async send(userId: string, msg: string) {
this.sent.push({ userId, msg });
return ok(undefined);
}
}
this is what i meant in the convergence post when i said "testability is one of the core concepts you want to think about when designing software architecture that is going to be driven by AI development." the entire testing strategy revolves around:
- in-memory representations over mocking - always
- integration tests over unit tests - test user workflows, not implementation details
- Provider pattern for third-party APIs - never mock HTTP calls
Production Stack Test Stack
+------------------+ +------------------+
| API Route | | Test Runner |
+--------+---------+ +--------+---------+
| |
+--------v---------+ +--------v---------+
| BookmarkService | | BookmarkService | <-- same service
+--------+---------+ +--------+---------+ same code
| |
+--------v---------+ +--------v---------+
| Drizzle (SQLite) | | Drizzle (:memory)| <-- in-memory DB
+--------+---------+ +--------+---------+
| |
+--------v---------+ +--------v---------+
| SlackProvider | | InMemoryProvider | <-- no network calls
+------------------+ +------------------+
the AI writes tests that use createTestDb() (SQLite in-memory), swaps in InMemoryProvider for external services, and runs the actual service code against it. when tests pass, i have high confidence the production path works too.
custom commands
opencode supports custom slash commands, which are just markdown files in ~/.config/opencode/commands/. i have four:
| Command | What it does |
|---------|-------------|
| /plan <description> | runs explore-first, then planner - stops before implementation |
| /verify [area] | typecheck + test + lint cycle, fixes issues |
| /review | read-only audit of uncommitted changes |
| /health | scans for test gaps, TODOs, dead code, incomplete plans |
/plan is what i use when i want to think through something without committing to code. it produces a .plans/ file that i can review, annotate, and then hand off to the coder agent later. /verify is the phase-end verification step that the workflow automates, but i can also call it manually whenever i want a sanity check.
AGENTS.md: self-learning per project
each project accumulates knowledge in an AGENTS.md file at its root. this gets auto-loaded by opencode for all agents working in that project. it captures things that are too specific for global skills but too valuable to rediscover every session:
- project structure overview
- conventions that differ from global defaults
- known gotchas
- patterns discovered during development
the evolution workflow is built into the orchestration: after completing a feature, the workflow suggests AGENTS.md updates based on what was learned. updates only happen after i confirm - agents never write to it directly.
Global Config (~/.config/opencode/) Per-Project (./AGENTS.md)
+---------------------------------------+ +---------------------------+
| workflow.md -> all agents | | "this project uses the |
| tooling.md -> all agents | | legacy auth pattern |
| skills/ -> on-demand | | because..." |
| agent/planner -> planner only | | |
| agent/coder -> coder only | | "env vars: DEVPAD_API_KEY |
+---------------------------------------+ | must be set for MCP" |
| | |
| loaded into system prompt | "drizzle migrations live |
+-------------------------------->| in packages/schema/..." |
+---------------------------+
devpad MCP integration
devpad is my project management tool - it tracks projects, tasks, milestones, goals, and even scans codebases for // TODO comments. it has an API and, more importantly, an MCP server that plugs directly into opencode.
this means the AI can:
- list my projects and their current status
- read/create/update tasks with priority, progress, and tags
- manage milestones and goals for roadmap planning
- read/write blog posts (yes, this post could be managed through devpad)
- view project history and recent commits
the MCP server exposes tools like devpad_tasks_list, devpad_projects_get, devpad_milestones_upsert, devpad_blog_posts_create, etc. when i tell the AI "check what tasks are left for devpad", it calls the MCP tool directly:
+-------------------+ MCP Protocol +-------------------+
| opencode |<------------------->| devpad MCP |
| (AI agent) | | server |
| | devpad_tasks_list | |
| "what tasks are |-------------------->| queries devpad |
| left for the | | API with auth |
| devpad project?"| returns tasks[] | |
| |<--------------------| |
+-------------------+ +---+---------------+
|
v
+-------+-------+
| devpad.tools |
| (Astro + |
| Hono API) |
+-------+-------+
|
+-------v-------+
| SQLite DB |
| (projects, |
| tasks, goals,|
| milestones, |
| blog posts) |
+---------------+
the beautiful thing about this is the feedback loop. the AI reads tasks from devpad, plans implementations using the orchestration workflow, writes code, and could even update the task status when it's done. it closes the gap between "what should i work on" and "implement it" into a single terminal session.
what devpad MCP exposes
here's the full surface area of tools the AI has access to:
Projects Tasks Milestones & Goals
+------------------+ +------------------+ +------------------+
| projects_list | | tasks_list | | milestones_list |
| projects_get | | tasks_get | | milestones_get |
| projects_upsert | | tasks_upsert | | milestones_upsert|
| projects_delete | | tasks_delete | | milestones_delete|
| projects_history | | tasks_history | | milestones_goals |
| projects_config | | tasks_save_tags | | goals_list |
| projects_spec | +------------------+ | goals_get |
+------------------+ | goals_upsert |
| goals_delete |
Blog Activity +------------------+
+------------------+ +------------------+
| blog_posts_list | | activity_ai |
| blog_posts_get | | user_history |
| blog_posts_create| +------------------+
| blog_posts_update|
| blog_posts_delete| GitHub
| blog_tags_list | +------------------+
| blog_categories | | github_repos |
+------------------+ | github_branches |
+------------------+
the full picture
here's everything wired together:
YOU
|
| tmux session(s)
v
+=================================================================+
| OPENCODE |
| |
| +------------------+ +-----------------------------------+ |
| | opencode.json | | instructions (injected globally) | |
| | | | | |
| | plugins: | | workflow.md: | |
| | - anthropic-auth| | orchestration patterns | |
| | - ocn | | hard rules (never code direct) | |
| | | | phase execution model | |
| | mcp: | | | |
| | - devpad server | | opencode-tooling.md: | |
| +------------------+ | tool usage, output style | |
| +-----------------------------------+ |
| |
| agents/ skills/ (loaded on demand) |
| +----------+----------+ +----+----+----+----+----+----+----+ |
| | explore | planner | |corp|driz|mono|test|git |plan|plan| |
| | (search) | (arch.) | |us |zle |repo|ing |wkfl|new |feat| |
| +----------+----------+ +----+----+----+----+----+----+----+ |
| | coder | general | |plan|plan| |
| | (impl.) | (research| |refr|api | 9 skills total |
| +----------+----------+ +----+----+ |
| |
| commands/ |
| +--------+--------+---------+--------+ |
| | /plan | /verify| /review | /health| |
| +--------+--------+---------+--------+ |
| |
+==========================+=====+================================+
| |
MCP tools | | filesystem + git
v v
+-------------------+ +---------+ +-------------------+
| devpad.tools | | project | | AGENTS.md |
| (projects, tasks, | | source | | (per-project |
| milestones, | | code | | learning) |
| blog, timeline) | +---------+ +-------------------+
+-------------------+
daily workflow
here's what a typical evening session looks like:
- open
tmux, split into 2-3 panes cdinto a project, runopencode- ask it to check devpad for open tasks: "what tasks are left for this project?"
- pick a task, tell it to
/planthe implementation - review the plan in
.plans/, give feedback - tell it to execute - it spawns explore -> planner -> coder agents
- while it cooks, switch to another tmux pane for a different project
- come back, review the diff, maybe tell it to refine tests or refactor
- each phase gets verified (typecheck, test) and committed automatically
- rinse and repeat
the key difference from pure vibe-coding is constraints. the AI isn't freewheeling - it's following a plan, using skills that enforce patterns, writing code that gets type-checked and tested after every phase, and using Result types that make the compiler catch what the AI misses.
this setup has genuinely changed my relationship with AI coding. i went from the skepticism in my gen-ai thoughts post - wanting a "read-only mode" for AI - to actually trusting the output. not because the AI got smarter, but because the system around it got better at catching mistakes.
what's next
i'm looking at a few things to improve this setup:
- extending devpad MCP with media timeline integration (media-timeline) for tracking what i'm actually working on across projects
- better refactoring workflows - as mentioned in the convergence post, getting the AI to do multiple "refactoring passes" after a feature reduces slop significantly
- corpus observations - using corpus's versioned snapshotting to track AI-generated code quality over time
- cloudflare workers deployment - moving more of the devpad stack to cloudflare for cloud-native deployment
if you're interested in any of the tools mentioned here:
- opencode - the open source AI coding agent (118k stars and counting)
- @f0rbit/corpus - functional snapshotting library with Result types (docs)
- devpad - project management + MCP server (live)
- budget-sync - personal finance CLI built entirely with this workflow
- forbit.dev - my personal site where all these projects live
let's see where this goes.