My workbench right now
So welcome to the first building-in-public status report. Four projects. Real problems. Real solutions. Real challenges. No bullshit. These will be shorter in the future, promise.
Here's what's on my workbench right now (if the video doesn't load, here's the link)
- Alfred (from 00:45) — My AI butler that refuses to cooperate
- Dovetail (from 21:30) — The CLI tool that makes me a better engineer without actually becoming one
- Joe (from 39:03) — A voice agent for construction that needs to interrupt people mid-sentence
- Jig (from 51:14) — A cognitive assembly line (like Henry Ford, but for AI)
Alfred: The Butler That Took 9 Months to Train
The Problem
Last August, I declared I was building my own AI butler. Because everyone kept promising that AI would handle your life for you, and it just... doesn't. The hallucinations alone make it unusable for anything that matters.
MCP came out in November and I thought, "Yeah, that's what we need." But even MCP is brittle. Connect Claude to a few servers, ask it to do something complex, and watch it drift off into la-la land the moment it's dealing with more than a handful of data points.
I needed something that could:
- Know things (about my business, my data, my world)
- Do things (reliably, the same way, every time)
- Save money (because I'm not paying Zapier $500/month for basic automations)
The Journey (AKA How Many Times I Pivoted)
Phase 1: n8n Workflows
Started building n8n workflows because "that's easier than coding." I figured Alfred could run these workflows for me and eventually build them for me too.
Then I hit the wall: connection costs. MCP connections to Make.com or Zapier or even n8n Cloud get expensive fast. So I went full self-hosting mode.
Phase 2: AlfredOS
Built a Railway template that deploys a bunch of open-source apps with one click—n8n, NoCoDB, Cal.com, LibreChat. The idea: you get your own little stack, fully self-hosted, and Alfred connects to all of it.
Problem: n8n licensing. I can't embed n8n workflows in a SaaS product without paying $50K/year.
So...it's a dead end.
Phase 3: The Breakthrough
Three weeks ago, I finally cracked it. And the solution came from letting go of n8n entirely.
Talk Mode vs Work Mode
The problem with AI agents isn't that they're dumb. It's that they're smart but not logical. They can't reliably follow step 1, 2, 3, 4 in order. There's no validation, no enforcement, no determinism.
So I built a separation of concerns:
Talk Mode is for exploration. Ad hoc conversations. "Hey Alfred, do this, do that." Flexible, agentic, potentially hallucinatory, but great for figuring out what you actually want.
Work Mode is where the magic happens. Once you figure out what works in Talk Mode, you say "learn it." And Alfred doesn't just save it to memory it creates deterministic code that will execute the exact same way, every single time.
Here's an example workflow:
"Hey Alfred, every Monday pull revenue from Stripe, support tickets from Zendesk, project updates from Linear, and Slack mentions. Send me a summary."
Alfred understands the intent, suggests 6 steps (not 45-node spaghetti), and when I say "learn it," it locks it down. Deterministic orchestration. Agentic execution within each step. In the Operator Bootcamp I call this a "Scaffolded Agent".
How It Actually Works
When you sign up for Alfred:
- You get an MCP server URL and API key
- A server gets provisioned for you automatically (I'm using Contabo while waiting for Hetzner verification, which, by the way, has more requirements than opening a bank account)
- LibreChat and NoCoDB get installed
- You add Alfred as a connector to Claude or ChatGPT
- Claude becomes Alfred for that session
The infrastructure is handled. The apps are open source. Your data is yours. And skills are deterministic.
What's Next
Pricing is still TBD. Probably around $40/month for the solo tier (you get a server with full root access, all the apps, and Alfred as your agent). The whole deployment system will stay open source if you just want to self-host without the agent.
For existing customers who paid early: you'll get the choice between Alfred Cloud access or a white-glove onboarding session. We'll figure it out.
If you want to check progress with: Here's my board for Alfred.
Dovetail: Making Me a Better Engineer (Without Being One)
The Problem
Here's my evolution as a "developer":
- 2014: IFTTT (archiving iPhone photos to Dropbox, feeling like a wizard)
- 2023: Make.com (expensive fast)
- 2024: n8n + Jet Admin + Airtable + Supabase (stack expanding, complexity exploding)
- 2025: Lovable + Supabase (felt powerful until it got brittle)
- Now: Claude Code (actually shipping)
The problem with Claude Code isn't capability, it's discipline.
I'd start a session, get excited, let Claude Code run wild, and two hours later realize it had replaced working features with placeholders. Or deleted database columns. Or pushed broken code directly to production.
I needed guardrails. Real ones.
The Journey
First I tried Claude plugins and commands. Kind of worked. Sometimes.
Then I tried sub-agents. Worked or didn't, randomly.
Then I tried various CLAUDE.md system prompts. Same story.
Every existing solution had the same problem: the agent was doing the orchestration. And agents are unreliable orchestrators.
The Solution: Deterministic Hooks
Dovetail is a CLI tool that wraps Claude Code with enforced workflows.
It's open source, can find it here: https://www.npmjs.com/package/@lumberjack-so/dovetail
You install it via npm, run dovetail init, and it:
- Creates a GitHub repo
- Creates a Linear project (with starter issues)
- Creates a Supabase project
- Creates a Fly.io app (staging + production)
- Scaffolds your codebase
- Installs hooks into Claude Code
Those hooks are the key. Here's what they do:
Session Start Hook: Gets context on the project → what branch you're on, recent commits, Linear issues, latest updates. Claude starts every session informed.
Pre-Prompt Hook: Before you send a prompt, checks if there's an active issue. No issue? Blocks the task and launches a sync agent to create one and branch properly.
Pre-Tool Hook: Before Claude writes to any file, verifies you're on the right branch and working on a real issue. No cowboy commits to main.
Post-Tool Hook: After completing work, creates a pull request, updates Linear, and asks if the issue is done.
The result? Claude Code literally cannot fuck up your production code. Every change is documented, branched, and reviewed. And because everything is externalized to Git and Linear, the agent always has good context—even with no memory.
The Real Power
Yesterday I told Claude: "I want to build this feature. Tell me the steps."
It said: "Six milestones, six steps each."
I said: "Fine. Launch six agents for milestone one. Make them write what needs to be written. Launch a seventh agent to orchestrate and test."
Because the hooks deterministically enforce how work happens, it just... worked. Same way every time.
Dovetail isn't stable yet. I'm releasing it anyway. Check the Linear roadmap for progress.
50% off with the code SANTA this week
Click here to join the Bootcamp
If you want to follow progress: Here's my board for Dovetail.
Joe: The Voice Agent That Needs to Interrupt You
The Problem
Joe is a voice agent for a construction company. The requirements are brutal:
- Less than $0.10/minute cost
- Less than 500ms latency
- 30 different tasks
- 60%+ task coverage
- 80%+ success rate
BUT: Joe needs to be able to interrupt me mid-conversation.
Normal voice agents are one-directional. You call, you talk, you wait, you hang up. But what if I want to say "Hey Joe, do this task, let me know when you're done" and then keep talking about something else? And when the background task finishes, Joe slides in: "By the way, that's done. Want to hear the results?"
That's a hard problem.
The Journey
Attempt 1: Custom OpenAI Realtime API
Built a voice agent from scratch. Expensive. Couldn't do the interruption thing.
Attempt 2: ElevenLabs Conversational AI
Beautiful. Smooth. Perfect except for one thing: no proactive mode. Can't do the interruption. (FYI it can now a lot better than VAPI)
Attempt 3: VAPI
Finally. VAPI's API allows it—but required extra plumbing.
The architecture now: Joe (VAPI) → routes tasks to backend → backend processes → calls back to Joe → Joe interrupts conversation.
The SQL Problem
The company has an ERP system. My first idea was simple: translate every user prompt to SQL, run it, return results.
"What's my bonus going to be based on the last two quarters?"
That's not a simple SQL query. That's multiple complex queries, joins, calculations. I was waiting minutes for responses.
The API Problem
The old API was technical—built for database operations, not human intent. So I built a new one. 84 endpoints, designed for operations: "create status update," "notify stakeholders," "generate financial projection."
Then I turned each endpoint into an MCP tool. Now Joe's toolbelt has 48 actions with schemas. No SQL translation needed.
The Solution
Current stack:
- VAPI for voice handling
- Groq inference running Llama Maverick 17B
- Custom operations API (84 endpoints → 48 MCP tools)
- MEM0 for memory caching
Results:
- Cost: $0.10/minute ✓
- Latency: 475ms ✓
- Task coverage: 80% (goes to 95% with Gmail + weather + a few more MCPs)
- Success rate: 80%+ in evals (probably higher—some logging issues)
The memory layer is clutch. Before Joe tries the expensive MCP calls, it checks: "Have I answered this before?" If yes, instant response. If no, do the work, then cache it.
I built a custom mem0 MCP server for this (the official one wasn't working for me). You can find it here: https://smithery.ai/server/@ssdavidai/mem0-mcp
What's Next
This just happend last week. Just before handover I found another issue with VAPI. After some testing I moved back to ElevenLabs Conversation Agents which now support ""async execution" for MCP and solved the entire problem.
Jig: The Cognitive Assembly Line
The Problem
I keep saying it: n8n workflows are deterministic but require manual plumbing. Agents are flexible but hallucinate.
What if I want both?
With n8n, a 5-step process becomes a 50-node workflow. JSON expressions, aggregates, merges, custom JavaScript. If anything breaks, everything breaks. And debugging at 2am because of an extra comma in a JSON is not my idea of a good time.
With agents, I have zero control. It's a black box. Works or doesn't.
The Insight: Henry Ford Had It Right
When Henry Ford invented the assembly line, he discovered something: his workers were pedestrians 40% of the time. Walking around the factory instead of working.
His solution: don't make people go to the work. Make work come to the people.
In knowledge work, we don't have pedestrians. We have context switching. Different problems require different context, and switching between them is expensive.
The cognitive assembly line flips it: context comes to you, not the other way around. The task structure stays the same, but the context is managed automatically.
The Solution: The ICAO Framework
Every operation in a business is a CRUD operation. You're always creating, reading, updating, or deleting some value in some database (even if that database is in your head).
The ICAO Framework describes every action with four components:
- Intent: What you want to achieve and why
- Context: What you need to know to do it
- Action: The specific thing you do
- Output: How you know it's done
This is fractal. A task can be described with ICAO. A workflow of tasks can be described with ICAO. An entire business unit can be described with ICAO.
The Implementation: Jig DSL
Jig uses a YAML-based domain-specific language to describe workflows. Every ICAO component maps to database entries. Orchestration happens through the database, not through code.
Why does this matter? Durability.
If step 3 fails, the database knows. Input, output, error—all logged. Instead of rerunning everything, you just fix step 3 and continue.
Intelligence-Agnostic Execution
In Jig it doesn't matter who executes a step.
Human? Works.
Claude agent? Works.
GPT agent? Works.
Manager? Engineer? Doesn't matter.
The context management layer handles compression and expansion automatically. The executor just gets what they need, nothing more.
How It Works
A Jig workflow runs like this:
- Spin up a Claude agent with specific tools and context
- Execute the task, generate output
- Fork the agent (it keeps its memories)
- Give the fork a new task, new tools, new access
- Repeat until final output
Each step is deterministic at the macro level, agentic at the micro level.
I tested this on a real use case: an operations manager sent me a 5-minute Loom video explaining how they do transaction reconciliation at their e-commerce company. I fed the transcript into Jig's architect (which converts raw descriptions into Jig YAML), and on the first attempt got 95%+ accuracy.
Not a new agent. Not custom code. Just describing the workflow and letting Jig handle the orchestration.
It's a system of externalization, automated.
Current Status
Jig has a full database with execution logging, cost tracking, token usage, and context ledgers. There's a Jig MCP server so you can run workflows from Claude or ChatGPT.
Bonus: No Code No Clue
Because apparently I don't have enough projects, I'm also turning N8N workflows into tutorials and radio plays.
Yes, radio plays.
You feed it an n8n workflow JSON, and it:
- Analyzes what the workflow does
- Generates context about when it's useful (and when it's overkill)
- Creates a sitcom-style script with persistent actors
- Generates the full audio using ElevenLabs V3
- Writes an SEO-optimized tutorial
- Posts it to Ghost as a blog article
The whole thing runs from the terminal. Fully automated.
I built it in less than an hour using Dovetail-powered Claude Code. The original n8n workflow I was building to do the same thing was abandoned after hours of debugging.
Follow along
Everything I'm building is trying to solve the same fundamental problem: how do you get AI to be reliable without losing its flexibility?
Alfred does it with Talk Mode vs Work Mode.
Dovetail does it with enforced Git/Linear workflows.
Joe does it with structured API endpoints and memory caching.
Jig does it with database-orchestrated, intelligence-agnostic execution.
Same problem, different angles.
I'll be posting weekly updates here on the Lumberjack. Shorter. More frequent. Less theory, more "here's what I built this week."
If you have questions, I'm always an email away.
