💡
TL;DR: I'm moving away from deep analysis in my content and towards sharing more about the projects I'm working on. This is the first build in public memo, let me know if you like it.

My workbench right now

So welcome to the first building-in-public status report. Four projects. Real problems. Real solutions. Real challenges. No bullshit. These will be shorter in the future, promise.

Here's what's on my workbench right now (if the video doesn't load, here's the link)

  • Alfred (from 00:45) — My AI butler that refuses to cooperate
  • Dovetail (from 21:30) — The CLI tool that makes me a better engineer without actually becoming one
  • Joe (from 39:03) — A voice agent for construction that needs to interrupt people mid-sentence
  • Jig (from 51:14) — A cognitive assembly line (like Henry Ford, but for AI)

💡
Have you seen the new Operator Bootcamp? A 5 module, 40 lesson bootcamp that will teach you my protocol of building apps, agents and automations. You will become someone who can vibe code products that work no matter your current level of technical skill. This week you can get it for 50% off by using the discount code SANTA

Alfred: The Butler That Took 9 Months to Train

The Problem

Last August, I declared I was building my own AI butler. Because everyone kept promising that AI would handle your life for you, and it just... doesn't. The hallucinations alone make it unusable for anything that matters.

MCP came out in November and I thought, "Yeah, that's what we need." But even MCP is brittle. Connect Claude to a few servers, ask it to do something complex, and watch it drift off into la-la land the moment it's dealing with more than a handful of data points.

I needed something that could:

  1. Know things (about my business, my data, my world)
  2. Do things (reliably, the same way, every time)
  3. Save money (because I'm not paying Zapier $500/month for basic automations)

The Journey (AKA How Many Times I Pivoted)

Phase 1: n8n Workflows

Started building n8n workflows because "that's easier than coding." I figured Alfred could run these workflows for me and eventually build them for me too.

Then I hit the wall: connection costs. MCP connections to Make.com or Zapier or even n8n Cloud get expensive fast. So I went full self-hosting mode.

Phase 2: AlfredOS

Built a Railway template that deploys a bunch of open-source apps with one click—n8n, NoCoDB, Cal.com, LibreChat. The idea: you get your own little stack, fully self-hosted, and Alfred connects to all of it.

Problem: n8n licensing. I can't embed n8n workflows in a SaaS product without paying $50K/year.

So...it's a dead end.

Phase 3: The Breakthrough

Three weeks ago, I finally cracked it. And the solution came from letting go of n8n entirely.

Talk Mode vs Work Mode

The problem with AI agents isn't that they're dumb. It's that they're smart but not logical. They can't reliably follow step 1, 2, 3, 4 in order. There's no validation, no enforcement, no determinism.

So I built a separation of concerns:

Talk Mode is for exploration. Ad hoc conversations. "Hey Alfred, do this, do that." Flexible, agentic, potentially hallucinatory, but great for figuring out what you actually want.

Work Mode is where the magic happens. Once you figure out what works in Talk Mode, you say "learn it." And Alfred doesn't just save it to memory it creates deterministic code that will execute the exact same way, every single time.

Here's an example workflow:

"Hey Alfred, every Monday pull revenue from Stripe, support tickets from Zendesk, project updates from Linear, and Slack mentions. Send me a summary."

Alfred understands the intent, suggests 6 steps (not 45-node spaghetti), and when I say "learn it," it locks it down. Deterministic orchestration. Agentic execution within each step. In the Operator Bootcamp I call this a "Scaffolded Agent".

How It Actually Works

When you sign up for Alfred:

  1. You get an MCP server URL and API key
  2. A server gets provisioned for you automatically (I'm using Contabo while waiting for Hetzner verification, which, by the way, has more requirements than opening a bank account)
  3. LibreChat and NoCoDB get installed
  4. You add Alfred as a connector to Claude or ChatGPT
  5. Claude becomes Alfred for that session

The infrastructure is handled. The apps are open source. Your data is yours. And skills are deterministic.

What's Next

Pricing is still TBD. Probably around $40/month for the solo tier (you get a server with full root access, all the apps, and Alfred as your agent). The whole deployment system will stay open source if you just want to self-host without the agent.

For existing customers who paid early: you'll get the choice between Alfred Cloud access or a white-glove onboarding session. We'll figure it out.

If you want to check progress with: Here's my board for Alfred.


Dovetail: Making Me a Better Engineer (Without Being One)

The Problem

Here's my evolution as a "developer":

  • 2014: IFTTT (archiving iPhone photos to Dropbox, feeling like a wizard)
  • 2023: Make.com (expensive fast)
  • 2024: n8n + Jet Admin + Airtable + Supabase (stack expanding, complexity exploding)
  • 2025: Lovable + Supabase (felt powerful until it got brittle)
  • Now: Claude Code (actually shipping)

The problem with Claude Code isn't capability, it's discipline.

I'd start a session, get excited, let Claude Code run wild, and two hours later realize it had replaced working features with placeholders. Or deleted database columns. Or pushed broken code directly to production.

I needed guardrails. Real ones.

The Journey

First I tried Claude plugins and commands. Kind of worked. Sometimes.

Then I tried sub-agents. Worked or didn't, randomly.

Then I tried various CLAUDE.md system prompts. Same story.

Every existing solution had the same problem: the agent was doing the orchestration. And agents are unreliable orchestrators.

The Solution: Deterministic Hooks

Dovetail is a CLI tool that wraps Claude Code with enforced workflows.

It's open source, can find it here: https://www.npmjs.com/package/@lumberjack-so/dovetail

💡
Dovetail is in WIP. Currently hooks don't seem to work, but the scaffolding and onboarding does.

You install it via npm, run dovetail init, and it:

  1. Creates a GitHub repo
  2. Creates a Linear project (with starter issues)
  3. Creates a Supabase project
  4. Creates a Fly.io app (staging + production)
  5. Scaffolds your codebase
  6. Installs hooks into Claude Code

Those hooks are the key. Here's what they do:

Session Start Hook: Gets context on the project → what branch you're on, recent commits, Linear issues, latest updates. Claude starts every session informed.

Pre-Prompt Hook: Before you send a prompt, checks if there's an active issue. No issue? Blocks the task and launches a sync agent to create one and branch properly.

Pre-Tool Hook: Before Claude writes to any file, verifies you're on the right branch and working on a real issue. No cowboy commits to main.

Post-Tool Hook: After completing work, creates a pull request, updates Linear, and asks if the issue is done.

The result? Claude Code literally cannot fuck up your production code. Every change is documented, branched, and reviewed. And because everything is externalized to Git and Linear, the agent always has good context—even with no memory.

The Real Power

Yesterday I told Claude: "I want to build this feature. Tell me the steps."

It said: "Six milestones, six steps each."

I said: "Fine. Launch six agents for milestone one. Make them write what needs to be written. Launch a seventh agent to orchestrate and test."

Because the hooks deterministically enforce how work happens, it just... worked. Same way every time.

Dovetail isn't stable yet. I'm releasing it anyway. Check the Linear roadmap for progress.

💡
I explicitly teach this methodology and framework in the Operator Bootcamp. You're only 30 days away from building things like this for yourself.

50% off with the code SANTA this week
Click here to join the Bootcamp

If you want to follow progress: Here's my board for Dovetail.


Joe: The Voice Agent That Needs to Interrupt You

The Problem

Joe is a voice agent for a construction company. The requirements are brutal:

  • Less than $0.10/minute cost
  • Less than 500ms latency
  • 30 different tasks
  • 60%+ task coverage
  • 80%+ success rate

BUT: Joe needs to be able to interrupt me mid-conversation.

Normal voice agents are one-directional. You call, you talk, you wait, you hang up. But what if I want to say "Hey Joe, do this task, let me know when you're done" and then keep talking about something else? And when the background task finishes, Joe slides in: "By the way, that's done. Want to hear the results?"

That's a hard problem.

The Journey

Attempt 1: Custom OpenAI Realtime API

Built a voice agent from scratch. Expensive. Couldn't do the interruption thing.

Attempt 2: ElevenLabs Conversational AI

Beautiful. Smooth. Perfect except for one thing: no proactive mode. Can't do the interruption. (FYI it can now a lot better than VAPI)

Attempt 3: VAPI

Finally. VAPI's API allows it—but required extra plumbing.

The architecture now: Joe (VAPI) → routes tasks to backend → backend processes → calls back to Joe → Joe interrupts conversation.

The SQL Problem

The company has an ERP system. My first idea was simple: translate every user prompt to SQL, run it, return results.

"What's my bonus going to be based on the last two quarters?"

That's not a simple SQL query. That's multiple complex queries, joins, calculations. I was waiting minutes for responses.

The API Problem

The old API was technical—built for database operations, not human intent. So I built a new one. 84 endpoints, designed for operations: "create status update," "notify stakeholders," "generate financial projection."

Then I turned each endpoint into an MCP tool. Now Joe's toolbelt has 48 actions with schemas. No SQL translation needed.

The Solution

Current stack:

  • VAPI for voice handling
  • Groq inference running Llama Maverick 17B
  • Custom operations API (84 endpoints → 48 MCP tools)
  • MEM0 for memory caching

Results:

  • Cost: $0.10/minute ✓
  • Latency: 475ms ✓
  • Task coverage: 80% (goes to 95% with Gmail + weather + a few more MCPs)
  • Success rate: 80%+ in evals (probably higher—some logging issues)

The memory layer is clutch. Before Joe tries the expensive MCP calls, it checks: "Have I answered this before?" If yes, instant response. If no, do the work, then cache it.

I built a custom mem0 MCP server for this (the official one wasn't working for me). You can find it here: https://smithery.ai/server/@ssdavidai/mem0-mcp

What's Next

This just happend last week. Just before handover I found another issue with VAPI. After some testing I moved back to ElevenLabs Conversation Agents which now support ""async execution" for MCP and solved the entire problem.


Jig: The Cognitive Assembly Line

The Problem

I keep saying it: n8n workflows are deterministic but require manual plumbing. Agents are flexible but hallucinate.

What if I want both?

With n8n, a 5-step process becomes a 50-node workflow. JSON expressions, aggregates, merges, custom JavaScript. If anything breaks, everything breaks. And debugging at 2am because of an extra comma in a JSON is not my idea of a good time.

With agents, I have zero control. It's a black box. Works or doesn't.

The Insight: Henry Ford Had It Right

When Henry Ford invented the assembly line, he discovered something: his workers were pedestrians 40% of the time. Walking around the factory instead of working.

His solution: don't make people go to the work. Make work come to the people.

In knowledge work, we don't have pedestrians. We have context switching. Different problems require different context, and switching between them is expensive.

The cognitive assembly line flips it: context comes to you, not the other way around. The task structure stays the same, but the context is managed automatically.

The Solution: The ICAO Framework

Every operation in a business is a CRUD operation. You're always creating, reading, updating, or deleting some value in some database (even if that database is in your head).

The ICAO Framework describes every action with four components:

  • Intent: What you want to achieve and why
  • Context: What you need to know to do it
  • Action: The specific thing you do
  • Output: How you know it's done

This is fractal. A task can be described with ICAO. A workflow of tasks can be described with ICAO. An entire business unit can be described with ICAO.

The Implementation: Jig DSL

Jig uses a YAML-based domain-specific language to describe workflows. Every ICAO component maps to database entries. Orchestration happens through the database, not through code.

Why does this matter? Durability.

If step 3 fails, the database knows. Input, output, error—all logged. Instead of rerunning everything, you just fix step 3 and continue.

Intelligence-Agnostic Execution

In Jig it doesn't matter who executes a step.

Human? Works.
Claude agent? Works.
GPT agent? Works.
Manager? Engineer? Doesn't matter.

The context management layer handles compression and expansion automatically. The executor just gets what they need, nothing more.

How It Works

A Jig workflow runs like this:

  1. Spin up a Claude agent with specific tools and context
  2. Execute the task, generate output
  3. Fork the agent (it keeps its memories)
  4. Give the fork a new task, new tools, new access
  5. Repeat until final output

Each step is deterministic at the macro level, agentic at the micro level.

I tested this on a real use case: an operations manager sent me a 5-minute Loom video explaining how they do transaction reconciliation at their e-commerce company. I fed the transcript into Jig's architect (which converts raw descriptions into Jig YAML), and on the first attempt got 95%+ accuracy.

Not a new agent. Not custom code. Just describing the workflow and letting Jig handle the orchestration.

It's a system of externalization, automated.

Current Status

Jig has a full database with execution logging, cost tracking, token usage, and context ledgers. There's a Jig MCP server so you can run workflows from Claude or ChatGPT.


Bonus: No Code No Clue

Because apparently I don't have enough projects, I'm also turning N8N workflows into tutorials and radio plays.

Yes, radio plays.

You feed it an n8n workflow JSON, and it:

  1. Analyzes what the workflow does
  2. Generates context about when it's useful (and when it's overkill)
  3. Creates a sitcom-style script with persistent actors
  4. Generates the full audio using ElevenLabs V3
  5. Writes an SEO-optimized tutorial
  6. Posts it to Ghost as a blog article

The whole thing runs from the terminal. Fully automated.

I built it in less than an hour using Dovetail-powered Claude Code. The original n8n workflow I was building to do the same thing was abandoned after hours of debugging.


Follow along

Everything I'm building is trying to solve the same fundamental problem: how do you get AI to be reliable without losing its flexibility?

Alfred does it with Talk Mode vs Work Mode.
Dovetail does it with enforced Git/Linear workflows.
Joe does it with structured API endpoints and memory caching.
Jig does it with database-orchestrated, intelligence-agnostic execution.

Same problem, different angles.


I'll be posting weekly updates here on the Lumberjack. Shorter. More frequent. Less theory, more "here's what I built this week."

If you have questions, I'm always an email away.