AI Agent Architecture

AI Vibe Coding Needs System Design, Not More Prompts

AI has made building applications feel easier. Ras Mic's useful warning is that most people are not really shipping applications yet. They are shipping prototypes that look convincing until real users, real data, real cost, and real failure modes show up.

That is why this video matters. It is not another "watch me vibe code" demo. It is a system-design walkthrough: clients, server, database, durable workflows, queues, service boundaries, auth, payments, observability, sandboxes, and the tradeoffs behind each choice.

JQ AI SYSTEMS take: AI can generate the code, but the builder still has to decide where the boundaries are. If you cannot describe the client, server, database, workflow, services, logging, auth, payments, and failure paths, you are not ready to trust the app with real users.

Video credit: Ras Mic / Rasmic. This post summarizes the video and adds the JQ AI SYSTEMS architecture checklist.


Source note

Credit to Ras Mic, also listed publicly as Michael Shimeles, who describes himself as a full-stack engineer running a software studio and AI consultancy. The article is based on the supplied transcript, Ras Mic's public links, and official docs for the major tools discussed in the video.

The practical focus here is not "copy this exact stack." Ras Mic is sharing how he thinks about his app Pluto. The useful part for JQ AI SYSTEMS readers is the decision model: pick abstractions that are production-capable, agent-readable, observable, and easy enough to debug when AI-generated code gets messy.


Layer Link Why it matters
Original video Ras Mic system design overview The source walkthrough for Pluto's architecture and the production-app lesson.
Creator credit Ras Mic link hub, official site Credit and context for Ras Mic / Michael Shimeles.
Durable workflows Convex Workflow Long-running flows with retries, delays, state persistence, and continuation across interruptions.
Queues Convex Workpool Prioritized queues, concurrency limits, retries, and completion handling for async work.
Agent backend Convex AI Agent component Message history, vector search, and long-running agent workflows tied to a reactive database.
Component model Convex Components docs Reusable backend packages with isolated tables and app-facing APIs.
Monorepo deployment Vercel monorepos, Turborepo One repository can hold web, mobile, desktop, backend, shared packages, and services.
Web app layer SvelteKit Ras Mic's chosen web framework for the client surface in the video.
Error handling Effect Typed error handling, retries, concurrency, logging, tracing, and robust TypeScript patterns.
Agent compute Daytona Sandbox infrastructure for AI-generated code and agent workflows.
Auth and enterprise readiness WorkOS Useful when B2B auth, organizations, SSO, and enterprise access controls matter.
Observability Sentry, PostHog Errors, product analytics, and visibility into what users and systems are doing.
Model routing OpenRouter One way to route inference across models while keeping provider choice flexible.

The Main Lesson

The video opens with a blunt distinction: building prototypes with AI is easy; shipping production apps is still hard. A prototype can survive on vibes. A real app needs boundaries.

Ras Mic frames system design as planning the architecture, components, modules, interfaces, and data flow so the system meets real requirements. The AI-era version is less about drawing AWS diagrams for sport and more about deciding what should be abstracted, what should be custom, what should persist, what should retry, and what should be reviewed by a human.

The biggest shift is that the stack is no longer chosen only for the human developer experience. It is also chosen for the agent experience. If the backend is code, the database schema is code, integrations are installed as components, and the monorepo gives the agent full context, then AI can make safer, more coherent changes.

Prototype thinking says:

  • Can the app demo the idea?
  • Can I make the screen look good?
  • Can I get one happy-path flow working?
  • Can I show it on X today?

System thinking says:

  • What happens when ten people use it at once?
  • What keeps running when the tab closes?
  • Where does secret or paid work happen?
  • What retries safely and what must never retry?
  • Where do I see errors, usage, cost, and abuse?
  • Which parts can be changed by an agent without breaking everything?

System Design Is Tradeoffs

One of the healthiest points in the video is that engineering is tradeoffs. There is no perfect stack. There is only a stack that fits your current constraints better than the alternatives.

Ras Mic names four constraints that matter for production apps:

Constraint Question to ask AI-era failure mode
Scalability If usage spikes, can I add capacity without redesigning the app? The app works for the demo, then falls over when ten real users hit the same path.
Reliability Which provider or service is allowed to be a single point of failure? An agent depends on a fragile API call and silently stops halfway through work.
Performance What has to feel instant, and what can run in the background? A long model call blocks the UI because the workflow was not separated from the client.
Cost What cost grows with every user, every job, or every generated token? The builder optimizes for launch speed and discovers unit economics too late.

This is the part many vibe-coded apps skip. AI makes the first version cheap. It does not make the operating model free.


The AI App Architecture Map

Ras Mic breaks Pluto into a few layers. The exact tools are his choices, but the map is broadly useful.

Layer Example from the video JQ AI SYSTEMS interpretation
Client surfaces Web app, desktop app, mobile app, admin site Name every place users or operators interact with the system.
Monorepo Turborepo-style structure Keep related surfaces close enough that the agent can inspect shared contracts and components.
Control plane Convex as backend, database, and source of truth Use one reliable place for state, realtime updates, and application decisions.
Durable workflows Convex Workflow component Run long work outside the tab, with retry and state persistence.
Queues and parallel work Workpool-style async operations Separate urgent work from background work and cap concurrency.
Special services iMessage service, inference/payments service Split out code that has its own lifecycle, risk, team owner, or data model.
Auth WorkOS B2B apps need organization-aware auth, not just "sign in with Google."
Payments and credits Autumn/Stripe-style billing logic Payments should be explicit, logged, and reviewed before autonomy increases.
Agent sandbox Daytona If agents run code or browse, give them an isolated work environment.
Observability Sentry and PostHog If you cannot see errors and user behavior, you cannot operate the system.

The common thread is not "use Convex for everything." It is: choose tools that reduce accidental complexity without hiding the parts you need to reason about.


Durable Workflows

The Convex Workflow page describes workflows as long-running code flows with retries, delays, and state persistence across interruptions. That maps directly to AI-agent apps.

If a user asks an agent to research, create, browse, call tools, write files, or generate a report, the task should not die because the user closes a tab. The UI should be a window into work, not the thing holding the work alive.

A durable workflow gives you:

  • Continuation: the work can resume from a known point.
  • Retries: transient failures can be retried deliberately.
  • State: each step can record progress and output.
  • Cancellation: users or operators can stop work cleanly.
  • Visibility: the UI can show progress without blocking.

The caution: retries are not magic. A model call can retry safely. A payment capture, email send, or external side effect needs idempotency keys, logs, and often human approval.


Service Boundaries

Ras Mic separates iMessage and inference/payments into their own services. The deeper lesson is service boundary discipline.

A feature might deserve its own service when:

  • it has a narrow responsibility;
  • it touches a risky external system;
  • it has a separate data model;
  • it may need a dedicated owner later;
  • it should fail without taking down the whole product;
  • it needs stronger error handling or observability than normal app code.

This is where AI builders need restraint. Do not split everything into microservices because it sounds mature. But do not dump payments, model routing, message bridges, and long-running jobs into one giant backend file either.

The healthy pattern is boring: start together, split when the boundary is real, and document the contract so agents can work across it safely.


Observability and Payments

Once money, credits, organizations, or external messages enter the system, the app is no longer a toy. Ras Mic names tools like Sentry and PostHog because production systems need evidence.

In the stack Ras Mic shared, Sentry is the obvious place to start for error monitoring and exception visibility, while PostHog is the product-analytics layer for understanding user behavior, funnels, feature usage, and where people get stuck.

For an AI app, observability should answer:

  • Which user triggered this job?
  • Which model or provider ran?
  • How many tokens or credits did it spend?
  • Which external tools did it call?
  • Which step failed?
  • Did the failure retry, pause, cancel, or continue?
  • Was a human approval required before the external action?

Payment systems need even more care. If credits map to agent actions, model calls, or cards, then your ledger matters. Every balance change should have a reason, timestamp, actor, and idempotency story.

Rule of thumb: Never let a coding agent "just wire up billing" without a human review path. Billing is not only UI. It is contracts, states, retries, refunds, taxes, abuse, and support.

Agent-Friendly Infrastructure

A subtle point in the video is that modern infrastructure choices are also agent choices. If everything is hidden behind dashboards and manual setup, the AI agent cannot reason about the system. If the backend, schema, workflows, and components are represented as code, the agent can inspect and modify them.

Agent-friendly infrastructure tends to have:

  • clear file structure;
  • one repo or a well-documented repo map;
  • typed APIs and generated types;
  • repeatable local commands;
  • tests or smoke checks the agent can run;
  • docs links inside the repo;
  • known service boundaries;
  • logs that explain failures in plain language.

That is why a monorepo can be powerful in an AI-built product. The benefit is not only human convenience. It gives the agent a larger coherent context window over the product.

The risk is also real. One large repo means the agent can touch more surface area. That is why the monorepo should come with task scopes, review queues, tests, and approval rules.


Prompt Pack

Use these before asking an AI agent to build a production feature.

1. Architecture Map Prompt

Act as a senior system designer.

I am building: [describe the app].

Create a production architecture map with:
- client surfaces
- server/backend layer
- database and source of truth
- long-running workflows
- queues or background jobs
- external APIs
- auth and organizations
- billing or credits
- observability
- admin/operator tools
- failure paths

For each layer, tell me:
1. What belongs there.
2. What must never happen there.
3. What can fail.
4. What logs or tests I need before launch.

2. Durable Workflow Audit

Review this feature as a durable workflow.

Feature: [describe the user task].

Identify:
- steps that can run synchronously
- steps that must run in the background
- steps that need retries
- steps that must not retry automatically
- state that must be persisted after each step
- cancellation rules
- progress updates the UI should show
- human approvals required before external actions

3. Service Boundary Audit

Look at this planned feature and decide whether it should live inside the main backend or become a separate service.

Score it across:
- risk
- data model independence
- external API dependence
- team ownership
- failure isolation
- cost sensitivity
- compliance or security needs
- expected growth

Give me a recommendation and the smallest boundary that would keep the system simple.

4. Agent-Friendly Repo Prompt

Prepare this codebase so an AI coding agent can safely work on it.

Create or update:
- repo map
- local setup instructions
- test commands
- architecture notes
- service boundaries
- environment variable examples without secrets
- common failure modes
- review checklist

Do not change product behavior yet. Only improve context and safety for future agent work.

5. Launch Risk Review

Before I launch this AI-built app, review the system for production risks.

Check:
- auth
- database access boundaries
- payments and credits
- retries and idempotency
- long-running jobs
- queue limits
- observability
- cost runaway risks
- abuse risks
- admin recovery paths

Return a table with severity, risk, evidence to collect, and the smallest fix.

Builder Checklist

Before you ask an AI agent to "build the app," answer these:

  • Users: Who uses the app, and through which surfaces?
  • State: What is the source of truth?
  • Boundaries: Does the client ever talk directly to sensitive data?
  • Workflows: Which jobs keep running after the tab closes?
  • Queues: What work needs priority, throttling, or concurrency limits?
  • Services: Which parts deserve separate boundaries?
  • Auth: Do you need personal auth, organizations, SSO, roles, or admin access?
  • Billing: Are you charging subscriptions, credits, usage, or a mix?
  • Observability: Where do errors, usage, model calls, and costs show up?
  • Agent access: What is the agent allowed to read, change, run, or deploy?
  • Review: Which actions require human approval?
  • Recovery: What do you do when a workflow gets stuck halfway?

CTA: Before you ask AI to build another feature, ask it to map the system. A better prompt will improve a screen. A better architecture will keep the product alive when real users arrive.


Sources

Common questions

What is the main lesson from Ras Mic's system design video?
The main lesson is that AI makes prototypes easier, but production apps still need architecture. Builders need to define client surfaces, server boundaries, database access, durable workflows, queues, service boundaries, observability, auth, payments, and cost tradeoffs before trusting an AI-generated app with real users.
Why does system design matter for AI-built apps?
AI can generate screens and features quickly, but it will not automatically choose safe boundaries, retry behavior, queue limits, billing logic, observability, or failure handling. Those choices determine whether the app survives real traffic and real users.
Why does Ras Mic recommend a monorepo for agent-built products?
In the video, Ras Mic argues that a monorepo gives the coding agent one codebase to inspect across web, mobile, desktop, backend, shared UI, and services. That reduces context fragmentation and makes cross-surface changes easier to reason about.
What is Convex Workflow used for?
Convex Workflow is a component for long-running code flows that need retries, delays, state persistence, and continuation across interruptions. In an AI app, that matters when agent work should continue even if the user closes the tab.
Should every AI app use Ras Mic's exact stack?
No. The useful lesson is the architecture pattern, not a mandate to copy every tool. Use tools your team can debug, monitor, pay for, and hand to an agent safely.
Share
X LinkedIn Reddit
Build Yours

Want a system
like this one?

Book a free 30-minute call. We map your situation, identify the highest-impact automation, and figure out if we are a fit.

Book Free 30-min Call