The video below looks like a list of separate agent updates: Codex Sites, Cursor Canvases, DeepSeek V4, Apple Messages, Hermes Desktop, Anthropic, and Microsoft. My read is that these are not separate stories. They are all signs of the same race.
The winning agent platform will not be just the smartest model. It will be the place where work actually happens: where an agent can read context, use tools, build a small app, store state, share output, run on a schedule, and stay inside review boundaries.
Do not pick an agent platform by benchmark alone. Pick the surface where the agent can read, act, store state, share output, and stay reviewable.
Source note
The supplied video is a fast-moving commentary source. I am treating it that way. The factual spine comes from OpenAI's Codex Sites documentation, Cursor's public Canvas notes, DeepSeek's pricing docs, TechCrunch reporting on Poke and Apple Messages for Business, the official Hermes Desktop docs, Anthropic's research post on AI building AI, and Microsoft's Build 2026 announcement.
That matters because several claims in the video are best understood as predictions or early interpretations. "Apple agent store" is not a confirmed Apple product. "DeepSeek catches Opus" depends on the workload. "Replit killer" and "Lovable killer" are social-media shorthand, not a useful technical analysis.
The pattern: agents need a surface
Early agent tools were mostly chat boxes plus tool calls. That was enough to prove the concept, but not enough to run real work.
Real work needs more surfaces:
- A context surface where the agent can understand files, conversations, tickets, docs, emails, and decisions.
- A tool surface where the agent can act through approved plugins, connectors, skills, browsers, terminals, APIs, or local apps.
- An artifact surface where outputs become dashboards, reports, internal tools, canvases, tasks, prototypes, or reviewable files.
- A memory surface where the work can persist beyond one chat.
- A review surface where humans approve changes, deploys, sends, deletes, purchases, and production actions.
That is the agent super-app race. It is not "who has the prettiest chat UI?" It is "who becomes the place where agent work can be done end to end without becoming unsafe or chaotic?"
Codex Sites: hosted internal tools, not just vibe coding
The headline feature in the video is Codex Sites. OpenAI's docs describe Sites as a way for Codex to create, save, deploy, and inspect websites, web apps, and games hosted by OpenAI. The docs also make one point builders should not miss: every Sites deployment URL is a production deployment, so if you want review first, ask Codex to save a version without deploying it.
That is the practical difference between "vibe coding" and a work platform. Sites gives Codex somewhere to put the output. Plugins and skills give Codex ways to operate the output later. Review gates keep the output from becoming a messy live deployment before anyone checks it.
The video example is useful: ask Codex to build a dashboard from email and Slack updates, or a Kanban board backed by a database. The interesting bit is not the board. It is the loop:
- The agent reads relevant business context.
- The agent builds a small internal app.
- The app persists state.
- Another agent thread or skill can update the app later.
- The team can review or use the result.
That is why Codex Sites matters more than "make me a landing page." A landing page is a page. A Codex Site can become a small operational surface.
| Use Codex Sites For | Why It Fits |
|---|---|
| Internal dashboards | The site can live close to Codex threads, connected context, and agent-maintained records. |
| Review boards | Agents can collect items, score them, and ask humans to approve the next action. |
| Agent-operated utilities | Skills can define safe operations like add, update, move, archive, or summarize. |
| Team-specific tools | Access, storage, secrets, and review become part of the deployment habit. |
The safe play: save versions, review changes, confirm access, inspect storage choices, and deploy intentionally.
Cursor Canvases: agent-created artifacts and team review
Cursor is moving in a parallel direction with Canvases. Cursor's own changelog says agents can create interactive artifacts like dashboards, reports, and internal tools that teams can share. The Canvas blog describes canvases as durable artifacts in the Agents Window beside the terminal, browser, and source control.
That framing is important. Cursor is not only trying to make code editing faster. It is trying to make agent output more legible.
A canvas can be:
- a PR review interface,
- an incident response dashboard,
- an eval analysis surface,
- an architecture diagram,
- a context usage report,
- or a custom mini-tool for a team workflow.
That is not the same as Codex Sites. Codex Sites is closer to "host this app as a real site." Cursor Canvases are closer to "make this agent output interactive, durable, and shareable inside the development workflow."
Best when the agent needs to create a hosted app or internal tool that can store state and be operated later.
Best when the agent needs to turn complex work into an interactive artifact for review, debugging, explanation, or team collaboration.
The shared pattern is bigger than the feature names: agents are no longer expected to answer only in text. They are expected to create surfaces.
DeepSeek V4: cheap tokens change the consumer-agent math
The video asks whether DeepSeek V4 is getting close to Opus 4.8. I would phrase the useful question differently: is DeepSeek cheap and capable enough to change where agents run?
DeepSeek's official pricing page lists DeepSeek-V4-Flash and DeepSeek-V4-Pro with 1M context, tool calls, JSON output, and much lower per-token pricing than frontier premium models. As checked during writing, DeepSeek-V4-Pro lists cache-miss input at $0.435 per 1M tokens and output at $0.87 per 1M tokens; Flash is lower. DeepSeek also says prices may vary, so check the live pricing page before building a business model around it.
A quick 1M input plus 1M output comparison
The screenshot-style comparison from the video is directionally right if we use a simple blended example: 1M input tokens plus 1M output tokens, standard API pricing, and DeepSeek V4-Pro cache-miss input pricing. Under that assumption, the math looks like this:
| Model | Input + output assumption | Cost | Compared to DeepSeek V4-Pro |
|---|---|---|---|
| DeepSeek V4-Pro | $0.435 input + $0.87 output | $1.305, rounded to $1.30 | 1x |
| Claude Opus 4.8 | $5 input + $25 output | $30 | About 23x |
| GPT-5.5 | $5 input + $30 output | $35 | About 26.8x |
That is a useful price-pressure signal, but it is not the whole story. Cached input, long-context multipliers, fast modes, regional processing, batch pricing, retries, tool calls, and failed runs can change the real bill. More importantly, a cheaper model can still be more expensive if it takes more attempts to finish the job.
That pricing matters because consumer agents are extremely sensitive to cost. A personal assistant that monitors messages, keeps memory, searches, drafts, summarizes, updates small apps, and runs all day can burn a lot of tokens.
But lower token cost is not the same as lower cost per completed task.
- If the model needs more retries, cheap tokens get less cheap.
- If tool calling is less reliable, the workflow may need more supervision.
- If long context degrades on your real data, 1M context is not enough on its own.
- If latency hurts the experience, users will not care that the tokens were cheaper.
- If outputs need more review, the human cost moves somewhere else.
DeepSeek V4 is worth testing for research, summarization, monitoring, triage, first drafts, cheap parallel scans, and consumer-agent loops. I would not switch serious coding, finance, customer communications, or production operations on pricing alone.
Apple Messages and Poke: a signal, not an agent store yet
The video frames Apple as turning iMessage into an agent store. That may become true later, but the verified version is narrower.
TechCrunch reported on June 4, 2026 that Poke became the first AI agent approved to run on Apple's Messages for Business platform. It also reported that Apple Messages for Business had not previously been open to standalone third-party AI agents. That is significant, but it is not the same as Apple announcing a general-purpose consumer agent store.
The interesting pattern is messaging as the agent interface. Most people will not manage agents from a terminal. They will text them. The agent that wins in consumer workflows may not be the one with the richest desktop app. It may be the one that feels like texting a capable assistant.
That creates three requirements:
- Identity: users must know whether they are talking to a human, an AI agent, or a business.
- Approval: the agent needs clear boundaries before scheduling, purchasing, messaging, or modifying accounts.
- Fallback: live support or human handoff matters when the agent gets stuck or the action is sensitive.
So yes, Apple Messages is a big surface to watch. Just keep the claim precise: Poke's approval is an early platform signal, not proof that Apple has launched a full agent marketplace.
Hermes Desktop: the open super-app alternative
Hermes Desktop is the part of the story that feels different. Codex and Cursor are racing to own hosted, team, and development surfaces. Hermes is trying to give power users a flexible agent home across models, profiles, skills, sessions, cron jobs, messaging, files, and local work.
The official Hermes Desktop docs say the desktop app is built around the same agent as the CLI and gateway: same config, API keys, sessions, skills, and memory. It runs on macOS, Windows, and Linux, and exposes management panes for skills, cron, profiles, messaging, agents, and command center surfaces.
That makes Hermes interesting for a different type of builder:
- someone who wants local or open model flexibility,
- someone who wants multiple profiles instead of one vendor account,
- someone who wants scheduled jobs and messaging channels visible in one desktop app,
- someone who wants to experiment with agent workflows before betting on one closed platform.
Hermes is not the same category as Codex Sites. It does not automatically solve hosted app sharing, enterprise governance, or team review. But it may be the right control surface if your priority is flexibility, profiles, cron, and model routing.
A better desktop app does not automatically make agent work safe. You still need scoped tools, credentials, logs, approvals, and clear rules for what the agent may never do without review.
Microsoft's signal: Windows and enterprise agents
Microsoft is pushing from the opposite end of the stack: operating system, enterprise context, and governance.
In its Build 2026 announcement, Microsoft described an Agent Platform powered by Microsoft IQ, with Work IQ APIs for enterprise context and Microsoft Scout as a personal work agent for Frontier customers. It also highlighted Microsoft Execution Containers, a Windows sandboxing layer for agents in preview, and a GitHub Copilot app for native desktop agentic development.
The pattern is the same again: agents need context, tools, execution surfaces, sandboxes, and review. Microsoft is just attacking it from the enterprise OS and Microsoft 365 side.
This matters because the agent super-app may not be one app. It may be a stack:
- Codex or Cursor for building and reviewing software artifacts.
- Hermes for open local profiles, scheduled jobs, and model-flexible workflows.
- Microsoft 365 and Windows for enterprise context, identity, OS-level control, and internal deployment.
- Apple Messages or similar messaging surfaces for consumer access.
The real competition is not one sidebar versus another sidebar. It is who owns the trusted lane between intent and action.
What builders should do now
If you are building agent workflows this quarter, do not chase every platform headline. Map the workflow layer you actually need.
- If the output should become a hosted internal tool, test Codex Sites. Save before deploy, define storage, and create a skill for safe updates.
- If the output should help a team review or understand work, test Cursor Canvases. Use it for dashboards, reports, PR review, eval analysis, and context usage review.
- If cost is blocking always-on agents, test DeepSeek V4 on narrow workflows. Measure completed-task cost, not only token price.
- If the user interface is texting, watch Poke and Apple Messages for Business. Build approval, identity, and fallback rules from the start.
- If you want a model-flexible desktop control center, test Hermes Desktop. Start with one profile, one scheduled task, and one review queue.
- If the workflow touches enterprise systems, watch Microsoft. Context, identity, governance, and OS sandboxing will matter more than raw model taste.
My practical checklist for any agent super-app decision:
- Context: what can the agent read, and where does that context come from?
- Actions: what can the agent change, send, deploy, delete, spend, or publish?
- State: where does the work persist after the chat ends?
- Sharing: who can see the output, and is it a draft or production surface?
- Review: where does a human approve sensitive actions?
- Logs: can you inspect what happened later?
- Cost: what is the cost per completed workflow, including retries and human review?
That is the less flashy version of the agent super-app race. The winner is not only the tool with the best demo. It is the tool that makes useful work repeatable without making the business blind.
Sources
This post uses the supplied transcript for Agent Native Ep. 3 as commentary inspiration, then checks claims against the sources below.
- OpenAI Codex docs: Sites
- OpenAI Codex docs: Plugins
- OpenAI Codex docs: Skills
- Cursor changelog: Canvas Design Mode and Context Usage Report
- Cursor blog: Interact with agent-created visualizations in canvases
- DeepSeek API docs: Models and Pricing
- OpenAI API docs: GPT-5.5 model pricing
- Anthropic Institute: When AI builds itself
- TechCrunch: Apple approves Poke as the first AI agent on Messages for Business
- Apple Messages for Business
- Hermes Agent docs: Desktop App
- NousResearch/hermes-agent on GitHub
- Microsoft Build 2026: Be yourself at work
The short version: Codex, Cursor, Hermes, Microsoft, Apple, and DeepSeek are all pointing toward the same thing: agents need a durable work surface. The model matters. The surface around the model is what turns it into work.