AI Agent Architecture

Hermes Mixture of Agents: The Smartest Agent Is a Routing Pattern

David Ondrej's new Hermes Agent walkthrough makes a big claim: Hermes just became the smartest agent in the world. I would phrase it more carefully. Hermes did not magically become one model that knows everything. It added a better routing pattern for hard work.

That pattern is Mixture of Agents: ask multiple models for their independent views, then let one aggregator model turn those views into the final answer and tool-using action. For builders, this matters because the next leap in agent quality may not come only from waiting for a bigger model. It may come from better orchestration.

JQ AI SYSTEMS take: Mixture of Agents is not for cheap prompts. Use it as a review board for difficult tasks: architecture, migrations, security hardening, hard debugging, and decisions where one model's blind spot is expensive.

Video credit: David Ondrej. This post uses the supplied transcript as commentary and checks the core mechanics against Hermes Agent documentation, OpenRouter, Vercel AI Gateway, and the Mixture-of-Agents research paper.

Source Note

Credit for the video walkthrough goes to David Ondrej. The transcript walks through a Hermes setup with OpenRouter, GLM-5.2, GPT-5.5, Kimi/K2-style coding models, Opus 4.8, and an aggregator model inside a Hermes Mixture of Agents preset.

The factual spine for this article is Hermes' official Mixture of Agents documentation. Hermes says MoA is a virtual model provider: reference models run first, and the aggregator is the acting model that writes the response and emits tool calls.

Resource Use it for Builder note
David Ondrej video Hands-on Hermes MoA walkthrough. Useful as a setup and workflow demo; treat performance claims as commentary.
Hermes MoA docs Official explanation of MoA presets. Start here for how Hermes actually implements the feature.
Hermes CLI commands `hermes moa` command reference. Useful when configuring presets instead of only clicking in a UI.
Configuring models Main and auxiliary model slots. MoA is stronger when auxiliary slots and model routing are not an afterthought.
OpenRouter docs Route to many models through one API. Good for trying GLM, OpenAI, Anthropic, Kimi, DeepSeek, and other providers together.
Vercel AI Gateway Unified model access, budgets, usage monitoring, load balancing, and fallbacks. Useful if your stack already runs through Vercel or AI SDK workflows.
Mixture-of-Agents paper Research background for collaborative LLM outputs. Shows why multiple model outputs can outperform one model in some evaluation settings.
Together MoA repo Reference implementation and citation trail. Useful if you want the broader MoA idea outside Hermes.

The Main Idea

The old question was: "Which single model is best?" The better question is now: "Which models should be allowed to advise this task, and which model should be trusted to act?"

A Mixture of Agents setup lets you use one model for backend reasoning, another for UI taste, another for skeptical review, another for long-context synthesis, and then let an aggregator make the final call. That is not always better. It is better when the task benefits from disagreement, cross-checking, and specialized model strengths.

What MoA Is

Hermes' official docs define MoA as a selectable provider in the model system. Each named preset appears like a model. When the preset is selected, Hermes sends the prompt to the reference models first. Those outputs are then passed to the aggregator. The aggregator writes the final response and emits tool calls.

That last sentence matters. Hermes keeps the normal agent loop: tools, follow-up iterations, interrupts, transcripts, and session context. MoA does not replace the agent shell. It changes the thinking layer inside the shell.

Part Role Practical example
Reference models Produce independent analysis. GLM-5.2, GPT-5.5, Kimi/K2-style coding model, Opus 4.8.
Aggregator Chooses, combines, writes, and acts. A stronger or more trusted model with lower temperature.
Hermes loop Runs tools, memory, sessions, and follow-up work. Same agent workflow, but with a deeper reasoning pass.

MoA vs MoE

David correctly highlights a common confusion: Mixture of Agents is not Mixture of Experts.

Mixture of Experts is a model architecture. One model has many internal expert subnetworks, and only some are active for a token. GLM-5.2 and other modern MoE-style models are discussed in that architectural category.

Mixture of Agents is an orchestration pattern. Several separate models or agents produce candidate reasoning, and one aggregator model combines the useful parts.

Short version: MoE is inside the model. MoA is around the models.

How Hermes Does It

Hermes makes MoA feel like a normal model picker option. That is the clever product decision. If MoA had to be a separate script, most people would not use it. In Hermes, a preset can show up as a selectable model under the MoA provider.

The video shows a practical pattern:

  1. Set up Hermes on a machine or VPS.
  2. Connect model providers through OpenRouter or another gateway.
  3. Create a MoA preset with multiple reference models.
  4. Choose an aggregator model that should make the final call.
  5. Use MoA for hard tasks, not everyday trivia.
  6. Monitor cost and latency because every reference model adds spend.

In the transcript, David uses OpenRouter and also mentions Vercel AI Gateway as a route for model access. Both fit the same broader pattern: a model gateway makes multi-provider experimentation much easier than juggling every provider separately.

When To Use It

MoA should be reserved for work where model disagreement is valuable.

Good MoA task Why it helps
Hard debugging Different models may spot different failure modes.
Architecture planning One model can optimize for speed, another for maintainability, another for risk.
Security hardening Multiple reviews can reduce blind spots before human review.
Large refactors or migrations Reference models can challenge assumptions before tool calls happen.
Code review The aggregator can synthesize several review perspectives into one checklist.

Bad MoA tasks are simple lookups, tiny edits, low-risk drafts, formatting, routine summaries, and anything where speed matters more than depth.

Cost And Control

More models means more tokens, more latency, and more provider surface area. That is the cost of the "smarter" pattern.

A safe MoA setup should include:

  • API spend limits per provider or gateway.
  • A cheap default model for normal work.
  • MoA presets only for named hard workflows.
  • Logs showing which models were used.
  • Human review before deployments, purchases, external messages, or destructive file operations.
  • Provider separation when sensitive data should not leave a trusted boundary.

The biggest mistake would be turning MoA on for every prompt. That is like calling a board meeting to choose lunch. Sometimes you need the board. Most of the time, you need a default operator.

Workflow Pattern

The strongest pattern in the video is not only MoA. It is agent managing agent.

David uses one agent to set up, monitor, and steer another agent. That becomes important as tasks get long. A manager agent can poll status, notice stalls, summarize progress, restart failed runs, and keep a human informed without burning attention.

This is the practical architecture:

Layer Job
Manager agent Starts the run, monitors progress, asks for short status updates, notices stalls.
Hermes MoA preset Uses reference models and an aggregator for harder decisions.
Tool layer Runs shell, reads files, writes code, deploys, or uses project tools.
Human review Approves risk, taste, external actions, spend, and production changes.

Builder Checklist

Before using Hermes MoA on real work, set it up like this:

  1. Create one cheap default model route for normal Hermes work.
  2. Create one MoA preset for code review, not five presets you will forget.
  3. Pick reference models with different strengths, not four copies of the same bias.
  4. Use a lower temperature for the aggregator than for brainstorming references.
  5. Add gateway spend limits before the first serious test.
  6. Log model usage, cost, tool calls, files changed, and final decisions.
  7. Keep MoA away from secrets until provider boundaries are clear.
  8. Require human approval for production, payment, deployment, deletion, and customer-facing work.

If MoA gives you better answers on the hard 10% of tasks, it is useful. If it becomes a fancy way to make every prompt slower and more expensive, it is just ceremony.

CTA: Do not turn MoA into your default brain. Build one Hermes MoA preset for the workflow where mistakes are expensive, then measure whether the extra models actually improve the final result.

Sources

Common questions

What is Hermes Mixture of Agents?
Hermes describes Mixture of Agents as a virtual model provider. Reference models run first and provide analysis. An aggregator model then writes the assistant response and emits tool calls inside the normal Hermes agent loop.
Is Mixture of Agents the same as Mixture of Experts?
No. Mixture of Experts is a model architecture where only selected experts inside one model are active. Mixture of Agents is an orchestration pattern where multiple separate models produce outputs and an aggregator model combines them.
Does MoA make Hermes the smartest agent in the world?
That is a video/commentary claim, not something I would state as fact. The practical point is that MoA can improve hard-task answers by letting several models contribute, but it costs more, takes longer, and still needs verification.
When should builders use MoA?
Use it for hard debugging, architecture decisions, security review, migration planning, code review, and high-stakes synthesis. Do not use it for quick lookups, small edits, or cheap routine tasks.
What should a safe MoA setup include?
Use budget limits, provider keys with narrow scope, logs, bounded prompts, a cheaper default model, and human review before production changes, external messages, deployments, purchases, or infrastructure changes.
Share
X LinkedIn Reddit
Build Yours

Want a system
like this one?

Book a free 30-minute call. We map your situation, identify the highest-impact automation, and figure out if we are a fit.

Book Free 30-min Call