David Ondrej's new Hermes Agent walkthrough makes a big claim: Hermes just became the smartest agent in the world. I would phrase it more carefully. Hermes did not magically become one model that knows everything. It added a better routing pattern for hard work.
That pattern is Mixture of Agents: ask multiple models for their independent views, then let one aggregator model turn those views into the final answer and tool-using action. For builders, this matters because the next leap in agent quality may not come only from waiting for a bigger model. It may come from better orchestration.
Source Note
Credit for the video walkthrough goes to David Ondrej. The transcript walks through a Hermes setup with OpenRouter, GLM-5.2, GPT-5.5, Kimi/K2-style coding models, Opus 4.8, and an aggregator model inside a Hermes Mixture of Agents preset.
The factual spine for this article is Hermes' official Mixture of Agents documentation. Hermes says MoA is a virtual model provider: reference models run first, and the aggregator is the acting model that writes the response and emits tool calls.
Link Map
| Resource | Use it for | Builder note |
|---|---|---|
| David Ondrej video | Hands-on Hermes MoA walkthrough. | Useful as a setup and workflow demo; treat performance claims as commentary. |
| Hermes MoA docs | Official explanation of MoA presets. | Start here for how Hermes actually implements the feature. |
| Hermes CLI commands | `hermes moa` command reference. | Useful when configuring presets instead of only clicking in a UI. |
| Configuring models | Main and auxiliary model slots. | MoA is stronger when auxiliary slots and model routing are not an afterthought. |
| OpenRouter docs | Route to many models through one API. | Good for trying GLM, OpenAI, Anthropic, Kimi, DeepSeek, and other providers together. |
| Vercel AI Gateway | Unified model access, budgets, usage monitoring, load balancing, and fallbacks. | Useful if your stack already runs through Vercel or AI SDK workflows. |
| Mixture-of-Agents paper | Research background for collaborative LLM outputs. | Shows why multiple model outputs can outperform one model in some evaluation settings. |
| Together MoA repo | Reference implementation and citation trail. | Useful if you want the broader MoA idea outside Hermes. |
The Main Idea
The old question was: "Which single model is best?" The better question is now: "Which models should be allowed to advise this task, and which model should be trusted to act?"
A Mixture of Agents setup lets you use one model for backend reasoning, another for UI taste, another for skeptical review, another for long-context synthesis, and then let an aggregator make the final call. That is not always better. It is better when the task benefits from disagreement, cross-checking, and specialized model strengths.
What MoA Is
Hermes' official docs define MoA as a selectable provider in the model system. Each named preset appears like a model. When the preset is selected, Hermes sends the prompt to the reference models first. Those outputs are then passed to the aggregator. The aggregator writes the final response and emits tool calls.
That last sentence matters. Hermes keeps the normal agent loop: tools, follow-up iterations, interrupts, transcripts, and session context. MoA does not replace the agent shell. It changes the thinking layer inside the shell.
| Part | Role | Practical example |
|---|---|---|
| Reference models | Produce independent analysis. | GLM-5.2, GPT-5.5, Kimi/K2-style coding model, Opus 4.8. |
| Aggregator | Chooses, combines, writes, and acts. | A stronger or more trusted model with lower temperature. |
| Hermes loop | Runs tools, memory, sessions, and follow-up work. | Same agent workflow, but with a deeper reasoning pass. |
MoA vs MoE
David correctly highlights a common confusion: Mixture of Agents is not Mixture of Experts.
Mixture of Experts is a model architecture. One model has many internal expert subnetworks, and only some are active for a token. GLM-5.2 and other modern MoE-style models are discussed in that architectural category.
Mixture of Agents is an orchestration pattern. Several separate models or agents produce candidate reasoning, and one aggregator model combines the useful parts.
How Hermes Does It
Hermes makes MoA feel like a normal model picker option. That is the clever product decision. If MoA had to be a separate script, most people would not use it. In Hermes, a preset can show up as a selectable model under the MoA provider.
The video shows a practical pattern:
- Set up Hermes on a machine or VPS.
- Connect model providers through OpenRouter or another gateway.
- Create a MoA preset with multiple reference models.
- Choose an aggregator model that should make the final call.
- Use MoA for hard tasks, not everyday trivia.
- Monitor cost and latency because every reference model adds spend.
In the transcript, David uses OpenRouter and also mentions Vercel AI Gateway as a route for model access. Both fit the same broader pattern: a model gateway makes multi-provider experimentation much easier than juggling every provider separately.
When To Use It
MoA should be reserved for work where model disagreement is valuable.
| Good MoA task | Why it helps |
|---|---|
| Hard debugging | Different models may spot different failure modes. |
| Architecture planning | One model can optimize for speed, another for maintainability, another for risk. |
| Security hardening | Multiple reviews can reduce blind spots before human review. |
| Large refactors or migrations | Reference models can challenge assumptions before tool calls happen. |
| Code review | The aggregator can synthesize several review perspectives into one checklist. |
Bad MoA tasks are simple lookups, tiny edits, low-risk drafts, formatting, routine summaries, and anything where speed matters more than depth.
Cost And Control
More models means more tokens, more latency, and more provider surface area. That is the cost of the "smarter" pattern.
A safe MoA setup should include:
- API spend limits per provider or gateway.
- A cheap default model for normal work.
- MoA presets only for named hard workflows.
- Logs showing which models were used.
- Human review before deployments, purchases, external messages, or destructive file operations.
- Provider separation when sensitive data should not leave a trusted boundary.
The biggest mistake would be turning MoA on for every prompt. That is like calling a board meeting to choose lunch. Sometimes you need the board. Most of the time, you need a default operator.
Workflow Pattern
The strongest pattern in the video is not only MoA. It is agent managing agent.
David uses one agent to set up, monitor, and steer another agent. That becomes important as tasks get long. A manager agent can poll status, notice stalls, summarize progress, restart failed runs, and keep a human informed without burning attention.
This is the practical architecture:
| Layer | Job |
|---|---|
| Manager agent | Starts the run, monitors progress, asks for short status updates, notices stalls. |
| Hermes MoA preset | Uses reference models and an aggregator for harder decisions. |
| Tool layer | Runs shell, reads files, writes code, deploys, or uses project tools. |
| Human review | Approves risk, taste, external actions, spend, and production changes. |
Builder Checklist
Before using Hermes MoA on real work, set it up like this:
- Create one cheap default model route for normal Hermes work.
- Create one MoA preset for code review, not five presets you will forget.
- Pick reference models with different strengths, not four copies of the same bias.
- Use a lower temperature for the aggregator than for brainstorming references.
- Add gateway spend limits before the first serious test.
- Log model usage, cost, tool calls, files changed, and final decisions.
- Keep MoA away from secrets until provider boundaries are clear.
- Require human approval for production, payment, deployment, deletion, and customer-facing work.
If MoA gives you better answers on the hard 10% of tasks, it is useful. If it becomes a fancy way to make every prompt slower and more expensive, it is just ceremony.
Sources
- David Ondrej video: Hermes Agent Mixture of Agents walkthrough
- Hermes Agent product page
- Hermes Agent GitHub
- Hermes Agent documentation
- Hermes docs: Mixture of Agents
- Hermes CLI command reference
- Hermes docs: Configuring models
- Hermes docs: Profiles
- Hermes docs: Tools and toolsets
- Hermes docs: Memory providers
- OpenRouter quickstart
- OpenRouter models
- Vercel AI Gateway docs
- Vercel AI Gateway models
- Mixture-of-Agents paper
- Together MoA GitHub repo