Can open source AI really not be shut off?

Only the local part is truly resilient. If you download a model and run it through Ollama on your own computer, that workflow can keep working without a provider account. NVIDIA NIM and OpenRouter are useful fallback routes, but they are still hosted services with terms, rate limits, and availability risk.

What is the easiest way to start running AI locally?

Install Ollama, pull a model such as Gemma 4 or Qwen3-Coder, and test simple chat, document, or code tasks. Ollama also documents a Claude Code integration using the ollama launch claude command.

What is NVIDIA NIM useful for?

NVIDIA NIM is useful when you want to test stronger hosted models through NVIDIA infrastructure before renting GPUs or buying hardware. It is good for experimentation, but not the same as owning the model locally.

OpenRouter gives one API key for many models and documents a direct Claude Code integration through ANTHROPIC_BASE_URL. It is useful for model routing, cost tests, and quickly trying GLM, Qwen, DeepSeek, and closed models without rebuilding your tooling.

Should I replace Claude or ChatGPT with local models?

No. Use local and open models for privacy, cheap iteration, notes, drafts, code search, and fallback workflows. Keep frontier models for tasks where the higher quality and reliability justify the cost.

Open Source AI You Can Keep Running: Ollama, NVIDIA NIM, OpenRouter, and GLM 5.2

The useful version of "open source AI they can't shut off" is not a slogan. It is a stack decision.

Pat Simmons' video is a quick-start tour through three layers: run a model on your own machine with Ollama, test stronger hosted open models through NVIDIA NIM, and route cheap or specialized models through OpenRouter. The practical lesson is simple: do not make every workflow depend on one frontier-model account.

JQ AI SYSTEMS take: Build a resilience stack. Use local models for private, repeatable, low-risk work. Use hosted open-model routes when your laptop is too slow. Use frontier models where the task actually needs them.

Video credit: Pat Simmons. This post uses the supplied transcript as commentary and checks the setup paths against Ollama, NVIDIA, OpenCode, OpenRouter, Z.ai, and Claude Code documentation.

Source Note

Credit for the walkthrough goes to Pat Simmons. You can also find him on X and at PerSimmons Studio.

The video frames the moment around restricted frontier-model access. I am going to keep the analysis narrower and more useful: what can a builder set up this week so private notes, code search, draft work, and low-risk agent tasks do not depend entirely on one cloud vendor?

The Main Idea

There are three different things people often mix together:

Layer	What it gives you	What can still go wrong
Local Ollama	Model files and inference on your own machine.	Your hardware limits speed, context, and model size.
NVIDIA NIM	Free development access to hosted models and a path to self-hosted NVIDIA inference.	Still a service. Limits, terms, model availability, and uptime can change.
OpenRouter	One API key for many open and closed models, plus useful Claude Code routing.	Still cloud. You are buying routing flexibility, not full ownership.

That distinction matters. The "can't shut off" layer is local. The hosted layers are still useful because they reduce single-provider dependency and make model routing easier.

Link Map

Here are the useful links from the stack, grouped by what you are trying to do.

Need	Start here	Why it matters
Run local models	Ollama, Ollama download, Ollama model library	The fastest beginner path to local model files, local API calls, and private experiments.
Use a local model in Claude Code	Ollama Claude Code integration, Claude Code settings, Claude Code env vars	Lets a local model run inside an agentic coding harness for file reading, editing, and tool use.
Test local starter models	Gemma 4 on Ollama, Google Gemma + Ollama docs, Qwen3-Coder on Ollama	Gemma is a good private multimodal starting point. Qwen3-Coder is a stronger coding-agent test.
Use stronger hosted open models	NVIDIA Build, NVIDIA model catalog, NVIDIA NIM	Good for experiments where your own machine is too slow or too small.
Run an open-source coding agent	OpenCode, OpenCode providers, OpenCode models, OpenCode config	Useful when you want a coding-agent harness that can connect to many providers, including local or hosted routes.
Route many models through one key	OpenRouter, OpenRouter Claude Code integration, Z.ai on OpenRouter	Best for swapping models per task without rebuilding your agent setup.
Try GLM 5.2	Z.ai GLM 5.2 release, GLM 5.2 docs, GLM 5.2 on Hugging Face	Relevant for cheaper coding-agent and long-context experiments, usually via hosted routing.
Search private notes	Obsidian, Obsidian Local REST API with MCP	Lets local or agent workflows search notes without uploading your whole vault to a chatbot.

Method 1: Ollama

If you want actual resilience, start with Ollama. It runs models on your own computer and exposes a local API. Ollama's own site now positions it as a way to run apps and agents with open models, including Claude Code and similar tools.

The path is beginner-friendly:

Install Ollama from the download page.
Pick a model from the model library.
Start with a small model for chat, notes, and private drafting.
Move to coding-oriented models like Qwen3-Coder when you want agent work.
Use Ollama's Claude Code integration if you want to launch Claude Code against a local model.

# Basic model chat
ollama run gemma4

# Coding-oriented local model
ollama run qwen3-coder:30b

# Launch Claude Code through Ollama's Anthropic-compatible route
ollama launch claude --model qwen3-coder

This is the part of the stack that can keep working if a hosted model is paused, a pricing plan changes, or an account gets rate-limited. The tradeoff is that your laptop is now the data center. Small models are fine. Large models will be slow or impossible without serious memory.

Where local Ollama is strongest

Searching private notes and local files.
Drafting and rewriting non-sensitive text.
Classifying documents or CSVs.
Running small coding-agent tasks in a sandbox.
Keeping an offline fallback for travel, outages, or gated frontier access.

Where it is weak

Big UI builds where design taste matters.
Long-horizon refactors with many hidden dependencies.
High-stakes legal, medical, finance, security, or production changes.
Workflows that need very large context on modest hardware.

Method 2: NVIDIA NIM

NVIDIA's Build site and model catalog give developers API access to many models. NVIDIA also describes NIM as optimized inference microservices that can run on NVIDIA GPUs in workstations, data centers, or cloud.

In the video, Pat uses NVIDIA-hosted models through a coding-agent workflow. The benefit is obvious: you can test beefier models without buying a GPU. The caveat is just as important: this is not ownership. It is a generous hosted development route that can have limits, outages, model changes, or policy changes.

For OpenCode users, the relevant research links are:

OpenCode: open-source coding agent.
OpenCode providers: connect providers and API keys.
OpenCode models: configure provider and model selection.
OpenCode config: project and global config locations.
AI SDK NVIDIA NIM provider: useful because NIM exposes an OpenAI-compatible API shape.

My recommendation: use NIM as a testing lane. If a workflow proves valuable, then decide whether to keep using hosted APIs, rent GPUs, or buy hardware.

Method 3: OpenRouter

OpenRouter is the practical model-routing layer in this stack. It gives you one account and one API key for many models. The official Claude Code integration says that setting ANTHROPIC_BASE_URL to https://openrouter.ai/api lets Claude Code speak its native protocol directly to OpenRouter.

This is why the setup is useful for builders. You can test GLM 5.2, Qwen, DeepSeek, Anthropic, OpenAI, Gemini, and other models through one harness, then decide which one belongs on which task.

The important caveat: OpenRouter is still cloud. It is a flexibility layer, not a local-ownership layer. Do not send private client data or secrets through it unless your policies, contracts, and risk tolerance allow it.

Claude Code Config

The video includes a project-local .claude/settings.local.json approach for routing Claude Code through OpenRouter to GLM 5.2. Claude Code's official docs explain that settings files can configure environment variables, and the LLM gateway docs describe ANTHROPIC_BASE_URL as the common address variable for gateways.

Put this in your project-local .claude/settings.local.json. Paste your OpenRouter API key into ANTHROPIC_AUTH_TOKEN. Do not commit this file.

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://openrouter.ai/api",
    "ANTHROPIC_AUTH_TOKEN": "",
    "ANTHROPIC_API_KEY": "",
    "API_TIMEOUT_MS": "3000000",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "z-ai/glm-5.2",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "z-ai/glm-5.2",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "z-ai/glm-5.2",
    "ANTHROPIC_SMALL_FAST_MODEL": "z-ai/glm-5.2",
    "CLAUDE_CODE_SUBAGENT_MODEL": "z-ai/glm-5.2"
  }
}

Security note: settings.local.json should stay local. Add .claude/settings.local.json to your ignore rules if the project is under Git, and never paste API keys into shared docs, screenshots, or agent transcripts.

What To Route Where

The stack is most useful when you stop asking "which model is best?" and start asking "which model is good enough for this part of the workflow?"

Workflow	Best first route	Why
Private notes and Obsidian search	Ollama local model	Privacy matters more than frontier quality. Keep the vault local when possible.
Small code cleanup	Ollama Qwen3-Coder or OpenRouter GLM 5.2	Cheap iteration, easy rollback, and enough intelligence for scoped edits.
Landing page experiments	NVIDIA NIM through OpenCode or OpenRouter GLM 5.2	Better models help, but you still need visual review.
Large refactors	Frontier model or GLM 5.2 with strict tests	Use cheaper models for exploration, but keep human review and tests on production code.
Agent research over public sources	OpenRouter or NIM	Hosted speed matters more than local ownership when the data is already public.
Client data or sensitive business docs	Local first	Do not optimize for model quality before privacy, consent, and data handling.

Safety And Privacy

Running a model locally does not automatically make the workflow safe. A local agent can still delete files, leak secrets through tools, or make bad edits. Treat local models like junior employees with a lot of keyboard access.

Keep local APIs private. Do not expose Ollama or Open WebUI publicly without authentication, VPN, and a clear reason.
Separate private and public tasks. Notes and client files should stay local unless you intentionally approve a hosted route.
Use sandboxes. Local code agents should work in project folders, branches, or throwaway copies.
Log what matters. Keep prompts, commands, files changed, tests run, and model/provider used.
Use review gates. Cheaper model routing is not a substitute for code review, data review, or deployment approval.

Starter Checklist

If I were setting this up for a founder or operator this week, I would do it in this order:

Install Ollama.
Run one small model and one coding model from the Ollama library.
Test a private task: summarize notes, search a folder, or clean a transcript.
Try Ollama with Claude Code on a harmless repo.
Create an OpenRouter key and test GLM 5.2 with the local settings file above.
Try NVIDIA Build or NIM when local speed is not enough.
Write down your routing rule: local for private, OpenRouter/NIM for cheap public experiments, frontier models for high-stakes review.

The end state is not "use open source for everything." The end state is choice. Your workflow should not collapse just because one provider changes access, price, model names, or rollout policy.

Open Source AI You Can Keep Running: Ollama, NVIDIA NIM, OpenRouter, and GLM 5.2

Source Note

The Main Idea

Link Map

Method 1: Ollama

Where local Ollama is strongest

Where it is weak

Method 2: NVIDIA NIM

Method 3: OpenRouter

Claude Code Config

What To Route Where

Safety And Privacy

Starter Checklist

Sources

Common questions

Want a system
like this one?

Open Source AI You Can Keep Running: Ollama, NVIDIA NIM, OpenRouter, and GLM 5.2

Source Note

The Main Idea

Link Map

Method 1: Ollama

Where local Ollama is strongest

Where it is weak

Method 2: NVIDIA NIM

Method 3: OpenRouter

Claude Code Config

What To Route Where

Safety And Privacy

Starter Checklist

Sources

Common questions

Related Articles

Local AI Starter Stack: Run Private Models at Home in 20 Minutes

The Best Hardware for Local AI Models

GLM 5.2 in Claude Code: Cheap Model Routing Gets Serious

Want a systemlike this one?

Want a system
like this one?