What the failure looks like
The setup looks correct. Ollama is running. You can see your local model with ollama list. OpenClaw's model picker shows your Llama or Qwen model selected. You run a query. Response comes back.
Then you check your OpenAI or Anthropic billing dashboard. There are charges you did not expect. Token counts that do not match what a local model would produce. Sometimes the bill shows cloud usage every time you open a session. Sometimes it is intermittent — local some days, cloud others.
The bluntest diagnostic is to temporarily remove outbound connectivity while testing a query. If OpenClaw is genuinely local, the response still returns from Ollama. If the response dies immediately, the traffic was going out. Many operators only discover this problem because their API bill arrives.
In many reported cases, OpenClaw is not obviously "broken" so much as routing according to provider priority, startup timing, or fallback behaviour the operator did not realise was active.
Why OpenClaw falls back to cloud
There does not appear to be a single cause. Based on issue reports and observed behaviour, there are at least three recurring mechanisms that can each produce the same symptom: local model selected in the UI, cloud provider receiving the requests.
The startup race between OpenClaw and Ollama
OpenClaw polls for available local providers when it starts. This poll happens early in the startup sequence. Ollama, depending on your machine and the model size loaded, may not have finished initialising by the time OpenClaw runs its provider discovery check.
In reported cases, when OpenClaw polls and Ollama does not respond in time, local providers are skipped. The API models — OpenAI, Anthropic — are available immediately because they require no local process to be ready. OpenClaw registers them as the available providers for this session. Your configured API model becomes the active default even though you never asked for it.
If you start Ollama and OpenClaw at the same time — or if Ollama is set to start at login alongside OpenClaw — this race runs every session. On fast machines loading a small model you win the race most of the time. On slower machines, or with large models that take 15–30 seconds to fully load, you lose it more often. The inconsistency is why this is hard to catch: local works sometimes, cloud appears other times, with no obvious change in what you did.
How the model picker can mis-route requests
OpenClaw identifies providers using a prefix system. A model is not just a name — it is a provider:model combination. When you switch models in the picker, OpenClaw should update both the model name and the provider prefix. A documented class of issue exists where the display label updates but the underlying provider ID does not.
A common failure pattern in this case: the picker shows ollama/llama3 as the selected model. The active routing configuration still references an API provider from the previous session. Every request goes to cloud. The UI gives no indication that routing is wrong.
This is more common after switching between providers in the same session, or after an OpenClaw update that changes how provider IDs are constructed. GitHub issues confirm regressions in this area across multiple OpenClaw versions.
Which errors actually trigger fallback
OpenClaw's fallback system is more selective than most operators assume. Fallback is triggered by specific error categories: rate limit responses, request timeouts, and explicit provider unavailability signals. It is not triggered by context length exceeded errors, provider misclassification, or JSON parsing failures from the model.
This matters because some of the conditions that push traffic to a cloud provider look to the operator like "local wasn't working so it tried the backup." In reality OpenClaw may be routing to cloud on every request because the active provider was set to cloud at startup — the fallback logic is not involved at all.
| Error condition | Triggers fallback? | Notes |
|---|---|---|
| Rate limit (429) | Yes | Rotates to next auth profile, then fallback provider |
| Request timeout | Yes | After timeoutSeconds threshold; depends on config |
| Provider unreachable | Yes | Ollama not responding at discovery time |
| Context length exceeded | No | Session errors or freezes; does not escalate to backup |
| Tool call JSON parse error | No | Fails silently or returns incomplete response |
| Provider prefix mismatch | No | Model not allowed error; no automatic reroute |
How to prove it is happening
Before changing anything, confirm that cloud traffic is actually occurring and identify which of the three mechanisms is causing it.
Check the provider in OpenClaw settings
Open OpenClaw settings and navigate to the active provider configuration — not the model picker in the chat UI, but the underlying provider settings. Confirm that the provider registered as active matches what you expect. If it shows an API provider when you expect Ollama, the startup race or a picker mis-route is the cause.
Read the Ollama logs during startup
Start Ollama with logging enabled and watch the output. Note the timestamp when Ollama reports the model is fully loaded and ready to serve requests. Then start OpenClaw and note when its startup sequence runs provider discovery. If OpenClaw's discovery runs before Ollama's ready timestamp, you are losing the startup race consistently.
Watch your API billing in real time
OpenAI and Anthropic both have usage dashboards that update within minutes. Run a test session with OpenClaw and immediately check the usage page. A charge appearing confirms cloud traffic. No charge confirms local routing is working for that session. Test across multiple restarts to distinguish a consistent problem from an intermittent one.
The network cable test
Crude but definitive. Disable your internet connection before starting a session. Send a query. If the response returns normally, the traffic is local. If OpenClaw hangs or errors, the active provider requires an internet connection. Re-enable your connection and the exact error message OpenClaw returns will often confirm which cloud provider it was trying to reach.
What to change right now
Fix order: solve the startup race first, then pin the provider explicitly, then verify the result. Only add harder restrictions if the problem continues after that.
Give Ollama time to load before OpenClaw starts
The simplest fix for the startup race is sequencing. Start Ollama first. Wait until the model is fully loaded — you can confirm this by running a test query directly against Ollama in a terminal. Only then start OpenClaw. On systems where both start at login, add a delay to OpenClaw's startup or remove it from auto-start entirely and launch it manually after confirming Ollama is ready.
# Wait for Ollama to be fully ready ollama run llama3 "ping" && echo "Ollama ready" # Only then start OpenClaw openclaw
Pin the provider explicitly in config
Do not rely on auto-discovery for your primary provider. Set it explicitly in your OpenClaw configuration so that even if the startup race runs, the configured provider takes precedence over whatever discovery returned.
{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"enabled": true
}
},
"agents": {
"defaults": {
"provider": "ollama",
"model": "llama3"
}
}
}
Remove cloud API keys from the active config
If you are running a strictly local setup with no intention of using cloud providers, removing API keys from the active config is one way to eliminate the routing ambiguity entirely. OpenClaw cannot route to a provider it has no credentials for.
If you need cloud providers available for specific tasks but want local as the default, keep the keys but add them to an explicit allowlist rather than leaving them as open fallback options. The right approach depends on how your stack is actually used.
Run openclaw doctor after changes
OpenClaw includes a diagnostic command that validates your configuration and connectivity. After making config changes, run it to confirm the active provider is what you intend and that Ollama connectivity is confirmed.
openclaw doctor
# If config issues are found:
openclaw doctor --fix
Retest with the billing dashboard and the connectivity test. The startup race fix alone resolves most intermittent cases. The explicit provider pin addresses picker mis-routing. Removing unused API keys reduces routing ambiguity but involves trade-offs if you use cloud providers selectively.
What about Open WebUI and AnythingLLM?
The same class of problem appears in adjacent tools. AnythingLLM has a documented bug where workspaces route to OpenAI even when LM Studio is explicitly selected as the only configured provider — users have resorted to setting a fake OpenAI key to try to block the routing. Open WebUI users regularly report local Ollama models disappearing after updates, with the UI silently falling back to connected cloud providers.
The mechanism differs tool to tool — provider discovery timing, workspace-level routing overrides, Docker networking misconfigurations — but the symptom is identical: the UI says local, the bill says cloud.
If you are running OpenClaw alongside Open WebUI or AnythingLLM in a shared stack, check each tool's active provider configuration independently. One tool routing correctly does not guarantee the others are.
What this means for local-first operators
Silent fallback to cloud is not a bug you can report and wait for a fix. It is a structural property of how these systems handle provider priority. The default assumption in OpenClaw and most similar tools is that cloud providers are more reliable than local ones — faster to discover, faster to respond, always available. Local providers are bolted on top of that assumption.
Running a genuinely local-first stack means actively working against the defaults, not trusting that "local selected" means "local used." It means knowing where provider priority is set, what triggers fallback, how to verify traffic, and how to lock down routing so that cloud access requires an explicit decision — not just an Ollama startup delay.
If you are trying to make local-first behaviour predictable instead of hopeful, that is exactly the layer Foundation is built to address.
The Full Local-First Policy Framework
Foundation covers provider pinning, fallback architecture, memory continuity, tool-calling reliability, and trust boundaries — the complete set of operator decisions OpenClaw leaves unresolved after setup.
Get the Foundation Blueprint — £59