Local LLMs¶
Loom can stream chat from any LLM you run on localhost or your LAN. Two integrations are built in:
| Kind | Best for | Wire format |
|---|---|---|
| Ollama | ollama serve running locally or on a homelab box |
POST /api/chat (NDJSON stream), GET /api/tags for models |
| OpenAI-compatible | LM Studio, llama.cpp's llama-server, Jan, vLLM, LocalAI, anything that speaks /v1/chat/completions |
OpenAI SSE stream |
Both are added in Settings → Providers.
Ollama setup¶
- Install Ollama:
brew install ollama(or download). - Pull a model:
ollama pull llama3.2:3b. - Make sure the daemon is running:
ollama serve(the GUI installer auto-launches it; brew install does not). - Open Loom → Settings → Providers → Add.
- Display name:
Ollama - Kind: Ollama
- Base URL:
http://localhost:11434 - Default model: leave blank — Loom auto-discovers via
/api/tags. - Requires auth: off.
- Test connection → should report
N model(s). - Save.
The Agent pane picker now has a Local · Ollama group with one entry per pulled model. Pick one, send a prompt, watch tokens stream in.
Network Ollama¶
Run ollama serve on another machine with OLLAMA_HOST=0.0.0.0:11434 ollama serve, then in Loom set Base URL to http://<host>:11434. /api/tags discovery and /api/chat streaming work the same over LAN.
OpenAI-compatible setup (LM Studio, llama.cpp, Jan, vLLM)¶
These tools all expose an OpenAI-shaped HTTP API. Pick one, start its server, then add an endpoint in Loom.
LM Studio¶
- In LM Studio: Developer → Local Server → start the server (default port
1234). - Note the model identifier in the active session — e.g.
lmstudio-community/Llama-3.1-8B-Instruct. - Loom → Settings → Providers → Add.
- Display name:
LM Studio - Kind: OpenAI-compatible
- Base URL:
http://localhost:1234/v1 - Model: the identifier from step 2.
- Requires auth: off.
- Save.
llama.cpp¶
In Loom, Base URL = http://localhost:8080/v1, Model = whatever string you want (llama-server echoes it back regardless).
Jan¶
Jan exposes an OpenAI-compatible server at http://localhost:1337/v1 by default. Same setup — paste in the model identifier from Jan's UI.
vLLM / LocalAI¶
Same shape: configure Base URL to wherever the server listens (commonly http://localhost:8000/v1), set the Model id to whatever the server expects, save.
Auth tokens¶
Some local servers (or the LAN proxies in front of them) want a bearer token. Toggle Requires auth in the editor and paste the token. It's stored in macOS Keychain under account local_endpoint_<UUID> and sent as Authorization: Bearer <token> on every request.
Streaming and cancel¶
All HTTP providers stream tokens live into the assistant bubble. Hit the Stop button (top right of the agent pane during a turn) to cancel — the URLSession task is canceled and the bubble shows whatever was already emitted.
Troubleshooting¶
- "HTTP 404" when sending — check the Base URL. OpenAI-compatible servers expect the
/v1path; Ollama does not. - "Could not connect to the server" — daemon isn't running, or the port is wrong, or a firewall is blocking it. The Test connection button in the editor isolates the network problem from the model problem.
- No models showing in the picker after Ollama add —
/api/tagsreturned empty (or the daemon isn't on the URL you typed). Make sureollama listhas at least one model and the URL works incurl. - Empty / hung response — many local models don't honor
stoptokens correctly. Check the server's logs.