Local LLM inference — managed by Steve
Service
localhost:11434 (loopback)
Managed via Steve proxy — fleet apps call Steve, not Ollama directly.
Steve Proxy API
Use these endpoints from anywhere on the Tailscale network:
# List models
GET http://100.73.184.25:7777/api/ollama/status
# Generate (requires X-API-Key)
POST http://100.73.184.25:7777/api/ollama/generate
{"model": "mistral:latest", "prompt": "hello"}
# Pull a new model (returns job_id)
POST http://100.73.184.25:7777/api/ollama/pull
{"model": "codellama:latest"}
# Check pull progress
GET http://100.73.184.25:7777/api/ollama/pull/{job_id}
# Delete a model
POST http://100.73.184.25:7777/api/ollama/delete
{"model": "llama2:latest"}