Docker Model Runner Cheatsheet
Pull and run open-source LLMs locally with an OpenAI-compatible endpoint.
What it is
Use Model Runner to run open models (Llama, Mistral, Phi, Qwen, etc.) on your machine with GPU acceleration and a stable OpenAI-style API for your apps.
Installation
Enable in Docker Desktop → Settings → Beta features → 'Docker Model Runner'. Verify with `docker model --help`.
Quick start
docker model pull ai/llama3.2Download a model from the catalog.
docker model run ai/llama3.2 "Summarize Docker in one sentence."One-shot inference from the CLI.
docker model listShow locally available models.
curl http://localhost:12434/engines/v1/modelsHit the OpenAI-compatible endpoint exposed by Docker Desktop.
Common commands
| Task | Command | Description |
|---|---|---|
| Pull a model | | Fetch from the model catalog. |
| Run a model | | Run inference (interactive without prompt). |
| List models | | Locally available models. |
| Remove a model | | Delete local model weights. |
| View logs | | Inference engine logs. |
| Inspect a model | | Metadata: size, quantization, context length. |
Useful flags
| Flag | Example | Meaning |
|---|---|---|
| -i, --interactive | | Open an interactive chat session. |
| --gpu | | Request GPU acceleration (where supported). |
| --format | | Machine-readable output. |
Real-world examples
Chat with a local model from your app (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:12434/engines/v1",
apiKey: "not-needed",
});
const res = await client.chat.completions.create({
model: "ai/llama3.2",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);Pull a small model and run a quick prompt
docker model pull ai/phi3.5
docker model run ai/phi3.5 "Write a haiku about containers."Reach the endpoint from another container (Docker Desktop)
curl http://model-runner.docker.internal/engines/v1/modelsBest practices
- Start with small models (3B–7B) before pulling 70B weights — disk and RAM go fast.
- Use the OpenAI-compatible endpoint so your app code stays portable.
- Remove unused models periodically with `docker model rm` to reclaim disk.
Troubleshooting
`docker model: command not found`
Enable Docker Model Runner in Desktop settings and restart.
Inference is very slow
Ensure GPU support is enabled in Desktop and try a smaller/quantized variant.
Port 12434 unreachable from a container
Use the Docker Desktop hostname `model-runner.docker.internal` instead of localhost.