All cheatsheets
AIIntermediate

Docker Model Runner Cheatsheet

Pull and run open-source LLMs locally with an OpenAI-compatible endpoint.

What it is

Use Model Runner to run open models (Llama, Mistral, Phi, Qwen, etc.) on your machine with GPU acceleration and a stable OpenAI-style API for your apps.

Installation

Enable in Docker Desktop → Settings → Beta features → 'Docker Model Runner'. Verify with `docker model --help`.

Quick start

docker model pull ai/llama3.2

Download a model from the catalog.

docker model run ai/llama3.2 "Summarize Docker in one sentence."

One-shot inference from the CLI.

docker model list

Show locally available models.

curl http://localhost:12434/engines/v1/models

Hit the OpenAI-compatible endpoint exposed by Docker Desktop.

Common commands

TaskCommandDescription
Pull a model
docker model pull <model>
Fetch from the model catalog.
Run a model
docker model run <model> "<prompt>"
Run inference (interactive without prompt).
List models
docker model list
Locally available models.
Remove a model
docker model rm <model>
Delete local model weights.
View logs
docker model logs <model>
Inference engine logs.
Inspect a model
docker model inspect <model>
Metadata: size, quantization, context length.

Useful flags

FlagExampleMeaning
-i, --interactive
docker model run -i ai/llama3.2
Open an interactive chat session.
--gpu
docker model run --gpu ai/llama3.2 "hi"
Request GPU acceleration (where supported).
--format
docker model list --format json
Machine-readable output.

Real-world examples

Chat with a local model from your app (OpenAI SDK)

import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "http://localhost:12434/engines/v1",
  apiKey: "not-needed",
});
const res = await client.chat.completions.create({
  model: "ai/llama3.2",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);

Pull a small model and run a quick prompt

docker model pull ai/phi3.5
docker model run ai/phi3.5 "Write a haiku about containers."

Reach the endpoint from another container (Docker Desktop)

curl http://model-runner.docker.internal/engines/v1/models

Best practices

  • Start with small models (3B–7B) before pulling 70B weights — disk and RAM go fast.
  • Use the OpenAI-compatible endpoint so your app code stays portable.
  • Remove unused models periodically with `docker model rm` to reclaim disk.

Troubleshooting

`docker model: command not found`

Enable Docker Model Runner in Desktop settings and restart.

Inference is very slow

Ensure GPU support is enabled in Desktop and try a smaller/quantized variant.

Port 12434 unreachable from a container

Use the Docker Desktop hostname `model-runner.docker.internal` instead of localhost.

Official Docker Docs references

Last reviewed: