diff --git a/docsv2/selfhosted-production.md b/docsv2/selfhosted-production.md
index 1783a3ac..bb5385ce 100644
--- a/docsv2/selfhosted-production.md
+++ b/docsv2/selfhosted-production.md
@@ -85,6 +85,21 @@ Optionally add `--ollama-gpu` or `--ollama-cpu` for a **local Ollama instance**
 | `--llm-model MODEL` | Choose which Ollama model to download (default: `qwen2.5:14b`) | `--ollama-gpu` or `--ollama-cpu` |
 | *(omitted)* | User configures external LLM (OpenAI, Anthropic, etc.) | LLM API key |
 
+### macOS / Apple Silicon
+
+`--ollama-gpu` requires an NVIDIA GPU and **does not work on macOS**. Docker on macOS cannot access Apple GPU acceleration, so the containerized Ollama will run on CPU only regardless of the flag used.
+
+For the best performance on Mac, we recommend running Ollama **natively outside Docker** (install from https://ollama.com) — this gives Ollama direct access to Apple Metal GPU acceleration. Then omit `--ollama-gpu`/`--ollama-cpu` from the setup script and point the backend to your local Ollama instance:
+
+```env
+# In server/.env
+LLM_URL=http://host.docker.internal:11434/v1
+LLM_MODEL=qwen2.5:14b
+LLM_API_KEY=not-needed
+```
+
+`--ollama-cpu` does work on macOS but will be significantly slower than a native Ollama install with Metal acceleration.
+
 ### Choosing an Ollama model
 
 The default model is `qwen2.5:14b` (~9GB download, good multilingual support and summary quality). Override with `--llm-model`: