refactor: move Ollama services to docker-compose.standalone.yml

Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone deployment. Mac devs never use them. Separate file keeps the main compose clean and provides a natural home for future standalone services (MinIO, etc.). Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d Mac: docker compose up -d (native Ollama, no standalone file needed)
2026-05-06 11:15:18 +00:00 · 2026-02-10 16:02:28 -05:00
parent 663345ece6
commit 33a93db802
5 changed files with 65 additions and 78 deletions
--- a/docs/01_ollama.prd.md
+++ b/docs/01_ollama.prd.md
@@ -190,53 +190,31 @@ LLM_API_KEY=not-needed
 LLM_CONTEXT_WINDOW=16000
 ```

-### Docker Compose additions
+### Docker Compose changes

+**`docker-compose.yml`** — `extra_hosts` added to `server` and `hatchet-worker-llm` so containers can reach host Ollama on Mac:
 ```yaml
+  hatchet-worker-llm:
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+```
+
+**`docker-compose.standalone.yml`** — Ollama services for Linux (not in main compose, only used with `-f`):
+```yaml
+# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
 services:
  ollama:
    image: ollama/ollama:latest
    profiles: ["ollama-gpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: all
-              capabilities: [gpu]
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
+    # ... NVIDIA GPU passthrough
  ollama-cpu:
    image: ollama/ollama:latest
    profiles: ["ollama-cpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
-  hatchet-worker-llm:
-    extra_hosts:
-      - "host.docker.internal:host-gateway"
-
-volumes:
-  ollama_data:
+    # ... CPU-only fallback
 ```

+Mac devs never touch `docker-compose.standalone.yml` — Ollama runs natively. The standalone file is for Linux deployment and will grow to include other local-only services (e.g. MinIO for S3) as the standalone story expands.
+
 ### Known gotchas

 1. **OrbStack `host.docker.internal`**: OrbStack uses `host.internal` by default, but also supports `host.docker.internal` with `extra_hosts: host-gateway`.
--- a/docs/docs/installation/local-dev-setup.md
+++ b/docs/docs/installation/local-dev-setup.md
@@ -27,7 +27,7 @@ The script is idempotent — safe to re-run at any time. It detects what's alrea

 **Mac**: starts Ollama natively (Metal GPU acceleration). Pulls the LLM model. Docker containers reach it via `host.docker.internal:11434`.

-**Linux**: starts containerized Ollama via docker-compose profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
+**Linux**: starts containerized Ollama via `docker-compose.standalone.yml` profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.

 Configures `server/.env`:
 ```