refactor: move Ollama services to docker-compose.standalone.yml

Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone deployment. Mac devs never use them. Separate file keeps the main compose clean and provides a natural home for future standalone services (MinIO, etc.). Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d Mac: docker compose up -d (native Ollama, no standalone file needed)
2026-05-06 11:15:18 +00:00 · 2026-02-10 16:02:28 -05:00
parent 663345ece6
commit 33a93db802
5 changed files with 65 additions and 78 deletions
--- a/docker-compose.standalone.yml
+++ b/docker-compose.standalone.yml
@@ -0,0 +1,45 @@
+# Standalone services for fully local deployment (no external dependencies).
+# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml up -d
+#
+# On Linux with NVIDIA GPU, also pass: --profile ollama-gpu
+# On Linux without GPU:                --profile ollama-cpu
+# On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
+
+services:
+  ollama:
+    image: ollama/ollama:latest
+    profiles: ["ollama-gpu"]
+    ports:
+      - "11434:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  ollama-cpu:
+    image: ollama/ollama:latest
+    profiles: ["ollama-cpu"]
+    ports:
+      - "11434:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+volumes:
+  ollama_data:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -132,44 +132,6 @@ services:
      retries: 5
      start_period: 30s

-  ollama:
-    image: ollama/ollama:latest
-    profiles: ["ollama-gpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: all
-              capabilities: [gpu]
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
-  ollama-cpu:
-    image: ollama/ollama:latest
-    profiles: ["ollama-cpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
-volumes:
-  ollama_data:
-
 networks:
  default:
    attachable: true
--- a/docs/01_ollama.prd.md
+++ b/docs/01_ollama.prd.md
@@ -190,53 +190,31 @@ LLM_API_KEY=not-needed
 LLM_CONTEXT_WINDOW=16000
 ```

-### Docker Compose additions
+### Docker Compose changes

+**`docker-compose.yml`** — `extra_hosts` added to `server` and `hatchet-worker-llm` so containers can reach host Ollama on Mac:
 ```yaml
+  hatchet-worker-llm:
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+```
+
+**`docker-compose.standalone.yml`** — Ollama services for Linux (not in main compose, only used with `-f`):
+```yaml
+# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
 services:
  ollama:
    image: ollama/ollama:latest
    profiles: ["ollama-gpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: all
-              capabilities: [gpu]
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
+    # ... NVIDIA GPU passthrough
  ollama-cpu:
    image: ollama/ollama:latest
    profiles: ["ollama-cpu"]
-    ports:
-      - "11434:11434"
-    volumes:
-      - ollama_data:/root/.ollama
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
-      interval: 10s
-      timeout: 5s
-      retries: 5
-
-  hatchet-worker-llm:
-    extra_hosts:
-      - "host.docker.internal:host-gateway"
-
-volumes:
-  ollama_data:
+    # ... CPU-only fallback
 ```

+Mac devs never touch `docker-compose.standalone.yml` — Ollama runs natively. The standalone file is for Linux deployment and will grow to include other local-only services (e.g. MinIO for S3) as the standalone story expands.
+
 ### Known gotchas

 1. **OrbStack `host.docker.internal`**: OrbStack uses `host.internal` by default, but also supports `host.docker.internal` with `extra_hosts: host-gateway`.
--- a/docs/docs/installation/local-dev-setup.md
+++ b/docs/docs/installation/local-dev-setup.md
@@ -27,7 +27,7 @@ The script is idempotent — safe to re-run at any time. It detects what's alrea

 **Mac**: starts Ollama natively (Metal GPU acceleration). Pulls the LLM model. Docker containers reach it via `host.docker.internal:11434`.

-**Linux**: starts containerized Ollama via docker-compose profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
+**Linux**: starts containerized Ollama via `docker-compose.standalone.yml` profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.

 Configures `server/.env`:
 ```
--- a/scripts/setup-local-llm.sh
+++ b/scripts/setup-local-llm.sh
@@ -69,8 +69,10 @@ case "$OS" in
            LLM_URL="http://ollama-cpu:$OLLAMA_PORT/v1"
        fi

+        COMPOSE="docker compose -f docker-compose.yml -f docker-compose.standalone.yml"
+
        echo "Starting Ollama container..."
-        docker compose --profile "$PROFILE" up -d
+        $COMPOSE --profile "$PROFILE" up -d

        # Determine container name
        if [ "$PROFILE" = "ollama-gpu" ]; then
@@ -82,7 +84,7 @@ case "$OS" in
        wait_for_ollama "http://localhost:$OLLAMA_PORT"

        echo "Pulling model $MODEL..."
-        docker compose exec "$SVC" ollama pull "$MODEL"
+        $COMPOSE exec "$SVC" ollama pull "$MODEL"

        echo ""
        echo "Done. Add to server/.env:"
@@ -90,7 +92,7 @@ case "$OS" in
        echo "  LLM_MODEL=$MODEL"
        echo "  LLM_API_KEY=not-needed"
        echo ""
-        echo "Then: docker compose --profile $PROFILE up -d"
+        echo "Then: $COMPOSE --profile $PROFILE up -d"
        ;;

    *)