refactor: move Ollama services to docker-compose.standalone.yml

Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone deployment. Mac devs never use them. Separate file keeps the main compose clean and provides a natural home for future standalone services (MinIO, etc.). Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d Mac: docker compose up -d (native Ollama, no standalone file needed)
2026-04-22 05:05:18 +00:00 · 2026-02-10 16:02:28 -05:00
parent 663345ece6
commit 33a93db802
5 changed files with 65 additions and 78 deletions
--- a/docker-compose.standalone.yml
+++ b/docker-compose.standalone.yml
@@ -0,0 +1,45 @@
 # Standalone services for fully local deployment (no external dependencies).
 # Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml up -d
 #
 # On Linux with NVIDIA GPU, also pass: --profile ollama-gpu
 # On Linux without GPU:                --profile ollama-cpu
 # On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
 services:
  ollama:
    image: ollama/ollama:latest
    profiles: ["ollama-gpu"]
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
  ollama-cpu:
    image: ollama/ollama:latest
    profiles: ["ollama-cpu"]
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
 volumes:
  ollama_data:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -132,44 +132,6 @@ services:
      retries: 5
      start_period: 30s
  ollama:
    image: ollama/ollama:latest
    profiles: ["ollama-gpu"]
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
  ollama-cpu:
    image: ollama/ollama:latest
    profiles: ["ollama-cpu"]
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
 volumes:
  ollama_data:
 networks:
  default:
    attachable: true
--- a/docs/01_ollama.prd.md
+++ b/docs/01_ollama.prd.md
@@ -190,53 +190,31 @@ LLM_API_KEY=not-needed
 LLM_CONTEXT_WINDOW=16000
 ```
-### Docker Compose additions
+### Docker Compose changes
 **`docker-compose.yml`** — `extra_hosts` added to `server` and `hatchet-worker-llm` so containers can reach host Ollama on Mac:
 ```yaml
  hatchet-worker-llm:
    extra_hosts:
      - "host.docker.internal:host-gateway"
 ```
 **`docker-compose.standalone.yml`** — Ollama services for Linux (not in main compose, only used with `-f`):
 ```yaml
 # Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
 services:
  ollama:
    image: ollama/ollama:latest
    profiles: ["ollama-gpu"]
-    ports:
+    # ... NVIDIA GPU passthrough
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
  ollama-cpu:
    image: ollama/ollama:latest
    profiles: ["ollama-cpu"]
-    ports:
+    # ... CPU-only fallback
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 10s
      timeout: 5s
      retries: 5
  hatchet-worker-llm:
    extra_hosts:
      - "host.docker.internal:host-gateway"
 volumes:
  ollama_data:
 ```
 Mac devs never touch `docker-compose.standalone.yml` — Ollama runs natively. The standalone file is for Linux deployment and will grow to include other local-only services (e.g. MinIO for S3) as the standalone story expands.
 ### Known gotchas
 1. **OrbStack `host.docker.internal`**: OrbStack uses `host.internal` by default, but also supports `host.docker.internal` with `extra_hosts: host-gateway`.
--- a/docs/docs/installation/local-dev-setup.md
+++ b/docs/docs/installation/local-dev-setup.md
@@ -27,7 +27,7 @@ The script is idempotent — safe to re-run at any time. It detects what's alrea
 **Mac**: starts Ollama natively (Metal GPU acceleration). Pulls the LLM model. Docker containers reach it via `host.docker.internal:11434`.
-**Linux**: starts containerized Ollama via docker-compose profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
+**Linux**: starts containerized Ollama via `docker-compose.standalone.yml` profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
 Configures `server/.env`:
 ```
--- a/scripts/setup-local-llm.sh
+++ b/scripts/setup-local-llm.sh
@@ -69,8 +69,10 @@ case "$OS" in
            LLM_URL="http://ollama-cpu:$OLLAMA_PORT/v1"
        fi
        COMPOSE="docker compose -f docker-compose.yml -f docker-compose.standalone.yml"
        echo "Starting Ollama container..."
-        docker compose --profile "$PROFILE" up -d
+        $COMPOSE --profile "$PROFILE" up -d
        # Determine container name
        if [ "$PROFILE" = "ollama-gpu" ]; then
@@ -82,7 +84,7 @@ case "$OS" in
        wait_for_ollama "http://localhost:$OLLAMA_PORT"
        echo "Pulling model $MODEL..."
-        docker compose exec "$SVC" ollama pull "$MODEL"
+        $COMPOSE exec "$SVC" ollama pull "$MODEL"
        echo ""
        echo "Done. Add to server/.env:"
@@ -90,7 +92,7 @@ case "$OS" in
        echo "  LLM_MODEL=$MODEL"
        echo "  LLM_API_KEY=not-needed"
        echo ""
-        echo "Then: docker compose --profile $PROFILE up -d"
+        echo "Then: $COMPOSE --profile $PROFILE up -d"
        ;;
    *)