refactor: move Ollama services to docker-compose.standalone.yml

Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone
deployment. Mac devs never use them. Separate file keeps the main
compose clean and provides a natural home for future standalone services
(MinIO, etc.).

Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
Mac: docker compose up -d (native Ollama, no standalone file needed)
This commit is contained in:
Igor Loskutov
2026-02-10 16:02:28 -05:00
parent 663345ece6
commit 33a93db802
5 changed files with 65 additions and 78 deletions

View File

@@ -0,0 +1,45 @@
# Standalone services for fully local deployment (no external dependencies).
# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml up -d
#
# On Linux with NVIDIA GPU, also pass: --profile ollama-gpu
# On Linux without GPU: --profile ollama-cpu
# On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
services:
ollama:
image: ollama/ollama:latest
profiles: ["ollama-gpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
ollama-cpu:
image: ollama/ollama:latest
profiles: ["ollama-cpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
volumes:
ollama_data:

View File

@@ -132,44 +132,6 @@ services:
retries: 5
start_period: 30s
ollama:
image: ollama/ollama:latest
profiles: ["ollama-gpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
ollama-cpu:
image: ollama/ollama:latest
profiles: ["ollama-cpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
volumes:
ollama_data:
networks:
default:
attachable: true

View File

@@ -190,53 +190,31 @@ LLM_API_KEY=not-needed
LLM_CONTEXT_WINDOW=16000
```
### Docker Compose additions
### Docker Compose changes
**`docker-compose.yml`** — `extra_hosts` added to `server` and `hatchet-worker-llm` so containers can reach host Ollama on Mac:
```yaml
hatchet-worker-llm:
extra_hosts:
- "host.docker.internal:host-gateway"
```
**`docker-compose.standalone.yml`** — Ollama services for Linux (not in main compose, only used with `-f`):
```yaml
# Usage: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
services:
ollama:
image: ollama/ollama:latest
profiles: ["ollama-gpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
# ... NVIDIA GPU passthrough
ollama-cpu:
image: ollama/ollama:latest
profiles: ["ollama-cpu"]
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 10s
timeout: 5s
retries: 5
hatchet-worker-llm:
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
ollama_data:
# ... CPU-only fallback
```
Mac devs never touch `docker-compose.standalone.yml` — Ollama runs natively. The standalone file is for Linux deployment and will grow to include other local-only services (e.g. MinIO for S3) as the standalone story expands.
### Known gotchas
1. **OrbStack `host.docker.internal`**: OrbStack uses `host.internal` by default, but also supports `host.docker.internal` with `extra_hosts: host-gateway`.

View File

@@ -27,7 +27,7 @@ The script is idempotent — safe to re-run at any time. It detects what's alrea
**Mac**: starts Ollama natively (Metal GPU acceleration). Pulls the LLM model. Docker containers reach it via `host.docker.internal:11434`.
**Linux**: starts containerized Ollama via docker-compose profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
**Linux**: starts containerized Ollama via `docker-compose.standalone.yml` profile (`ollama-gpu` with NVIDIA, `ollama-cpu` without). Pulls model inside the container.
Configures `server/.env`:
```

View File

@@ -69,8 +69,10 @@ case "$OS" in
LLM_URL="http://ollama-cpu:$OLLAMA_PORT/v1"
fi
COMPOSE="docker compose -f docker-compose.yml -f docker-compose.standalone.yml"
echo "Starting Ollama container..."
docker compose --profile "$PROFILE" up -d
$COMPOSE --profile "$PROFILE" up -d
# Determine container name
if [ "$PROFILE" = "ollama-gpu" ]; then
@@ -82,7 +84,7 @@ case "$OS" in
wait_for_ollama "http://localhost:$OLLAMA_PORT"
echo "Pulling model $MODEL..."
docker compose exec "$SVC" ollama pull "$MODEL"
$COMPOSE exec "$SVC" ollama pull "$MODEL"
echo ""
echo "Done. Add to server/.env:"
@@ -90,7 +92,7 @@ case "$OS" in
echo " LLM_MODEL=$MODEL"
echo " LLM_API_KEY=not-needed"
echo ""
echo "Then: docker compose --profile $PROFILE up -d"
echo "Then: $COMPOSE --profile $PROFILE up -d"
;;
*)