reflector/docsv2/gpu-host-setup.md

# Standalone GPU Host Setup

Deploy Reflector's GPU transcription/diarization/translation service on a dedicated machine, separate from the main Reflector instance. Useful when:

- Your GPU machine is on a different network than the Reflector server
- You want to share one GPU service across multiple Reflector instances
- The GPU machine has special hardware/drivers that can't run the full stack
- You need to scale GPU processing independently

## Architecture

```
┌─────────────────────┐         HTTPS          ┌────────────────────┐
│  Reflector Server    │ ────────────────────── │  GPU Host          │
│  (server, worker,    │  TRANSCRIPT_URL        │  (transcription,   │
│   web, postgres,     │  DIARIZATION_URL       │   diarization,     │
│   redis, hatchet)    │  TRANSLATE_URL         │   translation)     │
│                      │                        │                    │
│  setup-selfhosted.sh │                        │  setup-gpu-host.sh │
│  --hosted            │                        │                    │
└─────────────────────┘                        └────────────────────┘
```

The GPU service is a standalone FastAPI app that exposes transcription, diarization, translation, and audio padding endpoints. It has **no dependencies** on PostgreSQL, Redis, Hatchet, or any other Reflector service.

## Quick Start

### On the GPU machine

```bash
git clone <reflector-repo>
cd reflector

# Set HuggingFace token (required for diarization models)
export HF_TOKEN=your-huggingface-token

# Deploy with HTTPS (Let's Encrypt)
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key

# Or deploy with custom CA
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-secret-key
```

### On the Reflector machine

```bash
# If the GPU host uses a custom CA, trust it
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
    --extra-ca /path/to/gpu-machine-ca.crt

# Or if you already have --custom-ca for your local domain
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
    --domain reflector.local --custom-ca certs/ \
    --extra-ca /path/to/gpu-machine-ca.crt
```

Then configure `server/.env` to point to the GPU host:

```bash
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://gpu.example.com
TRANSCRIPT_MODAL_API_KEY=my-secret-key

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://gpu.example.com
DIARIZATION_MODAL_API_KEY=my-secret-key

TRANSLATION_BACKEND=modal
TRANSLATE_URL=https://gpu.example.com
TRANSLATION_MODAL_API_KEY=my-secret-key
```

## Script Options

```
./scripts/setup-gpu-host.sh [OPTIONS]

Options:
  --domain DOMAIN    Domain name for HTTPS (Let's Encrypt or custom cert)
  --custom-ca PATH   Custom CA (directory or single PEM file)
  --extra-ca FILE    Additional CA cert to trust (repeatable)
  --api-key KEY      API key to protect the service (strongly recommended)
  --cpu              CPU-only mode (no NVIDIA GPU required)
  --port PORT        Host port (default: 443 with Caddy, 8000 without)
```

## Deployment Scenarios

### Public internet with Let's Encrypt

GPU machine has a public IP and domain:

```bash
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
```

Requirements:
- DNS A record: `gpu.example.com` → GPU machine's public IP
- Ports 80 and 443 open
- Caddy auto-provisions Let's Encrypt certificate

### Internal network with custom CA

GPU machine on a private network:

```bash
# Generate certs on the GPU machine
./scripts/generate-certs.sh gpu.internal "IP:192.168.1.200"

# Deploy
./scripts/setup-gpu-host.sh --domain gpu.internal --custom-ca certs/ --api-key my-secret-key
```

On each machine that connects (including the Reflector server), add DNS:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```

### IP-only (no domain)

No domain needed — just use the machine's IP:

```bash
./scripts/setup-gpu-host.sh --api-key my-secret-key
```

Caddy is not used; the GPU service runs directly on port 8000 (HTTP). For HTTPS without a domain, the Reflector machine connects via `http://<GPU_IP>:8000`.

### CPU-only (no NVIDIA GPU)

Works on any machine — transcription will be slower:

```bash
./scripts/setup-gpu-host.sh --cpu --domain gpu.example.com --api-key my-secret-key
```

## DNS Resolution

The Reflector server must be able to reach the GPU host by name or IP.

| Setup | DNS Method | TRANSCRIPT_URL example |
|-------|------------|----------------------|
| Public domain | DNS A record | `https://gpu.example.com` |
| Internal domain | `/etc/hosts` on both machines | `https://gpu.internal` |
| IP only | No DNS needed | `http://192.168.1.200:8000` |

For internal domains, add the GPU machine's IP to `/etc/hosts` on the Reflector machine:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```

If the Reflector server runs in Docker, the containers resolve DNS from the host (Docker's default DNS behavior). So adding to the host's `/etc/hosts` is sufficient.

## Multi-CA Setup

When your Reflector instance has its own CA (for `reflector.local`) and the GPU host has a different CA:

**On the GPU machine:**
```bash
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-key
```

**On the Reflector machine:**
```bash
# Your local CA for reflector.local + the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
    --domain reflector.local \
    --custom-ca certs/ \
    --extra-ca /path/to/gpu-machine-ca.crt
```

The `--extra-ca` flag appends the GPU host's CA to the trust bundle. Backend containers trust both CAs — your local domain works AND outbound calls to the GPU host succeed.

You can repeat `--extra-ca` for multiple remote services:
```bash
--extra-ca /path/to/gpu-ca.crt --extra-ca /path/to/llm-ca.crt
```

## API Key Authentication

The GPU service uses Bearer token authentication via `REFLECTOR_GPU_APIKEY`:

```bash
# Test from the Reflector machine
curl -s https://gpu.example.com/docs                              # No auth needed for docs
curl -s -X POST https://gpu.example.com/v1/audio/transcriptions \
    -H "Authorization: Bearer <my-secret-key>" \                    #gitleaks:allow
    -F "file=@audio.wav"
```

If `REFLECTOR_GPU_APIKEY` is not set, the service accepts all requests (open access). Always use `--api-key` for internet-facing deployments.

The same key goes in Reflector's `server/.env` as `TRANSCRIPT_MODAL_API_KEY` and `DIARIZATION_MODAL_API_KEY`.

## Files

| File | Checked in? | Purpose |
|------|-------------|---------|
| `docker-compose.gpu-host.yml` | Yes | Static compose file with profiles (`gpu`, `cpu`, `caddy`) |
| `.env.gpu-host` | No (generated) | Environment variables (HF_TOKEN, API key, ports) |
| `Caddyfile.gpu-host` | No (generated) | Caddy config (only when using HTTPS) |
| `docker-compose.gpu-ca.yml` | No (generated) | CA cert mounts override (only with --custom-ca) |
| `certs/` | No (generated) | Staged certificates (when using --custom-ca) |

The compose file is checked into the repo — you can read it to understand exactly what runs. The script only generates env vars, Caddyfile, and CA overrides. Profiles control which service starts:

```bash
# What the script does under the hood:
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy \
    --env-file .env.gpu-host up -d

# CPU mode:
docker compose -f docker-compose.gpu-host.yml --profile cpu --profile caddy \
    --env-file .env.gpu-host up -d
```

Both `gpu` and `cpu` services get the network alias `transcription`, so Caddy's config works with either.

## Management

```bash
# View logs
docker compose -f docker-compose.gpu-host.yml --profile gpu logs -f gpu

# Restart
docker compose -f docker-compose.gpu-host.yml --profile gpu restart gpu

# Stop
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy down

# Re-run setup
./scripts/setup-gpu-host.sh [same flags]

# Rebuild after code changes
docker compose -f docker-compose.gpu-host.yml --profile gpu build gpu
docker compose -f docker-compose.gpu-host.yml --profile gpu up -d gpu
```

If you deployed with `--custom-ca`, include the CA override in manual commands:
```bash
docker compose -f docker-compose.gpu-host.yml -f docker-compose.gpu-ca.yml \
    --profile gpu logs -f gpu
```

## Troubleshooting

### GPU service won't start

Check logs:
```bash
docker compose -f docker-compose.gpu-host.yml logs gpu
```

Common causes:
- NVIDIA driver not installed or `nvidia-container-toolkit` missing
- `HF_TOKEN` not set (diarization model download fails)
- Port already in use

### Reflector can't connect to GPU host

From the Reflector machine:
```bash
# Test HTTPS connectivity
curl -v https://gpu.example.com/docs

# If using custom CA, test with explicit CA
curl --cacert /path/to/gpu-ca.crt https://gpu.internal/docs
```

From inside the Reflector container:
```bash
docker compose exec server python -c "
import httpx
r = httpx.get('https://gpu.internal/docs')
print(r.status_code)
"
```

### SSL: CERTIFICATE_VERIFY_FAILED

The Reflector backend doesn't trust the GPU host's CA. Fix:
```bash
# Re-run Reflector setup with the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --extra-ca /path/to/gpu-ca.crt
```

### Diarization returns errors

- Accept pyannote model licenses on HuggingFace:
  - https://huggingface.co/pyannote/speaker-diarization-3.1
  - https://huggingface.co/pyannote/segmentation-3.0
- Verify `HF_TOKEN` is set in `.env.gpu-host`