Files
reflector/docsv2/gpu-host-setup.md
Juan Diego García 12bf0c2d77 feat: custom ca for caddy (#931)
* fix: send email on transcript page permissions fixed

* feat: custom ca for caddy
2026-03-30 11:42:39 -05:00

295 lines
9.6 KiB
Markdown

# Standalone GPU Host Setup
Deploy Reflector's GPU transcription/diarization/translation service on a dedicated machine, separate from the main Reflector instance. Useful when:
- Your GPU machine is on a different network than the Reflector server
- You want to share one GPU service across multiple Reflector instances
- The GPU machine has special hardware/drivers that can't run the full stack
- You need to scale GPU processing independently
## Architecture
```
┌─────────────────────┐ HTTPS ┌────────────────────┐
│ Reflector Server │ ────────────────────── │ GPU Host │
│ (server, worker, │ TRANSCRIPT_URL │ (transcription, │
│ web, postgres, │ DIARIZATION_URL │ diarization, │
│ redis, hatchet) │ TRANSLATE_URL │ translation) │
│ │ │ │
│ setup-selfhosted.sh │ │ setup-gpu-host.sh │
│ --hosted │ │ │
└─────────────────────┘ └────────────────────┘
```
The GPU service is a standalone FastAPI app that exposes transcription, diarization, translation, and audio padding endpoints. It has **no dependencies** on PostgreSQL, Redis, Hatchet, or any other Reflector service.
## Quick Start
### On the GPU machine
```bash
git clone <reflector-repo>
cd reflector
# Set HuggingFace token (required for diarization models)
export HF_TOKEN=your-huggingface-token
# Deploy with HTTPS (Let's Encrypt)
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
# Or deploy with custom CA
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-secret-key
```
### On the Reflector machine
```bash
# If the GPU host uses a custom CA, trust it
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--extra-ca /path/to/gpu-machine-ca.crt
# Or if you already have --custom-ca for your local domain
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--domain reflector.local --custom-ca certs/ \
--extra-ca /path/to/gpu-machine-ca.crt
```
Then configure `server/.env` to point to the GPU host:
```bash
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://gpu.example.com
TRANSCRIPT_MODAL_API_KEY=my-secret-key
DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://gpu.example.com
DIARIZATION_MODAL_API_KEY=my-secret-key
TRANSLATION_BACKEND=modal
TRANSLATE_URL=https://gpu.example.com
TRANSLATION_MODAL_API_KEY=my-secret-key
```
## Script Options
```
./scripts/setup-gpu-host.sh [OPTIONS]
Options:
--domain DOMAIN Domain name for HTTPS (Let's Encrypt or custom cert)
--custom-ca PATH Custom CA (directory or single PEM file)
--extra-ca FILE Additional CA cert to trust (repeatable)
--api-key KEY API key to protect the service (strongly recommended)
--cpu CPU-only mode (no NVIDIA GPU required)
--port PORT Host port (default: 443 with Caddy, 8000 without)
```
## Deployment Scenarios
### Public internet with Let's Encrypt
GPU machine has a public IP and domain:
```bash
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
```
Requirements:
- DNS A record: `gpu.example.com` → GPU machine's public IP
- Ports 80 and 443 open
- Caddy auto-provisions Let's Encrypt certificate
### Internal network with custom CA
GPU machine on a private network:
```bash
# Generate certs on the GPU machine
./scripts/generate-certs.sh gpu.internal "IP:192.168.1.200"
# Deploy
./scripts/setup-gpu-host.sh --domain gpu.internal --custom-ca certs/ --api-key my-secret-key
```
On each machine that connects (including the Reflector server), add DNS:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```
### IP-only (no domain)
No domain needed — just use the machine's IP:
```bash
./scripts/setup-gpu-host.sh --api-key my-secret-key
```
Caddy is not used; the GPU service runs directly on port 8000 (HTTP). For HTTPS without a domain, the Reflector machine connects via `http://<GPU_IP>:8000`.
### CPU-only (no NVIDIA GPU)
Works on any machine — transcription will be slower:
```bash
./scripts/setup-gpu-host.sh --cpu --domain gpu.example.com --api-key my-secret-key
```
## DNS Resolution
The Reflector server must be able to reach the GPU host by name or IP.
| Setup | DNS Method | TRANSCRIPT_URL example |
|-------|------------|----------------------|
| Public domain | DNS A record | `https://gpu.example.com` |
| Internal domain | `/etc/hosts` on both machines | `https://gpu.internal` |
| IP only | No DNS needed | `http://192.168.1.200:8000` |
For internal domains, add the GPU machine's IP to `/etc/hosts` on the Reflector machine:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```
If the Reflector server runs in Docker, the containers resolve DNS from the host (Docker's default DNS behavior). So adding to the host's `/etc/hosts` is sufficient.
## Multi-CA Setup
When your Reflector instance has its own CA (for `reflector.local`) and the GPU host has a different CA:
**On the GPU machine:**
```bash
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-key
```
**On the Reflector machine:**
```bash
# Your local CA for reflector.local + the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--domain reflector.local \
--custom-ca certs/ \
--extra-ca /path/to/gpu-machine-ca.crt
```
The `--extra-ca` flag appends the GPU host's CA to the trust bundle. Backend containers trust both CAs — your local domain works AND outbound calls to the GPU host succeed.
You can repeat `--extra-ca` for multiple remote services:
```bash
--extra-ca /path/to/gpu-ca.crt --extra-ca /path/to/llm-ca.crt
```
## API Key Authentication
The GPU service uses Bearer token authentication via `REFLECTOR_GPU_APIKEY`:
```bash
# Test from the Reflector machine
curl -s https://gpu.example.com/docs # No auth needed for docs
curl -s -X POST https://gpu.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer <my-secret-key>" \ #gitleaks:allow
-F "file=@audio.wav"
```
If `REFLECTOR_GPU_APIKEY` is not set, the service accepts all requests (open access). Always use `--api-key` for internet-facing deployments.
The same key goes in Reflector's `server/.env` as `TRANSCRIPT_MODAL_API_KEY` and `DIARIZATION_MODAL_API_KEY`.
## Files
| File | Checked in? | Purpose |
|------|-------------|---------|
| `docker-compose.gpu-host.yml` | Yes | Static compose file with profiles (`gpu`, `cpu`, `caddy`) |
| `.env.gpu-host` | No (generated) | Environment variables (HF_TOKEN, API key, ports) |
| `Caddyfile.gpu-host` | No (generated) | Caddy config (only when using HTTPS) |
| `docker-compose.gpu-ca.yml` | No (generated) | CA cert mounts override (only with --custom-ca) |
| `certs/` | No (generated) | Staged certificates (when using --custom-ca) |
The compose file is checked into the repo — you can read it to understand exactly what runs. The script only generates env vars, Caddyfile, and CA overrides. Profiles control which service starts:
```bash
# What the script does under the hood:
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy \
--env-file .env.gpu-host up -d
# CPU mode:
docker compose -f docker-compose.gpu-host.yml --profile cpu --profile caddy \
--env-file .env.gpu-host up -d
```
Both `gpu` and `cpu` services get the network alias `transcription`, so Caddy's config works with either.
## Management
```bash
# View logs
docker compose -f docker-compose.gpu-host.yml --profile gpu logs -f gpu
# Restart
docker compose -f docker-compose.gpu-host.yml --profile gpu restart gpu
# Stop
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy down
# Re-run setup
./scripts/setup-gpu-host.sh [same flags]
# Rebuild after code changes
docker compose -f docker-compose.gpu-host.yml --profile gpu build gpu
docker compose -f docker-compose.gpu-host.yml --profile gpu up -d gpu
```
If you deployed with `--custom-ca`, include the CA override in manual commands:
```bash
docker compose -f docker-compose.gpu-host.yml -f docker-compose.gpu-ca.yml \
--profile gpu logs -f gpu
```
## Troubleshooting
### GPU service won't start
Check logs:
```bash
docker compose -f docker-compose.gpu-host.yml logs gpu
```
Common causes:
- NVIDIA driver not installed or `nvidia-container-toolkit` missing
- `HF_TOKEN` not set (diarization model download fails)
- Port already in use
### Reflector can't connect to GPU host
From the Reflector machine:
```bash
# Test HTTPS connectivity
curl -v https://gpu.example.com/docs
# If using custom CA, test with explicit CA
curl --cacert /path/to/gpu-ca.crt https://gpu.internal/docs
```
From inside the Reflector container:
```bash
docker compose exec server python -c "
import httpx
r = httpx.get('https://gpu.internal/docs')
print(r.status_code)
"
```
### SSL: CERTIFICATE_VERIFY_FAILED
The Reflector backend doesn't trust the GPU host's CA. Fix:
```bash
# Re-run Reflector setup with the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --extra-ca /path/to/gpu-ca.crt
```
### Diarization returns errors
- Accept pyannote model licenses on HuggingFace:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
- Verify `HF_TOKEN` is set in `.env.gpu-host`