# Self-Hosted Production Deployment Deploy Reflector on a single server with everything running in Docker. Transcription, diarization, and translation use specialized ML models (Whisper/Parakeet, Pyannote); only summarization and topic detection require an LLM. > For a detailed walkthrough of how the setup script and infrastructure work under the hood, see [How the Self-Hosted Setup Works](selfhosted-architecture.md). ## Prerequisites ### Hardware - **With GPU**: Linux server with NVIDIA GPU (8GB+ VRAM recommended), 16GB+ RAM, 50GB+ disk - **CPU-only**: 8+ cores, 32GB+ RAM (transcription is slower but works) - Disk space for ML models (~2GB on first run) + audio storage ### Software - Docker Engine 24+ with Compose V2 - NVIDIA drivers + `nvidia-container-toolkit` (GPU modes only) - `curl`, `openssl` (usually pre-installed) ### Accounts & Credentials (depending on options) **Always recommended:** - **HuggingFace token** — For downloading pyannote speaker diarization models. Get one at https://huggingface.co/settings/tokens and accept the model licenses: - https://huggingface.co/pyannote/speaker-diarization-3.1 - https://huggingface.co/pyannote/segmentation-3.0 - The setup script will prompt for this. If skipped, diarization falls back to a public model bundle (may be less reliable). **LLM for summarization & topic detection (pick one):** - **With `--ollama-gpu` or `--ollama-cpu`**: Nothing extra — Ollama runs locally and pulls the model automatically - **Without `--ollama-*`**: An OpenAI-compatible LLM API key and endpoint. Examples: - OpenAI: `LLM_URL=https://api.openai.com/v1`, `LLM_API_KEY=sk-...`, `LLM_MODEL=gpt-4o-mini` - Anthropic, Together, Groq, or any OpenAI-compatible API - A self-managed vLLM or Ollama instance elsewhere on the network **Object storage (pick one):** - **With `--garage`**: Nothing extra — Garage (local S3-compatible storage) is auto-configured by the script - **Without `--garage`**: S3-compatible storage credentials. The script will prompt for these, or you can pre-fill `server/.env`. Options include: - **AWS S3**: Access Key ID, Secret Access Key, bucket name, region - **MinIO**: Same credentials + `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://your-minio:9000` - **Any S3-compatible provider** (Backblaze B2, Cloudflare R2, DigitalOcean Spaces, etc.): same fields + custom endpoint URL **Optional add-ons (configure after initial setup):** - **Daily.co** (live meeting rooms): Requires a Daily.co account (https://www.daily.co/), API key, subdomain, and an AWS S3 bucket + IAM Role for recording storage. See [Enabling Daily.co Live Rooms](#enabling-dailyco-live-rooms) below. - **Authentik** (user authentication): Requires an Authentik instance with an OAuth2/OIDC application configured for Reflector. See [Enabling Authentication](#enabling-authentication-authentik) below. ## Quick Start ```bash git clone https://github.com/Monadical-SAS/reflector.git cd reflector # GPU + local Ollama LLM + local Garage storage + Caddy SSL (with domain): ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com # Same but without a domain (self-signed cert, access via IP): ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy # CPU-only (same, but slower): ./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy # Build from source instead of pulling prebuilt images: ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --build ``` That's it. The script generates env files, secrets, starts all containers, waits for health checks, and prints the URL. ## Specialized Models (Required) Pick `--gpu` or `--cpu`. This determines how **transcription, diarization, and translation** run: | Flag | What it does | Requires | |------|-------------|----------| | `--gpu` | NVIDIA GPU acceleration for ML models | NVIDIA GPU + drivers + `nvidia-container-toolkit` | | `--cpu` | CPU-only (slower but works without GPU) | 8+ cores, 32GB+ RAM recommended | ## Local LLM (Optional) Optionally add `--ollama-gpu` or `--ollama-cpu` for a **local Ollama instance** that handles summarization and topic detection. If omitted, configure an external OpenAI-compatible LLM in `server/.env`. | Flag | What it does | Requires | |------|-------------|----------| | `--ollama-gpu` | Local Ollama with NVIDIA GPU acceleration | NVIDIA GPU | | `--ollama-cpu` | Local Ollama on CPU only | Nothing extra | | `--llm-model MODEL` | Choose which Ollama model to download (default: `qwen2.5:14b`) | `--ollama-gpu` or `--ollama-cpu` | | *(omitted)* | User configures external LLM (OpenAI, Anthropic, etc.) | LLM API key | ### Choosing an Ollama model The default model is `qwen2.5:14b` (~9GB download, good multilingual support and summary quality). Override with `--llm-model`: ```bash # Default (qwen2.5:14b) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy # Mistral — good balance of speed and quality (~4.1GB) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy # Phi-4 — smaller and faster (~9.1GB) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model phi4 --garage --caddy # Llama 3.3 70B — best quality, needs 48GB+ RAM or GPU VRAM (~43GB) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model llama3.3:70b --garage --caddy # Gemma 2 9B (~5.4GB) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model gemma2 --garage --caddy # DeepSeek R1 8B — reasoning model, verbose but thorough summaries (~4.9GB) ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model deepseek-r1:8b --garage --caddy ``` Browse all available models at https://ollama.com/library. ### Recommended combinations - **`--gpu --ollama-gpu`**: Best for servers with NVIDIA GPU. Fully self-contained, no external API keys needed. - **`--cpu --ollama-cpu`**: No GPU available but want everything self-contained. Slower but works. - **`--gpu --ollama-cpu`**: GPU for transcription, CPU for LLM. Saves GPU VRAM for ML models. - **`--gpu`**: Have NVIDIA GPU but prefer a cloud LLM (faster/better summaries with GPT-4, Claude, etc.). - **`--cpu`**: No GPU, prefer cloud LLM. Slowest transcription but best summary quality. ## Other Optional Flags | Flag | What it does | |------|-------------| | `--garage` | Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars. | | `--caddy` | Starts Caddy reverse proxy on ports 80/443 with self-signed cert. | | `--domain DOMAIN` | Use a real domain with Let's Encrypt auto-HTTPS (implies `--caddy`). Requires DNS A record pointing to this server and ports 80/443 open. | | `--build` | Build backend (server, worker, beat) and frontend (web) Docker images from source instead of pulling prebuilt images from the registry. Useful for development or when running a version with local changes. | Without `--garage`, you **must** provide S3-compatible credentials (the script will prompt interactively or you can pre-fill `server/.env`). Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse proxy at `web:3000` (frontend) and `server:1250` (API). **Using a domain (recommended for production):** Point a DNS A record at your server's IP, then pass `--domain your.domain.com`. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open. **Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted. ## What the Script Does 1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists 2. **Generate secrets** — `SECRET_KEY`, `NEXTAUTH_SECRET` via `openssl rand` 3. **Generate `server/.env`** — From template, sets infrastructure defaults, configures LLM based on mode, enables `PUBLIC_MODE` 4. **Generate `www/.env`** — Auto-detects server IP, sets URLs 5. **Storage setup** — Either initializes Garage (bucket, keys, permissions) or prompts for external S3 credentials 6. **Caddyfile** — Generates domain-specific (Let's Encrypt) or IP-specific (self-signed) configuration 7. **Build & start** — Always builds GPU/CPU model image from source. With `--build`, also builds backend and frontend from source; otherwise pulls prebuilt images from the registry 8. **Health checks** — Waits for each service, pulls Ollama model if needed, warns about missing LLM config > For a deeper dive into each step, see [How the Self-Hosted Setup Works](selfhosted-architecture.md). ## Configuration Reference ### Server Environment (`server/.env`) | Variable | Description | Default | |----------|-------------|---------| | `DATABASE_URL` | PostgreSQL connection | Auto-set (Docker internal) | | `REDIS_HOST` | Redis hostname | Auto-set (`redis`) | | `SECRET_KEY` | App secret | Auto-generated | | `AUTH_BACKEND` | Authentication method | `none` | | `PUBLIC_MODE` | Allow unauthenticated access | `true` | | `WEBRTC_HOST` | IP advertised in WebRTC ICE candidates | Auto-detected (server IP) | | `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` | | `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes | | `LLM_API_KEY` | LLM API key | `not-needed` for Ollama | | `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) | | `TRANSCRIPT_STORAGE_BACKEND` | Storage backend | `aws` | | `TRANSCRIPT_STORAGE_AWS_*` | S3 credentials | Auto-set for Garage | ### Frontend Environment (`www/.env`) | Variable | Description | Default | |----------|-------------|---------| | `SITE_URL` | Public-facing URL | Auto-detected | | `API_URL` | API URL (browser-side) | Same as SITE_URL | | `SERVER_API_URL` | API URL (server-side) | `http://server:1250` | | `NEXTAUTH_SECRET` | Auth secret | Auto-generated | | `FEATURE_REQUIRE_LOGIN` | Require authentication | `false` | ## Storage Options ### Garage (Recommended for Self-Hosted) Use `--garage` flag. The script automatically: - Generates `data/garage.toml` with a random RPC secret - Starts the Garage container - Creates the `reflector-media` bucket - Creates an access key with read/write permissions - Writes all S3 credentials to `server/.env` ### External S3 (AWS, MinIO, etc.) Don't use `--garage`. The script will prompt for: - Access Key ID - Secret Access Key - Bucket Name - Region - Endpoint URL (for non-AWS like MinIO) Or pre-fill in `server/.env`: ```env TRANSCRIPT_STORAGE_BACKEND=aws TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=your-key TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret TRANSCRIPT_STORAGE_AWS_BUCKET_NAME=reflector-media TRANSCRIPT_STORAGE_AWS_REGION=us-east-1 # For non-AWS S3 (MinIO, etc.): TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000 ``` ## Enabling Authentication (Authentik) By default, authentication is disabled (`AUTH_BACKEND=none`, `FEATURE_REQUIRE_LOGIN=false`). To enable: 1. Deploy an Authentik instance (see [Authentik docs](https://goauthentik.io/docs/installation)) 2. Create an OAuth2/OIDC application for Reflector 3. Update `server/.env`: ```env AUTH_BACKEND=jwt AUTH_JWT_AUDIENCE=your-client-id ``` 4. Update `www/.env`: ```env FEATURE_REQUIRE_LOGIN=true AUTHENTIK_ISSUER=https://authentik.example.com/application/o/reflector AUTHENTIK_REFRESH_TOKEN_URL=https://authentik.example.com/application/o/token/ AUTHENTIK_CLIENT_ID=your-client-id AUTHENTIK_CLIENT_SECRET=your-client-secret ``` 5. Restart: `docker compose -f docker-compose.selfhosted.yml down && ./scripts/setup-selfhosted.sh ` ## Enabling Daily.co Live Rooms Daily.co enables real-time meeting rooms with automatic recording and transcription. 1. Create a [Daily.co](https://www.daily.co/) account 2. Add to `server/.env`: ```env DEFAULT_VIDEO_PLATFORM=daily DAILY_API_KEY=your-daily-api-key DAILY_SUBDOMAIN=your-subdomain DAILY_WEBHOOK_SECRET=your-webhook-secret DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco DAILYCO_STORAGE_AWS_REGION=us-east-1 DAILYCO_STORAGE_AWS_ROLE_ARN=arn:aws:iam::role/DailyCoAccess ``` 3. Restart the server: `docker compose -f docker-compose.selfhosted.yml restart server worker` ## Enabling Real Domain with Let's Encrypt By default, Caddy uses self-signed certificates. For a real domain: 1. Point your domain's DNS to your server's IP 2. Ensure ports 80 and 443 are open 3. Edit `Caddyfile`: ``` reflector.example.com { handle /v1/* { reverse_proxy server:1250 } handle /health { reverse_proxy server:1250 } handle { reverse_proxy web:3000 } } ``` 4. Update `www/.env`: ```env SITE_URL=https://reflector.example.com NEXTAUTH_URL=https://reflector.example.com API_URL=https://reflector.example.com ``` 5. Restart Caddy: `docker compose -f docker-compose.selfhosted.yml restart caddy web` ## Troubleshooting ### Check service status ```bash docker compose -f docker-compose.selfhosted.yml ps ``` ### View logs for a specific service ```bash docker compose -f docker-compose.selfhosted.yml logs server --tail 50 docker compose -f docker-compose.selfhosted.yml logs gpu --tail 50 docker compose -f docker-compose.selfhosted.yml logs web --tail 50 ``` ### GPU service taking too long First start downloads ~1-2GB of ML models. Check progress: ```bash docker compose -f docker-compose.selfhosted.yml logs gpu -f ``` ### Server exits immediately Usually a database migration issue. Check: ```bash docker compose -f docker-compose.selfhosted.yml logs server --tail 50 ``` ### Caddy certificate issues For self-signed certs, your browser will warn. Click Advanced > Proceed. For Let's Encrypt, ensure ports 80/443 are open and DNS is pointed correctly. ### Summaries/topics not generating Check LLM configuration: ```bash grep LLM_ server/.env ``` If you didn't use `--ollama-gpu` or `--ollama-cpu`, you must set `LLM_URL`, `LLM_API_KEY`, and `LLM_MODEL`. ### Health check from inside containers ```bash docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8000/docs ``` ## Updating ```bash # Option A: Pull latest prebuilt images and restart docker compose -f docker-compose.selfhosted.yml down ./scripts/setup-selfhosted.sh # Option B: Build from source (after git pull) and restart git pull docker compose -f docker-compose.selfhosted.yml down ./scripts/setup-selfhosted.sh --build # Rebuild only the GPU/CPU model image (picks up model updates) docker compose -f docker-compose.selfhosted.yml build gpu # or cpu ``` The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set. ## Architecture Overview ``` ┌─────────┐ Internet ────────>│ Caddy │ :80/:443 └────┬────┘ │ ┌────────────┼────────────┐ │ │ │ v v │ ┌─────────┐ ┌─────────┐ │ │ web │ │ server │ │ │ :3000 │ │ :1250 │ │ └─────────┘ └────┬────┘ │ │ │ ┌────┴────┐ │ │ worker │ │ │ beat │ │ └────┬────┘ │ │ │ ┌──────────────┼────────────┤ │ │ │ v v v ┌───────────┐ ┌─────────┐ ┌─────────┐ │transcription│ │postgres │ │ redis │ │(gpu/cpu) │ │ :5432 │ │ :6379 │ │ :8000 │ └─────────┘ └─────────┘ └───────────┘ │ ┌─────┴─────┐ ┌─────────┐ │ ollama │ │ garage │ │ (optional)│ │(optional│ │ :11434 │ │ S3) │ └───────────┘ └─────────┘ ``` All services communicate over Docker's internal network. Only Caddy (if enabled) exposes ports to the internet.