chore: create script for selfhosted reflector (#866)

* self hosted with self gpu * add optional ollama model * garage ports * exposes ports and changes curl * custom domain * try to fix wroker * build locallly * documentation * docs format * precommit
2026-05-06 11:15:18 +00:00 · 2026-02-19 15:11:45 -05:00
parent a8ad237d85
commit cdd974b935
11 changed files with 2313 additions and 1 deletions
--- a/docsv2/selfhosted-architecture.md
+++ b/docsv2/selfhosted-architecture.md
@@ -0,0 +1,468 @@
+# How the Self-Hosted Setup Works
+
+This document explains the internals of the self-hosted deployment: how the setup script orchestrates everything, how the Docker Compose profiles work, how services communicate, and how configuration flows from flags to running containers.
+
+> For quick-start instructions and flag reference, see [Self-Hosted Production Deployment](selfhosted-production.md).
+
+## Table of Contents
+
+- [Overview](#overview)
+- [The Setup Script Step by Step](#the-setup-script-step-by-step)
+- [Docker Compose Profile System](#docker-compose-profile-system)
+- [Service Architecture](#service-architecture)
+- [Configuration Flow](#configuration-flow)
+- [Storage Architecture](#storage-architecture)
+- [SSL/TLS and Reverse Proxy](#ssltls-and-reverse-proxy)
+- [Build vs Pull Workflow](#build-vs-pull-workflow)
+- [Background Task Processing](#background-task-processing)
+- [Network and Port Layout](#network-and-port-layout)
+
+---
+
+## Overview
+
+The self-hosted deployment runs the entire Reflector platform on a single server using Docker Compose. A single bash script (`scripts/setup-selfhosted.sh`) handles all configuration and orchestration. The key design principles are:
+
+- **One command to deploy** — flags select which features to enable
+- **Idempotent** — safe to re-run without losing existing configuration
+- **Profile-based composition** — Docker Compose profiles activate optional services
+- **No external dependencies required** — with `--garage` and `--ollama-*`, everything runs locally
+
+## The Setup Script Step by Step
+
+The script (`scripts/setup-selfhosted.sh`) runs 7 sequential steps. Here's what each one does and why.
+
+### Step 0: Prerequisites
+
+Validates the environment before doing anything:
+
+- **Docker Compose V2** — checks `docker compose version` output (not the legacy `docker-compose`)
+- **Docker daemon** — verifies `docker info` succeeds
+- **NVIDIA GPU** — only checked when `--gpu` or `--ollama-gpu` is used; runs `nvidia-smi` to confirm drivers are installed
+- **Compose file** — verifies `docker-compose.selfhosted.yml` exists at the expected path
+
+If any check fails, the script exits with a clear error message and remediation steps.
+
+### Step 1: Generate Secrets
+
+Creates cryptographic secrets needed by the backend and frontend:
+
+- **`SECRET_KEY`** — used by the FastAPI server for session signing (64 hex chars via `openssl rand -hex 32`)
+- **`NEXTAUTH_SECRET`** — used by Next.js NextAuth for JWT signing
+
+Secrets are only generated if they don't already exist or are still set to the placeholder value `changeme`. This is what makes the script idempotent for secrets.
+
+### Step 2: Generate `server/.env`
+
+Creates or updates the backend environment file from `server/.env.selfhosted.example`. Sets:
+
+- **Infrastructure** — PostgreSQL URL, Redis host, Celery broker (all pointing to Docker-internal hostnames)
+- **Public URLs** — `BASE_URL` and `CORS_ORIGIN` computed from the domain (if `--domain`), IP (if detected on Linux), or `localhost`
+- **WebRTC** — `WEBRTC_HOST` set to the server's LAN IP so browsers can reach UDP ICE candidates
+- **Specialized models** — always points to `http://transcription:8000` (the Docker network alias shared by GPU and CPU containers)
+- **HuggingFace token** — prompts interactively for pyannote model access; writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
+- **LLM** — if `--ollama-*` is used, configures `LLM_URL` pointing to the Ollama container. Otherwise, warns that the user needs to configure an external LLM
+- **Public mode** — sets `PUBLIC_MODE=true` so the app is accessible without authentication by default
+
+The script uses `env_set` for each variable, which either updates an existing line or appends a new one. This means re-running the script updates values in-place without duplicating keys.
+
+### Step 3: Generate `www/.env`
+
+Creates or updates the frontend environment file from `www/.env.selfhosted.example`. Sets:
+
+- **`SITE_URL` / `NEXTAUTH_URL` / `API_URL`** — all set to the same public-facing URL (with `https://` if Caddy is enabled)
+- **`WEBSOCKET_URL`** — set to `auto`, which tells the frontend to derive the WebSocket URL from the page URL automatically
+- **`SERVER_API_URL`** — always `http://server:1250` (Docker-internal, used for server-side rendering)
+- **`KV_URL`** — Redis URL for Next.js caching
+- **`FEATURE_REQUIRE_LOGIN`** — `false` by default (matches `PUBLIC_MODE=true` on the backend)
+
+### Step 4: Storage Setup
+
+Branches based on whether `--garage` was passed:
+
+**With `--garage` (local S3):**
+
+1. Generates `data/garage.toml` from a template, injecting a random RPC secret
+2. Starts only the Garage container (`docker compose --profile garage up -d garage`)
+3. Waits for the Garage admin API to respond on port 3903
+4. Assigns the node to a storage layout (1GB capacity, zone `dc1`)
+5. Creates the `reflector-media` bucket
+6. Creates an access key named `reflector` and grants it read/write on the bucket
+7. Writes all S3 credentials (`ENDPOINT_URL`, `BUCKET_NAME`, `REGION`, `ACCESS_KEY_ID`, `SECRET_ACCESS_KEY`) to `server/.env`
+
+The Garage endpoint is `http://garage:3900` (Docker-internal), and the region is set to `garage` (arbitrary, Garage ignores it). The boto3 client uses path-style addressing when an endpoint URL is configured, which is required for S3-compatible services like Garage.
+
+**Without `--garage` (external S3):**
+
+1. Checks `server/.env` for the four required S3 variables
+2. If any are missing, prompts interactively for each one
+3. Optionally prompts for an endpoint URL (for MinIO, Backblaze B2, etc.)
+
+### Step 5: Caddyfile
+
+Only runs when `--caddy` or `--domain` is used. Generates a Caddy configuration file:
+
+**With `--domain`:** Creates a named site block (`reflector.example.com { ... }`). Caddy automatically provisions a Let's Encrypt certificate for this domain. Requires DNS pointing to the server and ports 80/443 open.
+
+**Without `--domain` (IP access):** Creates a catch-all `:443 { tls internal ... }` block. Caddy generates a self-signed certificate. Browsers will show a security warning.
+
+Both configurations route:
+- `/v1/*` and `/health` to the backend (`server:1250`)
+- Everything else to the frontend (`web:3000`)
+
+### Step 6: Start Services
+
+1. **Always builds the GPU/CPU model image** — these are never prebuilt because they contain ML model download logic specific to the host's hardware
+2. **With `--build`:** Also builds backend (server, worker, beat) and frontend (web) images from source
+3. **Without `--build`:** Pulls prebuilt images from the Docker registry (`monadicalsas/reflector-backend:latest`, `monadicalsas/reflector-frontend:latest`)
+4. **Starts all services** — `docker compose up -d` with the active profiles
+5. **Quick sanity check** — after 3 seconds, checks for any containers that exited immediately
+
+### Step 7: Health Checks
+
+Waits for each service in order, with generous timeouts:
+
+| Service | Check | Timeout | Notes |
+|---------|-------|---------|-------|
+| GPU/CPU models | `curl http://localhost:8000/docs` | 10 min (120 x 5s) | First start downloads ~1GB of models |
+| Ollama | `curl http://localhost:11434/api/tags` | 3 min (60 x 3s) | Then pulls the selected model |
+| Server API | `curl http://localhost:1250/health` | 7.5 min (90 x 5s) | First start runs database migrations |
+| Frontend | `curl http://localhost:3000` | 1.5 min (30 x 3s) | Next.js build on first start |
+| Caddy | `curl -k https://localhost` | Quick check | After other services are up |
+
+If the server container exits during the health check, the script dumps diagnostics (container statuses + logs) before exiting.
+
+After the Ollama health check passes, the script checks if the selected model is already pulled. If not, it runs `ollama pull <model>` inside the container.
+
+---
+
+## Docker Compose Profile System
+
+The compose file (`docker-compose.selfhosted.yml`) uses Docker Compose profiles to make services optional. Only services whose profiles match the active `--profile` flags are started.
+
+### Always-on Services (no profile)
+
+These start regardless of which flags you pass:
+
+| Service | Role | Image |
+|---------|------|-------|
+| `server` | FastAPI backend, API endpoints, WebRTC | `monadicalsas/reflector-backend:latest` |
+| `worker` | Celery worker for background processing | Same image, `ENTRYPOINT=worker` |
+| `beat` | Celery beat scheduler for periodic tasks | Same image, `ENTRYPOINT=beat` |
+| `web` | Next.js frontend | `monadicalsas/reflector-frontend:latest` |
+| `redis` | Message broker + caching | `redis:7.2-alpine` |
+| `postgres` | Primary database | `postgres:17-alpine` |
+
+### Profile-Based Services
+
+| Profile | Service | Role |
+|---------|---------|------|
+| `gpu` | `gpu` | NVIDIA GPU-accelerated transcription/diarization/translation |
+| `cpu` | `cpu` | CPU-only transcription/diarization/translation |
+| `ollama-gpu` | `ollama` | Local Ollama LLM with GPU |
+| `ollama-cpu` | `ollama-cpu` | Local Ollama LLM on CPU |
+| `garage` | `garage` | Local S3-compatible object storage |
+| `caddy` | `caddy` | Reverse proxy with SSL |
+
+### The "transcription" Alias
+
+Both the `gpu` and `cpu` services define a Docker network alias of `transcription`. This means the backend always connects to `http://transcription:8000` regardless of which profile is active. The alias is defined in the compose file's `networks.default.aliases` section.
+
+---
+
+## Service Architecture
+
+```
+                    ┌─────────────┐
+  Internet ────────>│    Caddy     │ :80/:443   (profile: caddy)
+                    └──────┬──────┘
+                           │
+              ┌────────────┼────────────┐
+              │            │            │
+              v            v            │
+         ┌─────────┐  ┌─────────┐      │
+         │   web   │  │ server  │      │
+         │ :3000   │  │ :1250   │      │
+         └─────────┘  └────┬────┘      │
+                           │            │
+                      ┌────┴────┐       │
+                      │ worker  │       │
+                      │  beat   │       │
+                      └────┬────┘       │
+                           │            │
+            ┌──────────────┼────────────┤
+            │              │            │
+            v              v            v
+      ┌───────────┐  ┌─────────┐  ┌─────────┐
+      │transcription│ │postgres │  │  redis  │
+      │ (gpu/cpu) │  │ :5432   │  │ :6379   │
+      │ :8000     │  └─────────┘  └─────────┘
+      └───────────┘
+            │
+      ┌─────┴─────┐     ┌─────────┐
+      │  ollama   │     │ garage  │
+      │(optional) │     │(optional│
+      │ :11434    │     │  S3)    │
+      └───────────┘     └─────────┘
+```
+
+### How Services Interact
+
+1. **User request** hits Caddy (if enabled), which routes to `web` (pages) or `server` (API)
+2. **`web`** renders pages server-side using `SERVER_API_URL=http://server:1250` and client-side using the public `API_URL`
+3. **`server`** handles API requests, file uploads, WebRTC streaming. Dispatches background work to Celery via Redis
+4. **`worker`** picks up Celery tasks (transcription pipelines, audio processing). Calls `transcription:8000` for ML inference and uploads results to S3 storage
+5. **`beat`** schedules periodic tasks (cleanup, webhook retries) by pushing them onto the Celery queue
+6. **`transcription` (gpu/cpu)** runs Whisper/Parakeet (transcription), Pyannote (diarization), and translation models. Stateless HTTP API
+7. **`ollama`** provides an OpenAI-compatible API for summarization and topic detection. Called by the worker during post-processing
+8. **`garage`** provides S3-compatible storage for audio files and processed results. Accessed by the worker via boto3
+
+---
+
+## Configuration Flow
+
+Environment variables flow through multiple layers. Understanding this prevents confusion when debugging:
+
+```
+Flags (--gpu, --garage, etc.)
+  │
+  ├── setup-selfhosted.sh interprets flags
+  │     │
+  │     ├── Writes server/.env (backend config)
+  │     ├── Writes www/.env (frontend config)
+  │     ├── Writes .env (HF_TOKEN for compose interpolation)
+  │     └── Writes Caddyfile (proxy routes)
+  │
+  └── docker-compose.selfhosted.yml reads:
+        ├── env_file: ./server/.env   (loaded into server, worker, beat)
+        ├── env_file: ./www/.env      (loaded into web)
+        ├── .env                      (compose variable interpolation, e.g. ${HF_TOKEN})
+        └── environment: {...}        (hardcoded overrides, always win over env_file)
+```
+
+### Precedence Rules
+
+Docker Compose `environment:` keys **always override** `env_file:` values. This is by design — the compose file hardcodes infrastructure values that must be correct inside the Docker network (like `DATABASE_URL=postgresql+asyncpg://...@postgres:5432/...`) regardless of what's in `server/.env`.
+
+The `server/.env` file is still useful for:
+- Values not overridden in the compose file (LLM config, storage credentials, auth settings)
+- Running the server outside Docker during development
+
+### The Three `.env` Files
+
+| File | Used By | Contains |
+|------|---------|----------|
+| `server/.env` | server, worker, beat | Backend config: database, Redis, S3, LLM, auth, public URLs |
+| `www/.env` | web | Frontend config: site URL, auth, feature flags |
+| `.env` (root) | Docker Compose interpolation | Only `HF_TOKEN` — injected into GPU/CPU container env |
+
+---
+
+## Storage Architecture
+
+All audio files and processing results are stored in S3-compatible object storage. The backend uses boto3 (via aioboto3) with automatic path-style addressing when a custom endpoint URL is configured.
+
+### How Garage Works
+
+Garage is a lightweight, self-hosted S3-compatible storage engine. In this deployment:
+
+- Runs as a single-node cluster with 1GB capacity allocation
+- Listens on port 3900 (S3 API) and 3903 (admin API)
+- Data persists in Docker volumes (`garage_data`, `garage_meta`)
+- Accessed by the worker at `http://garage:3900` (Docker-internal)
+
+The setup script creates:
+- A bucket called `reflector-media`
+- An access key called `reflector` with read/write permissions on that bucket
+
+### Path-Style vs Virtual-Hosted Addressing
+
+AWS S3 uses virtual-hosted addressing by default (`bucket.s3.amazonaws.com`). S3-compatible services like Garage require path-style addressing (`endpoint/bucket`). The `AwsStorage` class detects this automatically: when `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL` is set, it configures boto3 with `addressing_style: "path"`.
+
+---
+
+## SSL/TLS and Reverse Proxy
+
+### With `--domain` (Production)
+
+Caddy automatically obtains and renews a Let's Encrypt certificate. Requirements:
+- DNS A record pointing to the server
+- Ports 80 (HTTP challenge) and 443 (HTTPS) open to the internet
+
+The generated Caddyfile uses the domain as the site address, which triggers Caddy's automatic HTTPS.
+
+### Without `--domain` (Development/LAN)
+
+Caddy generates a self-signed certificate and listens on `:443` as a catch-all. Browsers will show a security warning that must be accepted manually.
+
+### Without `--caddy` (BYO Proxy)
+
+No ports are exposed to the internet. The services listen on `127.0.0.1` only:
+- Frontend: `localhost:3000`
+- Backend API: `localhost:1250`
+
+You can point your own reverse proxy (nginx, Traefik, etc.) at these ports.
+
+### WebRTC and UDP
+
+The server exposes UDP ports 50000-50100 for WebRTC ICE candidates. The `WEBRTC_HOST` variable tells the server which IP to advertise in ICE candidates — this must be the server's actual IP address (not a domain), because WebRTC uses UDP which doesn't go through the HTTP reverse proxy.
+
+---
+
+## Build vs Pull Workflow
+
+### Default (no `--build` flag)
+
+```
+GPU/CPU model image: Always built from source (./gpu/self_hosted/)
+Backend image:       Pulled from monadicalsas/reflector-backend:latest
+Frontend image:      Pulled from monadicalsas/reflector-frontend:latest
+```
+
+The GPU/CPU image is always built because it contains hardware-specific build steps and ML model download logic.
+
+### With `--build`
+
+```
+GPU/CPU model image: Built from source (./gpu/self_hosted/)
+Backend image:       Built from source (./server/)
+Frontend image:      Built from source (./www/)
+```
+
+Use `--build` when:
+- You've made local code changes
+- The prebuilt registry images are outdated
+- You want to verify the build works on your hardware
+
+### Rebuilding Individual Services
+
+```bash
+# Rebuild just the backend
+docker compose -f docker-compose.selfhosted.yml build server worker beat
+
+# Rebuild just the frontend
+docker compose -f docker-compose.selfhosted.yml build web
+
+# Rebuild the GPU model container
+docker compose -f docker-compose.selfhosted.yml build gpu
+
+# Force a clean rebuild (no cache)
+docker compose -f docker-compose.selfhosted.yml build --no-cache server
+```
+
+---
+
+## Background Task Processing
+
+### Celery Architecture
+
+The backend uses Celery for all background work, with Redis as the message broker:
+
+- **`worker`** — picks up tasks from the Redis queue and executes them
+- **`beat`** — schedules periodic tasks (cron-like) by pushing them onto the queue
+- **`Redis`** — acts as both message broker and result backend
+
+### The Audio Processing Pipeline
+
+When a file is uploaded, the worker runs a multi-step pipeline:
+
+```
+Upload → Extract Audio → Upload to S3
+                           │
+                    ┌──────┼──────┐
+                    │      │      │
+                    v      v      v
+              Transcribe  Diarize  Waveform
+                    │      │      │
+                    └──────┼──────┘
+                           │
+                       Assemble
+                           │
+                    ┌──────┼──────┐
+                    v      v      v
+                Topics  Title  Summaries
+                           │
+                         Done
+```
+
+Transcription, diarization, and waveform generation run in parallel. After assembly, topic detection, title generation, and summarization also run in parallel. Each step calls the appropriate service (transcription container for ML, Ollama/external LLM for text generation, S3 for storage).
+
+### Event Loop Management
+
+Each Celery task runs in its own `asyncio.run()` call, which creates a fresh event loop. The `asynctask` decorator in `server/reflector/asynctask.py` handles:
+
+1. **Database connections** — resets the connection pool before each task (connections from a previous event loop would cause "Future attached to a different loop" errors)
+2. **Redis connections** — resets the WebSocket manager singleton so Redis pub/sub reconnects on the current loop
+3. **Cleanup** — disconnects the database and clears the context variable in the `finally` block
+
+---
+
+## Network and Port Layout
+
+All services communicate over Docker's default bridge network. Only specific ports are exposed to the host:
+
+| Port | Service | Binding | Purpose |
+|------|---------|---------|---------|
+| 80 | Caddy | `0.0.0.0:80` | HTTP (redirect to HTTPS / Let's Encrypt challenge) |
+| 443 | Caddy | `0.0.0.0:443` | HTTPS (main entry point) |
+| 1250 | Server | `127.0.0.1:1250` | Backend API (localhost only) |
+| 3000 | Web | `127.0.0.1:3000` | Frontend (localhost only) |
+| 3900 | Garage | `0.0.0.0:3900` | S3 API (for admin/debug access) |
+| 3903 | Garage | `0.0.0.0:3903` | Garage admin API |
+| 8000 | GPU/CPU | `127.0.0.1:8000` | ML model API (localhost only) |
+| 11434 | Ollama | `127.0.0.1:11434` | Ollama API (localhost only) |
+| 50000-50100/udp | Server | `0.0.0.0:50000-50100` | WebRTC ICE candidates |
+
+Services bound to `127.0.0.1` are only accessible from the host itself (not from the network). Caddy is the only service exposed to the internet on standard HTTP/HTTPS ports.
+
+### Docker-Internal Hostnames
+
+Inside the Docker network, services reach each other by their compose service name:
+
+| Hostname | Resolves To |
+|----------|-------------|
+| `server` | Backend API container |
+| `web` | Frontend container |
+| `postgres` | PostgreSQL container |
+| `redis` | Redis container |
+| `transcription` | GPU or CPU container (network alias) |
+| `ollama` / `ollama-cpu` | Ollama container |
+| `garage` | Garage S3 container |
+
+---
+
+## Diagnostics and Error Handling
+
+The setup script includes an `ERR` trap that automatically dumps diagnostics when any command fails:
+
+1. Lists all container statuses
+2. Shows the last 30 lines of logs for any stopped/exited containers
+3. Shows the last 40 lines of the specific failing service
+
+This means if something goes wrong during setup, you'll see the relevant logs immediately without having to run manual debug commands.
+
+### Common Debug Commands
+
+```bash
+# Overall status
+docker compose -f docker-compose.selfhosted.yml ps
+
+# Logs for a specific service
+docker compose -f docker-compose.selfhosted.yml logs server --tail 50
+docker compose -f docker-compose.selfhosted.yml logs worker --tail 50
+
+# Check environment inside a container
+docker compose -f docker-compose.selfhosted.yml exec server env | grep TRANSCRIPT
+
+# Health check from inside the network
+docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
+
+# Check S3 storage connectivity
+docker compose -f docker-compose.selfhosted.yml exec server curl http://garage:3900
+
+# Database access
+docker compose -f docker-compose.selfhosted.yml exec postgres psql -U reflector -c "SELECT id, status FROM transcript ORDER BY created_at DESC LIMIT 5;"
+
+# List files in server data directory
+docker compose -f docker-compose.selfhosted.yml exec server ls -la /app/data/
+```
--- a/docsv2/selfhosted-production.md
+++ b/docsv2/selfhosted-production.md
@@ -0,0 +1,373 @@
+# Self-Hosted Production Deployment
+
+Deploy Reflector on a single server with everything running in Docker. Transcription, diarization, and translation use specialized ML models (Whisper/Parakeet, Pyannote); only summarization and topic detection require an LLM.
+
+> For a detailed walkthrough of how the setup script and infrastructure work under the hood, see [How the Self-Hosted Setup Works](selfhosted-architecture.md).
+
+## Prerequisites
+
+### Hardware
+- **With GPU**: Linux server with NVIDIA GPU (8GB+ VRAM recommended), 16GB+ RAM, 50GB+ disk
+- **CPU-only**: 8+ cores, 32GB+ RAM (transcription is slower but works)
+- Disk space for ML models (~2GB on first run) + audio storage
+
+### Software
+- Docker Engine 24+ with Compose V2
+- NVIDIA drivers + `nvidia-container-toolkit` (GPU modes only)
+- `curl`, `openssl` (usually pre-installed)
+
+### Accounts & Credentials (depending on options)
+
+**Always recommended:**
+- **HuggingFace token** — For downloading pyannote speaker diarization models. Get one at https://huggingface.co/settings/tokens and accept the model licenses:
+  - https://huggingface.co/pyannote/speaker-diarization-3.1
+  - https://huggingface.co/pyannote/segmentation-3.0
+  - The setup script will prompt for this. If skipped, diarization falls back to a public model bundle (may be less reliable).
+
+**LLM for summarization & topic detection (pick one):**
+- **With `--ollama-gpu` or `--ollama-cpu`**: Nothing extra — Ollama runs locally and pulls the model automatically
+- **Without `--ollama-*`**: An OpenAI-compatible LLM API key and endpoint. Examples:
+  - OpenAI: `LLM_URL=https://api.openai.com/v1`, `LLM_API_KEY=sk-...`, `LLM_MODEL=gpt-4o-mini`
+  - Anthropic, Together, Groq, or any OpenAI-compatible API
+  - A self-managed vLLM or Ollama instance elsewhere on the network
+
+**Object storage (pick one):**
+- **With `--garage`**: Nothing extra — Garage (local S3-compatible storage) is auto-configured by the script
+- **Without `--garage`**: S3-compatible storage credentials. The script will prompt for these, or you can pre-fill `server/.env`. Options include:
+  - **AWS S3**: Access Key ID, Secret Access Key, bucket name, region
+  - **MinIO**: Same credentials + `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://your-minio:9000`
+  - **Any S3-compatible provider** (Backblaze B2, Cloudflare R2, DigitalOcean Spaces, etc.): same fields + custom endpoint URL
+
+**Optional add-ons (configure after initial setup):**
+- **Daily.co** (live meeting rooms): Requires a Daily.co account (https://www.daily.co/), API key, subdomain, and an AWS S3 bucket + IAM Role for recording storage. See [Enabling Daily.co Live Rooms](#enabling-dailyco-live-rooms) below.
+- **Authentik** (user authentication): Requires an Authentik instance with an OAuth2/OIDC application configured for Reflector. See [Enabling Authentication](#enabling-authentication-authentik) below.
+
+## Quick Start
+
+```bash
+git clone https://github.com/Monadical-SAS/reflector.git
+cd reflector
+
+# GPU + local Ollama LLM + local Garage storage + Caddy SSL (with domain):
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com
+
+# Same but without a domain (self-signed cert, access via IP):
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
+
+# CPU-only (same, but slower):
+./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy
+
+# Build from source instead of pulling prebuilt images:
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --build
+```
+
+That's it. The script generates env files, secrets, starts all containers, waits for health checks, and prints the URL.
+
+## Specialized Models (Required)
+
+Pick `--gpu` or `--cpu`. This determines how **transcription, diarization, and translation** run:
+
+| Flag | What it does | Requires |
+|------|-------------|----------|
+| `--gpu` | NVIDIA GPU acceleration for ML models | NVIDIA GPU + drivers + `nvidia-container-toolkit` |
+| `--cpu` | CPU-only (slower but works without GPU) | 8+ cores, 32GB+ RAM recommended |
+
+## Local LLM (Optional)
+
+Optionally add `--ollama-gpu` or `--ollama-cpu` for a **local Ollama instance** that handles summarization and topic detection. If omitted, configure an external OpenAI-compatible LLM in `server/.env`.
+
+| Flag | What it does | Requires |
+|------|-------------|----------|
+| `--ollama-gpu` | Local Ollama with NVIDIA GPU acceleration | NVIDIA GPU |
+| `--ollama-cpu` | Local Ollama on CPU only | Nothing extra |
+| `--llm-model MODEL` | Choose which Ollama model to download (default: `qwen2.5:14b`) | `--ollama-gpu` or `--ollama-cpu` |
+| *(omitted)* | User configures external LLM (OpenAI, Anthropic, etc.) | LLM API key |
+
+### Choosing an Ollama model
+
+The default model is `qwen2.5:14b` (~9GB download, good multilingual support and summary quality). Override with `--llm-model`:
+
+```bash
+# Default (qwen2.5:14b)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
+
+# Mistral — good balance of speed and quality (~4.1GB)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy
+
+# Phi-4 — smaller and faster (~9.1GB)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model phi4 --garage --caddy
+
+# Llama 3.3 70B — best quality, needs 48GB+ RAM or GPU VRAM (~43GB)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model llama3.3:70b --garage --caddy
+
+# Gemma 2 9B (~5.4GB)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model gemma2 --garage --caddy
+
+# DeepSeek R1 8B — reasoning model, verbose but thorough summaries (~4.9GB)
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model deepseek-r1:8b --garage --caddy
+```
+
+Browse all available models at https://ollama.com/library.
+
+### Recommended combinations
+
+- **`--gpu --ollama-gpu`**: Best for servers with NVIDIA GPU. Fully self-contained, no external API keys needed.
+- **`--cpu --ollama-cpu`**: No GPU available but want everything self-contained. Slower but works.
+- **`--gpu --ollama-cpu`**: GPU for transcription, CPU for LLM. Saves GPU VRAM for ML models.
+- **`--gpu`**: Have NVIDIA GPU but prefer a cloud LLM (faster/better summaries with GPT-4, Claude, etc.).
+- **`--cpu`**: No GPU, prefer cloud LLM. Slowest transcription but best summary quality.
+
+## Other Optional Flags
+
+| Flag | What it does |
+|------|-------------|
+| `--garage` | Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars. |
+| `--caddy` | Starts Caddy reverse proxy on ports 80/443 with self-signed cert. |
+| `--domain DOMAIN` | Use a real domain with Let's Encrypt auto-HTTPS (implies `--caddy`). Requires DNS A record pointing to this server and ports 80/443 open. |
+| `--build` | Build backend (server, worker, beat) and frontend (web) Docker images from source instead of pulling prebuilt images from the registry. Useful for development or when running a version with local changes. |
+
+Without `--garage`, you **must** provide S3-compatible credentials (the script will prompt interactively or you can pre-fill `server/.env`).
+
+Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse proxy at `web:3000` (frontend) and `server:1250` (API).
+
+**Using a domain (recommended for production):** Point a DNS A record at your server's IP, then pass `--domain your.domain.com`. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open.
+
+**Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
+
+## What the Script Does
+
+1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
+2. **Generate secrets** — `SECRET_KEY`, `NEXTAUTH_SECRET` via `openssl rand`
+3. **Generate `server/.env`** — From template, sets infrastructure defaults, configures LLM based on mode, enables `PUBLIC_MODE`
+4. **Generate `www/.env`** — Auto-detects server IP, sets URLs
+5. **Storage setup** — Either initializes Garage (bucket, keys, permissions) or prompts for external S3 credentials
+6. **Caddyfile** — Generates domain-specific (Let's Encrypt) or IP-specific (self-signed) configuration
+7. **Build & start** — Always builds GPU/CPU model image from source. With `--build`, also builds backend and frontend from source; otherwise pulls prebuilt images from the registry
+8. **Health checks** — Waits for each service, pulls Ollama model if needed, warns about missing LLM config
+
+> For a deeper dive into each step, see [How the Self-Hosted Setup Works](selfhosted-architecture.md).
+
+## Configuration Reference
+
+### Server Environment (`server/.env`)
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DATABASE_URL` | PostgreSQL connection | Auto-set (Docker internal) |
+| `REDIS_HOST` | Redis hostname | Auto-set (`redis`) |
+| `SECRET_KEY` | App secret | Auto-generated |
+| `AUTH_BACKEND` | Authentication method | `none` |
+| `PUBLIC_MODE` | Allow unauthenticated access | `true` |
+| `WEBRTC_HOST` | IP advertised in WebRTC ICE candidates | Auto-detected (server IP) |
+| `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
+| `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
+| `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
+| `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
+| `TRANSCRIPT_STORAGE_BACKEND` | Storage backend | `aws` |
+| `TRANSCRIPT_STORAGE_AWS_*` | S3 credentials | Auto-set for Garage |
+
+### Frontend Environment (`www/.env`)
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `SITE_URL` | Public-facing URL | Auto-detected |
+| `API_URL` | API URL (browser-side) | Same as SITE_URL |
+| `SERVER_API_URL` | API URL (server-side) | `http://server:1250` |
+| `NEXTAUTH_SECRET` | Auth secret | Auto-generated |
+| `FEATURE_REQUIRE_LOGIN` | Require authentication | `false` |
+
+## Storage Options
+
+### Garage (Recommended for Self-Hosted)
+
+Use `--garage` flag. The script automatically:
+- Generates `data/garage.toml` with a random RPC secret
+- Starts the Garage container
+- Creates the `reflector-media` bucket
+- Creates an access key with read/write permissions
+- Writes all S3 credentials to `server/.env`
+
+### External S3 (AWS, MinIO, etc.)
+
+Don't use `--garage`. The script will prompt for:
+- Access Key ID
+- Secret Access Key
+- Bucket Name
+- Region
+- Endpoint URL (for non-AWS like MinIO)
+
+Or pre-fill in `server/.env`:
+```env
+TRANSCRIPT_STORAGE_BACKEND=aws
+TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=your-key
+TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret
+TRANSCRIPT_STORAGE_AWS_BUCKET_NAME=reflector-media
+TRANSCRIPT_STORAGE_AWS_REGION=us-east-1
+# For non-AWS S3 (MinIO, etc.):
+TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000
+```
+
+## Enabling Authentication (Authentik)
+
+By default, authentication is disabled (`AUTH_BACKEND=none`, `FEATURE_REQUIRE_LOGIN=false`). To enable:
+
+1. Deploy an Authentik instance (see [Authentik docs](https://goauthentik.io/docs/installation))
+2. Create an OAuth2/OIDC application for Reflector
+3. Update `server/.env`:
+   ```env
+   AUTH_BACKEND=jwt
+   AUTH_JWT_AUDIENCE=your-client-id
+   ```
+4. Update `www/.env`:
+   ```env
+   FEATURE_REQUIRE_LOGIN=true
+   AUTHENTIK_ISSUER=https://authentik.example.com/application/o/reflector
+   AUTHENTIK_REFRESH_TOKEN_URL=https://authentik.example.com/application/o/token/
+   AUTHENTIK_CLIENT_ID=your-client-id
+   AUTHENTIK_CLIENT_SECRET=your-client-secret
+   ```
+5. Restart: `docker compose -f docker-compose.selfhosted.yml down && ./scripts/setup-selfhosted.sh <same-flags>`
+
+## Enabling Daily.co Live Rooms
+
+Daily.co enables real-time meeting rooms with automatic recording and transcription.
+
+1. Create a [Daily.co](https://www.daily.co/) account
+2. Add to `server/.env`:
+   ```env
+   DEFAULT_VIDEO_PLATFORM=daily
+   DAILY_API_KEY=your-daily-api-key
+   DAILY_SUBDOMAIN=your-subdomain
+   DAILY_WEBHOOK_SECRET=your-webhook-secret
+   DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco
+   DAILYCO_STORAGE_AWS_REGION=us-east-1
+   DAILYCO_STORAGE_AWS_ROLE_ARN=arn:aws:iam::role/DailyCoAccess
+   ```
+3. Restart the server: `docker compose -f docker-compose.selfhosted.yml restart server worker`
+
+## Enabling Real Domain with Let's Encrypt
+
+By default, Caddy uses self-signed certificates. For a real domain:
+
+1. Point your domain's DNS to your server's IP
+2. Ensure ports 80 and 443 are open
+3. Edit `Caddyfile`:
+   ```
+   reflector.example.com {
+       handle /v1/* {
+           reverse_proxy server:1250
+       }
+       handle /health {
+           reverse_proxy server:1250
+       }
+       handle {
+           reverse_proxy web:3000
+       }
+   }
+   ```
+4. Update `www/.env`:
+   ```env
+   SITE_URL=https://reflector.example.com
+   NEXTAUTH_URL=https://reflector.example.com
+   API_URL=https://reflector.example.com
+   ```
+5. Restart Caddy: `docker compose -f docker-compose.selfhosted.yml restart caddy web`
+
+## Troubleshooting
+
+### Check service status
+```bash
+docker compose -f docker-compose.selfhosted.yml ps
+```
+
+### View logs for a specific service
+```bash
+docker compose -f docker-compose.selfhosted.yml logs server --tail 50
+docker compose -f docker-compose.selfhosted.yml logs gpu --tail 50
+docker compose -f docker-compose.selfhosted.yml logs web --tail 50
+```
+
+### GPU service taking too long
+First start downloads ~1-2GB of ML models. Check progress:
+```bash
+docker compose -f docker-compose.selfhosted.yml logs gpu -f
+```
+
+### Server exits immediately
+Usually a database migration issue. Check:
+```bash
+docker compose -f docker-compose.selfhosted.yml logs server --tail 50
+```
+
+### Caddy certificate issues
+For self-signed certs, your browser will warn. Click Advanced > Proceed.
+For Let's Encrypt, ensure ports 80/443 are open and DNS is pointed correctly.
+
+### Summaries/topics not generating
+Check LLM configuration:
+```bash
+grep LLM_ server/.env
+```
+If you didn't use `--ollama-gpu` or `--ollama-cpu`, you must set `LLM_URL`, `LLM_API_KEY`, and `LLM_MODEL`.
+
+### Health check from inside containers
+```bash
+docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
+docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8000/docs
+```
+
+## Updating
+
+```bash
+# Option A: Pull latest prebuilt images and restart
+docker compose -f docker-compose.selfhosted.yml down
+./scripts/setup-selfhosted.sh <same-flags-as-before>
+
+# Option B: Build from source (after git pull) and restart
+git pull
+docker compose -f docker-compose.selfhosted.yml down
+./scripts/setup-selfhosted.sh <same-flags-as-before> --build
+
+# Rebuild only the GPU/CPU model image (picks up model updates)
+docker compose -f docker-compose.selfhosted.yml build gpu  # or cpu
+```
+
+The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
+
+## Architecture Overview
+
+```
+                    ┌─────────┐
+  Internet ────────>│  Caddy  │ :80/:443
+                    └────┬────┘
+                         │
+            ┌────────────┼────────────┐
+            │            │            │
+            v            v            │
+       ┌─────────┐  ┌─────────┐      │
+       │   web   │  │ server  │      │
+       │ :3000   │  │ :1250   │      │
+       └─────────┘  └────┬────┘      │
+                         │            │
+                    ┌────┴────┐       │
+                    │ worker  │       │
+                    │  beat   │       │
+                    └────┬────┘       │
+                         │            │
+          ┌──────────────┼────────────┤
+          │              │            │
+          v              v            v
+    ┌───────────┐  ┌─────────┐  ┌─────────┐
+    │transcription│  │postgres │  │  redis  │
+    │(gpu/cpu)  │  │ :5432   │  │ :6379   │
+    │ :8000     │  └─────────┘  └─────────┘
+    └───────────┘
+          │
+    ┌─────┴─────┐     ┌─────────┐
+    │  ollama   │     │ garage  │
+    │ (optional)│     │(optional│
+    │ :11434    │     │ S3)     │
+    └───────────┘     └─────────┘
+```
+
+All services communicate over Docker's internal network. Only Caddy (if enabled) exposes ports to the internet.