22 KiB
Self-Hosted Production Deployment
Deploy Reflector on a single server with everything running in Docker. Transcription, diarization, and translation use specialized ML models (Whisper/Parakeet, Pyannote); only summarization and topic detection require an LLM.
For a detailed walkthrough of how the setup script and infrastructure work under the hood, see How the Self-Hosted Setup Works.
Prerequisites
Hardware
- With GPU: Linux server with NVIDIA GPU (8GB+ VRAM recommended), 16GB+ RAM, 50GB+ disk
- CPU-only: 8+ cores, 32GB+ RAM (transcription is slower but works)
- Disk space for ML models (~2GB on first run) + audio storage
Software
- Docker Engine 24+ with Compose V2
- NVIDIA drivers +
nvidia-container-toolkit(GPU modes only) curl,openssl(usually pre-installed)
Accounts & Credentials (depending on options)
Always recommended:
- HuggingFace token — For downloading pyannote speaker diarization models. Get one at https://huggingface.co/settings/tokens and accept the model licenses:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
- The setup script will prompt for this. If skipped, diarization falls back to a public model bundle (may be less reliable).
LLM for summarization & topic detection (pick one):
- With
--ollama-gpuor--ollama-cpu: Nothing extra — Ollama runs locally and pulls the model automatically - Without
--ollama-*: An OpenAI-compatible LLM API key and endpoint. Examples:- OpenAI:
LLM_URL=https://api.openai.com/v1,LLM_API_KEY=sk-...,LLM_MODEL=gpt-4o-mini - Anthropic, Together, Groq, or any OpenAI-compatible API
- A self-managed vLLM or Ollama instance elsewhere on the network
- OpenAI:
Object storage (pick one):
- With
--garage: Nothing extra — Garage (local S3-compatible storage) is auto-configured by the script - Without
--garage: S3-compatible storage credentials. The script will prompt for these, or you can pre-fillserver/.env. Options include:- AWS S3: Access Key ID, Secret Access Key, bucket name, region
- MinIO: Same credentials +
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://your-minio:9000 - Any S3-compatible provider (Backblaze B2, Cloudflare R2, DigitalOcean Spaces, etc.): same fields + custom endpoint URL
Optional add-ons (configure after initial setup):
- Authentik (user authentication): Requires an Authentik instance with an OAuth2/OIDC application configured for Reflector. See Enabling Authentication below.
Quick Start
git clone https://github.com/Monadical-SAS/reflector.git
cd reflector
# GPU + local Ollama LLM + local Garage storage + Caddy SSL (with domain):
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com
# Same but without a domain (self-signed cert, access via IP):
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
# CPU-only (same, but slower):
./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy
# With password authentication (single admin user):
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --password mysecretpass
# Build from source instead of pulling prebuilt images:
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --build
That's it. The script generates env files, secrets, starts all containers, waits for health checks, and prints the URL.
Specialized Models (Required)
Pick --gpu or --cpu. This determines how transcription, diarization, and translation run:
| Flag | What it does | Requires |
|---|---|---|
--gpu |
NVIDIA GPU acceleration for ML models | NVIDIA GPU + drivers + nvidia-container-toolkit |
--cpu |
CPU-only (slower but works without GPU) | 8+ cores, 32GB+ RAM recommended |
Local LLM (Optional)
Optionally add --ollama-gpu or --ollama-cpu for a local Ollama instance that handles summarization and topic detection. If omitted, configure an external OpenAI-compatible LLM in server/.env.
| Flag | What it does | Requires |
|---|---|---|
--ollama-gpu |
Local Ollama with NVIDIA GPU acceleration | NVIDIA GPU |
--ollama-cpu |
Local Ollama on CPU only | Nothing extra |
--llm-model MODEL |
Choose which Ollama model to download (default: qwen2.5:14b) |
--ollama-gpu or --ollama-cpu |
| (omitted) | User configures external LLM (OpenAI, Anthropic, etc.) | LLM API key |
Choosing an Ollama model
The default model is qwen2.5:14b (~9GB download, good multilingual support and summary quality). Override with --llm-model:
# Default (qwen2.5:14b)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
# Mistral — good balance of speed and quality (~4.1GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy
# Phi-4 — smaller and faster (~9.1GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model phi4 --garage --caddy
# Llama 3.3 70B — best quality, needs 48GB+ RAM or GPU VRAM (~43GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model llama3.3:70b --garage --caddy
# Gemma 2 9B (~5.4GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model gemma2 --garage --caddy
# DeepSeek R1 8B — reasoning model, verbose but thorough summaries (~4.9GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model deepseek-r1:8b --garage --caddy
Browse all available models at https://ollama.com/library.
Recommended combinations
--gpu --ollama-gpu: Best for servers with NVIDIA GPU. Fully self-contained, no external API keys needed.--cpu --ollama-cpu: No GPU available but want everything self-contained. Slower but works.--gpu --ollama-cpu: GPU for transcription, CPU for LLM. Saves GPU VRAM for ML models.--gpu: Have NVIDIA GPU but prefer a cloud LLM (faster/better summaries with GPT-4, Claude, etc.).--cpu: No GPU, prefer cloud LLM. Slowest transcription but best summary quality.
Other Optional Flags
| Flag | What it does |
|---|---|
--garage |
Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars. |
--caddy |
Starts Caddy reverse proxy on ports 80/443 with self-signed cert. |
--domain DOMAIN |
Use a real domain with Let's Encrypt auto-HTTPS (implies --caddy). Requires DNS A record pointing to this server and ports 80/443 open. |
--password PASS |
Enable password authentication with an admin@localhost user. Sets AUTH_BACKEND=password, PUBLIC_MODE=false. See Enabling Password Authentication. |
--build |
Build backend (server, worker, beat) and frontend (web) Docker images from source instead of pulling prebuilt images from the registry. Useful for development or when running a version with local changes. |
Without --garage, you must provide S3-compatible credentials (the script will prompt interactively or you can pre-fill server/.env).
Without --caddy or --domain, no ports are exposed. Point your own reverse proxy at web:3000 (frontend) and server:1250 (API).
Using a domain (recommended for production): Point a DNS A record at your server's IP, then pass --domain your.domain.com. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open.
Without a domain: --caddy alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
What the Script Does
- Prerequisites check — Docker, NVIDIA GPU (if needed), compose file exists
- Generate secrets —
SECRET_KEY,NEXTAUTH_SECRETviaopenssl rand - Generate
server/.env— From template, sets infrastructure defaults, configures LLM based on mode, enablesPUBLIC_MODE - Generate
www/.env— Auto-detects server IP, sets URLs - Storage setup — Either initializes Garage (bucket, keys, permissions) or prompts for external S3 credentials
- Caddyfile — Generates domain-specific (Let's Encrypt) or IP-specific (self-signed) configuration
- Build & start — Always builds GPU/CPU model image from source. With
--build, also builds backend and frontend from source; otherwise pulls prebuilt images from the registry - Health checks — Waits for each service, pulls Ollama model if needed, warns about missing LLM config
For a deeper dive into each step, see How the Self-Hosted Setup Works.
Configuration Reference
Server Environment (server/.env)
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection | Auto-set (Docker internal) |
REDIS_HOST |
Redis hostname | Auto-set (redis) |
SECRET_KEY |
App secret | Auto-generated |
AUTH_BACKEND |
Authentication method (none, password, jwt) |
none |
PUBLIC_MODE |
Allow unauthenticated access | true |
ADMIN_EMAIL |
Admin email for password auth | (unset) |
ADMIN_PASSWORD_HASH |
PBKDF2 hash for password auth | (unset) |
WEBRTC_HOST |
IP advertised in WebRTC ICE candidates | Auto-detected (server IP) |
TRANSCRIPT_URL |
Specialized model endpoint | http://transcription:8000 |
LLM_URL |
OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
LLM_API_KEY |
LLM API key | not-needed for Ollama |
LLM_MODEL |
LLM model name | qwen2.5:14b for Ollama (override with --llm-model) |
CELERY_BEAT_POLL_INTERVAL |
Override all worker polling intervals (seconds). 0 = use individual defaults |
300 (selfhosted), 0 (other) |
TRANSCRIPT_STORAGE_BACKEND |
Storage backend | aws |
TRANSCRIPT_STORAGE_AWS_* |
S3 credentials | Auto-set for Garage |
Frontend Environment (www/.env)
| Variable | Description | Default |
|---|---|---|
SITE_URL |
Public-facing URL | Auto-detected |
API_URL |
API URL (browser-side) | Same as SITE_URL |
SERVER_API_URL |
API URL (server-side) | http://server:1250 |
NEXTAUTH_SECRET |
Auth secret | Auto-generated |
FEATURE_REQUIRE_LOGIN |
Require authentication | false |
AUTH_PROVIDER |
Auth provider (authentik or credentials) |
(unset) |
Storage Options
Garage (Recommended for Self-Hosted)
Use --garage flag. The script automatically:
- Generates
data/garage.tomlwith a random RPC secret - Starts the Garage container
- Creates the
reflector-mediabucket - Creates an access key with read/write permissions
- Writes all S3 credentials to
server/.env
External S3 (AWS, MinIO, etc.)
Don't use --garage. The script will prompt for:
- Access Key ID
- Secret Access Key
- Bucket Name
- Region
- Endpoint URL (for non-AWS like MinIO)
Or pre-fill in server/.env:
TRANSCRIPT_STORAGE_BACKEND=aws
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=your-key
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret
TRANSCRIPT_STORAGE_AWS_BUCKET_NAME=reflector-media
TRANSCRIPT_STORAGE_AWS_REGION=us-east-1
# For non-AWS S3 (MinIO, etc.):
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000
What Authentication Enables
By default, Reflector runs in public mode (AUTH_BACKEND=none, PUBLIC_MODE=true) — anyone can create and view transcripts without logging in. Transcripts are anonymous (not linked to any user) and cannot be edited or deleted after creation.
Enabling authentication (either password or Authentik) unlocks:
| Feature | Public mode (no auth) | With authentication |
|---|---|---|
| Create transcripts (record/upload) | Yes (anonymous, unowned) | Yes (owned by user) |
| View transcripts | All transcripts visible | Own transcripts + shared rooms |
| Edit/delete transcripts | No | Yes (owner only) |
| Privacy controls (private/semi-private/public) | No (everything public) | Yes (owner can set share mode) |
| Speaker reassignment and merging | No | Yes (owner only) |
| Participant management (add/edit/delete) | Read-only | Full CRUD (owner only) |
| Create rooms | No | Yes |
| Edit/delete rooms | No | Yes (owner only) |
| Room calendar (ICS) sync | No | Yes (owner only) |
| API key management | No | Yes |
| Post to Zulip | No | Yes (owner only) |
| Real-time WebSocket notifications | No (connection closed) | Yes (transcript create/delete events) |
| Meeting host access (Daily.co token) | No | Yes (room owner) |
In short: public mode is "demo-friendly" — great for trying Reflector out. Authentication adds ownership, privacy, and management of your data.
Authentication Options
Reflector supports three authentication backends:
| Backend | AUTH_BACKEND |
Use case |
|---|---|---|
none |
none |
Public/demo mode, no login required |
password |
password |
Single-user self-hosted, simple email/password login |
jwt |
jwt |
Multi-user via Authentik (OAuth2/OIDC) |
Enabling Password Authentication
The simplest way to add authentication. Creates a single admin user with email/password login — no external identity provider needed.
Quick setup (recommended)
Pass --password to the setup script:
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --password mysecretpass
This automatically:
- Sets
AUTH_BACKEND=passwordandPUBLIC_MODE=falseinserver/.env - Creates an
admin@localhostuser with the given password - Sets
FEATURE_REQUIRE_LOGIN=trueandAUTH_PROVIDER=credentialsinwww/.env - Provisions the admin user in the database on container startup
Manual setup
If you prefer to configure manually or want to change the admin email:
-
Generate a password hash:
cd server uv run python -m reflector.tools.create_admin --hash-only --password yourpassword -
Update
server/.env:AUTH_BACKEND=password PUBLIC_MODE=false ADMIN_EMAIL=admin@yourdomain.com ADMIN_PASSWORD_HASH=pbkdf2:sha256:100000$<salt>$<hash> -
Update
www/.env:FEATURE_REQUIRE_LOGIN=true AUTH_PROVIDER=credentials -
Restart:
docker compose -f docker-compose.selfhosted.yml down ./scripts/setup-selfhosted.sh <same-flags>
How it works
- The backend issues HS256 JWTs (signed with
SECRET_KEY) on successful login viaPOST /v1/auth/login - Tokens expire after 24 hours; the user must log in again after expiry
- The frontend shows a login page at
/loginwith email and password fields - A rate limiter blocks IPs after 10 failed login attempts within 5 minutes
- The admin user is provisioned automatically on container startup from
ADMIN_EMAILandADMIN_PASSWORD_HASHenvironment variables - Passwords are hashed with PBKDF2-SHA256 (100,000 iterations) — no additional dependencies required
Changing the admin password
cd server
uv run python -m reflector.tools.create_admin --email admin@localhost --password newpassword
Or update ADMIN_PASSWORD_HASH in server/.env and restart the containers.
Enabling Authentication (Authentik)
For multi-user deployments with SSO. Requires an external Authentik instance.
By default, authentication is disabled (AUTH_BACKEND=none, FEATURE_REQUIRE_LOGIN=false). To enable:
- Deploy an Authentik instance (see Authentik docs)
- Create an OAuth2/OIDC application for Reflector
- Update
server/.env:AUTH_BACKEND=jwt AUTH_JWT_AUDIENCE=your-client-id - Update
www/.env:FEATURE_REQUIRE_LOGIN=true AUTH_PROVIDER=authentik AUTHENTIK_ISSUER=https://authentik.example.com/application/o/reflector AUTHENTIK_REFRESH_TOKEN_URL=https://authentik.example.com/application/o/token/ AUTHENTIK_CLIENT_ID=your-client-id AUTHENTIK_CLIENT_SECRET=your-client-secret - Restart:
docker compose -f docker-compose.selfhosted.yml down && ./scripts/setup-selfhosted.sh <same-flags>
Enabling Real Domain with Let's Encrypt
By default, Caddy uses self-signed certificates. For a real domain:
- Point your domain's DNS to your server's IP
- Ensure ports 80 and 443 are open
- Edit
Caddyfile:reflector.example.com { handle /v1/* { reverse_proxy server:1250 } handle /health { reverse_proxy server:1250 } handle { reverse_proxy web:3000 } } - Update
www/.env:SITE_URL=https://reflector.example.com NEXTAUTH_URL=https://reflector.example.com API_URL=https://reflector.example.com - Restart Caddy:
docker compose -f docker-compose.selfhosted.yml restart caddy web
Worker Polling Frequency
The selfhosted setup defaults all background worker polling intervals to 300 seconds (5 minutes) to reduce CPU and memory usage. This controls how often the beat scheduler triggers tasks like recording discovery, meeting reconciliation, and calendar sync.
To change the interval, edit server/.env:
# Poll every 60 seconds (more responsive, uses more resources)
CELERY_BEAT_POLL_INTERVAL=60
# Poll every 5 minutes (default for selfhosted)
CELERY_BEAT_POLL_INTERVAL=300
# Use individual per-task defaults (production SaaS behavior)
CELERY_BEAT_POLL_INTERVAL=0
After changing, restart the beat and worker containers:
docker compose -f docker-compose.selfhosted.yml restart beat worker
Affected tasks when CELERY_BEAT_POLL_INTERVAL is set:
| Task | Default (no override) | With override |
|---|---|---|
| SQS message polling | 60s | Override value |
| Daily.co recording discovery | 15s (no webhook) / 180s (webhook) | Override value |
| Meeting reconciliation | 30s | Override value |
| ICS calendar sync | 60s | Override value |
| Upcoming meeting creation | 30s | Override value |
Note: Daily crontab tasks (failed recording reprocessing at 05:00 UTC, public data cleanup at 03:00 UTC) and healthcheck pings (10 min) are not affected by this setting.
Troubleshooting
Check service status
docker compose -f docker-compose.selfhosted.yml ps
View logs for a specific service
docker compose -f docker-compose.selfhosted.yml logs server --tail 50
docker compose -f docker-compose.selfhosted.yml logs gpu --tail 50
docker compose -f docker-compose.selfhosted.yml logs web --tail 50
GPU service taking too long
First start downloads ~1-2GB of ML models. Check progress:
docker compose -f docker-compose.selfhosted.yml logs gpu -f
Server exits immediately
Usually a database migration issue. Check:
docker compose -f docker-compose.selfhosted.yml logs server --tail 50
Caddy certificate issues
For self-signed certs, your browser will warn. Click Advanced > Proceed. For Let's Encrypt, ensure ports 80/443 are open and DNS is pointed correctly.
Summaries/topics not generating
Check LLM configuration:
grep LLM_ server/.env
If you didn't use --ollama-gpu or --ollama-cpu, you must set LLM_URL, LLM_API_KEY, and LLM_MODEL.
Health check from inside containers
docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8000/docs
Updating
# Option A: Pull latest prebuilt images and restart
docker compose -f docker-compose.selfhosted.yml down
./scripts/setup-selfhosted.sh <same-flags-as-before>
# Option B: Build from source (after git pull) and restart
git pull
docker compose -f docker-compose.selfhosted.yml down
./scripts/setup-selfhosted.sh <same-flags-as-before> --build
# Rebuild only the GPU/CPU model image (picks up model updates)
docker compose -f docker-compose.selfhosted.yml build gpu # or cpu
The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
Architecture Overview
┌─────────┐
Internet ────────>│ Caddy │ :80/:443
└────┬────┘
│
┌────────────┼────────────┐
│ │ │
v v │
┌─────────┐ ┌─────────┐ │
│ web │ │ server │ │
│ :3000 │ │ :1250 │ │
└─────────┘ └────┬────┘ │
│ │
┌────┴────┐ │
│ worker │ │
│ beat │ │
└────┬────┘ │
│ │
┌──────────────┼────────────┤
│ │ │
v v v
┌───────────┐ ┌─────────┐ ┌─────────┐
│transcription│ │postgres │ │ redis │
│(gpu/cpu) │ │ :5432 │ │ :6379 │
│ :8000 │ └─────────┘ └─────────┘
└───────────┘
│
┌─────┴─────┐ ┌─────────┐
│ ollama │ │ garage │
│ (optional)│ │(optional│
│ :11435 │ │ S3) │
└───────────┘ └─────────┘
All services communicate over Docker's internal network. Only Caddy (if enabled) exposes ports to the internet.
Future Plans for the Self-Hosted Script
The following features are supported by Reflector but are not yet integrated into the self-hosted setup script and require manual configuration:
- Daily.co live rooms with multitrack processing: Daily.co enables real-time meeting rooms with automatic recording and per-participant audio tracks for improved diarization. Requires a Daily.co account, API key, and an AWS S3 bucket for recording storage. Currently not automated in the script because the worker orchestration (hatchet) is not yet supported in the selfhosted compose setup.