Files
reflector/docsv2/selfhosted-production.md
Juan Diego García cdd974b935 chore: create script for selfhosted reflector (#866)
* self hosted with self gpu

* add optional ollama model

* garage ports

* exposes ports and changes curl

* custom domain

* try to fix wroker

* build locallly

* documentation

* docs format

* precommit
2026-02-19 15:11:45 -05:00

16 KiB

Self-Hosted Production Deployment

Deploy Reflector on a single server with everything running in Docker. Transcription, diarization, and translation use specialized ML models (Whisper/Parakeet, Pyannote); only summarization and topic detection require an LLM.

For a detailed walkthrough of how the setup script and infrastructure work under the hood, see How the Self-Hosted Setup Works.

Prerequisites

Hardware

  • With GPU: Linux server with NVIDIA GPU (8GB+ VRAM recommended), 16GB+ RAM, 50GB+ disk
  • CPU-only: 8+ cores, 32GB+ RAM (transcription is slower but works)
  • Disk space for ML models (~2GB on first run) + audio storage

Software

  • Docker Engine 24+ with Compose V2
  • NVIDIA drivers + nvidia-container-toolkit (GPU modes only)
  • curl, openssl (usually pre-installed)

Accounts & Credentials (depending on options)

Always recommended:

LLM for summarization & topic detection (pick one):

  • With --ollama-gpu or --ollama-cpu: Nothing extra — Ollama runs locally and pulls the model automatically
  • Without --ollama-*: An OpenAI-compatible LLM API key and endpoint. Examples:
    • OpenAI: LLM_URL=https://api.openai.com/v1, LLM_API_KEY=sk-..., LLM_MODEL=gpt-4o-mini
    • Anthropic, Together, Groq, or any OpenAI-compatible API
    • A self-managed vLLM or Ollama instance elsewhere on the network

Object storage (pick one):

  • With --garage: Nothing extra — Garage (local S3-compatible storage) is auto-configured by the script
  • Without --garage: S3-compatible storage credentials. The script will prompt for these, or you can pre-fill server/.env. Options include:
    • AWS S3: Access Key ID, Secret Access Key, bucket name, region
    • MinIO: Same credentials + TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://your-minio:9000
    • Any S3-compatible provider (Backblaze B2, Cloudflare R2, DigitalOcean Spaces, etc.): same fields + custom endpoint URL

Optional add-ons (configure after initial setup):

Quick Start

git clone https://github.com/Monadical-SAS/reflector.git
cd reflector

# GPU + local Ollama LLM + local Garage storage + Caddy SSL (with domain):
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com

# Same but without a domain (self-signed cert, access via IP):
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy

# CPU-only (same, but slower):
./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy

# Build from source instead of pulling prebuilt images:
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --build

That's it. The script generates env files, secrets, starts all containers, waits for health checks, and prints the URL.

Specialized Models (Required)

Pick --gpu or --cpu. This determines how transcription, diarization, and translation run:

Flag What it does Requires
--gpu NVIDIA GPU acceleration for ML models NVIDIA GPU + drivers + nvidia-container-toolkit
--cpu CPU-only (slower but works without GPU) 8+ cores, 32GB+ RAM recommended

Local LLM (Optional)

Optionally add --ollama-gpu or --ollama-cpu for a local Ollama instance that handles summarization and topic detection. If omitted, configure an external OpenAI-compatible LLM in server/.env.

Flag What it does Requires
--ollama-gpu Local Ollama with NVIDIA GPU acceleration NVIDIA GPU
--ollama-cpu Local Ollama on CPU only Nothing extra
--llm-model MODEL Choose which Ollama model to download (default: qwen2.5:14b) --ollama-gpu or --ollama-cpu
(omitted) User configures external LLM (OpenAI, Anthropic, etc.) LLM API key

Choosing an Ollama model

The default model is qwen2.5:14b (~9GB download, good multilingual support and summary quality). Override with --llm-model:

# Default (qwen2.5:14b)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy

# Mistral — good balance of speed and quality (~4.1GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy

# Phi-4 — smaller and faster (~9.1GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model phi4 --garage --caddy

# Llama 3.3 70B — best quality, needs 48GB+ RAM or GPU VRAM (~43GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model llama3.3:70b --garage --caddy

# Gemma 2 9B (~5.4GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model gemma2 --garage --caddy

# DeepSeek R1 8B — reasoning model, verbose but thorough summaries (~4.9GB)
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model deepseek-r1:8b --garage --caddy

Browse all available models at https://ollama.com/library.

  • --gpu --ollama-gpu: Best for servers with NVIDIA GPU. Fully self-contained, no external API keys needed.
  • --cpu --ollama-cpu: No GPU available but want everything self-contained. Slower but works.
  • --gpu --ollama-cpu: GPU for transcription, CPU for LLM. Saves GPU VRAM for ML models.
  • --gpu: Have NVIDIA GPU but prefer a cloud LLM (faster/better summaries with GPT-4, Claude, etc.).
  • --cpu: No GPU, prefer cloud LLM. Slowest transcription but best summary quality.

Other Optional Flags

Flag What it does
--garage Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars.
--caddy Starts Caddy reverse proxy on ports 80/443 with self-signed cert.
--domain DOMAIN Use a real domain with Let's Encrypt auto-HTTPS (implies --caddy). Requires DNS A record pointing to this server and ports 80/443 open.
--build Build backend (server, worker, beat) and frontend (web) Docker images from source instead of pulling prebuilt images from the registry. Useful for development or when running a version with local changes.

Without --garage, you must provide S3-compatible credentials (the script will prompt interactively or you can pre-fill server/.env).

Without --caddy or --domain, no ports are exposed. Point your own reverse proxy at web:3000 (frontend) and server:1250 (API).

Using a domain (recommended for production): Point a DNS A record at your server's IP, then pass --domain your.domain.com. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open.

Without a domain: --caddy alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.

What the Script Does

  1. Prerequisites check — Docker, NVIDIA GPU (if needed), compose file exists
  2. Generate secretsSECRET_KEY, NEXTAUTH_SECRET via openssl rand
  3. Generate server/.env — From template, sets infrastructure defaults, configures LLM based on mode, enables PUBLIC_MODE
  4. Generate www/.env — Auto-detects server IP, sets URLs
  5. Storage setup — Either initializes Garage (bucket, keys, permissions) or prompts for external S3 credentials
  6. Caddyfile — Generates domain-specific (Let's Encrypt) or IP-specific (self-signed) configuration
  7. Build & start — Always builds GPU/CPU model image from source. With --build, also builds backend and frontend from source; otherwise pulls prebuilt images from the registry
  8. Health checks — Waits for each service, pulls Ollama model if needed, warns about missing LLM config

For a deeper dive into each step, see How the Self-Hosted Setup Works.

Configuration Reference

Server Environment (server/.env)

Variable Description Default
DATABASE_URL PostgreSQL connection Auto-set (Docker internal)
REDIS_HOST Redis hostname Auto-set (redis)
SECRET_KEY App secret Auto-generated
AUTH_BACKEND Authentication method none
PUBLIC_MODE Allow unauthenticated access true
WEBRTC_HOST IP advertised in WebRTC ICE candidates Auto-detected (server IP)
TRANSCRIPT_URL Specialized model endpoint http://transcription:8000
LLM_URL OpenAI-compatible LLM endpoint Auto-set for Ollama modes
LLM_API_KEY LLM API key not-needed for Ollama
LLM_MODEL LLM model name qwen2.5:14b for Ollama (override with --llm-model)
TRANSCRIPT_STORAGE_BACKEND Storage backend aws
TRANSCRIPT_STORAGE_AWS_* S3 credentials Auto-set for Garage

Frontend Environment (www/.env)

Variable Description Default
SITE_URL Public-facing URL Auto-detected
API_URL API URL (browser-side) Same as SITE_URL
SERVER_API_URL API URL (server-side) http://server:1250
NEXTAUTH_SECRET Auth secret Auto-generated
FEATURE_REQUIRE_LOGIN Require authentication false

Storage Options

Use --garage flag. The script automatically:

  • Generates data/garage.toml with a random RPC secret
  • Starts the Garage container
  • Creates the reflector-media bucket
  • Creates an access key with read/write permissions
  • Writes all S3 credentials to server/.env

External S3 (AWS, MinIO, etc.)

Don't use --garage. The script will prompt for:

  • Access Key ID
  • Secret Access Key
  • Bucket Name
  • Region
  • Endpoint URL (for non-AWS like MinIO)

Or pre-fill in server/.env:

TRANSCRIPT_STORAGE_BACKEND=aws
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=your-key
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret
TRANSCRIPT_STORAGE_AWS_BUCKET_NAME=reflector-media
TRANSCRIPT_STORAGE_AWS_REGION=us-east-1
# For non-AWS S3 (MinIO, etc.):
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000

Enabling Authentication (Authentik)

By default, authentication is disabled (AUTH_BACKEND=none, FEATURE_REQUIRE_LOGIN=false). To enable:

  1. Deploy an Authentik instance (see Authentik docs)
  2. Create an OAuth2/OIDC application for Reflector
  3. Update server/.env:
    AUTH_BACKEND=jwt
    AUTH_JWT_AUDIENCE=your-client-id
    
  4. Update www/.env:
    FEATURE_REQUIRE_LOGIN=true
    AUTHENTIK_ISSUER=https://authentik.example.com/application/o/reflector
    AUTHENTIK_REFRESH_TOKEN_URL=https://authentik.example.com/application/o/token/
    AUTHENTIK_CLIENT_ID=your-client-id
    AUTHENTIK_CLIENT_SECRET=your-client-secret
    
  5. Restart: docker compose -f docker-compose.selfhosted.yml down && ./scripts/setup-selfhosted.sh <same-flags>

Enabling Daily.co Live Rooms

Daily.co enables real-time meeting rooms with automatic recording and transcription.

  1. Create a Daily.co account
  2. Add to server/.env:
    DEFAULT_VIDEO_PLATFORM=daily
    DAILY_API_KEY=your-daily-api-key
    DAILY_SUBDOMAIN=your-subdomain
    DAILY_WEBHOOK_SECRET=your-webhook-secret
    DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco
    DAILYCO_STORAGE_AWS_REGION=us-east-1
    DAILYCO_STORAGE_AWS_ROLE_ARN=arn:aws:iam::role/DailyCoAccess
    
  3. Restart the server: docker compose -f docker-compose.selfhosted.yml restart server worker

Enabling Real Domain with Let's Encrypt

By default, Caddy uses self-signed certificates. For a real domain:

  1. Point your domain's DNS to your server's IP
  2. Ensure ports 80 and 443 are open
  3. Edit Caddyfile:
    reflector.example.com {
        handle /v1/* {
            reverse_proxy server:1250
        }
        handle /health {
            reverse_proxy server:1250
        }
        handle {
            reverse_proxy web:3000
        }
    }
    
  4. Update www/.env:
    SITE_URL=https://reflector.example.com
    NEXTAUTH_URL=https://reflector.example.com
    API_URL=https://reflector.example.com
    
  5. Restart Caddy: docker compose -f docker-compose.selfhosted.yml restart caddy web

Troubleshooting

Check service status

docker compose -f docker-compose.selfhosted.yml ps

View logs for a specific service

docker compose -f docker-compose.selfhosted.yml logs server --tail 50
docker compose -f docker-compose.selfhosted.yml logs gpu --tail 50
docker compose -f docker-compose.selfhosted.yml logs web --tail 50

GPU service taking too long

First start downloads ~1-2GB of ML models. Check progress:

docker compose -f docker-compose.selfhosted.yml logs gpu -f

Server exits immediately

Usually a database migration issue. Check:

docker compose -f docker-compose.selfhosted.yml logs server --tail 50

Caddy certificate issues

For self-signed certs, your browser will warn. Click Advanced > Proceed. For Let's Encrypt, ensure ports 80/443 are open and DNS is pointed correctly.

Summaries/topics not generating

Check LLM configuration:

grep LLM_ server/.env

If you didn't use --ollama-gpu or --ollama-cpu, you must set LLM_URL, LLM_API_KEY, and LLM_MODEL.

Health check from inside containers

docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8000/docs

Updating

# Option A: Pull latest prebuilt images and restart
docker compose -f docker-compose.selfhosted.yml down
./scripts/setup-selfhosted.sh <same-flags-as-before>

# Option B: Build from source (after git pull) and restart
git pull
docker compose -f docker-compose.selfhosted.yml down
./scripts/setup-selfhosted.sh <same-flags-as-before> --build

# Rebuild only the GPU/CPU model image (picks up model updates)
docker compose -f docker-compose.selfhosted.yml build gpu  # or cpu

The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.

Architecture Overview

                    ┌─────────┐
  Internet ────────>│  Caddy  │ :80/:443
                    └────┬────┘
                         │
            ┌────────────┼────────────┐
            │            │            │
            v            v            │
       ┌─────────┐  ┌─────────┐      │
       │   web   │  │ server  │      │
       │ :3000   │  │ :1250   │      │
       └─────────┘  └────┬────┘      │
                         │            │
                    ┌────┴────┐       │
                    │ worker  │       │
                    │  beat   │       │
                    └────┬────┘       │
                         │            │
          ┌──────────────┼────────────┤
          │              │            │
          v              v            v
    ┌───────────┐  ┌─────────┐  ┌─────────┐
    │transcription│  │postgres │  │  redis  │
    │(gpu/cpu)  │  │ :5432   │  │ :6379   │
    │ :8000     │  └─────────┘  └─────────┘
    └───────────┘
          │
    ┌─────┴─────┐     ┌─────────┐
    │  ollama   │     │ garage  │
    │ (optional)│     │(optional│
    │ :11434    │     │ S3)     │
    └───────────┘     └─────────┘

All services communicate over Docker's internal network. Only Caddy (if enabled) exposes ports to the internet.