mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-03-21 22:56:47 +00:00
chore: create script for selfhosted reflector (#866)
* self hosted with self gpu * add optional ollama model * garage ports * exposes ports and changes curl * custom domain * try to fix wroker * build locallly * documentation * docs format * precommit
This commit is contained in:
committed by
GitHub
parent
a8ad237d85
commit
cdd974b935
468
docsv2/selfhosted-architecture.md
Normal file
468
docsv2/selfhosted-architecture.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# How the Self-Hosted Setup Works
|
||||
|
||||
This document explains the internals of the self-hosted deployment: how the setup script orchestrates everything, how the Docker Compose profiles work, how services communicate, and how configuration flows from flags to running containers.
|
||||
|
||||
> For quick-start instructions and flag reference, see [Self-Hosted Production Deployment](selfhosted-production.md).
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [The Setup Script Step by Step](#the-setup-script-step-by-step)
|
||||
- [Docker Compose Profile System](#docker-compose-profile-system)
|
||||
- [Service Architecture](#service-architecture)
|
||||
- [Configuration Flow](#configuration-flow)
|
||||
- [Storage Architecture](#storage-architecture)
|
||||
- [SSL/TLS and Reverse Proxy](#ssltls-and-reverse-proxy)
|
||||
- [Build vs Pull Workflow](#build-vs-pull-workflow)
|
||||
- [Background Task Processing](#background-task-processing)
|
||||
- [Network and Port Layout](#network-and-port-layout)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The self-hosted deployment runs the entire Reflector platform on a single server using Docker Compose. A single bash script (`scripts/setup-selfhosted.sh`) handles all configuration and orchestration. The key design principles are:
|
||||
|
||||
- **One command to deploy** — flags select which features to enable
|
||||
- **Idempotent** — safe to re-run without losing existing configuration
|
||||
- **Profile-based composition** — Docker Compose profiles activate optional services
|
||||
- **No external dependencies required** — with `--garage` and `--ollama-*`, everything runs locally
|
||||
|
||||
## The Setup Script Step by Step
|
||||
|
||||
The script (`scripts/setup-selfhosted.sh`) runs 7 sequential steps. Here's what each one does and why.
|
||||
|
||||
### Step 0: Prerequisites
|
||||
|
||||
Validates the environment before doing anything:
|
||||
|
||||
- **Docker Compose V2** — checks `docker compose version` output (not the legacy `docker-compose`)
|
||||
- **Docker daemon** — verifies `docker info` succeeds
|
||||
- **NVIDIA GPU** — only checked when `--gpu` or `--ollama-gpu` is used; runs `nvidia-smi` to confirm drivers are installed
|
||||
- **Compose file** — verifies `docker-compose.selfhosted.yml` exists at the expected path
|
||||
|
||||
If any check fails, the script exits with a clear error message and remediation steps.
|
||||
|
||||
### Step 1: Generate Secrets
|
||||
|
||||
Creates cryptographic secrets needed by the backend and frontend:
|
||||
|
||||
- **`SECRET_KEY`** — used by the FastAPI server for session signing (64 hex chars via `openssl rand -hex 32`)
|
||||
- **`NEXTAUTH_SECRET`** — used by Next.js NextAuth for JWT signing
|
||||
|
||||
Secrets are only generated if they don't already exist or are still set to the placeholder value `changeme`. This is what makes the script idempotent for secrets.
|
||||
|
||||
### Step 2: Generate `server/.env`
|
||||
|
||||
Creates or updates the backend environment file from `server/.env.selfhosted.example`. Sets:
|
||||
|
||||
- **Infrastructure** — PostgreSQL URL, Redis host, Celery broker (all pointing to Docker-internal hostnames)
|
||||
- **Public URLs** — `BASE_URL` and `CORS_ORIGIN` computed from the domain (if `--domain`), IP (if detected on Linux), or `localhost`
|
||||
- **WebRTC** — `WEBRTC_HOST` set to the server's LAN IP so browsers can reach UDP ICE candidates
|
||||
- **Specialized models** — always points to `http://transcription:8000` (the Docker network alias shared by GPU and CPU containers)
|
||||
- **HuggingFace token** — prompts interactively for pyannote model access; writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
|
||||
- **LLM** — if `--ollama-*` is used, configures `LLM_URL` pointing to the Ollama container. Otherwise, warns that the user needs to configure an external LLM
|
||||
- **Public mode** — sets `PUBLIC_MODE=true` so the app is accessible without authentication by default
|
||||
|
||||
The script uses `env_set` for each variable, which either updates an existing line or appends a new one. This means re-running the script updates values in-place without duplicating keys.
|
||||
|
||||
### Step 3: Generate `www/.env`
|
||||
|
||||
Creates or updates the frontend environment file from `www/.env.selfhosted.example`. Sets:
|
||||
|
||||
- **`SITE_URL` / `NEXTAUTH_URL` / `API_URL`** — all set to the same public-facing URL (with `https://` if Caddy is enabled)
|
||||
- **`WEBSOCKET_URL`** — set to `auto`, which tells the frontend to derive the WebSocket URL from the page URL automatically
|
||||
- **`SERVER_API_URL`** — always `http://server:1250` (Docker-internal, used for server-side rendering)
|
||||
- **`KV_URL`** — Redis URL for Next.js caching
|
||||
- **`FEATURE_REQUIRE_LOGIN`** — `false` by default (matches `PUBLIC_MODE=true` on the backend)
|
||||
|
||||
### Step 4: Storage Setup
|
||||
|
||||
Branches based on whether `--garage` was passed:
|
||||
|
||||
**With `--garage` (local S3):**
|
||||
|
||||
1. Generates `data/garage.toml` from a template, injecting a random RPC secret
|
||||
2. Starts only the Garage container (`docker compose --profile garage up -d garage`)
|
||||
3. Waits for the Garage admin API to respond on port 3903
|
||||
4. Assigns the node to a storage layout (1GB capacity, zone `dc1`)
|
||||
5. Creates the `reflector-media` bucket
|
||||
6. Creates an access key named `reflector` and grants it read/write on the bucket
|
||||
7. Writes all S3 credentials (`ENDPOINT_URL`, `BUCKET_NAME`, `REGION`, `ACCESS_KEY_ID`, `SECRET_ACCESS_KEY`) to `server/.env`
|
||||
|
||||
The Garage endpoint is `http://garage:3900` (Docker-internal), and the region is set to `garage` (arbitrary, Garage ignores it). The boto3 client uses path-style addressing when an endpoint URL is configured, which is required for S3-compatible services like Garage.
|
||||
|
||||
**Without `--garage` (external S3):**
|
||||
|
||||
1. Checks `server/.env` for the four required S3 variables
|
||||
2. If any are missing, prompts interactively for each one
|
||||
3. Optionally prompts for an endpoint URL (for MinIO, Backblaze B2, etc.)
|
||||
|
||||
### Step 5: Caddyfile
|
||||
|
||||
Only runs when `--caddy` or `--domain` is used. Generates a Caddy configuration file:
|
||||
|
||||
**With `--domain`:** Creates a named site block (`reflector.example.com { ... }`). Caddy automatically provisions a Let's Encrypt certificate for this domain. Requires DNS pointing to the server and ports 80/443 open.
|
||||
|
||||
**Without `--domain` (IP access):** Creates a catch-all `:443 { tls internal ... }` block. Caddy generates a self-signed certificate. Browsers will show a security warning.
|
||||
|
||||
Both configurations route:
|
||||
- `/v1/*` and `/health` to the backend (`server:1250`)
|
||||
- Everything else to the frontend (`web:3000`)
|
||||
|
||||
### Step 6: Start Services
|
||||
|
||||
1. **Always builds the GPU/CPU model image** — these are never prebuilt because they contain ML model download logic specific to the host's hardware
|
||||
2. **With `--build`:** Also builds backend (server, worker, beat) and frontend (web) images from source
|
||||
3. **Without `--build`:** Pulls prebuilt images from the Docker registry (`monadicalsas/reflector-backend:latest`, `monadicalsas/reflector-frontend:latest`)
|
||||
4. **Starts all services** — `docker compose up -d` with the active profiles
|
||||
5. **Quick sanity check** — after 3 seconds, checks for any containers that exited immediately
|
||||
|
||||
### Step 7: Health Checks
|
||||
|
||||
Waits for each service in order, with generous timeouts:
|
||||
|
||||
| Service | Check | Timeout | Notes |
|
||||
|---------|-------|---------|-------|
|
||||
| GPU/CPU models | `curl http://localhost:8000/docs` | 10 min (120 x 5s) | First start downloads ~1GB of models |
|
||||
| Ollama | `curl http://localhost:11434/api/tags` | 3 min (60 x 3s) | Then pulls the selected model |
|
||||
| Server API | `curl http://localhost:1250/health` | 7.5 min (90 x 5s) | First start runs database migrations |
|
||||
| Frontend | `curl http://localhost:3000` | 1.5 min (30 x 3s) | Next.js build on first start |
|
||||
| Caddy | `curl -k https://localhost` | Quick check | After other services are up |
|
||||
|
||||
If the server container exits during the health check, the script dumps diagnostics (container statuses + logs) before exiting.
|
||||
|
||||
After the Ollama health check passes, the script checks if the selected model is already pulled. If not, it runs `ollama pull <model>` inside the container.
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Profile System
|
||||
|
||||
The compose file (`docker-compose.selfhosted.yml`) uses Docker Compose profiles to make services optional. Only services whose profiles match the active `--profile` flags are started.
|
||||
|
||||
### Always-on Services (no profile)
|
||||
|
||||
These start regardless of which flags you pass:
|
||||
|
||||
| Service | Role | Image |
|
||||
|---------|------|-------|
|
||||
| `server` | FastAPI backend, API endpoints, WebRTC | `monadicalsas/reflector-backend:latest` |
|
||||
| `worker` | Celery worker for background processing | Same image, `ENTRYPOINT=worker` |
|
||||
| `beat` | Celery beat scheduler for periodic tasks | Same image, `ENTRYPOINT=beat` |
|
||||
| `web` | Next.js frontend | `monadicalsas/reflector-frontend:latest` |
|
||||
| `redis` | Message broker + caching | `redis:7.2-alpine` |
|
||||
| `postgres` | Primary database | `postgres:17-alpine` |
|
||||
|
||||
### Profile-Based Services
|
||||
|
||||
| Profile | Service | Role |
|
||||
|---------|---------|------|
|
||||
| `gpu` | `gpu` | NVIDIA GPU-accelerated transcription/diarization/translation |
|
||||
| `cpu` | `cpu` | CPU-only transcription/diarization/translation |
|
||||
| `ollama-gpu` | `ollama` | Local Ollama LLM with GPU |
|
||||
| `ollama-cpu` | `ollama-cpu` | Local Ollama LLM on CPU |
|
||||
| `garage` | `garage` | Local S3-compatible object storage |
|
||||
| `caddy` | `caddy` | Reverse proxy with SSL |
|
||||
|
||||
### The "transcription" Alias
|
||||
|
||||
Both the `gpu` and `cpu` services define a Docker network alias of `transcription`. This means the backend always connects to `http://transcription:8000` regardless of which profile is active. The alias is defined in the compose file's `networks.default.aliases` section.
|
||||
|
||||
---
|
||||
|
||||
## Service Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
Internet ────────>│ Caddy │ :80/:443 (profile: caddy)
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
│ │ │
|
||||
v v │
|
||||
┌─────────┐ ┌─────────┐ │
|
||||
│ web │ │ server │ │
|
||||
│ :3000 │ │ :1250 │ │
|
||||
└─────────┘ └────┬────┘ │
|
||||
│ │
|
||||
┌────┴────┐ │
|
||||
│ worker │ │
|
||||
│ beat │ │
|
||||
└────┬────┘ │
|
||||
│ │
|
||||
┌──────────────┼────────────┤
|
||||
│ │ │
|
||||
v v v
|
||||
┌───────────┐ ┌─────────┐ ┌─────────┐
|
||||
│transcription│ │postgres │ │ redis │
|
||||
│ (gpu/cpu) │ │ :5432 │ │ :6379 │
|
||||
│ :8000 │ └─────────┘ └─────────┘
|
||||
└───────────┘
|
||||
│
|
||||
┌─────┴─────┐ ┌─────────┐
|
||||
│ ollama │ │ garage │
|
||||
│(optional) │ │(optional│
|
||||
│ :11434 │ │ S3) │
|
||||
└───────────┘ └─────────┘
|
||||
```
|
||||
|
||||
### How Services Interact
|
||||
|
||||
1. **User request** hits Caddy (if enabled), which routes to `web` (pages) or `server` (API)
|
||||
2. **`web`** renders pages server-side using `SERVER_API_URL=http://server:1250` and client-side using the public `API_URL`
|
||||
3. **`server`** handles API requests, file uploads, WebRTC streaming. Dispatches background work to Celery via Redis
|
||||
4. **`worker`** picks up Celery tasks (transcription pipelines, audio processing). Calls `transcription:8000` for ML inference and uploads results to S3 storage
|
||||
5. **`beat`** schedules periodic tasks (cleanup, webhook retries) by pushing them onto the Celery queue
|
||||
6. **`transcription` (gpu/cpu)** runs Whisper/Parakeet (transcription), Pyannote (diarization), and translation models. Stateless HTTP API
|
||||
7. **`ollama`** provides an OpenAI-compatible API for summarization and topic detection. Called by the worker during post-processing
|
||||
8. **`garage`** provides S3-compatible storage for audio files and processed results. Accessed by the worker via boto3
|
||||
|
||||
---
|
||||
|
||||
## Configuration Flow
|
||||
|
||||
Environment variables flow through multiple layers. Understanding this prevents confusion when debugging:
|
||||
|
||||
```
|
||||
Flags (--gpu, --garage, etc.)
|
||||
│
|
||||
├── setup-selfhosted.sh interprets flags
|
||||
│ │
|
||||
│ ├── Writes server/.env (backend config)
|
||||
│ ├── Writes www/.env (frontend config)
|
||||
│ ├── Writes .env (HF_TOKEN for compose interpolation)
|
||||
│ └── Writes Caddyfile (proxy routes)
|
||||
│
|
||||
└── docker-compose.selfhosted.yml reads:
|
||||
├── env_file: ./server/.env (loaded into server, worker, beat)
|
||||
├── env_file: ./www/.env (loaded into web)
|
||||
├── .env (compose variable interpolation, e.g. ${HF_TOKEN})
|
||||
└── environment: {...} (hardcoded overrides, always win over env_file)
|
||||
```
|
||||
|
||||
### Precedence Rules
|
||||
|
||||
Docker Compose `environment:` keys **always override** `env_file:` values. This is by design — the compose file hardcodes infrastructure values that must be correct inside the Docker network (like `DATABASE_URL=postgresql+asyncpg://...@postgres:5432/...`) regardless of what's in `server/.env`.
|
||||
|
||||
The `server/.env` file is still useful for:
|
||||
- Values not overridden in the compose file (LLM config, storage credentials, auth settings)
|
||||
- Running the server outside Docker during development
|
||||
|
||||
### The Three `.env` Files
|
||||
|
||||
| File | Used By | Contains |
|
||||
|------|---------|----------|
|
||||
| `server/.env` | server, worker, beat | Backend config: database, Redis, S3, LLM, auth, public URLs |
|
||||
| `www/.env` | web | Frontend config: site URL, auth, feature flags |
|
||||
| `.env` (root) | Docker Compose interpolation | Only `HF_TOKEN` — injected into GPU/CPU container env |
|
||||
|
||||
---
|
||||
|
||||
## Storage Architecture
|
||||
|
||||
All audio files and processing results are stored in S3-compatible object storage. The backend uses boto3 (via aioboto3) with automatic path-style addressing when a custom endpoint URL is configured.
|
||||
|
||||
### How Garage Works
|
||||
|
||||
Garage is a lightweight, self-hosted S3-compatible storage engine. In this deployment:
|
||||
|
||||
- Runs as a single-node cluster with 1GB capacity allocation
|
||||
- Listens on port 3900 (S3 API) and 3903 (admin API)
|
||||
- Data persists in Docker volumes (`garage_data`, `garage_meta`)
|
||||
- Accessed by the worker at `http://garage:3900` (Docker-internal)
|
||||
|
||||
The setup script creates:
|
||||
- A bucket called `reflector-media`
|
||||
- An access key called `reflector` with read/write permissions on that bucket
|
||||
|
||||
### Path-Style vs Virtual-Hosted Addressing
|
||||
|
||||
AWS S3 uses virtual-hosted addressing by default (`bucket.s3.amazonaws.com`). S3-compatible services like Garage require path-style addressing (`endpoint/bucket`). The `AwsStorage` class detects this automatically: when `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL` is set, it configures boto3 with `addressing_style: "path"`.
|
||||
|
||||
---
|
||||
|
||||
## SSL/TLS and Reverse Proxy
|
||||
|
||||
### With `--domain` (Production)
|
||||
|
||||
Caddy automatically obtains and renews a Let's Encrypt certificate. Requirements:
|
||||
- DNS A record pointing to the server
|
||||
- Ports 80 (HTTP challenge) and 443 (HTTPS) open to the internet
|
||||
|
||||
The generated Caddyfile uses the domain as the site address, which triggers Caddy's automatic HTTPS.
|
||||
|
||||
### Without `--domain` (Development/LAN)
|
||||
|
||||
Caddy generates a self-signed certificate and listens on `:443` as a catch-all. Browsers will show a security warning that must be accepted manually.
|
||||
|
||||
### Without `--caddy` (BYO Proxy)
|
||||
|
||||
No ports are exposed to the internet. The services listen on `127.0.0.1` only:
|
||||
- Frontend: `localhost:3000`
|
||||
- Backend API: `localhost:1250`
|
||||
|
||||
You can point your own reverse proxy (nginx, Traefik, etc.) at these ports.
|
||||
|
||||
### WebRTC and UDP
|
||||
|
||||
The server exposes UDP ports 50000-50100 for WebRTC ICE candidates. The `WEBRTC_HOST` variable tells the server which IP to advertise in ICE candidates — this must be the server's actual IP address (not a domain), because WebRTC uses UDP which doesn't go through the HTTP reverse proxy.
|
||||
|
||||
---
|
||||
|
||||
## Build vs Pull Workflow
|
||||
|
||||
### Default (no `--build` flag)
|
||||
|
||||
```
|
||||
GPU/CPU model image: Always built from source (./gpu/self_hosted/)
|
||||
Backend image: Pulled from monadicalsas/reflector-backend:latest
|
||||
Frontend image: Pulled from monadicalsas/reflector-frontend:latest
|
||||
```
|
||||
|
||||
The GPU/CPU image is always built because it contains hardware-specific build steps and ML model download logic.
|
||||
|
||||
### With `--build`
|
||||
|
||||
```
|
||||
GPU/CPU model image: Built from source (./gpu/self_hosted/)
|
||||
Backend image: Built from source (./server/)
|
||||
Frontend image: Built from source (./www/)
|
||||
```
|
||||
|
||||
Use `--build` when:
|
||||
- You've made local code changes
|
||||
- The prebuilt registry images are outdated
|
||||
- You want to verify the build works on your hardware
|
||||
|
||||
### Rebuilding Individual Services
|
||||
|
||||
```bash
|
||||
# Rebuild just the backend
|
||||
docker compose -f docker-compose.selfhosted.yml build server worker beat
|
||||
|
||||
# Rebuild just the frontend
|
||||
docker compose -f docker-compose.selfhosted.yml build web
|
||||
|
||||
# Rebuild the GPU model container
|
||||
docker compose -f docker-compose.selfhosted.yml build gpu
|
||||
|
||||
# Force a clean rebuild (no cache)
|
||||
docker compose -f docker-compose.selfhosted.yml build --no-cache server
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Background Task Processing
|
||||
|
||||
### Celery Architecture
|
||||
|
||||
The backend uses Celery for all background work, with Redis as the message broker:
|
||||
|
||||
- **`worker`** — picks up tasks from the Redis queue and executes them
|
||||
- **`beat`** — schedules periodic tasks (cron-like) by pushing them onto the queue
|
||||
- **`Redis`** — acts as both message broker and result backend
|
||||
|
||||
### The Audio Processing Pipeline
|
||||
|
||||
When a file is uploaded, the worker runs a multi-step pipeline:
|
||||
|
||||
```
|
||||
Upload → Extract Audio → Upload to S3
|
||||
│
|
||||
┌──────┼──────┐
|
||||
│ │ │
|
||||
v v v
|
||||
Transcribe Diarize Waveform
|
||||
│ │ │
|
||||
└──────┼──────┘
|
||||
│
|
||||
Assemble
|
||||
│
|
||||
┌──────┼──────┐
|
||||
v v v
|
||||
Topics Title Summaries
|
||||
│
|
||||
Done
|
||||
```
|
||||
|
||||
Transcription, diarization, and waveform generation run in parallel. After assembly, topic detection, title generation, and summarization also run in parallel. Each step calls the appropriate service (transcription container for ML, Ollama/external LLM for text generation, S3 for storage).
|
||||
|
||||
### Event Loop Management
|
||||
|
||||
Each Celery task runs in its own `asyncio.run()` call, which creates a fresh event loop. The `asynctask` decorator in `server/reflector/asynctask.py` handles:
|
||||
|
||||
1. **Database connections** — resets the connection pool before each task (connections from a previous event loop would cause "Future attached to a different loop" errors)
|
||||
2. **Redis connections** — resets the WebSocket manager singleton so Redis pub/sub reconnects on the current loop
|
||||
3. **Cleanup** — disconnects the database and clears the context variable in the `finally` block
|
||||
|
||||
---
|
||||
|
||||
## Network and Port Layout
|
||||
|
||||
All services communicate over Docker's default bridge network. Only specific ports are exposed to the host:
|
||||
|
||||
| Port | Service | Binding | Purpose |
|
||||
|------|---------|---------|---------|
|
||||
| 80 | Caddy | `0.0.0.0:80` | HTTP (redirect to HTTPS / Let's Encrypt challenge) |
|
||||
| 443 | Caddy | `0.0.0.0:443` | HTTPS (main entry point) |
|
||||
| 1250 | Server | `127.0.0.1:1250` | Backend API (localhost only) |
|
||||
| 3000 | Web | `127.0.0.1:3000` | Frontend (localhost only) |
|
||||
| 3900 | Garage | `0.0.0.0:3900` | S3 API (for admin/debug access) |
|
||||
| 3903 | Garage | `0.0.0.0:3903` | Garage admin API |
|
||||
| 8000 | GPU/CPU | `127.0.0.1:8000` | ML model API (localhost only) |
|
||||
| 11434 | Ollama | `127.0.0.1:11434` | Ollama API (localhost only) |
|
||||
| 50000-50100/udp | Server | `0.0.0.0:50000-50100` | WebRTC ICE candidates |
|
||||
|
||||
Services bound to `127.0.0.1` are only accessible from the host itself (not from the network). Caddy is the only service exposed to the internet on standard HTTP/HTTPS ports.
|
||||
|
||||
### Docker-Internal Hostnames
|
||||
|
||||
Inside the Docker network, services reach each other by their compose service name:
|
||||
|
||||
| Hostname | Resolves To |
|
||||
|----------|-------------|
|
||||
| `server` | Backend API container |
|
||||
| `web` | Frontend container |
|
||||
| `postgres` | PostgreSQL container |
|
||||
| `redis` | Redis container |
|
||||
| `transcription` | GPU or CPU container (network alias) |
|
||||
| `ollama` / `ollama-cpu` | Ollama container |
|
||||
| `garage` | Garage S3 container |
|
||||
|
||||
---
|
||||
|
||||
## Diagnostics and Error Handling
|
||||
|
||||
The setup script includes an `ERR` trap that automatically dumps diagnostics when any command fails:
|
||||
|
||||
1. Lists all container statuses
|
||||
2. Shows the last 30 lines of logs for any stopped/exited containers
|
||||
3. Shows the last 40 lines of the specific failing service
|
||||
|
||||
This means if something goes wrong during setup, you'll see the relevant logs immediately without having to run manual debug commands.
|
||||
|
||||
### Common Debug Commands
|
||||
|
||||
```bash
|
||||
# Overall status
|
||||
docker compose -f docker-compose.selfhosted.yml ps
|
||||
|
||||
# Logs for a specific service
|
||||
docker compose -f docker-compose.selfhosted.yml logs server --tail 50
|
||||
docker compose -f docker-compose.selfhosted.yml logs worker --tail 50
|
||||
|
||||
# Check environment inside a container
|
||||
docker compose -f docker-compose.selfhosted.yml exec server env | grep TRANSCRIPT
|
||||
|
||||
# Health check from inside the network
|
||||
docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
|
||||
|
||||
# Check S3 storage connectivity
|
||||
docker compose -f docker-compose.selfhosted.yml exec server curl http://garage:3900
|
||||
|
||||
# Database access
|
||||
docker compose -f docker-compose.selfhosted.yml exec postgres psql -U reflector -c "SELECT id, status FROM transcript ORDER BY created_at DESC LIMIT 5;"
|
||||
|
||||
# List files in server data directory
|
||||
docker compose -f docker-compose.selfhosted.yml exec server ls -la /app/data/
|
||||
```
|
||||
373
docsv2/selfhosted-production.md
Normal file
373
docsv2/selfhosted-production.md
Normal file
@@ -0,0 +1,373 @@
|
||||
# Self-Hosted Production Deployment
|
||||
|
||||
Deploy Reflector on a single server with everything running in Docker. Transcription, diarization, and translation use specialized ML models (Whisper/Parakeet, Pyannote); only summarization and topic detection require an LLM.
|
||||
|
||||
> For a detailed walkthrough of how the setup script and infrastructure work under the hood, see [How the Self-Hosted Setup Works](selfhosted-architecture.md).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Hardware
|
||||
- **With GPU**: Linux server with NVIDIA GPU (8GB+ VRAM recommended), 16GB+ RAM, 50GB+ disk
|
||||
- **CPU-only**: 8+ cores, 32GB+ RAM (transcription is slower but works)
|
||||
- Disk space for ML models (~2GB on first run) + audio storage
|
||||
|
||||
### Software
|
||||
- Docker Engine 24+ with Compose V2
|
||||
- NVIDIA drivers + `nvidia-container-toolkit` (GPU modes only)
|
||||
- `curl`, `openssl` (usually pre-installed)
|
||||
|
||||
### Accounts & Credentials (depending on options)
|
||||
|
||||
**Always recommended:**
|
||||
- **HuggingFace token** — For downloading pyannote speaker diarization models. Get one at https://huggingface.co/settings/tokens and accept the model licenses:
|
||||
- https://huggingface.co/pyannote/speaker-diarization-3.1
|
||||
- https://huggingface.co/pyannote/segmentation-3.0
|
||||
- The setup script will prompt for this. If skipped, diarization falls back to a public model bundle (may be less reliable).
|
||||
|
||||
**LLM for summarization & topic detection (pick one):**
|
||||
- **With `--ollama-gpu` or `--ollama-cpu`**: Nothing extra — Ollama runs locally and pulls the model automatically
|
||||
- **Without `--ollama-*`**: An OpenAI-compatible LLM API key and endpoint. Examples:
|
||||
- OpenAI: `LLM_URL=https://api.openai.com/v1`, `LLM_API_KEY=sk-...`, `LLM_MODEL=gpt-4o-mini`
|
||||
- Anthropic, Together, Groq, or any OpenAI-compatible API
|
||||
- A self-managed vLLM or Ollama instance elsewhere on the network
|
||||
|
||||
**Object storage (pick one):**
|
||||
- **With `--garage`**: Nothing extra — Garage (local S3-compatible storage) is auto-configured by the script
|
||||
- **Without `--garage`**: S3-compatible storage credentials. The script will prompt for these, or you can pre-fill `server/.env`. Options include:
|
||||
- **AWS S3**: Access Key ID, Secret Access Key, bucket name, region
|
||||
- **MinIO**: Same credentials + `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://your-minio:9000`
|
||||
- **Any S3-compatible provider** (Backblaze B2, Cloudflare R2, DigitalOcean Spaces, etc.): same fields + custom endpoint URL
|
||||
|
||||
**Optional add-ons (configure after initial setup):**
|
||||
- **Daily.co** (live meeting rooms): Requires a Daily.co account (https://www.daily.co/), API key, subdomain, and an AWS S3 bucket + IAM Role for recording storage. See [Enabling Daily.co Live Rooms](#enabling-dailyco-live-rooms) below.
|
||||
- **Authentik** (user authentication): Requires an Authentik instance with an OAuth2/OIDC application configured for Reflector. See [Enabling Authentication](#enabling-authentication-authentik) below.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Monadical-SAS/reflector.git
|
||||
cd reflector
|
||||
|
||||
# GPU + local Ollama LLM + local Garage storage + Caddy SSL (with domain):
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com
|
||||
|
||||
# Same but without a domain (self-signed cert, access via IP):
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
|
||||
|
||||
# CPU-only (same, but slower):
|
||||
./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy
|
||||
|
||||
# Build from source instead of pulling prebuilt images:
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --build
|
||||
```
|
||||
|
||||
That's it. The script generates env files, secrets, starts all containers, waits for health checks, and prints the URL.
|
||||
|
||||
## Specialized Models (Required)
|
||||
|
||||
Pick `--gpu` or `--cpu`. This determines how **transcription, diarization, and translation** run:
|
||||
|
||||
| Flag | What it does | Requires |
|
||||
|------|-------------|----------|
|
||||
| `--gpu` | NVIDIA GPU acceleration for ML models | NVIDIA GPU + drivers + `nvidia-container-toolkit` |
|
||||
| `--cpu` | CPU-only (slower but works without GPU) | 8+ cores, 32GB+ RAM recommended |
|
||||
|
||||
## Local LLM (Optional)
|
||||
|
||||
Optionally add `--ollama-gpu` or `--ollama-cpu` for a **local Ollama instance** that handles summarization and topic detection. If omitted, configure an external OpenAI-compatible LLM in `server/.env`.
|
||||
|
||||
| Flag | What it does | Requires |
|
||||
|------|-------------|----------|
|
||||
| `--ollama-gpu` | Local Ollama with NVIDIA GPU acceleration | NVIDIA GPU |
|
||||
| `--ollama-cpu` | Local Ollama on CPU only | Nothing extra |
|
||||
| `--llm-model MODEL` | Choose which Ollama model to download (default: `qwen2.5:14b`) | `--ollama-gpu` or `--ollama-cpu` |
|
||||
| *(omitted)* | User configures external LLM (OpenAI, Anthropic, etc.) | LLM API key |
|
||||
|
||||
### Choosing an Ollama model
|
||||
|
||||
The default model is `qwen2.5:14b` (~9GB download, good multilingual support and summary quality). Override with `--llm-model`:
|
||||
|
||||
```bash
|
||||
# Default (qwen2.5:14b)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
|
||||
|
||||
# Mistral — good balance of speed and quality (~4.1GB)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy
|
||||
|
||||
# Phi-4 — smaller and faster (~9.1GB)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model phi4 --garage --caddy
|
||||
|
||||
# Llama 3.3 70B — best quality, needs 48GB+ RAM or GPU VRAM (~43GB)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model llama3.3:70b --garage --caddy
|
||||
|
||||
# Gemma 2 9B (~5.4GB)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model gemma2 --garage --caddy
|
||||
|
||||
# DeepSeek R1 8B — reasoning model, verbose but thorough summaries (~4.9GB)
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model deepseek-r1:8b --garage --caddy
|
||||
```
|
||||
|
||||
Browse all available models at https://ollama.com/library.
|
||||
|
||||
### Recommended combinations
|
||||
|
||||
- **`--gpu --ollama-gpu`**: Best for servers with NVIDIA GPU. Fully self-contained, no external API keys needed.
|
||||
- **`--cpu --ollama-cpu`**: No GPU available but want everything self-contained. Slower but works.
|
||||
- **`--gpu --ollama-cpu`**: GPU for transcription, CPU for LLM. Saves GPU VRAM for ML models.
|
||||
- **`--gpu`**: Have NVIDIA GPU but prefer a cloud LLM (faster/better summaries with GPT-4, Claude, etc.).
|
||||
- **`--cpu`**: No GPU, prefer cloud LLM. Slowest transcription but best summary quality.
|
||||
|
||||
## Other Optional Flags
|
||||
|
||||
| Flag | What it does |
|
||||
|------|-------------|
|
||||
| `--garage` | Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars. |
|
||||
| `--caddy` | Starts Caddy reverse proxy on ports 80/443 with self-signed cert. |
|
||||
| `--domain DOMAIN` | Use a real domain with Let's Encrypt auto-HTTPS (implies `--caddy`). Requires DNS A record pointing to this server and ports 80/443 open. |
|
||||
| `--build` | Build backend (server, worker, beat) and frontend (web) Docker images from source instead of pulling prebuilt images from the registry. Useful for development or when running a version with local changes. |
|
||||
|
||||
Without `--garage`, you **must** provide S3-compatible credentials (the script will prompt interactively or you can pre-fill `server/.env`).
|
||||
|
||||
Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse proxy at `web:3000` (frontend) and `server:1250` (API).
|
||||
|
||||
**Using a domain (recommended for production):** Point a DNS A record at your server's IP, then pass `--domain your.domain.com`. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open.
|
||||
|
||||
**Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
|
||||
|
||||
## What the Script Does
|
||||
|
||||
1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
|
||||
2. **Generate secrets** — `SECRET_KEY`, `NEXTAUTH_SECRET` via `openssl rand`
|
||||
3. **Generate `server/.env`** — From template, sets infrastructure defaults, configures LLM based on mode, enables `PUBLIC_MODE`
|
||||
4. **Generate `www/.env`** — Auto-detects server IP, sets URLs
|
||||
5. **Storage setup** — Either initializes Garage (bucket, keys, permissions) or prompts for external S3 credentials
|
||||
6. **Caddyfile** — Generates domain-specific (Let's Encrypt) or IP-specific (self-signed) configuration
|
||||
7. **Build & start** — Always builds GPU/CPU model image from source. With `--build`, also builds backend and frontend from source; otherwise pulls prebuilt images from the registry
|
||||
8. **Health checks** — Waits for each service, pulls Ollama model if needed, warns about missing LLM config
|
||||
|
||||
> For a deeper dive into each step, see [How the Self-Hosted Setup Works](selfhosted-architecture.md).
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Server Environment (`server/.env`)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `DATABASE_URL` | PostgreSQL connection | Auto-set (Docker internal) |
|
||||
| `REDIS_HOST` | Redis hostname | Auto-set (`redis`) |
|
||||
| `SECRET_KEY` | App secret | Auto-generated |
|
||||
| `AUTH_BACKEND` | Authentication method | `none` |
|
||||
| `PUBLIC_MODE` | Allow unauthenticated access | `true` |
|
||||
| `WEBRTC_HOST` | IP advertised in WebRTC ICE candidates | Auto-detected (server IP) |
|
||||
| `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
|
||||
| `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
|
||||
| `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
|
||||
| `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
|
||||
| `TRANSCRIPT_STORAGE_BACKEND` | Storage backend | `aws` |
|
||||
| `TRANSCRIPT_STORAGE_AWS_*` | S3 credentials | Auto-set for Garage |
|
||||
|
||||
### Frontend Environment (`www/.env`)
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `SITE_URL` | Public-facing URL | Auto-detected |
|
||||
| `API_URL` | API URL (browser-side) | Same as SITE_URL |
|
||||
| `SERVER_API_URL` | API URL (server-side) | `http://server:1250` |
|
||||
| `NEXTAUTH_SECRET` | Auth secret | Auto-generated |
|
||||
| `FEATURE_REQUIRE_LOGIN` | Require authentication | `false` |
|
||||
|
||||
## Storage Options
|
||||
|
||||
### Garage (Recommended for Self-Hosted)
|
||||
|
||||
Use `--garage` flag. The script automatically:
|
||||
- Generates `data/garage.toml` with a random RPC secret
|
||||
- Starts the Garage container
|
||||
- Creates the `reflector-media` bucket
|
||||
- Creates an access key with read/write permissions
|
||||
- Writes all S3 credentials to `server/.env`
|
||||
|
||||
### External S3 (AWS, MinIO, etc.)
|
||||
|
||||
Don't use `--garage`. The script will prompt for:
|
||||
- Access Key ID
|
||||
- Secret Access Key
|
||||
- Bucket Name
|
||||
- Region
|
||||
- Endpoint URL (for non-AWS like MinIO)
|
||||
|
||||
Or pre-fill in `server/.env`:
|
||||
```env
|
||||
TRANSCRIPT_STORAGE_BACKEND=aws
|
||||
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=your-key
|
||||
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret
|
||||
TRANSCRIPT_STORAGE_AWS_BUCKET_NAME=reflector-media
|
||||
TRANSCRIPT_STORAGE_AWS_REGION=us-east-1
|
||||
# For non-AWS S3 (MinIO, etc.):
|
||||
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000
|
||||
```
|
||||
|
||||
## Enabling Authentication (Authentik)
|
||||
|
||||
By default, authentication is disabled (`AUTH_BACKEND=none`, `FEATURE_REQUIRE_LOGIN=false`). To enable:
|
||||
|
||||
1. Deploy an Authentik instance (see [Authentik docs](https://goauthentik.io/docs/installation))
|
||||
2. Create an OAuth2/OIDC application for Reflector
|
||||
3. Update `server/.env`:
|
||||
```env
|
||||
AUTH_BACKEND=jwt
|
||||
AUTH_JWT_AUDIENCE=your-client-id
|
||||
```
|
||||
4. Update `www/.env`:
|
||||
```env
|
||||
FEATURE_REQUIRE_LOGIN=true
|
||||
AUTHENTIK_ISSUER=https://authentik.example.com/application/o/reflector
|
||||
AUTHENTIK_REFRESH_TOKEN_URL=https://authentik.example.com/application/o/token/
|
||||
AUTHENTIK_CLIENT_ID=your-client-id
|
||||
AUTHENTIK_CLIENT_SECRET=your-client-secret
|
||||
```
|
||||
5. Restart: `docker compose -f docker-compose.selfhosted.yml down && ./scripts/setup-selfhosted.sh <same-flags>`
|
||||
|
||||
## Enabling Daily.co Live Rooms
|
||||
|
||||
Daily.co enables real-time meeting rooms with automatic recording and transcription.
|
||||
|
||||
1. Create a [Daily.co](https://www.daily.co/) account
|
||||
2. Add to `server/.env`:
|
||||
```env
|
||||
DEFAULT_VIDEO_PLATFORM=daily
|
||||
DAILY_API_KEY=your-daily-api-key
|
||||
DAILY_SUBDOMAIN=your-subdomain
|
||||
DAILY_WEBHOOK_SECRET=your-webhook-secret
|
||||
DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco
|
||||
DAILYCO_STORAGE_AWS_REGION=us-east-1
|
||||
DAILYCO_STORAGE_AWS_ROLE_ARN=arn:aws:iam::role/DailyCoAccess
|
||||
```
|
||||
3. Restart the server: `docker compose -f docker-compose.selfhosted.yml restart server worker`
|
||||
|
||||
## Enabling Real Domain with Let's Encrypt
|
||||
|
||||
By default, Caddy uses self-signed certificates. For a real domain:
|
||||
|
||||
1. Point your domain's DNS to your server's IP
|
||||
2. Ensure ports 80 and 443 are open
|
||||
3. Edit `Caddyfile`:
|
||||
```
|
||||
reflector.example.com {
|
||||
handle /v1/* {
|
||||
reverse_proxy server:1250
|
||||
}
|
||||
handle /health {
|
||||
reverse_proxy server:1250
|
||||
}
|
||||
handle {
|
||||
reverse_proxy web:3000
|
||||
}
|
||||
}
|
||||
```
|
||||
4. Update `www/.env`:
|
||||
```env
|
||||
SITE_URL=https://reflector.example.com
|
||||
NEXTAUTH_URL=https://reflector.example.com
|
||||
API_URL=https://reflector.example.com
|
||||
```
|
||||
5. Restart Caddy: `docker compose -f docker-compose.selfhosted.yml restart caddy web`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Check service status
|
||||
```bash
|
||||
docker compose -f docker-compose.selfhosted.yml ps
|
||||
```
|
||||
|
||||
### View logs for a specific service
|
||||
```bash
|
||||
docker compose -f docker-compose.selfhosted.yml logs server --tail 50
|
||||
docker compose -f docker-compose.selfhosted.yml logs gpu --tail 50
|
||||
docker compose -f docker-compose.selfhosted.yml logs web --tail 50
|
||||
```
|
||||
|
||||
### GPU service taking too long
|
||||
First start downloads ~1-2GB of ML models. Check progress:
|
||||
```bash
|
||||
docker compose -f docker-compose.selfhosted.yml logs gpu -f
|
||||
```
|
||||
|
||||
### Server exits immediately
|
||||
Usually a database migration issue. Check:
|
||||
```bash
|
||||
docker compose -f docker-compose.selfhosted.yml logs server --tail 50
|
||||
```
|
||||
|
||||
### Caddy certificate issues
|
||||
For self-signed certs, your browser will warn. Click Advanced > Proceed.
|
||||
For Let's Encrypt, ensure ports 80/443 are open and DNS is pointed correctly.
|
||||
|
||||
### Summaries/topics not generating
|
||||
Check LLM configuration:
|
||||
```bash
|
||||
grep LLM_ server/.env
|
||||
```
|
||||
If you didn't use `--ollama-gpu` or `--ollama-cpu`, you must set `LLM_URL`, `LLM_API_KEY`, and `LLM_MODEL`.
|
||||
|
||||
### Health check from inside containers
|
||||
```bash
|
||||
docker compose -f docker-compose.selfhosted.yml exec server curl http://localhost:1250/health
|
||||
docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8000/docs
|
||||
```
|
||||
|
||||
## Updating
|
||||
|
||||
```bash
|
||||
# Option A: Pull latest prebuilt images and restart
|
||||
docker compose -f docker-compose.selfhosted.yml down
|
||||
./scripts/setup-selfhosted.sh <same-flags-as-before>
|
||||
|
||||
# Option B: Build from source (after git pull) and restart
|
||||
git pull
|
||||
docker compose -f docker-compose.selfhosted.yml down
|
||||
./scripts/setup-selfhosted.sh <same-flags-as-before> --build
|
||||
|
||||
# Rebuild only the GPU/CPU model image (picks up model updates)
|
||||
docker compose -f docker-compose.selfhosted.yml build gpu # or cpu
|
||||
```
|
||||
|
||||
The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────┐
|
||||
Internet ────────>│ Caddy │ :80/:443
|
||||
└────┬────┘
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
│ │ │
|
||||
v v │
|
||||
┌─────────┐ ┌─────────┐ │
|
||||
│ web │ │ server │ │
|
||||
│ :3000 │ │ :1250 │ │
|
||||
└─────────┘ └────┬────┘ │
|
||||
│ │
|
||||
┌────┴────┐ │
|
||||
│ worker │ │
|
||||
│ beat │ │
|
||||
└────┬────┘ │
|
||||
│ │
|
||||
┌──────────────┼────────────┤
|
||||
│ │ │
|
||||
v v v
|
||||
┌───────────┐ ┌─────────┐ ┌─────────┐
|
||||
│transcription│ │postgres │ │ redis │
|
||||
│(gpu/cpu) │ │ :5432 │ │ :6379 │
|
||||
│ :8000 │ └─────────┘ └─────────┘
|
||||
└───────────┘
|
||||
│
|
||||
┌─────┴─────┐ ┌─────────┐
|
||||
│ ollama │ │ garage │
|
||||
│ (optional)│ │(optional│
|
||||
│ :11434 │ │ S3) │
|
||||
└───────────┘ └─────────┘
|
||||
```
|
||||
|
||||
All services communicate over Docker's internal network. Only Caddy (if enabled) exposes ports to the internet.
|
||||
Reference in New Issue
Block a user