mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-04-14 17:26:55 +00:00
feat: mixdown modal services + processor pattern (#936)
* allow memory flags and per service config * feat: mixdown modal services + processor pattern
This commit is contained in:
committed by
GitHub
parent
12bf0c2d77
commit
d164e486cc
@@ -24,6 +24,8 @@ This document explains the internals of the self-hosted deployment: how the setu
|
||||
The self-hosted deployment runs the entire Reflector platform on a single server using Docker Compose. A single bash script (`scripts/setup-selfhosted.sh`) handles all configuration and orchestration. The key design principles are:
|
||||
|
||||
- **One command to deploy** — flags select which features to enable
|
||||
- **Config memory** — CLI args are saved to `data/.selfhosted-last-args`; re-run with no flags to replay
|
||||
- **Per-service overrides** — individual ML backends (transcript, diarization, translation, padding, mixdown) can be overridden independently from the base mode
|
||||
- **Idempotent** — safe to re-run without losing existing configuration
|
||||
- **Profile-based composition** — Docker Compose profiles activate optional services
|
||||
- **No external dependencies required** — with `--garage` and `--ollama-*`, everything runs locally
|
||||
@@ -61,8 +63,9 @@ Creates or updates the backend environment file from `server/.env.selfhosted.exa
|
||||
- **Infrastructure** — PostgreSQL URL, Redis host, Celery broker (all pointing to Docker-internal hostnames)
|
||||
- **Public URLs** — `BASE_URL` and `CORS_ORIGIN` computed from the domain (if `--domain`), IP (if detected on Linux), or `localhost`
|
||||
- **WebRTC** — `WEBRTC_HOST` set to the server's LAN IP so browsers can reach UDP ICE candidates
|
||||
- **Specialized models** — always points to `http://transcription:8000` (the Docker network alias shared by GPU and CPU containers)
|
||||
- **HuggingFace token** — prompts interactively for pyannote model access; writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
|
||||
- **ML backends (per-service)** — Each ML service (transcript, diarization, translation, padding, mixdown) is configured independently using "effective backends" (`EFF_TRANSCRIPT`, `EFF_DIARIZATION`, `EFF_TRANSLATION`, `EFF_PADDING`, `EFF_MIXDOWN`). These are resolved from the base mode default + any `--transcript`/`--diarization`/`--translation`/`--padding`/`--mixdown` overrides. For `modal` backends, the URL is `http://transcription:8000` (GPU mode), user-provided (hosted mode), or read from existing env (CPU mode with override). For CPU backends, no URL is needed (in-process). If a service is overridden to `modal` in CPU mode without a URL configured, the script warns the user to set `TRANSCRIPT_URL` in `server/.env`
|
||||
- **CPU timeouts** — `TRANSCRIPT_FILE_TIMEOUT` and `DIARIZATION_FILE_TIMEOUT` are increased to 3600s only for services actually using CPU backends (whisper/pyannote), not blanket for the whole mode
|
||||
- **HuggingFace token** — prompted when diarization uses `pyannote` (in-process) or when GPU mode is active (GPU container needs it). Writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
|
||||
- **LLM** — if `--ollama-*` is used, configures `LLM_URL` pointing to the Ollama container. Otherwise, warns that the user needs to configure an external LLM
|
||||
- **Public mode** — sets `PUBLIC_MODE=true` so the app is accessible without authentication by default
|
||||
- **Password auth** — if `--password` is passed, sets `AUTH_BACKEND=password`, `PUBLIC_MODE=false`, `ADMIN_EMAIL=admin@localhost`, and `ADMIN_PASSWORD_HASH` (the hash generated in Step 1). The admin user is provisioned in the database on container startup via `runserver.sh`
|
||||
@@ -228,11 +231,19 @@ Both the `gpu` and `cpu` services define a Docker network alias of `transcriptio
|
||||
Environment variables flow through multiple layers. Understanding this prevents confusion when debugging:
|
||||
|
||||
```
|
||||
Flags (--gpu, --garage, etc.)
|
||||
CLI args (--gpu, --garage, --padding modal, --mixdown modal, etc.)
|
||||
│
|
||||
├── setup-selfhosted.sh interprets flags
|
||||
├── Config memory: saved to data/.selfhosted-last-args
|
||||
│ (replayed on next run if no args provided)
|
||||
│
|
||||
├── setup-selfhosted.sh resolves effective backends:
|
||||
│ EFF_TRANSCRIPT = override or base mode default
|
||||
│ EFF_DIARIZATION = override or base mode default
|
||||
│ EFF_TRANSLATION = override or base mode default
|
||||
│ EFF_PADDING = override or base mode default
|
||||
│ EFF_MIXDOWN = override or base mode default
|
||||
│ │
|
||||
│ ├── Writes server/.env (backend config)
|
||||
│ ├── Writes server/.env (backend config, per-service backends)
|
||||
│ ├── Writes www/.env (frontend config)
|
||||
│ ├── Writes .env (HF_TOKEN for compose interpolation)
|
||||
│ └── Writes Caddyfile (proxy routes)
|
||||
|
||||
@@ -70,7 +70,7 @@ That's it. The script generates env files, secrets, starts all containers, waits
|
||||
|
||||
## ML Processing Modes (Required)
|
||||
|
||||
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, and audio padding** run:
|
||||
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, audio padding, and audio mixdown** run:
|
||||
|
||||
| Flag | What it does | Requires |
|
||||
|------|-------------|----------|
|
||||
@@ -158,6 +158,56 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
|
||||
|
||||
**Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
|
||||
|
||||
## Per-Service Backend Overrides
|
||||
|
||||
Override individual ML services without changing the base mode. Useful when you want most services on one backend but need specific services on another.
|
||||
|
||||
| Flag | Valid backends | Default (`--gpu`/`--hosted`) | Default (`--cpu`) |
|
||||
|------|---------------|------------------------------|-------------------|
|
||||
| `--transcript BACKEND` | `whisper`, `modal` | `modal` | `whisper` |
|
||||
| `--diarization BACKEND` | `pyannote`, `modal` | `modal` | `pyannote` |
|
||||
| `--translation BACKEND` | `marian`, `modal`, `passthrough` | `modal` | `marian` |
|
||||
| `--padding BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
|
||||
| `--mixdown BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# CPU base, but use a remote modal service for padding only
|
||||
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
|
||||
|
||||
# GPU base, but skip translation entirely (passthrough)
|
||||
./scripts/setup-selfhosted.sh --gpu --translation passthrough --garage --caddy
|
||||
|
||||
# CPU base with remote modal diarization and translation
|
||||
./scripts/setup-selfhosted.sh --cpu --diarization modal --translation modal --garage
|
||||
```
|
||||
|
||||
When overriding a service to `modal` in `--cpu` mode, the script will warn you to configure the service URL (`TRANSCRIPT_URL` etc.) in `server/.env` to point to your GPU service, then re-run.
|
||||
|
||||
When overriding a service to a CPU backend (e.g., `--transcript whisper`) in `--gpu` mode, that service runs in-process on the server/worker containers while the GPU container still serves the remaining `modal` services.
|
||||
|
||||
## Config Memory (No-Flag Re-run)
|
||||
|
||||
After a successful run, the script saves your CLI arguments to `data/.selfhosted-last-args`. On subsequent runs with no arguments, the saved configuration is automatically replayed:
|
||||
|
||||
```bash
|
||||
# First run — saves the config
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
|
||||
|
||||
# Later re-runs — same config, no flags needed
|
||||
./scripts/setup-selfhosted.sh
|
||||
# => "No flags provided — replaying saved configuration:"
|
||||
# => " --gpu --ollama-gpu --garage --caddy"
|
||||
```
|
||||
|
||||
To change the configuration, pass new flags — they override and replace the saved config:
|
||||
|
||||
```bash
|
||||
# Switch to CPU mode with overrides — this becomes the new saved config
|
||||
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
|
||||
```
|
||||
|
||||
## What the Script Does
|
||||
|
||||
1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
|
||||
@@ -189,6 +239,8 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
|
||||
| `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
|
||||
| `PADDING_BACKEND` | Audio padding backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
|
||||
| `PADDING_URL` | Audio padding endpoint (when `PADDING_BACKEND=modal`) | `http://transcription:8000` |
|
||||
| `MIXDOWN_BACKEND` | Audio mixdown backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
|
||||
| `MIXDOWN_URL` | Audio mixdown endpoint (when `MIXDOWN_BACKEND=modal`) | `http://transcription:8000` |
|
||||
| `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
|
||||
| `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
|
||||
| `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
|
||||
@@ -576,9 +628,9 @@ docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8
|
||||
## Updating
|
||||
|
||||
```bash
|
||||
# Option A: Pull latest prebuilt images and restart
|
||||
# Option A: Pull latest prebuilt images and restart (replays saved config automatically)
|
||||
docker compose -f docker-compose.selfhosted.yml down
|
||||
./scripts/setup-selfhosted.sh <same-flags-as-before>
|
||||
./scripts/setup-selfhosted.sh
|
||||
|
||||
# Option B: Build from source (after git pull) and restart
|
||||
git pull
|
||||
@@ -589,6 +641,8 @@ docker compose -f docker-compose.selfhosted.yml down
|
||||
docker compose -f docker-compose.selfhosted.yml build gpu # or cpu
|
||||
```
|
||||
|
||||
> **Note on config memory:** Running with no flags replays the saved config from your last run. Running with *any* flags replaces the saved config entirely — the script always saves the complete set of flags you provide. See [Config Memory](#config-memory-no-flag-re-run).
|
||||
|
||||
The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Reference in New Issue
Block a user