mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-04-24 06:05:19 +00:00
feat: mixdown modal services + processor pattern (#936)
* allow memory flags and per service config * feat: mixdown modal services + processor pattern
This commit is contained in:
committed by
GitHub
parent
12bf0c2d77
commit
d164e486cc
@@ -70,7 +70,7 @@ That's it. The script generates env files, secrets, starts all containers, waits
|
||||
|
||||
## ML Processing Modes (Required)
|
||||
|
||||
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, and audio padding** run:
|
||||
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, audio padding, and audio mixdown** run:
|
||||
|
||||
| Flag | What it does | Requires |
|
||||
|------|-------------|----------|
|
||||
@@ -158,6 +158,56 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
|
||||
|
||||
**Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
|
||||
|
||||
## Per-Service Backend Overrides
|
||||
|
||||
Override individual ML services without changing the base mode. Useful when you want most services on one backend but need specific services on another.
|
||||
|
||||
| Flag | Valid backends | Default (`--gpu`/`--hosted`) | Default (`--cpu`) |
|
||||
|------|---------------|------------------------------|-------------------|
|
||||
| `--transcript BACKEND` | `whisper`, `modal` | `modal` | `whisper` |
|
||||
| `--diarization BACKEND` | `pyannote`, `modal` | `modal` | `pyannote` |
|
||||
| `--translation BACKEND` | `marian`, `modal`, `passthrough` | `modal` | `marian` |
|
||||
| `--padding BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
|
||||
| `--mixdown BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# CPU base, but use a remote modal service for padding only
|
||||
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
|
||||
|
||||
# GPU base, but skip translation entirely (passthrough)
|
||||
./scripts/setup-selfhosted.sh --gpu --translation passthrough --garage --caddy
|
||||
|
||||
# CPU base with remote modal diarization and translation
|
||||
./scripts/setup-selfhosted.sh --cpu --diarization modal --translation modal --garage
|
||||
```
|
||||
|
||||
When overriding a service to `modal` in `--cpu` mode, the script will warn you to configure the service URL (`TRANSCRIPT_URL` etc.) in `server/.env` to point to your GPU service, then re-run.
|
||||
|
||||
When overriding a service to a CPU backend (e.g., `--transcript whisper`) in `--gpu` mode, that service runs in-process on the server/worker containers while the GPU container still serves the remaining `modal` services.
|
||||
|
||||
## Config Memory (No-Flag Re-run)
|
||||
|
||||
After a successful run, the script saves your CLI arguments to `data/.selfhosted-last-args`. On subsequent runs with no arguments, the saved configuration is automatically replayed:
|
||||
|
||||
```bash
|
||||
# First run — saves the config
|
||||
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
|
||||
|
||||
# Later re-runs — same config, no flags needed
|
||||
./scripts/setup-selfhosted.sh
|
||||
# => "No flags provided — replaying saved configuration:"
|
||||
# => " --gpu --ollama-gpu --garage --caddy"
|
||||
```
|
||||
|
||||
To change the configuration, pass new flags — they override and replace the saved config:
|
||||
|
||||
```bash
|
||||
# Switch to CPU mode with overrides — this becomes the new saved config
|
||||
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
|
||||
```
|
||||
|
||||
## What the Script Does
|
||||
|
||||
1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
|
||||
@@ -189,6 +239,8 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
|
||||
| `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
|
||||
| `PADDING_BACKEND` | Audio padding backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
|
||||
| `PADDING_URL` | Audio padding endpoint (when `PADDING_BACKEND=modal`) | `http://transcription:8000` |
|
||||
| `MIXDOWN_BACKEND` | Audio mixdown backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
|
||||
| `MIXDOWN_URL` | Audio mixdown endpoint (when `MIXDOWN_BACKEND=modal`) | `http://transcription:8000` |
|
||||
| `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
|
||||
| `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
|
||||
| `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
|
||||
@@ -576,9 +628,9 @@ docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8
|
||||
## Updating
|
||||
|
||||
```bash
|
||||
# Option A: Pull latest prebuilt images and restart
|
||||
# Option A: Pull latest prebuilt images and restart (replays saved config automatically)
|
||||
docker compose -f docker-compose.selfhosted.yml down
|
||||
./scripts/setup-selfhosted.sh <same-flags-as-before>
|
||||
./scripts/setup-selfhosted.sh
|
||||
|
||||
# Option B: Build from source (after git pull) and restart
|
||||
git pull
|
||||
@@ -589,6 +641,8 @@ docker compose -f docker-compose.selfhosted.yml down
|
||||
docker compose -f docker-compose.selfhosted.yml build gpu # or cpu
|
||||
```
|
||||
|
||||
> **Note on config memory:** Running with no flags replays the saved config from your last run. Running with *any* flags replaces the saved config entirely — the script always saves the complete set of flags you provide. See [Config Memory](#config-memory-no-flag-re-run).
|
||||
|
||||
The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Reference in New Issue
Block a user