feat: mixdown modal services + processor pattern (#936)

* allow memory flags and per service config * feat: mixdown modal services + processor pattern
2026-03-31 11:26:47 +00:00 · 2026-03-30 17:38:23 -05:00
parent 12bf0c2d77
commit d164e486cc
15 changed files with 1353 additions and 104 deletions
--- a/docsv2/selfhosted-architecture.md
+++ b/docsv2/selfhosted-architecture.md
@@ -24,6 +24,8 @@ This document explains the internals of the self-hosted deployment: how the setu
 The self-hosted deployment runs the entire Reflector platform on a single server using Docker Compose. A single bash script (`scripts/setup-selfhosted.sh`) handles all configuration and orchestration. The key design principles are:

 - **One command to deploy** — flags select which features to enable
+- **Config memory** — CLI args are saved to `data/.selfhosted-last-args`; re-run with no flags to replay
+- **Per-service overrides** — individual ML backends (transcript, diarization, translation, padding, mixdown) can be overridden independently from the base mode
 - **Idempotent** — safe to re-run without losing existing configuration
 - **Profile-based composition** — Docker Compose profiles activate optional services
 - **No external dependencies required** — with `--garage` and `--ollama-*`, everything runs locally
@@ -61,8 +63,9 @@ Creates or updates the backend environment file from `server/.env.selfhosted.exa
 - **Infrastructure** — PostgreSQL URL, Redis host, Celery broker (all pointing to Docker-internal hostnames)
 - **Public URLs** — `BASE_URL` and `CORS_ORIGIN` computed from the domain (if `--domain`), IP (if detected on Linux), or `localhost`
 - **WebRTC** — `WEBRTC_HOST` set to the server's LAN IP so browsers can reach UDP ICE candidates
- **Specialized models** — always points to `http://transcription:8000` (the Docker network alias shared by GPU and CPU containers)
- **HuggingFace token** — prompts interactively for pyannote model access; writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
+- **ML backends (per-service)** — Each ML service (transcript, diarization, translation, padding, mixdown) is configured independently using "effective backends" (`EFF_TRANSCRIPT`, `EFF_DIARIZATION`, `EFF_TRANSLATION`, `EFF_PADDING`, `EFF_MIXDOWN`). These are resolved from the base mode default + any `--transcript`/`--diarization`/`--translation`/`--padding`/`--mixdown` overrides. For `modal` backends, the URL is `http://transcription:8000` (GPU mode), user-provided (hosted mode), or read from existing env (CPU mode with override). For CPU backends, no URL is needed (in-process). If a service is overridden to `modal` in CPU mode without a URL configured, the script warns the user to set `TRANSCRIPT_URL` in `server/.env`
+- **CPU timeouts** — `TRANSCRIPT_FILE_TIMEOUT` and `DIARIZATION_FILE_TIMEOUT` are increased to 3600s only for services actually using CPU backends (whisper/pyannote), not blanket for the whole mode
+- **HuggingFace token** — prompted when diarization uses `pyannote` (in-process) or when GPU mode is active (GPU container needs it). Writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
 - **LLM** — if `--ollama-*` is used, configures `LLM_URL` pointing to the Ollama container. Otherwise, warns that the user needs to configure an external LLM
 - **Public mode** — sets `PUBLIC_MODE=true` so the app is accessible without authentication by default
 - **Password auth** — if `--password` is passed, sets `AUTH_BACKEND=password`, `PUBLIC_MODE=false`, `ADMIN_EMAIL=admin@localhost`, and `ADMIN_PASSWORD_HASH` (the hash generated in Step 1). The admin user is provisioned in the database on container startup via `runserver.sh`
@@ -228,11 +231,19 @@ Both the `gpu` and `cpu` services define a Docker network alias of `transcriptio
 Environment variables flow through multiple layers. Understanding this prevents confusion when debugging:

 ```
-Flags (--gpu, --garage, etc.)
+CLI args (--gpu, --garage, --padding modal, --mixdown modal, etc.)
  │
-  ├── setup-selfhosted.sh interprets flags
+  ├── Config memory: saved to data/.selfhosted-last-args
+  │   (replayed on next run if no args provided)
+  │
+  ├── setup-selfhosted.sh resolves effective backends:
+  │     EFF_TRANSCRIPT  = override or base mode default
+  │     EFF_DIARIZATION = override or base mode default
+  │     EFF_TRANSLATION = override or base mode default
+  │     EFF_PADDING     = override or base mode default
+  │     EFF_MIXDOWN     = override or base mode default
  │     │
-  │     ├── Writes server/.env (backend config)
+  │     ├── Writes server/.env (backend config, per-service backends)
  │     ├── Writes www/.env (frontend config)
  │     ├── Writes .env (HF_TOKEN for compose interpolation)
  │     └── Writes Caddyfile (proxy routes)
--- a/docsv2/selfhosted-production.md
+++ b/docsv2/selfhosted-production.md
@@ -70,7 +70,7 @@ That's it. The script generates env files, secrets, starts all containers, waits

 ## ML Processing Modes (Required)

-Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, and audio padding** run:
+Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, audio padding, and audio mixdown** run:

 | Flag | What it does | Requires |
 |------|-------------|----------|
@@ -158,6 +158,56 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr

 **Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.

+## Per-Service Backend Overrides
+
+Override individual ML services without changing the base mode. Useful when you want most services on one backend but need specific services on another.
+
+| Flag | Valid backends | Default (`--gpu`/`--hosted`) | Default (`--cpu`) |
+|------|---------------|------------------------------|-------------------|
+| `--transcript BACKEND` | `whisper`, `modal` | `modal` | `whisper` |
+| `--diarization BACKEND` | `pyannote`, `modal` | `modal` | `pyannote` |
+| `--translation BACKEND` | `marian`, `modal`, `passthrough` | `modal` | `marian` |
+| `--padding BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
+| `--mixdown BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
+
+**Examples:**
+
+```bash
+# CPU base, but use a remote modal service for padding only
+./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
+
+# GPU base, but skip translation entirely (passthrough)
+./scripts/setup-selfhosted.sh --gpu --translation passthrough --garage --caddy
+
+# CPU base with remote modal diarization and translation
+./scripts/setup-selfhosted.sh --cpu --diarization modal --translation modal --garage
+```
+
+When overriding a service to `modal` in `--cpu` mode, the script will warn you to configure the service URL (`TRANSCRIPT_URL` etc.) in `server/.env` to point to your GPU service, then re-run.
+
+When overriding a service to a CPU backend (e.g., `--transcript whisper`) in `--gpu` mode, that service runs in-process on the server/worker containers while the GPU container still serves the remaining `modal` services.
+
+## Config Memory (No-Flag Re-run)
+
+After a successful run, the script saves your CLI arguments to `data/.selfhosted-last-args`. On subsequent runs with no arguments, the saved configuration is automatically replayed:
+
+```bash
+# First run — saves the config
+./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
+
+# Later re-runs — same config, no flags needed
+./scripts/setup-selfhosted.sh
+# => "No flags provided — replaying saved configuration:"
+# => "  --gpu --ollama-gpu --garage --caddy"
+```
+
+To change the configuration, pass new flags — they override and replace the saved config:
+
+```bash
+# Switch to CPU mode with overrides — this becomes the new saved config
+./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
+```
+
 ## What the Script Does

 1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
@@ -189,6 +239,8 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
 | `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
 | `PADDING_BACKEND` | Audio padding backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
 | `PADDING_URL` | Audio padding endpoint (when `PADDING_BACKEND=modal`) | `http://transcription:8000` |
+| `MIXDOWN_BACKEND` | Audio mixdown backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
+| `MIXDOWN_URL` | Audio mixdown endpoint (when `MIXDOWN_BACKEND=modal`) | `http://transcription:8000` |
 | `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
 | `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
 | `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
@@ -576,9 +628,9 @@ docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8
 ## Updating

 ```bash
-# Option A: Pull latest prebuilt images and restart
+# Option A: Pull latest prebuilt images and restart (replays saved config automatically)
 docker compose -f docker-compose.selfhosted.yml down
-./scripts/setup-selfhosted.sh <same-flags-as-before>
+./scripts/setup-selfhosted.sh

 # Option B: Build from source (after git pull) and restart
 git pull
@@ -589,6 +641,8 @@ docker compose -f docker-compose.selfhosted.yml down
 docker compose -f docker-compose.selfhosted.yml build gpu  # or cpu
 ```

+> **Note on config memory:** Running with no flags replays the saved config from your last run. Running with *any* flags replaces the saved config entirely — the script always saves the complete set of flags you provide. See [Config Memory](#config-memory-no-flag-re-run).
+
 The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.

 ## Architecture Overview
--- a/gpu/modal_deployments/deploy-all.sh
+++ b/gpu/modal_deployments/deploy-all.sh
@@ -132,13 +132,22 @@ fi
 echo "  -> $DIARIZER_URL"

 echo ""
-echo "Deploying padding (CPU audio processing via Modal SDK)..."
-modal deploy reflector_padding.py
-if [ $? -ne 0 ]; then
+echo "Deploying padding (CPU audio processing)..."
+PADDING_URL=$(modal deploy reflector_padding.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
+if [ -z "$PADDING_URL" ]; then
    echo "Error: Failed to deploy padding. Check Modal dashboard for details."
    exit 1
 fi
-echo "  -> reflector-padding.pad_track (Modal SDK function)"
+echo "  -> $PADDING_URL"
+
+echo ""
+echo "Deploying mixdown (CPU multi-track audio mixing)..."
+MIXDOWN_URL=$(modal deploy reflector_mixdown.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
+if [ -z "$MIXDOWN_URL" ]; then
+    echo "Error: Failed to deploy mixdown. Check Modal dashboard for details."
+    exit 1
+fi
+echo "  -> $MIXDOWN_URL"

 # --- Output Configuration ---
 echo ""
@@ -157,5 +166,11 @@ echo "DIARIZATION_BACKEND=modal"
 echo "DIARIZATION_URL=$DIARIZER_URL"
 echo "DIARIZATION_MODAL_API_KEY=$API_KEY"
 echo ""
-echo "# Padding uses Modal SDK (requires MODAL_TOKEN_ID/SECRET in worker containers)"
+echo "PADDING_BACKEND=modal"
+echo "PADDING_URL=$PADDING_URL"
+echo "PADDING_MODAL_API_KEY=$API_KEY"
+echo ""
+echo "MIXDOWN_BACKEND=modal"
+echo "MIXDOWN_URL=$MIXDOWN_URL"
+echo "MIXDOWN_MODAL_API_KEY=$API_KEY"
 echo "# --- End Modal Configuration ---"
--- a/gpu/modal_deployments/reflector_mixdown.py
+++ b/gpu/modal_deployments/reflector_mixdown.py
@@ -0,0 +1,385 @@
+"""
+Reflector GPU backend - audio mixdown
+=====================================
+
+CPU-intensive multi-track audio mixdown service.
+Mixes N audio tracks into a single MP3 using PyAV amix filter graph.
+
+IMPORTANT: This mixdown logic is duplicated from server/reflector/utils/audio_mixdown.py
+for Modal deployment isolation (Modal can't import from server/reflector/). If you modify
+the PyAV filter graph or mixdown algorithm, you MUST update both:
+  - gpu/modal_deployments/reflector_mixdown.py (this file)
+  - server/reflector/utils/audio_mixdown.py
+
+Constants duplicated from server/reflector/utils/audio_constants.py for same reason.
+"""
+
+import os
+import tempfile
+from fractions import Fraction
+import asyncio
+
+import modal
+
+S3_TIMEOUT = 120  # Higher than padding (60s) — multiple track downloads
+MIXDOWN_TIMEOUT = 1200 + (S3_TIMEOUT * 2)  # 1440s total
+SCALEDOWN_WINDOW = 60
+DISCONNECT_CHECK_INTERVAL = 2
+
+app = modal.App("reflector-mixdown")
+
+# CPU-based image (mixdown is CPU-bound, no GPU needed)
+image = (
+    modal.Image.debian_slim(python_version="3.12")
+    .apt_install("ffmpeg")  # Required by PyAV
+    .pip_install(
+        "av==13.1.0",  # PyAV for audio processing
+        "requests==2.32.3",  # HTTP for presigned URL downloads/uploads
+        "fastapi==0.115.12",  # API framework
+    )
+)
+
+
+@app.function(
+    cpu=4.0,  # Higher than padding (2.0) for multi-track mixing
+    timeout=MIXDOWN_TIMEOUT,
+    scaledown_window=SCALEDOWN_WINDOW,
+    image=image,
+    secrets=[modal.Secret.from_name("reflector-gpu")],
+)
+@modal.asgi_app()
+def web():
+    from fastapi import Depends, FastAPI, HTTPException, Request, status
+    from fastapi.security import OAuth2PasswordBearer
+    from pydantic import BaseModel
+
+    class MixdownRequest(BaseModel):
+        track_urls: list[str]
+        output_url: str
+        target_sample_rate: int | None = None
+        offsets_seconds: list[float] | None = None
+
+    class MixdownResponse(BaseModel):
+        size: int
+        duration_ms: float = 0.0
+        cancelled: bool = False
+
+    web_app = FastAPI()
+
+    oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
+
+    def apikey_auth(apikey: str = Depends(oauth2_scheme)):
+        if apikey == os.environ["REFLECTOR_GPU_APIKEY"]:
+            return
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid API key",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+    @web_app.post("/mixdown", dependencies=[Depends(apikey_auth)])
+    async def mixdown_endpoint(request: Request, req: MixdownRequest) -> MixdownResponse:
+        """Modal web endpoint for mixing audio tracks with disconnect detection."""
+        import logging
+        import threading
+
+        logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
+        logger = logging.getLogger(__name__)
+
+        valid_urls = [u for u in req.track_urls if u]
+        if not valid_urls:
+            raise HTTPException(status_code=400, detail="No valid track URLs provided")
+        if req.offsets_seconds is not None:
+            if len(req.offsets_seconds) != len(req.track_urls):
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"offsets_seconds length ({len(req.offsets_seconds)}) "
+                    f"must match track_urls ({len(req.track_urls)})",
+                )
+            if any(o > 18000 for o in req.offsets_seconds):
+                raise HTTPException(status_code=400, detail="offsets_seconds exceeds maximum 18000s (5 hours)")
+        if not req.output_url:
+            raise HTTPException(status_code=400, detail="output_url cannot be empty")
+
+        logger.info(f"Mixdown request: {len(valid_urls)} tracks")
+
+        # Thread-safe cancellation flag
+        cancelled = threading.Event()
+
+        async def check_disconnect():
+            """Background task to check for client disconnect."""
+            while not cancelled.is_set():
+                await asyncio.sleep(DISCONNECT_CHECK_INTERVAL)
+                if await request.is_disconnected():
+                    logger.warning("Client disconnected, setting cancellation flag")
+                    cancelled.set()
+                    break
+
+        disconnect_task = asyncio.create_task(check_disconnect())
+
+        try:
+            result = await asyncio.get_event_loop().run_in_executor(
+                None, _mixdown_tracks_blocking, req, cancelled, logger
+            )
+            return MixdownResponse(**result)
+        finally:
+            cancelled.set()
+            disconnect_task.cancel()
+            try:
+                await disconnect_task
+            except asyncio.CancelledError:
+                pass
+
+    def _mixdown_tracks_blocking(req, cancelled, logger) -> dict:
+        """Blocking CPU-bound mixdown work with periodic cancellation checks.
+
+        Downloads all tracks, builds PyAV amix filter graph, encodes to MP3,
+        and uploads the result to the presigned output URL.
+        """
+        import av
+        import requests
+        from av.audio.resampler import AudioResampler
+        import time
+
+        temp_dir = tempfile.mkdtemp()
+        track_paths = []
+        output_path = None
+        last_check = time.time()
+
+        try:
+            # --- Download all tracks ---
+            valid_urls = [u for u in req.track_urls if u]
+            for i, url in enumerate(valid_urls):
+                if cancelled.is_set():
+                    logger.info("Cancelled during download phase")
+                    return {"size": 0, "duration_ms": 0.0, "cancelled": True}
+
+                logger.info(f"Downloading track {i}")
+                response = requests.get(url, stream=True, timeout=S3_TIMEOUT)
+                response.raise_for_status()
+
+                track_path = os.path.join(temp_dir, f"track_{i}.webm")
+                total_bytes = 0
+                chunk_count = 0
+                with open(track_path, "wb") as f:
+                    for chunk in response.iter_content(chunk_size=8192):
+                        if chunk:
+                            f.write(chunk)
+                            total_bytes += len(chunk)
+                            chunk_count += 1
+                            if chunk_count % 12 == 0:
+                                now = time.time()
+                                if now - last_check >= DISCONNECT_CHECK_INTERVAL:
+                                    if cancelled.is_set():
+                                        logger.info(f"Cancelled during track {i} download")
+                                        return {"size": 0, "duration_ms": 0.0, "cancelled": True}
+                                    last_check = now
+
+                track_paths.append(track_path)
+                logger.info(f"Track {i} downloaded: {total_bytes} bytes")
+
+            if not track_paths:
+                raise ValueError("No tracks downloaded")
+
+            # --- Detect sample rate ---
+            target_sample_rate = req.target_sample_rate
+            if target_sample_rate is None:
+                for path in track_paths:
+                    try:
+                        container = av.open(path)
+                        for frame in container.decode(audio=0):
+                            target_sample_rate = frame.sample_rate
+                            container.close()
+                            break
+                        else:
+                            container.close()
+                            continue
+                        break
+                    except Exception:
+                        continue
+            if target_sample_rate is None:
+                raise ValueError("Could not detect sample rate from any track")
+
+            logger.info(f"Target sample rate: {target_sample_rate}")
+
+            # --- Calculate per-input delays ---
+            input_offsets_seconds = None
+            if req.offsets_seconds is not None:
+                input_offsets_seconds = [
+                    req.offsets_seconds[i] for i, url in enumerate(req.track_urls) if url
+                ]
+
+            delays_ms = []
+            if input_offsets_seconds is not None:
+                base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
+                delays_ms = [max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds]
+            else:
+                delays_ms = [0 for _ in track_paths]
+
+            # --- Build filter graph ---
+            # N abuffer -> optional adelay -> amix -> aformat -> abuffersink
+            graph = av.filter.Graph()
+            inputs = []
+
+            for idx in range(len(track_paths)):
+                args = (
+                    f"time_base=1/{target_sample_rate}:"
+                    f"sample_rate={target_sample_rate}:"
+                    f"sample_fmt=s32:"
+                    f"channel_layout=stereo"
+                )
+                in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
+                inputs.append(in_ctx)
+
+            mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
+            fmt = graph.add(
+                "aformat",
+                args=f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}",
+                name="fmt",
+            )
+            sink = graph.add("abuffersink", name="out")
+
+            for idx, in_ctx in enumerate(inputs):
+                delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
+                if delay_ms > 0:
+                    adelay = graph.add(
+                        "adelay",
+                        args=f"delays={delay_ms}|{delay_ms}:all=1",
+                        name=f"delay{idx}",
+                    )
+                    in_ctx.link_to(adelay)
+                    adelay.link_to(mixer, 0, idx)
+                else:
+                    in_ctx.link_to(mixer, 0, idx)
+
+            mixer.link_to(fmt)
+            fmt.link_to(sink)
+            graph.configure()
+
+            # --- Open all containers and decode ---
+            containers = []
+            output_path = os.path.join(temp_dir, "mixed.mp3")
+
+            try:
+                for path in track_paths:
+                    containers.append(av.open(path))
+
+                decoders = [c.decode(audio=0) for c in containers]
+                active = [True] * len(decoders)
+                resamplers = [
+                    AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
+                    for _ in decoders
+                ]
+
+                # Open output MP3
+                out_container = av.open(output_path, "w", format="mp3")
+                out_stream = out_container.add_stream("libmp3lame", rate=target_sample_rate)
+                total_duration = 0
+
+                while any(active):
+                    # Check cancellation periodically
+                    now = time.time()
+                    if now - last_check >= DISCONNECT_CHECK_INTERVAL:
+                        if cancelled.is_set():
+                            logger.info("Cancelled during mixing")
+                            out_container.close()
+                            return {"size": 0, "duration_ms": 0.0, "cancelled": True}
+                        last_check = now
+
+                    for i, (dec, is_active) in enumerate(zip(decoders, active)):
+                        if not is_active:
+                            continue
+                        try:
+                            frame = next(dec)
+                        except StopIteration:
+                            active[i] = False
+                            inputs[i].push(None)
+                            continue
+
+                        if frame.sample_rate != target_sample_rate:
+                            continue
+
+                        out_frames = resamplers[i].resample(frame) or []
+                        for rf in out_frames:
+                            rf.sample_rate = target_sample_rate
+                            rf.time_base = Fraction(1, target_sample_rate)
+                            inputs[i].push(rf)
+
+                        while True:
+                            try:
+                                mixed = sink.pull()
+                            except Exception:
+                                break
+                            mixed.sample_rate = target_sample_rate
+                            mixed.time_base = Fraction(1, target_sample_rate)
+                            for packet in out_stream.encode(mixed):
+                                out_container.mux(packet)
+                                total_duration += packet.duration
+
+                # Flush filter graph
+                while True:
+                    try:
+                        mixed = sink.pull()
+                    except Exception:
+                        break
+                    mixed.sample_rate = target_sample_rate
+                    mixed.time_base = Fraction(1, target_sample_rate)
+                    for packet in out_stream.encode(mixed):
+                        out_container.mux(packet)
+                        total_duration += packet.duration
+
+                # Flush encoder
+                for packet in out_stream.encode(None):
+                    out_container.mux(packet)
+                    total_duration += packet.duration
+
+                # Calculate duration in ms
+                last_tb = out_stream.time_base
+                duration_ms = 0.0
+                if last_tb and total_duration > 0:
+                    duration_ms = round(float(total_duration * last_tb * 1000), 2)
+
+                out_container.close()
+
+            finally:
+                for c in containers:
+                    try:
+                        c.close()
+                    except Exception:
+                        pass
+
+            file_size = os.path.getsize(output_path)
+            logger.info(f"Mixdown complete: {file_size} bytes, {duration_ms}ms")
+
+            if cancelled.is_set():
+                logger.info("Cancelled after mixing, before upload")
+                return {"size": 0, "duration_ms": 0.0, "cancelled": True}
+
+            # --- Upload result ---
+            logger.info("Uploading mixed audio to S3")
+            with open(output_path, "rb") as f:
+                upload_response = requests.put(req.output_url, data=f, timeout=S3_TIMEOUT)
+            upload_response.raise_for_status()
+            logger.info(f"Upload complete: {file_size} bytes")
+
+            return {"size": file_size, "duration_ms": duration_ms}
+
+        finally:
+            # Cleanup all temp files
+            for path in track_paths:
+                if os.path.exists(path):
+                    try:
+                        os.unlink(path)
+                    except Exception as e:
+                        logger.warning(f"Failed to cleanup track file: {e}")
+            if output_path and os.path.exists(output_path):
+                try:
+                    os.unlink(output_path)
+                except Exception as e:
+                    logger.warning(f"Failed to cleanup output file: {e}")
+            try:
+                os.rmdir(temp_dir)
+            except Exception as e:
+                logger.warning(f"Failed to cleanup temp directory: {e}")
+
+    return web_app
--- a/gpu/modal_deployments/reflector_padding.py
+++ b/gpu/modal_deployments/reflector_padding.py
@@ -52,10 +52,12 @@ OPUS_DEFAULT_BIT_RATE = 128000
    timeout=PADDING_TIMEOUT,
    scaledown_window=SCALEDOWN_WINDOW,
    image=image,
+    secrets=[modal.Secret.from_name("reflector-gpu")],
 )
@modal.asgi_app()
 def web():
-    from fastapi import FastAPI, Request, HTTPException
+    from fastapi import Depends, FastAPI, HTTPException, Request, status
+    from fastapi.security import OAuth2PasswordBearer
    from pydantic import BaseModel

    class PaddingRequest(BaseModel):
@@ -70,7 +72,18 @@ def web():

    web_app = FastAPI()

-    @web_app.post("/pad")
+    oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
+
+    def apikey_auth(apikey: str = Depends(oauth2_scheme)):
+        if apikey == os.environ["REFLECTOR_GPU_APIKEY"]:
+            return
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid API key",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+    @web_app.post("/pad", dependencies=[Depends(apikey_auth)])
    async def pad_track_endpoint(request: Request, req: PaddingRequest) -> PaddingResponse:
        """Modal web endpoint for padding audio tracks with disconnect detection.
        """
--- a/gpu/self_hosted/app/factory.py
+++ b/gpu/self_hosted/app/factory.py
@@ -3,6 +3,7 @@ from contextlib import asynccontextmanager
 from fastapi import FastAPI

 from .routers.diarization import router as diarization_router
+from .routers.mixdown import router as mixdown_router
 from .routers.padding import router as padding_router
 from .routers.transcription import router as transcription_router
 from .routers.translation import router as translation_router
@@ -29,4 +30,5 @@ def create_app() -> FastAPI:
    app.include_router(translation_router)
    app.include_router(diarization_router)
    app.include_router(padding_router)
+    app.include_router(mixdown_router)
    return app
--- a/gpu/self_hosted/app/routers/mixdown.py
+++ b/gpu/self_hosted/app/routers/mixdown.py
@@ -0,0 +1,288 @@
+"""
+Audio mixdown endpoint for selfhosted GPU service.
+
+CPU-intensive multi-track audio mixing service for combining N audio tracks
+into a single MP3 using PyAV amix filter graph.
+
+IMPORTANT: This mixdown logic is duplicated from server/reflector/utils/audio_mixdown.py
+for deployment isolation (self_hosted can't import from server/reflector/). If you modify
+the PyAV filter graph or mixdown algorithm, you MUST update both:
+  - gpu/self_hosted/app/routers/mixdown.py (this file)
+  - server/reflector/utils/audio_mixdown.py
+
+Constants duplicated from server/reflector/utils/audio_constants.py for same reason.
+"""
+
+import logging
+import os
+import tempfile
+from fractions import Fraction
+
+import av
+import requests
+from av.audio.resampler import AudioResampler
+from fastapi import APIRouter, Depends, HTTPException
+from pydantic import BaseModel
+
+from ..auth import apikey_auth
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["mixdown"])
+
+S3_TIMEOUT = 120
+
+
+class MixdownRequest(BaseModel):
+    track_urls: list[str]
+    output_url: str
+    target_sample_rate: int | None = None
+    offsets_seconds: list[float] | None = None
+
+
+class MixdownResponse(BaseModel):
+    size: int
+    duration_ms: float = 0.0
+    cancelled: bool = False
+
+
+@router.post("/mixdown", dependencies=[Depends(apikey_auth)], response_model=MixdownResponse)
+def mixdown_tracks(req: MixdownRequest):
+    """Mix multiple audio tracks into single MP3 using PyAV amix filter graph."""
+    valid_urls = [u for u in req.track_urls if u]
+    if not valid_urls:
+        raise HTTPException(status_code=400, detail="No valid track URLs provided")
+    if req.offsets_seconds is not None:
+        if len(req.offsets_seconds) != len(req.track_urls):
+            raise HTTPException(
+                status_code=400,
+                detail=f"offsets_seconds length ({len(req.offsets_seconds)}) "
+                f"must match track_urls ({len(req.track_urls)})",
+            )
+        if any(o > 18000 for o in req.offsets_seconds):
+            raise HTTPException(
+                status_code=400, detail="offsets_seconds exceeds maximum 18000s (5 hours)"
+            )
+    if not req.output_url:
+        raise HTTPException(status_code=400, detail="output_url cannot be empty")
+
+    logger.info("Mixdown request: %d tracks", len(valid_urls))
+
+    temp_dir = tempfile.mkdtemp()
+    track_paths = []
+    output_path = None
+
+    try:
+        # --- Download all tracks ---
+        for i, url in enumerate(valid_urls):
+            logger.info("Downloading track %d", i)
+            response = requests.get(url, stream=True, timeout=S3_TIMEOUT)
+            response.raise_for_status()
+
+            track_path = os.path.join(temp_dir, f"track_{i}.webm")
+            total_bytes = 0
+            with open(track_path, "wb") as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    if chunk:
+                        f.write(chunk)
+                        total_bytes += len(chunk)
+
+            track_paths.append(track_path)
+            logger.info("Track %d downloaded: %d bytes", i, total_bytes)
+
+        if not track_paths:
+            raise HTTPException(status_code=400, detail="No tracks could be downloaded")
+
+        # --- Detect sample rate ---
+        target_sample_rate = req.target_sample_rate
+        if target_sample_rate is None:
+            for path in track_paths:
+                try:
+                    container = av.open(path)
+                    for frame in container.decode(audio=0):
+                        target_sample_rate = frame.sample_rate
+                        container.close()
+                        break
+                    else:
+                        container.close()
+                        continue
+                    break
+                except Exception:
+                    continue
+        if target_sample_rate is None:
+            raise HTTPException(
+                status_code=400, detail="Could not detect sample rate from any track"
+            )
+
+        logger.info("Target sample rate: %d", target_sample_rate)
+
+        # --- Calculate per-input delays ---
+        input_offsets_seconds = None
+        if req.offsets_seconds is not None:
+            input_offsets_seconds = [
+                req.offsets_seconds[i] for i, url in enumerate(req.track_urls) if url
+            ]
+
+        delays_ms = []
+        if input_offsets_seconds is not None:
+            base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
+            delays_ms = [max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds]
+        else:
+            delays_ms = [0 for _ in track_paths]
+
+        # --- Build filter graph ---
+        # N abuffer -> optional adelay -> amix -> aformat -> abuffersink
+        graph = av.filter.Graph()
+        inputs = []
+
+        for idx in range(len(track_paths)):
+            args = (
+                f"time_base=1/{target_sample_rate}:"
+                f"sample_rate={target_sample_rate}:"
+                f"sample_fmt=s32:"
+                f"channel_layout=stereo"
+            )
+            in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
+            inputs.append(in_ctx)
+
+        mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
+        fmt = graph.add(
+            "aformat",
+            args=f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}",
+            name="fmt",
+        )
+        sink = graph.add("abuffersink", name="out")
+
+        for idx, in_ctx in enumerate(inputs):
+            delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
+            if delay_ms > 0:
+                adelay = graph.add(
+                    "adelay",
+                    args=f"delays={delay_ms}|{delay_ms}:all=1",
+                    name=f"delay{idx}",
+                )
+                in_ctx.link_to(adelay)
+                adelay.link_to(mixer, 0, idx)
+            else:
+                in_ctx.link_to(mixer, 0, idx)
+
+        mixer.link_to(fmt)
+        fmt.link_to(sink)
+        graph.configure()
+
+        # --- Open all containers and decode ---
+        containers = []
+        output_path = os.path.join(temp_dir, "mixed.mp3")
+
+        try:
+            for path in track_paths:
+                containers.append(av.open(path))
+
+            decoders = [c.decode(audio=0) for c in containers]
+            active = [True] * len(decoders)
+            resamplers = [
+                AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
+                for _ in decoders
+            ]
+
+            # Open output MP3
+            out_container = av.open(output_path, "w", format="mp3")
+            out_stream = out_container.add_stream("libmp3lame", rate=target_sample_rate)
+            total_duration = 0
+
+            while any(active):
+                for i, (dec, is_active) in enumerate(zip(decoders, active)):
+                    if not is_active:
+                        continue
+                    try:
+                        frame = next(dec)
+                    except StopIteration:
+                        active[i] = False
+                        inputs[i].push(None)
+                        continue
+
+                    if frame.sample_rate != target_sample_rate:
+                        continue
+
+                    out_frames = resamplers[i].resample(frame) or []
+                    for rf in out_frames:
+                        rf.sample_rate = target_sample_rate
+                        rf.time_base = Fraction(1, target_sample_rate)
+                        inputs[i].push(rf)
+
+                    while True:
+                        try:
+                            mixed = sink.pull()
+                        except Exception:
+                            break
+                        mixed.sample_rate = target_sample_rate
+                        mixed.time_base = Fraction(1, target_sample_rate)
+                        for packet in out_stream.encode(mixed):
+                            out_container.mux(packet)
+                            total_duration += packet.duration
+
+            # Flush filter graph
+            while True:
+                try:
+                    mixed = sink.pull()
+                except Exception:
+                    break
+                mixed.sample_rate = target_sample_rate
+                mixed.time_base = Fraction(1, target_sample_rate)
+                for packet in out_stream.encode(mixed):
+                    out_container.mux(packet)
+                    total_duration += packet.duration
+
+            # Flush encoder
+            for packet in out_stream.encode(None):
+                out_container.mux(packet)
+                total_duration += packet.duration
+
+            # Calculate duration in ms
+            last_tb = out_stream.time_base
+            duration_ms = 0.0
+            if last_tb and total_duration > 0:
+                duration_ms = round(float(total_duration * last_tb * 1000), 2)
+
+            out_container.close()
+
+        finally:
+            for c in containers:
+                try:
+                    c.close()
+                except Exception:
+                    pass
+
+        file_size = os.path.getsize(output_path)
+        logger.info("Mixdown complete: %d bytes, %.2fms", file_size, duration_ms)
+
+        # --- Upload result ---
+        logger.info("Uploading mixed audio to S3")
+        with open(output_path, "rb") as f:
+            upload_response = requests.put(req.output_url, data=f, timeout=S3_TIMEOUT)
+        upload_response.raise_for_status()
+        logger.info("Upload complete: %d bytes", file_size)
+
+        return MixdownResponse(size=file_size, duration_ms=duration_ms)
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error("Mixdown failed: %s", e, exc_info=True)
+        raise HTTPException(status_code=500, detail=f"Mixdown failed: {e}") from e
+    finally:
+        for path in track_paths:
+            if os.path.exists(path):
+                try:
+                    os.unlink(path)
+                except Exception as e:
+                    logger.warning("Failed to cleanup track file: %s", e)
+        if output_path and os.path.exists(output_path):
+            try:
+                os.unlink(output_path)
+            except Exception as e:
+                logger.warning("Failed to cleanup output file: %s", e)
+        try:
+            os.rmdir(temp_dir)
+        except Exception as e:
+            logger.warning("Failed to cleanup temp directory: %s", e)
--- a/scripts/setup-selfhosted.sh
+++ b/scripts/setup-selfhosted.sh
@@ -4,13 +4,21 @@
 # Single script to configure and launch everything on one server.
 #
 # Usage:
-#   ./scripts/setup-selfhosted.sh <--gpu|--cpu|--hosted> [--ollama-gpu|--ollama-cpu] [--llm-model MODEL] [--garage] [--caddy] [--domain DOMAIN] [--custom-ca PATH] [--password PASSWORD] [--build]
+#   ./scripts/setup-selfhosted.sh <--gpu|--cpu|--hosted> [options] [--transcript BACKEND] [--diarization BACKEND] [--translation BACKEND] [--padding BACKEND] [--mixdown BACKEND]
+#   ./scripts/setup-selfhosted.sh                        (re-run with saved config from last run)
 #
-# ML processing modes (pick ONE — required):
+# ML processing modes (pick ONE — required on first run):
 #   --gpu              NVIDIA GPU container for transcription/diarization/translation
 #   --cpu              In-process CPU processing (no ML container, slower)
 #   --hosted           Remote GPU service URL (no ML container)
 #
+# Per-service backend overrides (optional — override individual services from the base mode):
+#   --transcript BACKEND    whisper | modal  (default: whisper for --cpu, modal for --gpu/--hosted)
+#   --diarization BACKEND   pyannote | modal (default: pyannote for --cpu, modal for --gpu/--hosted)
+#   --translation BACKEND   marian | modal | passthrough (default: marian for --cpu, modal for --gpu/--hosted)
+#   --padding BACKEND       pyav | modal     (default: pyav for --cpu, modal for --gpu/--hosted)
+#   --mixdown BACKEND       pyav | modal     (default: pyav for --cpu, modal for --gpu/--hosted)
+#
 # Local LLM (optional — for summarization & topic detection):
 #   --ollama-gpu       Local Ollama with NVIDIA GPU acceleration
 #   --ollama-cpu       Local Ollama on CPU only
@@ -38,12 +46,17 @@
 #   ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy --domain reflector.example.com
 #   ./scripts/setup-selfhosted.sh --cpu --ollama-cpu --garage --caddy
 #   ./scripts/setup-selfhosted.sh --hosted --garage --caddy
+#   ./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
+#   ./scripts/setup-selfhosted.sh --gpu --translation passthrough --garage --caddy
+#   ./scripts/setup-selfhosted.sh --cpu --diarization modal --translation modal --garage
 #   ./scripts/setup-selfhosted.sh --gpu --ollama-gpu --llm-model mistral --garage --caddy
 #   ./scripts/setup-selfhosted.sh --gpu --garage --caddy --password mysecretpass
-#   ./scripts/setup-selfhosted.sh --gpu --garage --caddy
-#   ./scripts/setup-selfhosted.sh --cpu
 #   ./scripts/setup-selfhosted.sh --gpu --caddy --domain reflector.local --custom-ca certs/
 #   ./scripts/setup-selfhosted.sh --hosted --custom-ca /path/to/corporate-ca.crt
+#   ./scripts/setup-selfhosted.sh                       # re-run with saved config
+#
+# Config memory: after a successful run, flags are saved to data/.selfhosted-last-args.
+# Re-running with no arguments replays the saved configuration automatically.
 #
 # The script auto-detects Daily.co (DAILY_API_KEY) and Whereby (WHEREBY_API_KEY)
 # from server/.env. If Daily.co is configured, Hatchet workflow services are
@@ -59,6 +72,7 @@ ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
 COMPOSE_FILE="$ROOT_DIR/docker-compose.selfhosted.yml"
 SERVER_ENV="$ROOT_DIR/server/.env"
 WWW_ENV="$ROOT_DIR/www/.env"
+LAST_ARGS_FILE="$ROOT_DIR/data/.selfhosted-last-args"

 OLLAMA_MODEL="qwen2.5:14b"
 OS="$(uname -s)"
@@ -178,6 +192,17 @@ compose_garage_cmd() {
    docker compose $files --profile garage "$@"
 }

+# --- Config memory: replay last args if none provided ---
+if [[ $# -eq 0 ]] && [[ -f "$LAST_ARGS_FILE" ]]; then
+    SAVED_ARGS="$(cat "$LAST_ARGS_FILE")"
+    if [[ -n "$SAVED_ARGS" ]]; then
+        info "No flags provided — replaying saved configuration:"
+        info "  $SAVED_ARGS"
+        echo ""
+        eval "set -- $SAVED_ARGS"
+    fi
+fi
+
 # --- Parse arguments ---
 MODEL_MODE=""       # gpu or cpu (required, mutually exclusive)
 OLLAMA_MODE=""      # ollama-gpu or ollama-cpu (optional)
@@ -189,6 +214,19 @@ ADMIN_PASSWORD=""   # optional admin password for password auth
 CUSTOM_CA=""        # --custom-ca: path to dir or CA cert file
 USE_CUSTOM_CA=false # derived flag: true when --custom-ca is provided
 EXTRA_CA_FILES=()   # --extra-ca: additional CA certs to trust (can be repeated)
+OVERRIDE_TRANSCRIPT=""    # per-service override: whisper | modal
+OVERRIDE_DIARIZATION=""   # per-service override: pyannote | modal
+OVERRIDE_TRANSLATION=""   # per-service override: marian | modal | passthrough
+OVERRIDE_PADDING=""       # per-service override: pyav | modal
+OVERRIDE_MIXDOWN=""       # per-service override: pyav | modal
+
+# Validate per-service backend override values
+validate_backend() {
+    local service="$1" value="$2"; shift 2; local valid=("$@")
+    for v in "${valid[@]}"; do [[ "$value" == "$v" ]] && return 0; done
+    err "--$service value '$value' is not valid. Choose one of: ${valid[*]}"
+    exit 1
+}

 SKIP_NEXT=false
 ARGS=("$@")
@@ -265,14 +303,65 @@ for i in "${!ARGS[@]}"; do
            EXTRA_CA_FILES+=("$extra_ca_file")
            USE_CUSTOM_CA=true
            SKIP_NEXT=true ;;
+        --transcript)
+            next_i=$((i + 1))
+            if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
+                err "--transcript requires a backend (whisper | modal)"
+                exit 1
+            fi
+            validate_backend "transcript" "${ARGS[$next_i]}" whisper modal
+            OVERRIDE_TRANSCRIPT="${ARGS[$next_i]}"
+            SKIP_NEXT=true ;;
+        --diarization)
+            next_i=$((i + 1))
+            if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
+                err "--diarization requires a backend (pyannote | modal)"
+                exit 1
+            fi
+            validate_backend "diarization" "${ARGS[$next_i]}" pyannote modal
+            OVERRIDE_DIARIZATION="${ARGS[$next_i]}"
+            SKIP_NEXT=true ;;
+        --translation)
+            next_i=$((i + 1))
+            if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
+                err "--translation requires a backend (marian | modal | passthrough)"
+                exit 1
+            fi
+            validate_backend "translation" "${ARGS[$next_i]}" marian modal passthrough
+            OVERRIDE_TRANSLATION="${ARGS[$next_i]}"
+            SKIP_NEXT=true ;;
+        --padding)
+            next_i=$((i + 1))
+            if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
+                err "--padding requires a backend (pyav | modal)"
+                exit 1
+            fi
+            validate_backend "padding" "${ARGS[$next_i]}" pyav modal
+            OVERRIDE_PADDING="${ARGS[$next_i]}"
+            SKIP_NEXT=true ;;
+        --mixdown)
+            next_i=$((i + 1))
+            if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
+                err "--mixdown requires a backend (pyav | modal)"
+                exit 1
+            fi
+            validate_backend "mixdown" "${ARGS[$next_i]}" pyav modal
+            OVERRIDE_MIXDOWN="${ARGS[$next_i]}"
+            SKIP_NEXT=true ;;
        *)
            err "Unknown argument: $arg"
-            err "Usage: $0 <--gpu|--cpu|--hosted> [--ollama-gpu|--ollama-cpu] [--llm-model MODEL] [--garage] [--caddy] [--domain DOMAIN] [--custom-ca PATH] [--password PASS] [--build]"
+            err "Usage: $0 <--gpu|--cpu|--hosted> [options] [--transcript BACKEND] [--diarization BACKEND] [--translation BACKEND] [--padding BACKEND] [--mixdown BACKEND]"
            exit 1
            ;;
    esac
 done

+# --- Save CLI args for config memory (re-run without flags) ---
+if [[ $# -gt 0 ]]; then
+    mkdir -p "$ROOT_DIR/data"
+    printf '%q ' "$@" > "$LAST_ARGS_FILE"
+fi
+
 # --- Resolve --custom-ca flag ---
 CA_CERT_PATH=""       # resolved path to CA certificate
 TLS_CERT_PATH=""      # resolved path to server cert (optional, for Caddy TLS)
@@ -330,13 +419,20 @@ fi
 if [[ -z "$MODEL_MODE" ]]; then
    err "No model mode specified. You must choose --gpu, --cpu, or --hosted."
    err ""
-    err "Usage: $0 <--gpu|--cpu|--hosted> [--ollama-gpu|--ollama-cpu] [--llm-model MODEL] [--garage] [--caddy] [--domain DOMAIN] [--custom-ca PATH] [--password PASS] [--build]"
+    err "Usage: $0 <--gpu|--cpu|--hosted> [options] [--transcript BACKEND] [--diarization BACKEND] [--translation BACKEND] [--padding BACKEND] [--mixdown BACKEND]"
    err ""
    err "ML processing modes (required):"
    err "  --gpu              NVIDIA GPU container for transcription/diarization/translation"
    err "  --cpu              In-process CPU processing (no ML container, slower)"
    err "  --hosted           Remote GPU service URL (no ML container)"
    err ""
+    err "Per-service backend overrides (optional — override individual services):"
+    err "  --transcript BACKEND    whisper | modal  (default: whisper for --cpu, modal for --gpu/--hosted)"
+    err "  --diarization BACKEND   pyannote | modal (default: pyannote for --cpu, modal for --gpu/--hosted)"
+    err "  --translation BACKEND   marian | modal | passthrough (default: marian for --cpu, modal for --gpu/--hosted)"
+    err "  --padding BACKEND       pyav | modal     (default: pyav for --cpu, modal for --gpu/--hosted)"
+    err "  --mixdown BACKEND       pyav | modal     (default: pyav for --cpu, modal for --gpu/--hosted)"
+    err ""
    err "Local LLM (optional):"
    err "  --ollama-gpu       Local Ollama with GPU (for summarization/topics)"
    err "  --ollama-cpu       Local Ollama on CPU (for summarization/topics)"
@@ -351,6 +447,8 @@ if [[ -z "$MODEL_MODE" ]]; then
    err "  --extra-ca FILE    Additional CA cert to trust (repeatable for multiple CAs)"
    err "  --password PASS    Enable password auth (admin@localhost) instead of public mode"
    err "  --build            Build backend/frontend images from source instead of pulling"
+    err ""
+    err "Tip: After your first run, re-run with no flags to reuse the same configuration."
    exit 1
 fi

@@ -374,9 +472,38 @@ OLLAMA_SVC=""
 [[ "$OLLAMA_MODE" == "ollama-gpu" ]] && USES_OLLAMA=true && OLLAMA_SVC="ollama"
 [[ "$OLLAMA_MODE" == "ollama-cpu" ]] && USES_OLLAMA=true && OLLAMA_SVC="ollama-cpu"

+# Resolve effective backend per service (override wins over base mode default)
+case "$MODEL_MODE" in
+    gpu|hosted)
+        EFF_TRANSCRIPT="${OVERRIDE_TRANSCRIPT:-modal}"
+        EFF_DIARIZATION="${OVERRIDE_DIARIZATION:-modal}"
+        EFF_TRANSLATION="${OVERRIDE_TRANSLATION:-modal}"
+        EFF_PADDING="${OVERRIDE_PADDING:-modal}"
+        EFF_MIXDOWN="${OVERRIDE_MIXDOWN:-modal}"
+        ;;
+    cpu)
+        EFF_TRANSCRIPT="${OVERRIDE_TRANSCRIPT:-whisper}"
+        EFF_DIARIZATION="${OVERRIDE_DIARIZATION:-pyannote}"
+        EFF_TRANSLATION="${OVERRIDE_TRANSLATION:-marian}"
+        EFF_PADDING="${OVERRIDE_PADDING:-pyav}"
+        EFF_MIXDOWN="${OVERRIDE_MIXDOWN:-pyav}"
+        ;;
+esac
+
+# Check if any per-service overrides were provided
+HAS_OVERRIDES=false
+[[ -n "$OVERRIDE_TRANSCRIPT" ]] && HAS_OVERRIDES=true
+[[ -n "$OVERRIDE_DIARIZATION" ]] && HAS_OVERRIDES=true
+[[ -n "$OVERRIDE_TRANSLATION" ]] && HAS_OVERRIDES=true
+[[ -n "$OVERRIDE_PADDING" ]] && HAS_OVERRIDES=true
+[[ -n "$OVERRIDE_MIXDOWN" ]] && HAS_OVERRIDES=true
+
 # Human-readable mode string for display
 MODE_DISPLAY="$MODEL_MODE"
 [[ -n "$OLLAMA_MODE" ]] && MODE_DISPLAY="$MODEL_MODE + $OLLAMA_MODE"
+if [[ "$HAS_OVERRIDES" == "true" ]]; then
+    MODE_DISPLAY="$MODE_DISPLAY (overrides: transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION, translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN)"
+fi

 # =========================================================
 # Step 0: Prerequisites
@@ -623,54 +750,30 @@ step_server_env() {
        env_set "$SERVER_ENV" "WEBRTC_HOST" "$PRIMARY_IP"
    fi

-    # Specialized models — backend configuration per mode
+    # Specialized models — backend configuration per service
    env_set "$SERVER_ENV" "DIARIZATION_ENABLED" "true"
+
+    # Resolve the URL for modal backends
+    local modal_url=""
    case "$MODEL_MODE" in
        gpu)
-            # GPU container aliased as "transcription" on docker network
-            env_set "$SERVER_ENV" "TRANSCRIPT_BACKEND" "modal"
-            env_set "$SERVER_ENV" "TRANSCRIPT_URL" "http://transcription:8000"
-            env_set "$SERVER_ENV" "TRANSCRIPT_MODAL_API_KEY" "selfhosted"
-            env_set "$SERVER_ENV" "DIARIZATION_BACKEND" "modal"
-            env_set "$SERVER_ENV" "DIARIZATION_URL" "http://transcription:8000"
-            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "modal"
-            env_set "$SERVER_ENV" "TRANSLATE_URL" "http://transcription:8000"
-            env_set "$SERVER_ENV" "PADDING_BACKEND" "modal"
-            env_set "$SERVER_ENV" "PADDING_URL" "http://transcription:8000"
-            ok "ML backends: GPU container (modal)"
-            ;;
-        cpu)
-            # In-process backends — no ML service container needed
-            env_set "$SERVER_ENV" "TRANSCRIPT_BACKEND" "whisper"
-            env_set "$SERVER_ENV" "DIARIZATION_BACKEND" "pyannote"
-            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "marian"
-            env_set "$SERVER_ENV" "PADDING_BACKEND" "pyav"
-            ok "ML backends: in-process CPU (whisper/pyannote/marian/pyav)"
+            modal_url="http://transcription:8000"
            ;;
        hosted)
            # Remote GPU service — user provides URL
-            local gpu_url=""
            if env_has_key "$SERVER_ENV" "TRANSCRIPT_URL"; then
-                gpu_url=$(env_get "$SERVER_ENV" "TRANSCRIPT_URL")
+                modal_url=$(env_get "$SERVER_ENV" "TRANSCRIPT_URL")
            fi
-            if [[ -z "$gpu_url" ]] && [[ -t 0 ]]; then
+            if [[ -z "$modal_url" ]] && [[ -t 0 ]]; then
                echo ""
                info "Enter the URL of your remote GPU service (e.g. https://gpu.example.com)"
-                read -rp "  GPU service URL: " gpu_url
+                read -rp "  GPU service URL: " modal_url
            fi
-            if [[ -z "$gpu_url" ]]; then
+            if [[ -z "$modal_url" ]]; then
                err "GPU service URL required for --hosted mode."
                err "Set TRANSCRIPT_URL in server/.env or provide it interactively."
                exit 1
            fi
-            env_set "$SERVER_ENV" "TRANSCRIPT_BACKEND" "modal"
-            env_set "$SERVER_ENV" "TRANSCRIPT_URL" "$gpu_url"
-            env_set "$SERVER_ENV" "DIARIZATION_BACKEND" "modal"
-            env_set "$SERVER_ENV" "DIARIZATION_URL" "$gpu_url"
-            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "modal"
-            env_set "$SERVER_ENV" "TRANSLATE_URL" "$gpu_url"
-            env_set "$SERVER_ENV" "PADDING_BACKEND" "modal"
-            env_set "$SERVER_ENV" "PADDING_URL" "$gpu_url"
            # API key for remote service
            local gpu_api_key=""
            if env_has_key "$SERVER_ENV" "TRANSCRIPT_MODAL_API_KEY"; then
@@ -682,15 +785,106 @@ step_server_env() {
            if [[ -n "$gpu_api_key" ]]; then
                env_set "$SERVER_ENV" "TRANSCRIPT_MODAL_API_KEY" "$gpu_api_key"
            fi
-            ok "ML backends: remote hosted ($gpu_url)"
+            ;;
+        cpu)
+            # CPU mode: modal_url stays empty. If services are overridden to modal,
+            # the user must configure the URL (TRANSCRIPT_URL etc.) in server/.env manually.
+            # We intentionally do NOT read from existing env here to avoid overwriting
+            # per-service URLs with a stale TRANSCRIPT_URL from a previous --gpu run.
            ;;
    esac

+    # Set each service backend independently using effective backends
+    # Transcript
+    case "$EFF_TRANSCRIPT" in
+        modal)
+            env_set "$SERVER_ENV" "TRANSCRIPT_BACKEND" "modal"
+            if [[ -n "$modal_url" ]]; then
+                env_set "$SERVER_ENV" "TRANSCRIPT_URL" "$modal_url"
+            fi
+            [[ "$MODEL_MODE" == "gpu" ]] && env_set "$SERVER_ENV" "TRANSCRIPT_MODAL_API_KEY" "selfhosted"
+            ;;
+        whisper)
+            env_set "$SERVER_ENV" "TRANSCRIPT_BACKEND" "whisper"
+            ;;
+    esac
+
+    # Diarization
+    case "$EFF_DIARIZATION" in
+        modal)
+            env_set "$SERVER_ENV" "DIARIZATION_BACKEND" "modal"
+            if [[ -n "$modal_url" ]]; then
+                env_set "$SERVER_ENV" "DIARIZATION_URL" "$modal_url"
+            fi
+            ;;
+        pyannote)
+            env_set "$SERVER_ENV" "DIARIZATION_BACKEND" "pyannote"
+            ;;
+    esac
+
+    # Translation
+    case "$EFF_TRANSLATION" in
+        modal)
+            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "modal"
+            if [[ -n "$modal_url" ]]; then
+                env_set "$SERVER_ENV" "TRANSLATE_URL" "$modal_url"
+            fi
+            ;;
+        marian)
+            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "marian"
+            ;;
+        passthrough)
+            env_set "$SERVER_ENV" "TRANSLATION_BACKEND" "passthrough"
+            ;;
+    esac
+
+    # Padding
+    case "$EFF_PADDING" in
+        modal)
+            env_set "$SERVER_ENV" "PADDING_BACKEND" "modal"
+            if [[ -n "$modal_url" ]]; then
+                env_set "$SERVER_ENV" "PADDING_URL" "$modal_url"
+            fi
+            ;;
+        pyav)
+            env_set "$SERVER_ENV" "PADDING_BACKEND" "pyav"
+            ;;
+    esac
+
+    # Mixdown
+    case "$EFF_MIXDOWN" in
+        modal)
+            env_set "$SERVER_ENV" "MIXDOWN_BACKEND" "modal"
+            if [[ -n "$modal_url" ]]; then
+                env_set "$SERVER_ENV" "MIXDOWN_URL" "$modal_url"
+            fi
+            ;;
+        pyav)
+            env_set "$SERVER_ENV" "MIXDOWN_BACKEND" "pyav"
+            ;;
+    esac
+
+    # Warn about modal overrides in CPU mode that need URL configuration
+    if [[ "$MODEL_MODE" == "cpu" ]] && [[ -z "$modal_url" ]]; then
+        local needs_url=false
+        [[ "$EFF_TRANSCRIPT" == "modal" ]] && needs_url=true
+        [[ "$EFF_DIARIZATION" == "modal" ]] && needs_url=true
+        [[ "$EFF_TRANSLATION" == "modal" ]] && needs_url=true
+        [[ "$EFF_PADDING" == "modal" ]] && needs_url=true
+        [[ "$EFF_MIXDOWN" == "modal" ]] && needs_url=true
+        if [[ "$needs_url" == "true" ]]; then
+            warn "One or more services are set to 'modal' but no service URL is configured."
+            warn "Set TRANSCRIPT_URL (and optionally TRANSCRIPT_MODAL_API_KEY) in server/.env"
+            warn "to point to your GPU service, then re-run this script."
+        fi
+    fi
+
+    ok "ML backends: transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION, translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN"
+
    # HuggingFace token for gated models (pyannote diarization)
-    # --gpu: written to root .env (docker compose passes to GPU container)
-    # --cpu: written to both root .env and server/.env (in-process pyannote needs it)
-    # --hosted: not needed (remote service handles its own auth)
-    if [[ "$MODEL_MODE" != "hosted" ]]; then
+    # Needed when: GPU container is running (MODEL_MODE=gpu), or diarization uses pyannote in-process
+    # Not needed when: all modal services point to a remote hosted URL with its own auth
+    if [[ "$MODEL_MODE" == "gpu" ]] || [[ "$EFF_DIARIZATION" == "pyannote" ]]; then
        local root_env="$ROOT_DIR/.env"
        local current_hf_token="${HF_TOKEN:-}"
        if [[ -f "$root_env" ]] && env_has_key "$root_env" "HF_TOKEN"; then
@@ -709,8 +903,8 @@ step_server_env() {
            touch "$root_env"
            env_set "$root_env" "HF_TOKEN" "$current_hf_token"
            export HF_TOKEN="$current_hf_token"
-            # In CPU mode, server process needs HF_TOKEN directly
-            if [[ "$MODEL_MODE" == "cpu" ]]; then
+            # When diarization runs in-process (pyannote), server process needs HF_TOKEN directly
+            if [[ "$EFF_DIARIZATION" == "pyannote" ]]; then
                env_set "$SERVER_ENV" "HF_TOKEN" "$current_hf_token"
            fi
            ok "HF_TOKEN configured"
@@ -743,11 +937,15 @@ step_server_env() {
        fi
    fi

-    # CPU mode: increase file processing timeouts (default 600s is too short for long audio on CPU)
-    if [[ "$MODEL_MODE" == "cpu" ]]; then
+    # Increase file processing timeouts for CPU backends (default 600s is too short for long audio on CPU)
+    if [[ "$EFF_TRANSCRIPT" == "whisper" ]]; then
        env_set "$SERVER_ENV" "TRANSCRIPT_FILE_TIMEOUT" "3600"
+    fi
+    if [[ "$EFF_DIARIZATION" == "pyannote" ]]; then
        env_set "$SERVER_ENV" "DIARIZATION_FILE_TIMEOUT" "3600"
-        ok "CPU mode — file processing timeouts set to 3600s (1 hour)"
+    fi
+    if [[ "$EFF_TRANSCRIPT" == "whisper" ]] || [[ "$EFF_DIARIZATION" == "pyannote" ]]; then
+        ok "CPU backend(s) detected — file processing timeouts set to 3600s (1 hour)"
    fi

    # Hatchet is always required (file, live, and multitrack pipelines all use it)
@@ -1175,9 +1373,9 @@ step_health() {
            warn "Check with: docker compose -f docker-compose.selfhosted.yml logs gpu"
        fi
    elif [[ "$MODEL_MODE" == "cpu" ]]; then
-        ok "CPU mode — ML processing runs in-process on server/worker (no separate service)"
+        ok "CPU mode — in-process backends run on server/worker (transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION, translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN)"
    elif [[ "$MODEL_MODE" == "hosted" ]]; then
-        ok "Hosted mode — ML processing via remote GPU service (no local health check)"
+        ok "Hosted mode — ML processing via remote GPU service (transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION, translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN)"
    fi

    # Ollama (if applicable)
@@ -1375,6 +1573,10 @@ main() {
    echo "=========================================="
    echo ""
    echo "  Models:  $MODEL_MODE"
+    if [[ "$HAS_OVERRIDES" == "true" ]]; then
+        echo "           transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION"
+        echo "           translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN"
+    fi
    echo "  LLM:     ${OLLAMA_MODE:-external}"
    echo "  Garage:  $USE_GARAGE"
    echo "  Caddy:   $USE_CADDY"
@@ -1487,7 +1689,13 @@ EOF
        echo "    API:       server:1250 (or localhost:1250 from host)"
    fi
    echo ""
-    echo "  Models:  $MODEL_MODE (transcription/diarization/translation)"
+    if [[ "$HAS_OVERRIDES" == "true" ]]; then
+        echo "  Models:  $MODEL_MODE base + overrides"
+        echo "           transcript=$EFF_TRANSCRIPT, diarization=$EFF_DIARIZATION"
+        echo "           translation=$EFF_TRANSLATION, padding=$EFF_PADDING, mixdown=$EFF_MIXDOWN"
+    else
+        echo "  Models:  $MODEL_MODE (transcription/diarization/translation/padding)"
+    fi
    [[ "$USE_GARAGE" == "true" ]] && echo "  Storage: Garage (local S3)"
    [[ "$USE_GARAGE" != "true" ]] && echo "  Storage: External S3"
    [[ "$USES_OLLAMA" == "true" ]] && echo "  LLM:     Ollama ($OLLAMA_MODEL) for summarization/topics"
@@ -1507,7 +1715,8 @@ EOF
        echo ""
    fi
    echo "  To stop:   docker compose -f docker-compose.selfhosted.yml down"
-    echo "  To re-run: ./scripts/setup-selfhosted.sh $*"
+    echo "  To re-run: ./scripts/setup-selfhosted.sh          (replays saved config)"
+    echo "  Last args: $*"
    echo ""
 }

--- a/server/reflector/hatchet/workflows/daily_multitrack_pipeline.py
+++ b/server/reflector/hatchet/workflows/daily_multitrack_pipeline.py
@@ -84,7 +84,7 @@ from reflector.hatchet.workflows.topic_chunk_processing import (
 from reflector.hatchet.workflows.track_processing import TrackInput, track_workflow
 from reflector.logger import logger
 from reflector.pipelines import topic_processing
-from reflector.processors import AudioFileWriterProcessor
+from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
 from reflector.processors.summary.models import ActionItemsResponse
 from reflector.processors.summary.prompts import (
    RECAP_PROMPT,
@@ -99,10 +99,6 @@ from reflector.utils.audio_constants import (
    PRESIGNED_URL_EXPIRATION_SECONDS,
    WAVEFORM_SEGMENTS,
 )
-from reflector.utils.audio_mixdown import (
-    detect_sample_rate_from_tracks,
-    mixdown_tracks_pyav,
-)
 from reflector.utils.audio_waveform import get_audio_waveform
 from reflector.utils.daily import (
    filter_cam_audio_tracks,
@@ -539,7 +535,7 @@ async def process_tracks(input: PipelineInput, ctx: Context) -> ProcessTracksRes
 )
@with_error_handling(TaskName.MIXDOWN_TRACKS)
 async def mixdown_tracks(input: PipelineInput, ctx: Context) -> MixdownResult:
-    """Mix all padded tracks into single audio file using PyAV (same as Celery)."""
+    """Mix all padded tracks into single audio file via configured backend."""
    ctx.log("mixdown_tracks: mixing padded tracks into single audio file")

    track_result = ctx.task_output(process_tracks)
@@ -579,37 +575,33 @@ async def mixdown_tracks(input: PipelineInput, ctx: Context) -> MixdownResult:
    if not valid_urls:
        raise ValueError("No valid padded tracks to mixdown")

-    target_sample_rate = detect_sample_rate_from_tracks(valid_urls, logger=logger)
-    if not target_sample_rate:
-        logger.error("Mixdown failed - no decodable audio frames found")
-        raise ValueError("No decodable audio frames in any track")
-
-    output_path = tempfile.mktemp(suffix=".mp3")
-    duration_ms_callback_capture_container = [0.0]
-
-    async def capture_duration(d):
-        duration_ms_callback_capture_container[0] = d
-
-    writer = AudioFileWriterProcessor(path=output_path, on_duration=capture_duration)
-
-    await mixdown_tracks_pyav(
-        valid_urls,
-        writer,
-        target_sample_rate,
-        offsets_seconds=None,
-        logger=logger,
-        progress_callback=make_audio_progress_logger(ctx, TaskName.MIXDOWN_TRACKS),
-        expected_duration_sec=recording_duration if recording_duration > 0 else None,
-    )
-    await writer.flush()
-
-    file_size = Path(output_path).stat().st_size
    storage_path = f"{input.transcript_id}/audio.mp3"

-    with open(output_path, "rb") as mixed_file:
-        await storage.put_file(storage_path, mixed_file)
+    # Generate presigned PUT URL for the output (used by modal backend;
+    # pyav backend ignores it and writes locally instead)
+    output_url = await storage.get_file_url(
+        storage_path,
+        operation="put_object",
+        expires_in=PRESIGNED_URL_EXPIRATION_SECONDS,
+    )

-    Path(output_path).unlink(missing_ok=True)
+    processor = AudioMixdownAutoProcessor()
+    result = await processor.mixdown_tracks(
+        valid_urls, output_url, offsets_seconds=None
+    )
+
+    if result.output_path:
+        # Pyav backend wrote locally — upload to storage ourselves
+        output_file = Path(result.output_path)
+        with open(output_file, "rb") as mixed_file:
+            await storage.put_file(storage_path, mixed_file)
+        output_file.unlink(missing_ok=True)
+        # Clean up the temp directory the pyav processor created
+        try:
+            output_file.parent.rmdir()
+        except OSError:
+            pass
+    # else: modal backend already uploaded to output_url

    async with fresh_db_connection():
        from reflector.db.transcripts import transcripts_controller  # noqa: PLC0415
@@ -620,11 +612,11 @@ async def mixdown_tracks(input: PipelineInput, ctx: Context) -> MixdownResult:
                transcript, {"audio_location": "storage"}
            )

-    ctx.log(f"mixdown_tracks complete: uploaded {file_size} bytes to {storage_path}")
+    ctx.log(f"mixdown_tracks complete: {result.size} bytes to {storage_path}")

    return MixdownResult(
        audio_key=storage_path,
-        duration=duration_ms_callback_capture_container[0],
+        duration=result.duration_ms,
        tracks_mixed=len(valid_urls),
    )

--- a/server/reflector/processors/init.py
+++ b/server/reflector/processors/init.py
@@ -4,6 +4,8 @@ from .audio_diarization_auto import AudioDiarizationAutoProcessor  # noqa: F401
 from .audio_downscale import AudioDownscaleProcessor  # noqa: F401
 from .audio_file_writer import AudioFileWriterProcessor  # noqa: F401
 from .audio_merge import AudioMergeProcessor  # noqa: F401
+from .audio_mixdown import AudioMixdownProcessor  # noqa: F401
+from .audio_mixdown_auto import AudioMixdownAutoProcessor  # noqa: F401
 from .audio_padding import AudioPaddingProcessor  # noqa: F401
 from .audio_padding_auto import AudioPaddingAutoProcessor  # noqa: F401
 from .audio_transcript import AudioTranscriptProcessor  # noqa: F401
--- a/server/reflector/processors/audio_mixdown.py
+++ b/server/reflector/processors/audio_mixdown.py
@@ -0,0 +1,27 @@
+"""
+Base class for audio mixdown processors.
+"""
+
+from pydantic import BaseModel
+
+
+class MixdownResponse(BaseModel):
+    size: int
+    duration_ms: float = 0.0
+    cancelled: bool = False
+    output_path: str | None = (
+        None  # Local file path (pyav sets this; modal leaves None)
+    )
+
+
+class AudioMixdownProcessor:
+    """Base class for audio mixdown processors."""
+
+    async def mixdown_tracks(
+        self,
+        track_urls: list[str],
+        output_url: str,
+        target_sample_rate: int | None = None,
+        offsets_seconds: list[float] | None = None,
+    ) -> MixdownResponse:
+        raise NotImplementedError
--- a/server/reflector/processors/audio_mixdown_auto.py
+++ b/server/reflector/processors/audio_mixdown_auto.py
@@ -0,0 +1,32 @@
+import importlib
+
+from reflector.processors.audio_mixdown import AudioMixdownProcessor
+from reflector.settings import settings
+
+
+class AudioMixdownAutoProcessor(AudioMixdownProcessor):
+    _registry = {}
+
+    @classmethod
+    def register(cls, name, kclass):
+        cls._registry[name] = kclass
+
+    def __new__(cls, name: str | None = None, **kwargs):
+        if name is None:
+            name = settings.MIXDOWN_BACKEND
+        if name not in cls._registry:
+            module_name = f"reflector.processors.audio_mixdown_{name}"
+            importlib.import_module(module_name)
+
+        # gather specific configuration for the processor
+        # search `MIXDOWN_XXX_YYY`, push to constructor as `xxx_yyy`
+        config = {}
+        name_upper = name.upper()
+        settings_prefix = "MIXDOWN_"
+        config_prefix = f"{settings_prefix}{name_upper}_"
+        for key, value in settings:
+            if key.startswith(config_prefix):
+                config_name = key[len(settings_prefix) :].lower()
+                config[config_name] = value
+
+        return cls._registry[name](**config | kwargs)
--- a/server/reflector/processors/audio_mixdown_modal.py
+++ b/server/reflector/processors/audio_mixdown_modal.py
@@ -0,0 +1,110 @@
+"""
+Modal.com backend for audio mixdown.
+"""
+
+import asyncio
+import os
+
+import httpx
+
+from reflector.hatchet.constants import TIMEOUT_HEAVY_HTTP
+from reflector.logger import logger
+from reflector.processors.audio_mixdown import AudioMixdownProcessor, MixdownResponse
+from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
+
+
+class AudioMixdownModalProcessor(AudioMixdownProcessor):
+    """Audio mixdown processor using Modal.com/self-hosted backend via HTTP."""
+
+    def __init__(
+        self, mixdown_url: str | None = None, modal_api_key: str | None = None
+    ):
+        self.mixdown_url = mixdown_url or os.getenv("MIXDOWN_URL")
+        if not self.mixdown_url:
+            raise ValueError(
+                "MIXDOWN_URL required to use AudioMixdownModalProcessor. "
+                "Set MIXDOWN_URL environment variable or pass mixdown_url parameter."
+            )
+
+        self.modal_api_key = modal_api_key or os.getenv("MODAL_API_KEY")
+
+    async def mixdown_tracks(
+        self,
+        track_urls: list[str],
+        output_url: str,
+        target_sample_rate: int | None = None,
+        offsets_seconds: list[float] | None = None,
+    ) -> MixdownResponse:
+        """Mix audio tracks via remote Modal/self-hosted backend.
+
+        Args:
+            track_urls: Presigned GET URLs for source audio tracks
+            output_url: Presigned PUT URL for output MP3
+            target_sample_rate: Sample rate for output (Hz), auto-detected if None
+            offsets_seconds: Optional per-track delays in seconds for alignment
+        """
+        valid_count = len([u for u in track_urls if u])
+        log = logger.bind(track_count=valid_count)
+        log.info("Sending Modal mixdown HTTP request")
+
+        url = f"{self.mixdown_url}/mixdown"
+
+        headers = {}
+        if self.modal_api_key:
+            headers["Authorization"] = f"Bearer {self.modal_api_key}"
+
+        # Scale timeout with track count: base TIMEOUT_HEAVY_HTTP + 60s per track beyond 2
+        extra_timeout = max(0, (valid_count - 2)) * 60
+        timeout = TIMEOUT_HEAVY_HTTP + extra_timeout
+
+        try:
+            async with httpx.AsyncClient(timeout=timeout) as client:
+                response = await client.post(
+                    url,
+                    headers=headers,
+                    json={
+                        "track_urls": track_urls,
+                        "output_url": output_url,
+                        "target_sample_rate": target_sample_rate,
+                        "offsets_seconds": offsets_seconds,
+                    },
+                    follow_redirects=True,
+                )
+
+                if response.status_code != 200:
+                    error_body = response.text
+                    log.error(
+                        "Modal mixdown API error",
+                        status_code=response.status_code,
+                        error_body=error_body,
+                    )
+
+                response.raise_for_status()
+                result = response.json()
+
+            # Check if work was cancelled
+            if result.get("cancelled"):
+                log.warning("Modal mixdown was cancelled by disconnect detection")
+                raise asyncio.CancelledError(
+                    "Mixdown cancelled due to client disconnect"
+                )
+
+            log.info("Modal mixdown complete", size=result["size"])
+            return MixdownResponse(**result)
+        except asyncio.CancelledError:
+            log.warning(
+                "Modal mixdown cancelled (Hatchet timeout, disconnect detected on Modal side)"
+            )
+            raise
+        except httpx.TimeoutException as e:
+            log.error("Modal mixdown timeout", error=str(e), exc_info=True)
+            raise Exception(f"Modal mixdown timeout: {e}") from e
+        except httpx.HTTPStatusError as e:
+            log.error("Modal mixdown HTTP error", error=str(e), exc_info=True)
+            raise Exception(f"Modal mixdown HTTP error: {e}") from e
+        except Exception as e:
+            log.error("Modal mixdown unexpected error", error=str(e), exc_info=True)
+            raise
+
+
+AudioMixdownAutoProcessor.register("modal", AudioMixdownModalProcessor)
--- a/server/reflector/processors/audio_mixdown_pyav.py
+++ b/server/reflector/processors/audio_mixdown_pyav.py
@@ -0,0 +1,101 @@
+"""
+PyAV audio mixdown processor.
+
+Mixes N tracks in-process using the existing utility from reflector.utils.audio_mixdown.
+Writes to a local temp file (does NOT upload to S3 — the pipeline handles upload).
+"""
+
+import os
+import tempfile
+
+from reflector.logger import logger
+from reflector.processors.audio_file_writer import AudioFileWriterProcessor
+from reflector.processors.audio_mixdown import AudioMixdownProcessor, MixdownResponse
+from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
+from reflector.utils.audio_mixdown import (
+    detect_sample_rate_from_tracks,
+    mixdown_tracks_pyav,
+)
+
+
+class AudioMixdownPyavProcessor(AudioMixdownProcessor):
+    """Audio mixdown processor using PyAV (no HTTP backend).
+
+    Writes the mixed output to a local temp file and returns its path
+    in MixdownResponse.output_path. The caller is responsible for
+    uploading the file and cleaning it up.
+    """
+
+    async def mixdown_tracks(
+        self,
+        track_urls: list[str],
+        output_url: str,
+        target_sample_rate: int | None = None,
+        offsets_seconds: list[float] | None = None,
+    ) -> MixdownResponse:
+        log = logger.bind(track_count=len(track_urls))
+        log.info("Starting local PyAV mixdown")
+
+        valid_urls = [url for url in track_urls if url]
+        if not valid_urls:
+            raise ValueError("No valid track URLs provided")
+
+        # Auto-detect sample rate if not provided
+        if target_sample_rate is None:
+            target_sample_rate = detect_sample_rate_from_tracks(
+                valid_urls, logger=logger
+            )
+            if not target_sample_rate:
+                raise ValueError("No decodable audio frames in any track")
+
+        # Write to temp MP3 file
+        temp_dir = tempfile.mkdtemp()
+        output_path = os.path.join(temp_dir, "mixed.mp3")
+        duration_ms_container = [0.0]
+
+        async def capture_duration(d):
+            duration_ms_container[0] = d
+
+        writer = AudioFileWriterProcessor(
+            path=output_path, on_duration=capture_duration
+        )
+
+        try:
+            await mixdown_tracks_pyav(
+                valid_urls,
+                writer,
+                target_sample_rate,
+                offsets_seconds=offsets_seconds,
+                logger=logger,
+            )
+            await writer.flush()
+
+            file_size = os.path.getsize(output_path)
+            log.info(
+                "Local mixdown complete",
+                size=file_size,
+                duration_ms=duration_ms_container[0],
+            )
+
+            return MixdownResponse(
+                size=file_size,
+                duration_ms=duration_ms_container[0],
+                output_path=output_path,
+            )
+
+        except Exception as e:
+            # Cleanup on failure
+            if os.path.exists(output_path):
+                try:
+                    os.unlink(output_path)
+                except Exception:
+                    pass
+            try:
+                os.rmdir(temp_dir)
+            except Exception:
+                pass
+            log.error("Local mixdown failed", error=str(e), exc_info=True)
+            raise
+
+
+AudioMixdownAutoProcessor.register("pyav", AudioMixdownPyavProcessor)
--- a/server/reflector/settings.py
+++ b/server/reflector/settings.py
@@ -127,6 +127,14 @@ class Settings(BaseSettings):
    PADDING_URL: str | None = None
    PADDING_MODAL_API_KEY: str | None = None

+    # Audio Mixdown
+    # backends:
+    #   - pyav: in-process PyAV mixdown (no HTTP, runs in same process)
+    #   - modal: HTTP API client (works with Modal.com OR self-hosted gpu/self_hosted/)
+    MIXDOWN_BACKEND: str = "pyav"
+    MIXDOWN_URL: str | None = None
+    MIXDOWN_MODAL_API_KEY: str | None = None
+
    # Sentry
    SENTRY_DSN: str | None = None