Compare commits

...

31 Commits

Author SHA1 Message Date
dependabot[bot]
2f14df4b44 build(deps): bump the uv group across 2 directories with 1 update
Bumps the uv group with 1 update in the /gpu/self_hosted directory: [transformers](https://github.com/huggingface/transformers).
Bumps the uv group with 1 update in the /server directory: [transformers](https://github.com/huggingface/transformers).


Updates `transformers` from 4.56.1 to 5.0.0rc3
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.56.1...v5.0.0rc3)

Updates `transformers` from 4.53.2 to 5.0.0rc3
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.56.1...v5.0.0rc3)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 5.0.0rc3
  dependency-type: direct:production
  dependency-group: uv
- dependency-name: transformers
  dependency-version: 5.0.0rc3
  dependency-type: direct:production
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-08 07:36:05 +00:00
Juan Diego García
7ed3b781ee chore(main): release 0.44.0 (#947) 2026-04-07 18:40:08 -05:00
Juan Diego García
26239f05a3 fix: deactivate meeting button and better deactivation heuristics (#950) 2026-04-07 17:18:11 -05:00
Juan Diego García
08c276e4f4 fix: allow anonymous recording start and add light theme email icon (#949)
* fix: all anounimous can start recording

* fix: give more schedule timeout to heavy processes

* fix:  add light mode support for daily icons
2026-04-07 17:15:50 -05:00
Juan Diego García
f4f94a0d99 feat: allow livekit to work with tunnels udp and tcp with quickguide (#948) 2026-04-07 17:14:55 -05:00
dependabot[bot]
df782107d7 build(deps): bump the uv group across 2 directories with 1 update (#945)
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: indirect
  dependency-group: uv
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: direct:production
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-07 12:14:52 -05:00
Juan Diego García
bc8338fa4f feat: Livekit - Selfhost video room solution (#946)
* feat: Livekit bare no recording nor pipeline

* feat: full livekit pipeline

* fix: caddy hatchet with livekit

* fix: caddy livekit

* fix: hatchet tls

* fix: agg to webm for no padding

* fix: reflector user id on participants and duration fix

* fix: better docs and internal review fixes

* fix: remove video files livekit
2026-04-07 11:55:16 -05:00
Juan Diego García
b570d202dc chore(main): release 0.43.0 (#940) 2026-03-31 19:27:00 -05:00
Juan Diego García
8c4f5e9c0f fix: cpu usage + email improvements (#944)
* fix: cpu usage on server ws manager, 100% to 0% on idle

* fix:  change email icon to white and prefill email in daily room for authenticated users

* fix: improve email sending with full ts transcript
2026-03-31 16:34:10 -05:00
Juan Diego García
ec8b49738e feat: show trash for soft deleted transcripts and hard delete option (#942)
* feat: show trash for soft deleted transcripts and hard delete option

* fix: test fixtures

* docs: aws new permissions
2026-03-31 13:15:52 -05:00
Juan Diego García
cc9c5cd4a5 fix: add parakeet as default transcriber and fix diarizer image (#939) 2026-03-31 10:22:57 -05:00
Juan Diego García
61d6fbd344 chore(main): release 0.42.0 (#935) 2026-03-30 18:48:27 -05:00
Juan Diego García
7b3b5b9858 fix: remove share public from integration tests (#938) 2026-03-30 18:02:56 -05:00
Juan Diego García
a22789d548 fix: grpc tls for local hatchet (#937) 2026-03-30 17:46:23 -05:00
dependabot[bot]
e3cc646cf5 build(deps): bump the npm_and_yarn group across 1 directory with 2 updates (#934)
Bumps the npm_and_yarn group with 2 updates in the /docs directory: [brace-expansion](https://github.com/juliangruber/brace-expansion) and [path-to-regexp](https://github.com/pillarjs/path-to-regexp).


Updates `brace-expansion` from 1.1.12 to 1.1.13
- [Release notes](https://github.com/juliangruber/brace-expansion/releases)
- [Commits](https://github.com/juliangruber/brace-expansion/compare/v1.1.12...v1.1.13)

Updates `path-to-regexp` from 0.1.12 to 0.1.13
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/v.0.1.13/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v0.1.12...v.0.1.13)

---
updated-dependencies:
- dependency-name: brace-expansion
  dependency-version: 1.1.13
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: path-to-regexp
  dependency-version: 0.1.13
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 17:38:52 -05:00
dependabot[bot]
778ff6268c build(deps): bump cryptography (#932)
Bumps the uv group with 1 update in the /server directory: [cryptography](https://github.com/pyca/cryptography).


Updates `cryptography` from 46.0.5 to 46.0.6
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.5...46.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.6
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 17:38:37 -05:00
Juan Diego García
d164e486cc feat: mixdown modal services + processor pattern (#936)
* allow memory flags and per service config

* feat: mixdown modal services + processor pattern
2026-03-30 17:38:23 -05:00
Juan Diego García
12bf0c2d77 feat: custom ca for caddy (#931)
* fix: send email on transcript page permissions fixed

* feat: custom ca for caddy
2026-03-30 11:42:39 -05:00
dependabot[bot]
bfaf4f403b build(deps): bump the uv group across 2 directories with 1 update (#930)
Bumps the uv group with 1 update in the /gpu/self_hosted directory: [requests](https://github.com/psf/requests).
Bumps the uv group with 1 update in the /server directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.32.5 to 2.33.0
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

Updates `requests` from 2.32.4 to 2.33.0
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: indirect
  dependency-group: uv
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: direct:production
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 10:27:11 -05:00
dependabot[bot]
0258754a4c build(deps): bump picomatch (#929)
Bumps the npm_and_yarn group with 1 update in the /docs directory: [picomatch](https://github.com/micromatch/picomatch).


Updates `picomatch` from 2.3.1 to 2.3.2
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/picomatch/compare/2.3.1...2.3.2)

---
updated-dependencies:
- dependency-name: picomatch
  dependency-version: 2.3.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 10:26:16 -05:00
Juan Diego García
ea89fa5261 chore(main): release 0.41.0 (#925) 2026-03-25 17:33:44 -05:00
Juan Diego García
1f98790e7b feat: zulip dag monitor for failed runs (#928)
* feat: zulip dag monitor for failed runs

* fix: add collapsible tags to big information
2026-03-25 17:26:41 -05:00
dependabot[bot]
7b8d190c52 build(deps): bump the uv group across 1 directory with 2 updates (#927)
Bumps the uv group with 2 updates in the /server directory: [nltk](https://github.com/nltk/nltk) and [pypdf](https://github.com/py-pdf/pypdf).


Updates `nltk` from 3.9.3 to 3.9.4
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.9.3...3.9.4)

Updates `pypdf` from 6.9.1 to 6.9.2
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.9.1...6.9.2)

---
updated-dependencies:
- dependency-name: nltk
  dependency-version: 3.9.4
  dependency-type: indirect
  dependency-group: uv
- dependency-name: pypdf
  dependency-version: 6.9.2
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-25 17:13:43 -05:00
Juan Diego García
f19113a3cf feat: add auto-generated captions, speaker-colored progress bar with sync controls, and speaker tooltip to cloud video player (#926)
* feat: webvtt captions inside video with sync controls

* feat: highlight speaker timestamp progress bar
2026-03-25 12:07:08 -05:00
Juan Diego García
e2ba502697 feat: send email in share transcript and add email sending in room (#924)
* fix: add source language for file pipeline

* feat: send email in share transcript and add email sending in room

* fix: hide audio and video streaming for unauthenticated users

* fix: security order
2026-03-24 17:17:52 -05:00
Juan Diego García
74b9b97453 chore(main): release 0.40.0 (#921) 2026-03-20 15:57:59 -05:00
dependabot[bot]
9e37d60b3f build(deps): bump flatted (#922)
Bumps the npm_and_yarn group with 1 update in the /www directory: [flatted](https://github.com/WebReflection/flatted).


Updates `flatted` from 3.4.1 to 3.4.2
- [Commits](https://github.com/WebReflection/flatted/compare/v3.4.1...v3.4.2)

---
updated-dependencies:
- dependency-name: flatted
  dependency-version: 3.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-20 15:44:14 -05:00
Juan Diego García
55222ecc47 feat: allow participants to ask for email transcript (#923)
* feat: allow participants to ask for email transcript

* fix: set email update in a transaction
2026-03-20 15:43:58 -05:00
dependabot[bot]
41e7b3e84f build(deps): bump socket.io-parser (#918)
Bumps the npm_and_yarn group with 1 update in the /www directory: [socket.io-parser](https://github.com/socketio/socket.io).


Updates `socket.io-parser` from 4.2.5 to 4.2.6
- [Release notes](https://github.com/socketio/socket.io/releases)
- [Changelog](https://github.com/socketio/socket.io/blob/main/CHANGELOG.md)
- [Commits](https://github.com/socketio/socket.io/compare/socket.io-parser@4.2.5...socket.io-parser@4.2.6)

---
updated-dependencies:
- dependency-name: socket.io-parser
  dependency-version: 4.2.6
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-20 11:33:14 -05:00
dependabot[bot]
e5712a4168 build(deps): bump pypdf in /server in the uv group across 1 directory (#917)
Bumps the uv group with 1 update in the /server directory: [pypdf](https://github.com/py-pdf/pypdf).


Updates `pypdf` from 6.8.0 to 6.9.1
- [Release notes](https://github.com/py-pdf/pypdf/releases)
- [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/py-pdf/pypdf/compare/6.8.0...6.9.1)

---
updated-dependencies:
- dependency-name: pypdf
  dependency-version: 6.9.1
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-20 11:33:00 -05:00
Juan Diego García
a76f114378 feat: download files, show cloud video, solf deletion with no reprocessing (#920)
* fix: move upd ports out of MacOS internal Range

* feat: download files, show cloud video, solf deletion with no reprocessing
2026-03-20 11:04:53 -05:00
137 changed files with 13502 additions and 1083 deletions

7
.gitignore vendored
View File

@@ -3,6 +3,8 @@ server/.env
server/.env.production
.env
Caddyfile
livekit.yaml
egress.yaml
.env.hatchet
server/exportdanswer
.vercel
@@ -24,5 +26,10 @@ www/.env.production
.secrets
opencode.json
certs/
docker-compose.ca.yml
docker-compose.gpu-ca.yml
Caddyfile.gpu-host
.env.gpu-host
vibedocs/
server/tests/integration/logs/

View File

@@ -1,5 +1,63 @@
# Changelog
## [0.44.0](https://github.com/GreyhavenHQ/reflector/compare/v0.43.0...v0.44.0) (2026-04-07)
### Features
* allow livekit to work with tunnels udp and tcp with quickguide ([#948](https://github.com/GreyhavenHQ/reflector/issues/948)) ([f4f94a0](https://github.com/GreyhavenHQ/reflector/commit/f4f94a0d9998030e5ef7f01935d99722045165ac))
* Livekit - Selfhost video room solution ([#946](https://github.com/GreyhavenHQ/reflector/issues/946)) ([bc8338f](https://github.com/GreyhavenHQ/reflector/commit/bc8338fa4f136534f5f27784f5dd10d47cecf412))
### Bug Fixes
* allow anonymous recording start and add light theme email icon ([#949](https://github.com/GreyhavenHQ/reflector/issues/949)) ([08c276e](https://github.com/GreyhavenHQ/reflector/commit/08c276e4f493c7960158a44e77062af1b30c3352))
* deactivate meeting button and better deactivation heuristics ([#950](https://github.com/GreyhavenHQ/reflector/issues/950)) ([26239f0](https://github.com/GreyhavenHQ/reflector/commit/26239f05a34af07ebba764d669343c32e40e63bf))
## [0.43.0](https://github.com/GreyhavenHQ/reflector/compare/v0.42.0...v0.43.0) (2026-03-31)
### Features
* show trash for soft deleted transcripts and hard delete option ([#942](https://github.com/GreyhavenHQ/reflector/issues/942)) ([ec8b497](https://github.com/GreyhavenHQ/reflector/commit/ec8b49738e8e76f6e5d2496a42cb454ef6c2d7c7))
### Bug Fixes
* add parakeet as default transcriber and fix diarizer image ([#939](https://github.com/GreyhavenHQ/reflector/issues/939)) ([cc9c5cd](https://github.com/GreyhavenHQ/reflector/commit/cc9c5cd4a5f4123ef957ad82461ca37a727d1ba6))
* cpu usage + email improvements ([#944](https://github.com/GreyhavenHQ/reflector/issues/944)) ([8c4f5e9](https://github.com/GreyhavenHQ/reflector/commit/8c4f5e9c0f893f4cb029595505b53136f04760f4))
## [0.42.0](https://github.com/GreyhavenHQ/reflector/compare/v0.41.0...v0.42.0) (2026-03-30)
### Features
* custom ca for caddy ([#931](https://github.com/GreyhavenHQ/reflector/issues/931)) ([12bf0c2](https://github.com/GreyhavenHQ/reflector/commit/12bf0c2d77f9915b79b1eb1decd77ed2dadbb31d))
* mixdown modal services + processor pattern ([#936](https://github.com/GreyhavenHQ/reflector/issues/936)) ([d164e48](https://github.com/GreyhavenHQ/reflector/commit/d164e486cc33ff8babf6cff6c163893cfc56fd76))
### Bug Fixes
* grpc tls for local hatchet ([#937](https://github.com/GreyhavenHQ/reflector/issues/937)) ([a22789d](https://github.com/GreyhavenHQ/reflector/commit/a22789d5486bf8b83e33ab2fb5eb3ee9799c6d47))
* remove share public from integration tests ([#938](https://github.com/GreyhavenHQ/reflector/issues/938)) ([7b3b5b9](https://github.com/GreyhavenHQ/reflector/commit/7b3b5b98586449afd0b6996ba9fd7aec8308bbc6))
## [0.41.0](https://github.com/GreyhavenHQ/reflector/compare/v0.40.0...v0.41.0) (2026-03-25)
### Features
* add auto-generated captions, speaker-colored progress bar with sync controls, and speaker tooltip to cloud video player ([#926](https://github.com/GreyhavenHQ/reflector/issues/926)) ([f19113a](https://github.com/GreyhavenHQ/reflector/commit/f19113a3cfa27797a70b9496bfcf1baff9d89f0d))
* send email in share transcript and add email sending in room ([#924](https://github.com/GreyhavenHQ/reflector/issues/924)) ([e2ba502](https://github.com/GreyhavenHQ/reflector/commit/e2ba502697ce331c4d87fb019648fcbe4e7cca73))
* zulip dag monitor for failed runs ([#928](https://github.com/GreyhavenHQ/reflector/issues/928)) ([1f98790](https://github.com/GreyhavenHQ/reflector/commit/1f98790e7bc58013690ec81aefa051da5e36e93e))
## [0.40.0](https://github.com/GreyhavenHQ/reflector/compare/v0.39.0...v0.40.0) (2026-03-20)
### Features
* allow participants to ask for email transcript ([#923](https://github.com/GreyhavenHQ/reflector/issues/923)) ([55222ec](https://github.com/GreyhavenHQ/reflector/commit/55222ecc4736f99ad461f03a006c8d97b5876142))
* download files, show cloud video, solf deletion with no reprocessing ([#920](https://github.com/GreyhavenHQ/reflector/issues/920)) ([a76f114](https://github.com/GreyhavenHQ/reflector/commit/a76f1143783d3cf137a8847a851b72302e04445b))
## [0.39.0](https://github.com/GreyhavenHQ/reflector/compare/v0.38.2...v0.39.0) (2026-03-18)

View File

@@ -41,14 +41,14 @@ uv run celery -A reflector.worker.app beat
**Testing:**
```bash
# Run all tests with coverage
uv run pytest
# Run all tests with coverage (requires Redis on localhost)
REDIS_HOST=localhost REDIS_PORT=6379 uv run pytest
# Run specific test file
uv run pytest tests/test_transcripts.py
REDIS_HOST=localhost REDIS_PORT=6379 uv run pytest tests/test_transcripts.py
# Run tests with verbose output
uv run pytest -v
REDIS_HOST=localhost REDIS_PORT=6379 uv run pytest -v
```
**Process Audio Files:**
@@ -192,3 +192,13 @@ Modal.com integration for scalable ML processing:
## Pipeline/worker related info
If you need to do any worker/pipeline related work, search for "Pipeline" classes and their "create" or "build" methods to find the main processor sequence. Look for task orchestration patterns (like "chord", "group", or "chain") to identify the post-processing flow with parallel execution chains. This will give you abstract vision on how processing pipeling is organized.
## Documentation
- New documentation files go in `docsv2/`, not in `docs/docs/`.
- Existing `docs/` directory contains legacy Docusaurus docs.
## Code Style
- Always put imports at the top of the file. Let ruff/pre-commit handle sorting and formatting of imports.
- Exception: In Hatchet pipeline task functions, DB controller imports (e.g., `transcripts_controller`, `meetings_controller`) stay as deferred/inline imports inside `fresh_db_connection()` blocks — this is intentional to avoid sharing DB connections across forked processes. Non-DB imports (utilities, services) should still go at the top of the file.

106
docker-compose.gpu-host.yml Normal file
View File

@@ -0,0 +1,106 @@
# Standalone GPU host for Reflector — transcription, diarization, translation.
#
# Usage: ./scripts/setup-gpu-host.sh [--domain DOMAIN] [--custom-ca PATH] [--api-key KEY] [--cpu]
# or: docker compose -f docker-compose.gpu-host.yml --profile gpu [--profile caddy] up -d
#
# Processing mode (pick ONE — mutually exclusive, both bind port 8000):
# --profile gpu NVIDIA GPU container (requires nvidia-container-toolkit)
# --profile cpu CPU-only container (no GPU required, slower)
#
# Optional:
# --profile caddy Caddy reverse proxy with HTTPS
#
# This file is checked into the repo. The setup script generates:
# - .env.gpu-host (HF_TOKEN, API key, port config)
# - Caddyfile.gpu-host (Caddy config, only with --domain)
# - docker-compose.gpu-ca.yml (CA cert mounts, only with --custom-ca)
services:
# ===========================================================
# GPU service — NVIDIA GPU accelerated
# Activated with: --profile gpu
# ===========================================================
gpu:
build:
context: ./gpu/self_hosted
dockerfile: Dockerfile
profiles: [gpu]
restart: unless-stopped
ports:
- "${GPU_HOST_PORT:-8000}:8000"
environment:
HF_TOKEN: ${HF_TOKEN:-}
REFLECTOR_GPU_APIKEY: ${REFLECTOR_GPU_APIKEY:-}
volumes:
- gpu_cache:/root/.cache
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/docs"]
interval: 15s
timeout: 5s
retries: 10
start_period: 120s
networks:
default:
aliases:
- transcription
# ===========================================================
# CPU service — no GPU required, uses Dockerfile.cpu
# Activated with: --profile cpu
# Mutually exclusive with gpu (both bind port 8000)
# ===========================================================
cpu:
build:
context: ./gpu/self_hosted
dockerfile: Dockerfile.cpu
profiles: [cpu]
restart: unless-stopped
ports:
- "${GPU_HOST_PORT:-8000}:8000"
environment:
HF_TOKEN: ${HF_TOKEN:-}
REFLECTOR_GPU_APIKEY: ${REFLECTOR_GPU_APIKEY:-}
volumes:
- gpu_cache:/root/.cache
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/docs"]
interval: 15s
timeout: 5s
retries: 10
start_period: 120s
networks:
default:
aliases:
- transcription
# ===========================================================
# Caddy — reverse proxy with HTTPS (optional)
# Activated with: --profile caddy
# Proxies to "transcription" network alias (works for both gpu and cpu)
# ===========================================================
caddy:
image: caddy:2-alpine
profiles: [caddy]
restart: unless-stopped
ports:
- "80:80"
- "${CADDY_HTTPS_PORT:-443}:443"
volumes:
- ./Caddyfile.gpu-host:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
volumes:
gpu_cache:
caddy_data:
caddy_config:

View File

@@ -35,8 +35,8 @@ services:
image: monadicalsas/reflector-backend:latest
restart: unless-stopped
ports:
- "127.0.0.1:1250:1250"
- "51000-51100:51000-51100/udp"
- "${BIND_HOST:-127.0.0.1}:1250:1250"
- "40000-40100:40000-40100/udp"
env_file:
- ./server/.env
environment:
@@ -50,7 +50,7 @@ services:
# HF_TOKEN needed for in-process pyannote diarization (--cpu mode)
HF_TOKEN: ${HF_TOKEN:-}
# WebRTC: fixed UDP port range for ICE candidates (mapped above)
WEBRTC_PORT_RANGE: "51000-51100"
WEBRTC_PORT_RANGE: "40000-40100"
# Hatchet workflow engine (always-on for processing pipelines)
HATCHET_CLIENT_SERVER_URL: ${HATCHET_CLIENT_SERVER_URL:-http://hatchet:8888}
HATCHET_CLIENT_HOST_PORT: ${HATCHET_CLIENT_HOST_PORT:-hatchet:7077}
@@ -116,7 +116,7 @@ services:
image: monadicalsas/reflector-frontend:latest
restart: unless-stopped
ports:
- "127.0.0.1:3000:3000"
- "${BIND_HOST:-127.0.0.1}:3000:3000"
env_file:
- ./www/.env
environment:
@@ -308,6 +308,24 @@ services:
- web
- server
# ===========================================================
# Mailpit — local SMTP sink for testing email transcript notifications
# Start with: --profile mailpit
# Web UI at http://localhost:8025
# ===========================================================
mailpit:
image: axllent/mailpit:latest
profiles: [mailpit]
restart: unless-stopped
ports:
- "127.0.0.1:8025:8025" # Web UI
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8025/api/v1/messages"]
interval: 10s
timeout: 3s
retries: 5
# ===========================================================
# Hatchet workflow engine + workers
# Required for all processing pipelines (file, live, Daily.co multitrack).
@@ -321,7 +339,7 @@ services:
postgres:
condition: service_healthy
ports:
- "127.0.0.1:8888:8888"
- "0.0.0.0:8888:8888" # Hatchet dashboard (plain HTTP — no TLS)
- "127.0.0.1:7078:7077"
env_file:
- ./.env.hatchet
@@ -348,7 +366,7 @@ services:
context: ./server
dockerfile: Dockerfile
image: monadicalsas/reflector-backend:latest
profiles: [dailyco]
profiles: [dailyco, livekit]
restart: unless-stopped
env_file:
- ./server/.env
@@ -388,6 +406,40 @@ services:
volumes:
- server_data:/app/data
# ===========================================================
# LiveKit — self-hosted open-source video platform
# Activated via --profile livekit (auto-detected from LIVEKIT_API_KEY in server/.env)
# ===========================================================
livekit-server:
image: livekit/livekit-server:v1.10.1
profiles: [livekit]
restart: unless-stopped
ports:
- "7880:7880" # Signaling (HTTP/WS)
- "7881:7881" # WebRTC over TCP
- "${LIVEKIT_UDP_PORTS:-44200-44300:44200-44300}/udp" # WebRTC ICE (range or single port for tunnels)
volumes:
- ./livekit.yaml:/etc/livekit.yaml:ro
command: ["--config", "/etc/livekit.yaml"]
depends_on:
redis:
condition: service_started
livekit-egress:
image: livekit/egress:v1.12.0
profiles: [livekit]
restart: unless-stopped
environment:
EGRESS_CONFIG_FILE: /etc/egress.yaml
volumes:
- ./egress.yaml:/etc/egress.yaml:ro
depends_on:
redis:
condition: service_started
livekit-server:
condition: service_started
volumes:
postgres_data:
redis_data:

View File

@@ -95,6 +95,12 @@ DAILYCO_STORAGE_AWS_BUCKET_NAME=<your-bucket-from-daily-setup>
DAILYCO_STORAGE_AWS_REGION=us-east-1
DAILYCO_STORAGE_AWS_ROLE_ARN=<your-role-arn-from-daily-setup>
# Worker credentials for reading/deleting recordings from Daily's S3 bucket.
# Required when transcript storage uses a different bucket or credentials
# (e.g., selfhosted with Garage or a separate S3 account).
DAILYCO_STORAGE_AWS_ACCESS_KEY_ID=<your-aws-access-key>
DAILYCO_STORAGE_AWS_SECRET_ACCESS_KEY=<your-aws-secret-key>
# Transcript storage (should already be configured from main setup)
# TRANSCRIPT_STORAGE_BACKEND=aws
# TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID=<your-key>
@@ -103,6 +109,19 @@ DAILYCO_STORAGE_AWS_ROLE_ARN=<your-role-arn-from-daily-setup>
# TRANSCRIPT_STORAGE_AWS_REGION=<your-bucket-region>
```
:::info Two separate credential sets for Daily.co
- **`ROLE_ARN`** — Used by Daily's API to *write* recordings into your S3 bucket (configured via Daily dashboard).
- **`ACCESS_KEY_ID` / `SECRET_ACCESS_KEY`** — Used by Reflector workers to *read* recordings for transcription and *delete* them on consent denial or permanent transcript deletion.
Required IAM permissions for the worker key on the Daily recordings bucket:
- `s3:GetObject` — Download recording files for processing
- `s3:DeleteObject` — Remove files on consent denial, trash destroy, or data retention cleanup
- `s3:ListBucket` — Scan for recordings needing reprocessing
If the worker keys are not set, Reflector falls back to the transcript storage master key, which then needs cross-bucket access to the Daily bucket.
:::
---
## Restart Services

76
docs/pnpm-lock.yaml generated
View File

@@ -701,6 +701,10 @@ packages:
resolution: {integrity: sha512-05WQkdpL9COIMz4LjTxGpPNCdlpyimKppYNoJ5Di5EUObifl8t4tuLuUBBZEpoLYOmfvIWrsp9fCl0HoPRVTdA==}
engines: {node: '>=6.9.0'}
'@babel/runtime@7.29.2':
resolution: {integrity: sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==}
engines: {node: '>=6.9.0'}
'@babel/template@7.28.6':
resolution: {integrity: sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==}
engines: {node: '>=6.9.0'}
@@ -1490,42 +1494,36 @@ packages:
engines: {node: '>= 10.0.0'}
cpu: [arm]
os: [linux]
libc: [glibc]
'@parcel/watcher-linux-arm-musl@2.5.6':
resolution: {integrity: sha512-Ve3gUCG57nuUUSyjBq/MAM0CzArtuIOxsBdQ+ftz6ho8n7s1i9E1Nmk/xmP323r2YL0SONs1EuwqBp2u1k5fxg==}
engines: {node: '>= 10.0.0'}
cpu: [arm]
os: [linux]
libc: [musl]
'@parcel/watcher-linux-arm64-glibc@2.5.6':
resolution: {integrity: sha512-f2g/DT3NhGPdBmMWYoxixqYr3v/UXcmLOYy16Bx0TM20Tchduwr4EaCbmxh1321TABqPGDpS8D/ggOTaljijOA==}
engines: {node: '>= 10.0.0'}
cpu: [arm64]
os: [linux]
libc: [glibc]
'@parcel/watcher-linux-arm64-musl@2.5.6':
resolution: {integrity: sha512-qb6naMDGlbCwdhLj6hgoVKJl2odL34z2sqkC7Z6kzir8b5W65WYDpLB6R06KabvZdgoHI/zxke4b3zR0wAbDTA==}
engines: {node: '>= 10.0.0'}
cpu: [arm64]
os: [linux]
libc: [musl]
'@parcel/watcher-linux-x64-glibc@2.5.6':
resolution: {integrity: sha512-kbT5wvNQlx7NaGjzPFu8nVIW1rWqV780O7ZtkjuWaPUgpv2NMFpjYERVi0UYj1msZNyCzGlaCWEtzc+exjMGbQ==}
engines: {node: '>= 10.0.0'}
cpu: [x64]
os: [linux]
libc: [glibc]
'@parcel/watcher-linux-x64-musl@2.5.6':
resolution: {integrity: sha512-1JRFeC+h7RdXwldHzTsmdtYR/Ku8SylLgTU/reMuqdVD7CtLwf0VR1FqeprZ0eHQkO0vqsbvFLXUmYm/uNKJBg==}
engines: {node: '>= 10.0.0'}
cpu: [x64]
os: [linux]
libc: [musl]
'@parcel/watcher-win32-arm64@2.5.6':
resolution: {integrity: sha512-3ukyebjc6eGlw9yRt678DxVF7rjXatWiHvTXqphZLvo7aC5NdEgFufVwjFfY51ijYEWpXbqF5jtrK275z52D4Q==}
@@ -2254,11 +2252,11 @@ packages:
resolution: {integrity: sha512-2hCgjEmP8YLWQ130n2FerGv7rYpfBmnmp9Uy2Le1vge6X3gZIfSmEzP5QTDElFxcvVcXlEn8Aq6MU/PZygIOog==}
engines: {node: '>=14.16'}
brace-expansion@1.1.12:
resolution: {integrity: sha512-9T9UjW3r0UW5c1Q7GTwllptXwhvYmEzFhzMfZ9H7FQWt+uZePjZPjBP/W1ZEyZ1twGWom5/56TF4lPcqjnDHcg==}
brace-expansion@1.1.13:
resolution: {integrity: sha512-9ZLprWS6EENmhEOpjCYW2c8VkmOvckIJZfkr7rBW6dObmfgJ/L1GpSYW5Hpo9lDz4D1+n0Ckz8rU7FwHDQiG/w==}
brace-expansion@2.0.2:
resolution: {integrity: sha512-Jt0vHyM+jmUBqojB7E1NIYadt0vI0Qxjxd2TErW94wDz+E2LAm5vKMXXwg6ZZBTHPuUlDgQHKXvjGBdfcF1ZDQ==}
brace-expansion@2.0.3:
resolution: {integrity: sha512-MCV/fYJEbqx68aE58kv2cA/kiky1G8vux3OR6/jbS+jIMe/6fJWa0DTzJU7dqijOWYwHi1t29FlfYI9uytqlpA==}
braces@3.0.3:
resolution: {integrity: sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==}
@@ -3410,8 +3408,8 @@ packages:
graphlib@2.1.8:
resolution: {integrity: sha512-jcLLfkpoVGmH7/InMC/1hIvOPSUh38oJtGhvrOFGzioE1DZ+0YW16RgmOJhHiuWTvGiJQ9Z1Ik43JvkRPRvE+A==}
gray-matter@https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e317c87fe031e9368ffabde9c9149ce3ec:
resolution: {tarball: https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e317c87fe031e9368ffabde9c9149ce3ec}
gray-matter@https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e:
resolution: {tarball: https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e}
version: 4.0.3
engines: {node: '>=6.0'}
@@ -4533,8 +4531,8 @@ packages:
path-parse@1.0.7:
resolution: {integrity: sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw==}
path-to-regexp@0.1.12:
resolution: {integrity: sha512-RA1GjUVMnvYFxuqovrEqZoxxW5NUZqbwKtYz/Tt7nXerk0LbLblQmrsgdeOxV5SFHf0UDggjS/bSeOZwt1pmEQ==}
path-to-regexp@0.1.13:
resolution: {integrity: sha512-A/AGNMFN3c8bOlvV9RreMdrv7jsmF9XIfDeCd87+I8RNg6s78BhJxMu69NEMHBSJFxKidViTEdruRwEk/WIKqA==}
path-to-regexp@1.9.0:
resolution: {integrity: sha512-xIp7/apCFJuUHdDLWe8O1HIkb0kQrOMb/0u6FXQjemHn/ii5LrIzU6bdECnsiTF/GjZkMEKg1xdiZwNqDYlZ6g==}
@@ -4555,12 +4553,12 @@ packages:
picocolors@1.1.1:
resolution: {integrity: sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==}
picomatch@2.3.1:
resolution: {integrity: sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==}
picomatch@2.3.2:
resolution: {integrity: sha512-V7+vQEJ06Z+c5tSye8S+nHUfI51xoXIXjHQ99cQtKUkQqqO1kO/KCJUfZXuB47h/YBlDhah2H3hdUGXn8ie0oA==}
engines: {node: '>=8.6'}
picomatch@4.0.3:
resolution: {integrity: sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==}
picomatch@4.0.4:
resolution: {integrity: sha512-QP88BAKvMam/3NxH6vj2o21R6MjxZUAd6nlwAS/pnGvN9IVLocLHxGYIzFhg6fUQ+5th6P4dv4eW9jX3DSIj7A==}
engines: {node: '>=12'}
pirates@4.0.7:
@@ -7024,6 +7022,8 @@ snapshots:
'@babel/runtime@7.28.6': {}
'@babel/runtime@7.29.2': {}
'@babel/template@7.28.6':
dependencies:
'@babel/code-frame': 7.29.0
@@ -8162,7 +8162,7 @@ snapshots:
fs-extra: 11.3.3
github-slugger: 1.5.0
globby: 11.1.0
gray-matter: https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e317c87fe031e9368ffabde9c9149ce3ec
gray-matter: https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e
jiti: 1.21.7
js-yaml: 4.1.1
lodash: 4.17.23
@@ -8473,7 +8473,7 @@ snapshots:
detect-libc: 2.1.2
is-glob: 4.0.3
node-addon-api: 7.1.1
picomatch: 4.0.3
picomatch: 4.0.4
optionalDependencies:
'@parcel/watcher-android-arm64': 2.5.6
'@parcel/watcher-darwin-arm64': 2.5.6
@@ -8645,7 +8645,7 @@ snapshots:
'@slorber/react-helmet-async@1.3.0(react-dom@19.2.4(react@19.2.4))(react@19.2.4)':
dependencies:
'@babel/runtime': 7.28.6
'@babel/runtime': 7.29.2
invariant: 2.2.4
prop-types: 15.8.1
react: 19.2.4
@@ -9244,7 +9244,7 @@ snapshots:
anymatch@3.1.3:
dependencies:
normalize-path: 3.0.0
picomatch: 2.3.1
picomatch: 2.3.2
arg@5.0.2: {}
@@ -9378,12 +9378,12 @@ snapshots:
widest-line: 4.0.1
wrap-ansi: 8.1.0
brace-expansion@1.1.12:
brace-expansion@1.1.13:
dependencies:
balanced-match: 1.0.2
concat-map: 0.0.1
brace-expansion@2.0.2:
brace-expansion@2.0.3:
dependencies:
balanced-match: 1.0.2
@@ -10436,7 +10436,7 @@ snapshots:
methods: 1.1.2
on-finished: 2.4.1
parseurl: 1.3.3
path-to-regexp: 0.1.12
path-to-regexp: 0.1.13
proxy-addr: 2.0.7
qs: 6.14.2
range-parser: 1.2.1
@@ -10485,9 +10485,9 @@ snapshots:
dependencies:
websocket-driver: 0.7.4
fdir@6.5.0(picomatch@4.0.3):
fdir@6.5.0(picomatch@4.0.4):
optionalDependencies:
picomatch: 4.0.3
picomatch: 4.0.4
feed@4.2.2:
dependencies:
@@ -10658,7 +10658,7 @@ snapshots:
dependencies:
lodash: 4.17.23
gray-matter@https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e317c87fe031e9368ffabde9c9149ce3ec:
gray-matter@https://codeload.github.com/jonschlinkert/gray-matter/tar.gz/234163e:
dependencies:
js-yaml: 4.1.1
kind-of: 6.0.3
@@ -11080,7 +11080,7 @@ snapshots:
chalk: 4.1.2
ci-info: 3.9.0
graceful-fs: 4.2.11
picomatch: 2.3.1
picomatch: 2.3.2
jest-worker@27.5.1:
dependencies:
@@ -11780,7 +11780,7 @@ snapshots:
micromatch@4.0.8:
dependencies:
braces: 3.0.3
picomatch: 2.3.1
picomatch: 2.3.2
mime-db@1.33.0: {}
@@ -11824,11 +11824,11 @@ snapshots:
minimatch@3.1.5:
dependencies:
brace-expansion: 1.1.12
brace-expansion: 1.1.13
minimatch@5.1.8:
dependencies:
brace-expansion: 2.0.2
brace-expansion: 2.0.3
minimist@1.2.8: {}
@@ -12127,7 +12127,7 @@ snapshots:
path-parse@1.0.7: {}
path-to-regexp@0.1.12: {}
path-to-regexp@0.1.13: {}
path-to-regexp@1.9.0:
dependencies:
@@ -12146,9 +12146,9 @@ snapshots:
picocolors@1.1.1: {}
picomatch@2.3.1: {}
picomatch@2.3.2: {}
picomatch@4.0.3: {}
picomatch@4.0.4: {}
pirates@4.0.7: {}
@@ -12852,7 +12852,7 @@ snapshots:
readdirp@3.6.0:
dependencies:
picomatch: 2.3.1
picomatch: 2.3.2
readdirp@4.1.2: {}
@@ -13510,8 +13510,8 @@ snapshots:
tinyglobby@0.2.15:
dependencies:
fdir: 6.5.0(picomatch@4.0.3)
picomatch: 4.0.3
fdir: 6.5.0(picomatch@4.0.4)
picomatch: 4.0.4
tinypool@1.1.1: {}

338
docsv2/custom-ca-setup.md Normal file
View File

@@ -0,0 +1,338 @@
# Custom CA Certificate Setup
Use a private Certificate Authority (CA) with Reflector self-hosted deployments. This covers two scenarios:
1. **Custom local domain** — Serve Reflector over HTTPS on an internal domain (e.g., `reflector.local`) using certs signed by your own CA
2. **Backend CA trust** — Let Reflector's backend services (server, workers, GPU) make HTTPS calls to GPU, LLM, or other internal services behind your private CA
Both can be used independently or together.
## Quick Start
### Generate test certificates
```bash
./scripts/generate-certs.sh reflector.local
```
This creates `certs/` with:
- `ca.key` + `ca.crt` — Root CA (10-year validity)
- `server-key.pem` + `server.pem` — Server certificate (1-year, SAN: domain + localhost + 127.0.0.1)
### Deploy with custom CA + domain
```bash
# Add domain to /etc/hosts on the server (use 127.0.0.1 for local, or server LAN IP for network access)
echo "127.0.0.1 reflector.local" | sudo tee -a /etc/hosts
# Run setup — pass the certs directory
./scripts/setup-selfhosted.sh --gpu --caddy --domain reflector.local --custom-ca certs/
# Trust the CA on your machine (see "Trust the CA" section below)
```
### Deploy with CA trust only (GPU/LLM behind private CA)
```bash
# Only need the CA cert file — no Caddy TLS certs needed
./scripts/setup-selfhosted.sh --hosted --custom-ca /path/to/corporate-ca.crt
```
## How `--custom-ca` Works
The flag accepts a **directory** or a **single file**:
### Directory mode
```bash
--custom-ca certs/
```
Looks for these files by convention:
- `ca.crt` (required) — CA certificate to trust
- `server.pem` + `server-key.pem` (optional) — TLS certificate/key for Caddy
If `server.pem` + `server-key.pem` are found AND `--domain` is provided:
- Caddy serves HTTPS using those certs
- Backend containers trust the CA for outbound calls
If only `ca.crt` is found:
- Backend containers trust the CA for outbound calls
- Caddy is unaffected (uses Let's Encrypt, self-signed, or no Caddy)
### Single file mode
```bash
--custom-ca /path/to/corporate-ca.crt
```
Only injects CA trust into backend containers. No Caddy TLS changes.
## Scenarios
### Scenario 1: Custom local domain
Your Reflector instance runs on an internal network. You want `https://reflector.local` with proper TLS (no browser warnings).
```bash
# 1. Generate certs
./scripts/generate-certs.sh reflector.local
# 2. Add to /etc/hosts on the server
echo "127.0.0.1 reflector.local" | sudo tee -a /etc/hosts
# 3. Deploy
./scripts/setup-selfhosted.sh --gpu --garage --caddy --domain reflector.local --custom-ca certs/
# 4. Trust the CA on your machine (see "Trust the CA" section below)
```
If other machines on the network need to access it, add the server's LAN IP to `/etc/hosts` on those machines instead:
```bash
echo "192.168.1.100 reflector.local" | sudo tee -a /etc/hosts
```
And include that IP as an extra SAN when generating certs:
```bash
./scripts/generate-certs.sh reflector.local "IP:192.168.1.100"
```
### Scenario 2: GPU/LLM behind corporate CA
Your GPU or LLM server (e.g., `https://gpu.internal.corp`) uses certificates signed by your corporate CA. Reflector's backend needs to trust that CA for outbound HTTPS calls.
```bash
# Get the CA certificate from your IT team (PEM format)
# Then deploy — Caddy can still use Let's Encrypt or self-signed
./scripts/setup-selfhosted.sh --hosted --garage --caddy --custom-ca /path/to/corporate-ca.crt
```
This works because:
- **TLS cert/key** = "this is my identity" — for Caddy to serve HTTPS to browsers
- **CA cert** = "I trust this authority" — for backend containers to verify outbound connections
Your Reflector frontend can use Let's Encrypt (public domain) or self-signed certs, while the backend trusts a completely different CA for GPU/LLM calls.
### Scenario 3: Both combined (same CA)
Custom domain + GPU/LLM all behind the same CA:
```bash
./scripts/generate-certs.sh reflector.local "DNS:gpu.local"
./scripts/setup-selfhosted.sh --gpu --garage --caddy --domain reflector.local --custom-ca certs/
```
### Scenario 4: Multiple CAs (local domain + remote GPU on different CA)
Your Reflector uses one CA for `reflector.local`, but the GPU host uses a different CA:
```bash
# Your local domain setup
./scripts/generate-certs.sh reflector.local
# Deploy with your CA + trust the GPU host's CA too
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--domain reflector.local \
--custom-ca certs/ \
--extra-ca /path/to/gpu-machine-ca.crt
```
`--extra-ca` appends additional CA certs to the trust bundle. Backend containers trust ALL CAs — your local domain AND the GPU host's certs both work.
You can repeat `--extra-ca` for multiple remote services:
```bash
--extra-ca /path/to/gpu-ca.crt --extra-ca /path/to/llm-ca.crt
```
For setting up a dedicated GPU host, see [Standalone GPU Host Setup](gpu-host-setup.md).
## Trust the CA on Client Machines
After deploying, clients need to trust the CA to avoid browser warnings.
### macOS
```bash
sudo security add-trusted-cert -d -r trustRoot \
-k /Library/Keychains/System.keychain certs/ca.crt
```
### Linux (Ubuntu/Debian)
```bash
sudo cp certs/ca.crt /usr/local/share/ca-certificates/reflector-ca.crt
sudo update-ca-certificates
```
### Linux (RHEL/Fedora)
```bash
sudo cp certs/ca.crt /etc/pki/ca-trust/source/anchors/reflector-ca.crt
sudo update-ca-trust
```
### Windows (PowerShell as admin)
```powershell
Import-Certificate -FilePath .\certs\ca.crt -CertStoreLocation Cert:\LocalMachine\Root
```
### Firefox (all platforms)
Firefox uses its own certificate store:
1. Settings > Privacy & Security > View Certificates
2. Authorities tab > Import
3. Select `ca.crt` and check "Trust this CA to identify websites"
## How It Works Internally
### Docker entrypoint CA injection
Each backend container (server, worker, beat, hatchet workers, GPU) has an entrypoint script (`docker-entrypoint.sh`) that:
1. Checks if a CA cert is mounted at `/usr/local/share/ca-certificates/custom-ca.crt`
2. If present, runs `update-ca-certificates` to create a **combined bundle** (system CAs + custom CA)
3. Sets environment variables so all Python/gRPC libraries use the combined bundle:
| Env var | Covers |
|---------|--------|
| `SSL_CERT_FILE` | httpx, OpenAI SDK, llama-index, Python ssl module |
| `REQUESTS_CA_BUNDLE` | requests library (transitive dependencies) |
| `CURL_CA_BUNDLE` | curl CLI (container healthchecks) |
Note: `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH` is intentionally NOT set. Setting it causes grpcio to attempt TLS on internal Hatchet gRPC connections that run without TLS, resulting in handshake failures. The internal Hatchet connection uses `HATCHET_CLIENT_TLS_STRATEGY=none` (plaintext).
When no CA cert is mounted, the entrypoint is a no-op — containers behave exactly as before.
### Why this replaces manual certifi patching
Previously, the workaround for trusting a private CA in Python was to patch certifi's bundle directly:
```bash
# OLD approach — fragile, do NOT use
cat custom-ca.crt >> $(python -c "import certifi; print(certifi.where())")
```
This breaks whenever certifi is updated (any `pip install`/`uv sync` overwrites the bundle and the CA is lost).
Our entrypoint approach is permanent because:
1. `SSL_CERT_FILE` is checked by Python's `ssl.create_default_context()` **before** falling back to `certifi.where()`. When set, certifi's bundle is never read.
2. `REQUESTS_CA_BUNDLE` similarly overrides certifi for the `requests` library.
3. The CA is injected at container startup (runtime), not baked into the Python environment. It survives image rebuilds, dependency updates, and `uv sync`.
```
Python SSL lookup chain:
ssl.create_default_context()
→ SSL_CERT_FILE env var? → YES → use combined bundle (system + custom CA) ✓
→ (certifi.where() is never reached)
```
This covers all outbound HTTPS calls: httpx (transcription, diarization, translation, webhooks), OpenAI SDK (transcription), llama-index (LLM/summarization), and requests (transitive dependencies).
### Compose override
The setup script generates `docker-compose.ca.yml` which mounts the CA cert into every backend container as a read-only bind mount. This file is:
- Only generated when `--custom-ca` is passed
- Deleted on re-runs without `--custom-ca` (prevents stale overrides)
- Added to `.gitignore`
### Node.js (frontend)
The web container uses `NODE_EXTRA_CA_CERTS` which **adds** to Node's trust store (unlike Python's `SSL_CERT_FILE` which replaces it). This is set via the compose override.
## Generate Your Own CA (Manual)
If you prefer not to use `generate-certs.sh`:
```bash
# 1. Create CA
openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 \
-out ca.crt -subj "/CN=My CA/O=My Organization"
# 2. Create server key
openssl genrsa -out server-key.pem 2048
# 3. Create CSR with SANs
openssl req -new -key server-key.pem -out server.csr \
-subj "/CN=reflector.local" \
-addext "subjectAltName=DNS:reflector.local,DNS:localhost,IP:127.0.0.1"
# 4. Sign with CA
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out server.pem -days 365 -sha256 \
-copy_extensions copyall
# 5. Clean up
rm server.csr ca.srl
```
## Using Existing Corporate Certificates
If your organization already has a CA:
1. Get the CA certificate in PEM format from your IT team
2. If you have a PKCS#12 (.p12/.pfx) bundle, extract the CA cert:
```bash
openssl pkcs12 -in bundle.p12 -cacerts -nokeys -out ca.crt
```
3. If you have multiple intermediate CAs, concatenate them into one PEM file:
```bash
cat intermediate-ca.crt root-ca.crt > ca.crt
```
## Troubleshooting
### Browser: "Your connection is not private"
The CA is not trusted on the client machine. See "Trust the CA" section above.
Check certificate expiry:
```bash
openssl x509 -noout -dates -in certs/server.pem
```
### Backend: `SSL: CERTIFICATE_VERIFY_FAILED`
CA cert not mounted or not loaded. Check inside the container:
```bash
docker compose exec server env | grep SSL_CERT_FILE
docker compose exec server python -c "
import ssl, os
print('SSL_CERT_FILE:', os.environ.get('SSL_CERT_FILE', 'not set'))
ctx = ssl.create_default_context()
print('CA certs loaded:', ctx.cert_store_stats())
"
```
### Caddy: "certificate is not valid for any names"
Domain in Caddyfile doesn't match the certificate's SAN/CN. Check:
```bash
openssl x509 -noout -text -in certs/server.pem | grep -A1 "Subject Alternative Name"
```
### Certificate chain issues
If you have intermediate CAs, concatenate them into `server.pem`:
```bash
cat server-cert.pem intermediate-ca.pem > certs/server.pem
```
Verify the chain:
```bash
openssl verify -CAfile certs/ca.crt certs/server.pem
```
### Certificate renewal
Custom CA certs are NOT auto-renewed (unlike Let's Encrypt). Replace cert files and restart:
```bash
# Replace certs
cp new-server.pem certs/server.pem
cp new-server-key.pem certs/server-key.pem
# Restart Caddy to pick up new certs
docker compose restart caddy
```

97
docsv2/firewall-ports.md Normal file
View File

@@ -0,0 +1,97 @@
# Firewall & Port Requirements
Ports that need to be open on your server firewall, organized by deployment mode.
## With Caddy (--caddy or --ip or --domain)
Caddy acts as the reverse proxy. Most services are only accessible through Caddy on port 443.
| Port | Protocol | Direction | Service | Required? |
|------|----------|-----------|---------|-----------|
| 443 | TCP | Inbound | Caddy HTTPS — web app, API, LiveKit signaling (`/lk-ws`) | Yes |
| 80 | TCP | Inbound | Caddy HTTP — redirects to HTTPS | Yes |
| 44200-44300 | UDP | Inbound | LiveKit WebRTC media (audio/video) | Yes (if LiveKit enabled) |
| 7881 | TCP | Inbound | LiveKit TCP media fallback (when UDP is blocked by client network) | Recommended |
| 8888 | TCP | Inbound | Hatchet dashboard (plain HTTP, no TLS) | Optional (admin only) |
Ports that do NOT need to be open (proxied through Caddy):
- 1250 (backend API)
- 3000 (frontend)
- 7880 (LiveKit signaling — proxied via `/lk-ws`)
- 3900 (Garage S3)
## Without Caddy (direct access)
All services need direct port access. Use this only for local development or trusted networks.
| Port | Protocol | Direction | Service | Required? |
|------|----------|-----------|---------|-----------|
| 3000 | TCP | Inbound | Frontend (Next.js) | Yes |
| 1250 | TCP | Inbound | Backend API (FastAPI) | Yes |
| 7880 | TCP | Inbound | LiveKit signaling (WebSocket) | Yes (if LiveKit enabled) |
| 7881 | TCP | Inbound | LiveKit TCP media fallback | Recommended |
| 44200-44300 | UDP | Inbound | LiveKit WebRTC media | Yes (if LiveKit enabled) |
| 40000-40100 | UDP | Inbound | Reflector WebRTC (browser recording) | Yes (if using browser WebRTC) |
| 3900 | TCP | Inbound | Garage S3 (for presigned URLs in browser) | Yes (if using Garage) |
| 8888 | TCP | Inbound | Hatchet dashboard | Optional |
> **Important:** Without Caddy, all traffic is plain HTTP. Browsers block microphone/camera access on non-HTTPS pages (except `localhost`). Use `--ip` (which implies Caddy) for any non-localhost deployment.
## Internal-Only Ports (never expose)
These ports are used between Docker containers and should NOT be open on the firewall:
| Port | Service | Purpose |
|------|---------|---------|
| 5432 | PostgreSQL | Database |
| 6379 | Redis | Cache + message broker |
| 7077 | Hatchet gRPC | Worker communication |
## Cloud Provider Firewall Examples
### DigitalOcean (with Caddy + LiveKit)
```bash
# Create firewall
doctl compute firewall create \
--name reflector \
--inbound-rules "protocol:tcp,ports:443,address:0.0.0.0/0 protocol:tcp,ports:80,address:0.0.0.0/0 protocol:udp,ports:44200-44300,address:0.0.0.0/0 protocol:tcp,ports:7881,address:0.0.0.0/0 protocol:tcp,ports:22,address:0.0.0.0/0" \
--outbound-rules "protocol:tcp,ports:all,address:0.0.0.0/0 protocol:udp,ports:all,address:0.0.0.0/0" \
--droplet-ids <DROPLET_ID>
```
### AWS Security Group (with Caddy + LiveKit)
| Type | Port Range | Source | Description |
|------|-----------|--------|-------------|
| HTTPS | 443 | 0.0.0.0/0 | Web app + API + LiveKit signaling |
| HTTP | 80 | 0.0.0.0/0 | Redirect to HTTPS |
| Custom UDP | 44200-44300 | 0.0.0.0/0 | LiveKit WebRTC media |
| Custom TCP | 7881 | 0.0.0.0/0 | LiveKit TCP fallback |
| SSH | 22 | Your IP | Admin access |
### Ubuntu UFW (with Caddy + LiveKit)
```bash
sudo ufw allow 443/tcp # Caddy HTTPS
sudo ufw allow 80/tcp # HTTP redirect
sudo ufw allow 7881/tcp # LiveKit TCP fallback
sudo ufw allow 44200:44300/udp # LiveKit WebRTC media
sudo ufw allow 22/tcp # SSH
sudo ufw enable
```
## Port Ranges Explained
### Why 44200-44300 for LiveKit?
LiveKit's WebRTC ICE candidates use UDP. The port range was chosen to avoid collisions:
- **40000-40100** — Reflector's own WebRTC (browser recording)
- **44200-44300** — LiveKit WebRTC
- **49152-65535** — macOS ephemeral ports (reserved by OS)
The range is configurable in `livekit.yaml` under `rtc.port_range_start` / `rtc.port_range_end`. If changed, update `docker-compose.selfhosted.yml` port mapping to match.
### Why 101 ports?
100 UDP ports support ~100 concurrent WebRTC connections (roughly 50 participants with audio + video). For larger deployments, increase the range in both `livekit.yaml` and `docker-compose.selfhosted.yml`.

294
docsv2/gpu-host-setup.md Normal file
View File

@@ -0,0 +1,294 @@
# Standalone GPU Host Setup
Deploy Reflector's GPU transcription/diarization/translation service on a dedicated machine, separate from the main Reflector instance. Useful when:
- Your GPU machine is on a different network than the Reflector server
- You want to share one GPU service across multiple Reflector instances
- The GPU machine has special hardware/drivers that can't run the full stack
- You need to scale GPU processing independently
## Architecture
```
┌─────────────────────┐ HTTPS ┌────────────────────┐
│ Reflector Server │ ────────────────────── │ GPU Host │
│ (server, worker, │ TRANSCRIPT_URL │ (transcription, │
│ web, postgres, │ DIARIZATION_URL │ diarization, │
│ redis, hatchet) │ TRANSLATE_URL │ translation) │
│ │ │ │
│ setup-selfhosted.sh │ │ setup-gpu-host.sh │
│ --hosted │ │ │
└─────────────────────┘ └────────────────────┘
```
The GPU service is a standalone FastAPI app that exposes transcription, diarization, translation, and audio padding endpoints. It has **no dependencies** on PostgreSQL, Redis, Hatchet, or any other Reflector service.
## Quick Start
### On the GPU machine
```bash
git clone <reflector-repo>
cd reflector
# Set HuggingFace token (required for diarization models)
export HF_TOKEN=your-huggingface-token
# Deploy with HTTPS (Let's Encrypt)
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
# Or deploy with custom CA
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-secret-key
```
### On the Reflector machine
```bash
# If the GPU host uses a custom CA, trust it
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--extra-ca /path/to/gpu-machine-ca.crt
# Or if you already have --custom-ca for your local domain
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--domain reflector.local --custom-ca certs/ \
--extra-ca /path/to/gpu-machine-ca.crt
```
Then configure `server/.env` to point to the GPU host:
```bash
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://gpu.example.com
TRANSCRIPT_MODAL_API_KEY=my-secret-key
DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://gpu.example.com
DIARIZATION_MODAL_API_KEY=my-secret-key
TRANSLATION_BACKEND=modal
TRANSLATE_URL=https://gpu.example.com
TRANSLATION_MODAL_API_KEY=my-secret-key
```
## Script Options
```
./scripts/setup-gpu-host.sh [OPTIONS]
Options:
--domain DOMAIN Domain name for HTTPS (Let's Encrypt or custom cert)
--custom-ca PATH Custom CA (directory or single PEM file)
--extra-ca FILE Additional CA cert to trust (repeatable)
--api-key KEY API key to protect the service (strongly recommended)
--cpu CPU-only mode (no NVIDIA GPU required)
--port PORT Host port (default: 443 with Caddy, 8000 without)
```
## Deployment Scenarios
### Public internet with Let's Encrypt
GPU machine has a public IP and domain:
```bash
./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
```
Requirements:
- DNS A record: `gpu.example.com` → GPU machine's public IP
- Ports 80 and 443 open
- Caddy auto-provisions Let's Encrypt certificate
### Internal network with custom CA
GPU machine on a private network:
```bash
# Generate certs on the GPU machine
./scripts/generate-certs.sh gpu.internal "IP:192.168.1.200"
# Deploy
./scripts/setup-gpu-host.sh --domain gpu.internal --custom-ca certs/ --api-key my-secret-key
```
On each machine that connects (including the Reflector server), add DNS:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```
### IP-only (no domain)
No domain needed — just use the machine's IP:
```bash
./scripts/setup-gpu-host.sh --api-key my-secret-key
```
Caddy is not used; the GPU service runs directly on port 8000 (HTTP). For HTTPS without a domain, the Reflector machine connects via `http://<GPU_IP>:8000`.
### CPU-only (no NVIDIA GPU)
Works on any machine — transcription will be slower:
```bash
./scripts/setup-gpu-host.sh --cpu --domain gpu.example.com --api-key my-secret-key
```
## DNS Resolution
The Reflector server must be able to reach the GPU host by name or IP.
| Setup | DNS Method | TRANSCRIPT_URL example |
|-------|------------|----------------------|
| Public domain | DNS A record | `https://gpu.example.com` |
| Internal domain | `/etc/hosts` on both machines | `https://gpu.internal` |
| IP only | No DNS needed | `http://192.168.1.200:8000` |
For internal domains, add the GPU machine's IP to `/etc/hosts` on the Reflector machine:
```bash
echo "192.168.1.200 gpu.internal" | sudo tee -a /etc/hosts
```
If the Reflector server runs in Docker, the containers resolve DNS from the host (Docker's default DNS behavior). So adding to the host's `/etc/hosts` is sufficient.
## Multi-CA Setup
When your Reflector instance has its own CA (for `reflector.local`) and the GPU host has a different CA:
**On the GPU machine:**
```bash
./scripts/generate-certs.sh gpu.local
./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-key
```
**On the Reflector machine:**
```bash
# Your local CA for reflector.local + the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --garage --caddy \
--domain reflector.local \
--custom-ca certs/ \
--extra-ca /path/to/gpu-machine-ca.crt
```
The `--extra-ca` flag appends the GPU host's CA to the trust bundle. Backend containers trust both CAs — your local domain works AND outbound calls to the GPU host succeed.
You can repeat `--extra-ca` for multiple remote services:
```bash
--extra-ca /path/to/gpu-ca.crt --extra-ca /path/to/llm-ca.crt
```
## API Key Authentication
The GPU service uses Bearer token authentication via `REFLECTOR_GPU_APIKEY`:
```bash
# Test from the Reflector machine
curl -s https://gpu.example.com/docs # No auth needed for docs
curl -s -X POST https://gpu.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer <my-secret-key>" \ #gitleaks:allow
-F "file=@audio.wav"
```
If `REFLECTOR_GPU_APIKEY` is not set, the service accepts all requests (open access). Always use `--api-key` for internet-facing deployments.
The same key goes in Reflector's `server/.env` as `TRANSCRIPT_MODAL_API_KEY` and `DIARIZATION_MODAL_API_KEY`.
## Files
| File | Checked in? | Purpose |
|------|-------------|---------|
| `docker-compose.gpu-host.yml` | Yes | Static compose file with profiles (`gpu`, `cpu`, `caddy`) |
| `.env.gpu-host` | No (generated) | Environment variables (HF_TOKEN, API key, ports) |
| `Caddyfile.gpu-host` | No (generated) | Caddy config (only when using HTTPS) |
| `docker-compose.gpu-ca.yml` | No (generated) | CA cert mounts override (only with --custom-ca) |
| `certs/` | No (generated) | Staged certificates (when using --custom-ca) |
The compose file is checked into the repo — you can read it to understand exactly what runs. The script only generates env vars, Caddyfile, and CA overrides. Profiles control which service starts:
```bash
# What the script does under the hood:
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy \
--env-file .env.gpu-host up -d
# CPU mode:
docker compose -f docker-compose.gpu-host.yml --profile cpu --profile caddy \
--env-file .env.gpu-host up -d
```
Both `gpu` and `cpu` services get the network alias `transcription`, so Caddy's config works with either.
## Management
```bash
# View logs
docker compose -f docker-compose.gpu-host.yml --profile gpu logs -f gpu
# Restart
docker compose -f docker-compose.gpu-host.yml --profile gpu restart gpu
# Stop
docker compose -f docker-compose.gpu-host.yml --profile gpu --profile caddy down
# Re-run setup
./scripts/setup-gpu-host.sh [same flags]
# Rebuild after code changes
docker compose -f docker-compose.gpu-host.yml --profile gpu build gpu
docker compose -f docker-compose.gpu-host.yml --profile gpu up -d gpu
```
If you deployed with `--custom-ca`, include the CA override in manual commands:
```bash
docker compose -f docker-compose.gpu-host.yml -f docker-compose.gpu-ca.yml \
--profile gpu logs -f gpu
```
## Troubleshooting
### GPU service won't start
Check logs:
```bash
docker compose -f docker-compose.gpu-host.yml logs gpu
```
Common causes:
- NVIDIA driver not installed or `nvidia-container-toolkit` missing
- `HF_TOKEN` not set (diarization model download fails)
- Port already in use
### Reflector can't connect to GPU host
From the Reflector machine:
```bash
# Test HTTPS connectivity
curl -v https://gpu.example.com/docs
# If using custom CA, test with explicit CA
curl --cacert /path/to/gpu-ca.crt https://gpu.internal/docs
```
From inside the Reflector container:
```bash
docker compose exec server python -c "
import httpx
r = httpx.get('https://gpu.internal/docs')
print(r.status_code)
"
```
### SSL: CERTIFICATE_VERIFY_FAILED
The Reflector backend doesn't trust the GPU host's CA. Fix:
```bash
# Re-run Reflector setup with the GPU host's CA
./scripts/setup-selfhosted.sh --hosted --extra-ca /path/to/gpu-ca.crt
```
### Diarization returns errors
- Accept pyannote model licenses on HuggingFace:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
- Verify `HF_TOKEN` is set in `.env.gpu-host`

297
docsv2/livekit-setup.md Normal file
View File

@@ -0,0 +1,297 @@
# LiveKit Setup (Self-Hosted Video Platform)
LiveKit is the recommended open-source, self-hosted video platform for Reflector. It replaces Daily.co for deployments that need free, fully self-hosted video rooms with per-participant audio recording.
> LiveKit runs alongside Daily.co and Whereby — you choose the platform per room. Existing Daily/Whereby setups are not affected.
## What LiveKit Provides
- **Video/audio rooms** — WebRTC-based conferencing via `livekit-server` (Go SFU)
- **Per-participant audio recording** — Track Egress writes each participant's audio to S3 as a separate OGG/Opus file (no composite video, no Chrome dependency)
- **S3-compatible storage** — works with Garage, MinIO, AWS S3, or any S3-compatible provider via `force_path_style`
- **Webhook events** — participant join/leave, egress start/end, room lifecycle
- **JWT access tokens** — per-participant tokens with granular permissions
## Architecture
```
┌─────────────────┐
Participants ────>│ livekit-server │ :7880 (WS signaling)
(browser) │ (Go SFU) │ :7881 (TCP RTC)
│ │ :44200-44300/udp (ICE)
└────────┬────────┘
│ media forwarding
┌────────┴────────┐
│ livekit-egress │ Track Egress
│ (per-track OGG) │ writes to S3
└────────┬────────┘
┌────────┴────────┐
│ S3 Storage │ Garage / MinIO / AWS
│ (audio tracks) │
└─────────────────┘
```
Both services share Redis with the existing Reflector stack (same instance, same db).
## Quick Start
### Option 1: Via Setup Script (Recommended)
Pass `--livekit` to the setup script. It generates all credentials and config automatically:
```bash
# First run — --livekit generates credentials and config files
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --livekit --garage --caddy
# Re-runs — LiveKit is auto-detected from existing LIVEKIT_API_KEY in server/.env
./scripts/setup-selfhosted.sh
```
The `--livekit` flag will:
1. Generate `LIVEKIT_API_KEY` and `LIVEKIT_API_SECRET` (random credentials)
2. Set `LIVEKIT_URL`, `LIVEKIT_PUBLIC_URL`, and storage credentials in `server/.env`
3. Generate `livekit.yaml` and `egress.yaml` config files
4. Set `DEFAULT_VIDEO_PLATFORM=livekit`
5. Enable the `livekit` Docker Compose profile
6. Start `livekit-server` and `livekit-egress` containers
On subsequent re-runs (without flags), the script detects the existing `LIVEKIT_API_KEY` in `server/.env` and re-enables the profile automatically.
### Option 2: Manual Setup
If you prefer manual configuration:
1. **Generate credentials:**
```bash
export LK_KEY="reflector_$(openssl rand -hex 8)"
export LK_SECRET="$(openssl rand -hex 32)"
```
2. **Add to `server/.env`:**
```env
# LiveKit connection
LIVEKIT_URL=ws://livekit-server:7880
LIVEKIT_API_KEY=$LK_KEY
LIVEKIT_API_SECRET=$LK_SECRET
LIVEKIT_PUBLIC_URL=wss://your-domain:7880 # or ws://your-ip:7880
# LiveKit egress S3 storage (reuse transcript storage or configure separately)
LIVEKIT_STORAGE_AWS_BUCKET_NAME=reflector-bucket
LIVEKIT_STORAGE_AWS_REGION=us-east-1
LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID=your-key
LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY=your-secret
LIVEKIT_STORAGE_AWS_ENDPOINT_URL=http://garage:3900 # for Garage/MinIO
# Set LiveKit as default platform for new rooms
DEFAULT_VIDEO_PLATFORM=livekit
```
3. **Create `livekit.yaml`:**
```yaml
port: 7880
rtc:
tcp_port: 7881
port_range_start: 44200
port_range_end: 44300
redis:
address: redis:6379
keys:
your_api_key: your_api_secret
webhook:
urls:
- http://server:1250/v1/livekit/webhook
api_key: your_api_key
logging:
level: info
room:
empty_timeout: 300
max_participants: 0
```
4. **Create `egress.yaml`:**
```yaml
api_key: your_api_key
api_secret: your_api_secret
ws_url: ws://livekit-server:7880
health_port: 7082
log_level: info
session_limits:
file_output_max_duration: 4h
```
5. **Start with the livekit profile:**
```bash
docker compose -f docker-compose.selfhosted.yml --profile livekit up -d livekit-server livekit-egress
```
## Environment Variables Reference
### Required
| Variable | Description | Example |
|----------|-------------|---------|
| `LIVEKIT_URL` | Internal WebSocket URL (server -> LiveKit) | `ws://livekit-server:7880` |
| `LIVEKIT_API_KEY` | API key for authentication | `reflector_a1b2c3d4e5f6` |
| `LIVEKIT_API_SECRET` | API secret for token signing and webhooks | `64-char hex string` |
### Recommended
| Variable | Description | Example |
|----------|-------------|---------|
| `LIVEKIT_PUBLIC_URL` | Public WebSocket URL (browser -> LiveKit). **Must be reachable from participants' browsers**, not a Docker-internal address. Without `--domain`, set to `ws://<server-ip>:7880`. With `--domain`, set to `wss://<domain>:7880`. | `wss://reflector.example.com:7880` |
| `LIVEKIT_WEBHOOK_SECRET` | Webhook verification secret. Defaults to `LIVEKIT_API_SECRET` if not set. Only needed if you want a separate secret for webhooks. | (same as API secret) |
| `DEFAULT_VIDEO_PLATFORM` | Default platform for new rooms | `livekit` |
### Storage (for Track Egress)
Track Egress writes per-participant audio files to S3. If not configured, falls back to the transcript storage credentials.
| Variable | Description | Example |
|----------|-------------|---------|
| `LIVEKIT_STORAGE_AWS_BUCKET_NAME` | S3 bucket for egress output | `reflector-bucket` |
| `LIVEKIT_STORAGE_AWS_REGION` | S3 region | `us-east-1` |
| `LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID` | S3 access key | `GK...` |
| `LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY` | S3 secret key | `...` |
| `LIVEKIT_STORAGE_AWS_ENDPOINT_URL` | S3 endpoint (for Garage/MinIO) | `http://garage:3900` |
## Docker Compose Services
Two services are added under the `livekit` profile in `docker-compose.selfhosted.yml`:
### livekit-server
| Setting | Value |
|---------|-------|
| Image | `livekit/livekit-server:v1.10.1` |
| Ports | 7880 (signaling), 7881 (TCP RTC), 44200-44300/udp (ICE) |
| Config | `./livekit.yaml` mounted at `/etc/livekit.yaml` |
| Depends on | Redis |
### livekit-egress
| Setting | Value |
|---------|-------|
| Image | `livekit/egress:v1.10.1` |
| Config | `./egress.yaml` mounted at `/etc/egress.yaml` |
| Depends on | Redis, livekit-server |
No `--cap-add=SYS_ADMIN` is needed because Track Egress does not use Chrome (that's only for Room Composite video recording, which we don't use).
## Port Ranges
| Range | Protocol | Service | Notes |
|-------|----------|---------|-------|
| 7880 | TCP | LiveKit signaling | WebSocket connections from browsers (direct, no Caddy) |
| 7881 | TCP | LiveKit RTC over TCP | Fallback when UDP is blocked |
| 44200-44300 | UDP | LiveKit ICE | WebRTC media. Avoids collision with Reflector WebRTC (40000-40100) and macOS ephemeral ports (49152-65535) |
### TLS / Caddy Integration
When `--caddy` is enabled (HTTPS), the setup script automatically:
1. Adds a `/lk-ws` reverse proxy route to the Caddyfile that proxies `wss://domain/lk-ws``ws://livekit-server:7880`
2. Sets `LIVEKIT_PUBLIC_URL` to `wss://<domain>/lk-ws` (or `wss://<ip>/lk-ws`)
This avoids mixed-content blocking (browsers reject `ws://` connections on `https://` pages). Caddy handles TLS termination; LiveKit server itself runs plain WebSocket internally.
Without `--caddy`, browsers connect directly to LiveKit on port 7880 via `ws://`.
### Security Note: on_demand TLS
When using `--ip` (Caddy with self-signed certs), the Caddyfile uses `tls internal { on_demand }`. This generates certificates dynamically for any hostname/IP on first TLS request.
**Risk:** An attacker can trigger certificate generation for arbitrary hostnames by sending TLS requests with spoofed SNI values, causing disk and CPU usage. This is a low-severity resource exhaustion risk, not a data theft risk.
**Mitigations:**
- For LAN/development use: not a concern (not internet-exposed)
- For cloud VMs: restrict port 443 access via firewall to trusted IPs
- For production: use `--domain` with a real domain name instead of `--ip` — Caddy uses Let's Encrypt (no `on_demand` needed)
| Deployment | `LIVEKIT_PUBLIC_URL` | How it works |
|---|---|---|
| localhost, no Caddy | `ws://localhost:7880` | Direct connection |
| LAN IP, no Caddy | `ws://192.168.1.x:7880` | Direct connection |
| IP + Caddy | `wss://192.168.1.x/lk-ws` | Caddy terminates TLS, proxies to LiveKit |
| Domain + Caddy | `wss://example.com/lk-ws` | Caddy terminates TLS, proxies to LiveKit |
## Webhook Endpoint
LiveKit sends webhook events to `POST /v1/livekit/webhook`. Events handled:
| Event | Action |
|-------|--------|
| `participant_joined` | Logs participant join, updates meeting state |
| `participant_left` | Logs participant leave |
| `egress_started` | Logs recording start |
| `egress_ended` | Logs recording completion with output file info |
| `room_started` / `room_finished` | Logs room lifecycle |
Webhooks are authenticated via JWT in the `Authorization` header, verified using the API secret.
## Frontend
The LiveKit room component uses `@livekit/components-react` with the prebuilt `<VideoConference>` UI. It includes:
- Recording consent dialog (same as Daily/Whereby)
- Email transcript button (feature-gated)
- Extensible overlay buttons for custom actions
When a user joins a LiveKit room, the backend generates a JWT access token and returns it in the `room_url` query parameter. The frontend parses this and passes it to the LiveKit React SDK.
## Separate Server Deployment
For larger deployments (15+ participants, multiple simultaneous rooms), LiveKit can run on a dedicated server:
1. Run `livekit-server` and `livekit-egress` on a separate machine
2. Point `LIVEKIT_URL` to the remote LiveKit server (e.g., `ws://livekit-host:7880`)
3. Set `LIVEKIT_PUBLIC_URL` to the public-facing URL (e.g., `wss://livekit.example.com`)
4. Configure the remote LiveKit's `webhook.urls` to point back to the Reflector server
5. Both need access to the same Redis (or configure LiveKit's own Redis)
6. Both need access to the same S3 storage
## Troubleshooting
### LiveKit server not starting
```bash
# Check logs
docker compose -f docker-compose.selfhosted.yml logs livekit-server --tail 30
# Verify config
cat livekit.yaml
# Common issues:
# - Redis not reachable (check redis service is running)
# - Port 7880 already in use
# - Invalid API key format in livekit.yaml
```
### Participants can't connect
```bash
# Check that LIVEKIT_PUBLIC_URL is accessible from the browser
# It must be the URL the browser can reach, not the Docker-internal URL
# Check firewall allows ports 7880, 7881, and 44200-44300/udp
sudo ufw status # or iptables -L
# Verify the access token is being generated
docker compose -f docker-compose.selfhosted.yml logs server | grep livekit
```
### Track Egress not writing files
```bash
# Check egress logs
docker compose -f docker-compose.selfhosted.yml logs livekit-egress --tail 30
# Verify S3 credentials
# Egress receives S3 config per-request from the server, so check server/.env:
grep LIVEKIT_STORAGE server/.env
```

View File

@@ -0,0 +1,73 @@
# Migrating from Daily.co to LiveKit
This guide covers running LiveKit alongside Daily.co or fully replacing it.
## Both Platforms Run Simultaneously
LiveKit and Daily.co coexist — the platform is selected **per room**. You don't need to migrate all rooms at once.
- Existing Daily rooms continue to work as-is
- New rooms can use LiveKit
- Each room's `platform` field determines which video service is used
- Transcripts, topics, summaries work identically regardless of platform
## Step 1: Enable LiveKit
Add `--livekit` to your setup command:
```bash
# If currently running:
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
# Add --livekit:
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --livekit --garage --caddy
```
This starts `livekit-server` + `livekit-egress` containers alongside your existing stack.
## Step 2: Set Default Platform
The setup script automatically sets `DEFAULT_VIDEO_PLATFORM=livekit` in `server/.env`. This means **new rooms** default to LiveKit. Existing rooms keep their current platform.
To keep Daily as the default for new rooms:
```bash
# In server/.env, change:
DEFAULT_VIDEO_PLATFORM=daily
```
## Step 3: Switch Individual Rooms
In the Rooms admin page, edit any room and change the **Platform** dropdown from "Daily" to "LiveKit". The next meeting in that room will use LiveKit.
Previously recorded Daily transcripts for that room are unaffected.
## Step 4: (Optional) Remove Daily.co
Once all rooms use LiveKit and you no longer need Daily.co:
1. Remove `DAILY_API_KEY` and related Daily settings from `server/.env`
2. Re-run the setup script — it won't activate the `dailyco` profile
3. Hatchet workers are shared between Daily and LiveKit, so they continue running
Daily-specific services that stop:
- `hatchet-worker-cpu` with `dailyco` profile (but continues if `livekit` profile is active)
- Daily webhook polling tasks (`poll_daily_recordings`, etc.)
## What Changes for Users
| Feature | Daily.co | LiveKit |
|---------|---------|---------|
| Video/audio quality | Daily.co SFU | LiveKit SFU (comparable) |
| Pre-join screen | Daily's built-in iframe | LiveKit PreJoin component (name + device selection) |
| Recording | Starts via REST API from frontend | Auto Track Egress (automatic, no user action) |
| Multitrack audio | Per-participant WebM tracks | Per-participant OGG tracks |
| Transcript quality | Same pipeline | Same pipeline |
| Self-hosted | No (SaaS only) | Yes (fully self-hosted) |
## Database Changes
None required. The `platform` field on rooms and meetings already supports `"livekit"`. LiveKit recordings use recording IDs prefixed with `lk-` to distinguish them from Daily recordings.
## Rollback
To revert a room back to Daily, just change the Platform dropdown back to "Daily" in the Rooms admin page. No data migration needed.

View File

@@ -24,6 +24,8 @@ This document explains the internals of the self-hosted deployment: how the setu
The self-hosted deployment runs the entire Reflector platform on a single server using Docker Compose. A single bash script (`scripts/setup-selfhosted.sh`) handles all configuration and orchestration. The key design principles are:
- **One command to deploy** — flags select which features to enable
- **Config memory** — CLI args are saved to `data/.selfhosted-last-args`; re-run with no flags to replay
- **Per-service overrides** — individual ML backends (transcript, diarization, translation, padding, mixdown) can be overridden independently from the base mode
- **Idempotent** — safe to re-run without losing existing configuration
- **Profile-based composition** — Docker Compose profiles activate optional services
- **No external dependencies required** — with `--garage` and `--ollama-*`, everything runs locally
@@ -61,8 +63,9 @@ Creates or updates the backend environment file from `server/.env.selfhosted.exa
- **Infrastructure** — PostgreSQL URL, Redis host, Celery broker (all pointing to Docker-internal hostnames)
- **Public URLs** — `BASE_URL` and `CORS_ORIGIN` computed from the domain (if `--domain`), IP (if detected on Linux), or `localhost`
- **WebRTC** — `WEBRTC_HOST` set to the server's LAN IP so browsers can reach UDP ICE candidates
- **Specialized models** — always points to `http://transcription:8000` (the Docker network alias shared by GPU and CPU containers)
- **HuggingFace token** — prompts interactively for pyannote model access; writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
- **ML backends (per-service)** — Each ML service (transcript, diarization, translation, padding, mixdown) is configured independently using "effective backends" (`EFF_TRANSCRIPT`, `EFF_DIARIZATION`, `EFF_TRANSLATION`, `EFF_PADDING`, `EFF_MIXDOWN`). These are resolved from the base mode default + any `--transcript`/`--diarization`/`--translation`/`--padding`/`--mixdown` overrides. For `modal` backends, the URL is `http://transcription:8000` (GPU mode), user-provided (hosted mode), or read from existing env (CPU mode with override). For CPU backends, no URL is needed (in-process). If a service is overridden to `modal` in CPU mode without a URL configured, the script warns the user to set `TRANSCRIPT_URL` in `server/.env`
- **CPU timeouts** — `TRANSCRIPT_FILE_TIMEOUT` and `DIARIZATION_FILE_TIMEOUT` are increased to 3600s only for services actually using CPU backends (whisper/pyannote), not blanket for the whole mode
- **HuggingFace token** — prompted when diarization uses `pyannote` (in-process) or when GPU mode is active (GPU container needs it). Writes to root `.env` so Docker Compose can inject it into GPU/CPU containers
- **LLM** — if `--ollama-*` is used, configures `LLM_URL` pointing to the Ollama container. Otherwise, warns that the user needs to configure an external LLM
- **Public mode** — sets `PUBLIC_MODE=true` so the app is accessible without authentication by default
- **Password auth** — if `--password` is passed, sets `AUTH_BACKEND=password`, `PUBLIC_MODE=false`, `ADMIN_EMAIL=admin@localhost`, and `ADMIN_PASSWORD_HASH` (the hash generated in Step 1). The admin user is provisioned in the database on container startup via `runserver.sh`
@@ -167,6 +170,8 @@ These start regardless of which flags you pass:
| `ollama-cpu` | `ollama-cpu` | Local Ollama LLM on CPU |
| `garage` | `garage` | Local S3-compatible object storage |
| `caddy` | `caddy` | Reverse proxy with SSL |
| `dailyco` | `hatchet-worker-cpu` | Hatchet workflow workers for Daily.co multitrack processing |
| `livekit` | `livekit-server`, `livekit-egress` | Self-hosted video platform + per-participant audio recording |
### The "transcription" Alias
@@ -203,11 +208,17 @@ Both the `gpu` and `cpu` services define a Docker network alias of `transcriptio
│ :8000 │ └─────────┘ └─────────┘
└───────────┘
┌─────┴─────┐ ┌─────────┐
│ ollama │ │ garage │
│(optional) │ │(optional│
│ :11435 │ │ S3) │
└───────────┘ └─────────┘
┌─────┴─────┐ ┌─────────┐ ┌──────────────┐
│ ollama │ │ garage │ │livekit-server│
│(optional) │ │(optional│ │ (optional) │
│ :11435 │ │ S3) │ │ :7880 │
└───────────┘ └─────────┘ └──────┬───────┘
┌──────┴───────┐
│livekit-egress│
│ (Track Egress│
│ to S3) │
└──────────────┘
```
### How Services Interact
@@ -228,11 +239,19 @@ Both the `gpu` and `cpu` services define a Docker network alias of `transcriptio
Environment variables flow through multiple layers. Understanding this prevents confusion when debugging:
```
Flags (--gpu, --garage, etc.)
CLI args (--gpu, --garage, --padding modal, --mixdown modal, etc.)
├── setup-selfhosted.sh interprets flags
├── Config memory: saved to data/.selfhosted-last-args
│ (replayed on next run if no args provided)
├── setup-selfhosted.sh resolves effective backends:
│ EFF_TRANSCRIPT = override or base mode default
│ EFF_DIARIZATION = override or base mode default
│ EFF_TRANSLATION = override or base mode default
│ EFF_PADDING = override or base mode default
│ EFF_MIXDOWN = override or base mode default
│ │
│ ├── Writes server/.env (backend config)
│ ├── Writes server/.env (backend config, per-service backends)
│ ├── Writes www/.env (frontend config)
│ ├── Writes .env (HF_TOKEN for compose interpolation)
│ └── Writes Caddyfile (proxy routes)
@@ -309,7 +328,9 @@ You can point your own reverse proxy (nginx, Traefik, etc.) at these ports.
### WebRTC and UDP
The server exposes UDP ports 50000-50100 for WebRTC ICE candidates. The `WEBRTC_HOST` variable tells the server which IP to advertise in ICE candidates — this must be the server's actual IP address (not a domain), because WebRTC uses UDP which doesn't go through the HTTP reverse proxy.
The server exposes UDP ports 40000-40100 for Reflector's own WebRTC ICE candidates. When LiveKit is enabled, it additionally uses ports 44200-44300/udp for its WebRTC ICE candidates. The `WEBRTC_HOST` variable tells the server which IP to advertise in ICE candidates — this must be the server's actual IP address (not a domain), because WebRTC uses UDP which doesn't go through the HTTP reverse proxy.
Port ranges are chosen to avoid collision with macOS ephemeral ports (49152-65535).
---
@@ -415,7 +436,10 @@ All services communicate over Docker's default bridge network. Only specific por
| 3903 | Garage | `0.0.0.0:3903` | Garage admin API |
| 8000 | GPU/CPU | `127.0.0.1:8000` | ML model API (localhost only) |
| 11435 | Ollama | `127.0.0.1:11435` | Ollama API (localhost only) |
| 50000-50100/udp | Server | `0.0.0.0:50000-50100` | WebRTC ICE candidates |
| 40000-40100/udp | Server | `0.0.0.0:40000-40100` | Reflector WebRTC ICE candidates |
| 7880 | LiveKit | `0.0.0.0:7880` | LiveKit signaling (WS) |
| 7881 | LiveKit | `0.0.0.0:7881` | LiveKit RTC over TCP |
| 44200-44300/udp | LiveKit | `0.0.0.0:44200-44300` | LiveKit WebRTC ICE candidates |
Services bound to `127.0.0.1` are only accessible from the host itself (not from the network). Caddy is the only service exposed to the internet on standard HTTP/HTTPS ports.
@@ -432,6 +456,8 @@ Inside the Docker network, services reach each other by their compose service na
| `transcription` | GPU or CPU container (network alias) |
| `ollama` / `ollama-cpu` | Ollama container |
| `garage` | Garage S3 container |
| `livekit-server` | LiveKit SFU server |
| `livekit-egress` | LiveKit Track Egress service |
---

View File

@@ -70,7 +70,7 @@ That's it. The script generates env files, secrets, starts all containers, waits
## ML Processing Modes (Required)
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, and audio padding** run:
Pick `--gpu`, `--cpu`, or `--hosted`. This determines how **transcription, diarization, translation, audio padding, and audio mixdown** run:
| Flag | What it does | Requires |
|------|-------------|----------|
@@ -144,6 +144,7 @@ Browse all available models at https://ollama.com/library.
| Flag | What it does |
|------|-------------|
| `--livekit` | Enables LiveKit self-hosted video platform. Generates API credentials, starts `livekit-server` + `livekit-egress`. See [LiveKit Setup](livekit-setup.md). |
| `--garage` | Starts Garage (local S3-compatible storage). Auto-configures bucket, keys, and env vars. |
| `--caddy` | Starts Caddy reverse proxy on ports 80/443 with self-signed cert. |
| `--domain DOMAIN` | Use a real domain with Let's Encrypt auto-HTTPS (implies `--caddy`). Requires DNS A record pointing to this server and ports 80/443 open. |
@@ -154,10 +155,74 @@ Without `--garage`, you **must** provide S3-compatible credentials (the script w
Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse proxy at `web:3000` (frontend) and `server:1250` (API).
## Video Platform (LiveKit)
For self-hosted video rooms with per-participant audio recording, add `--livekit` to your setup command:
```bash
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --livekit --garage --caddy
```
This generates LiveKit API credentials, creates config files (`livekit.yaml`, `egress.yaml`), and starts `livekit-server` (WebRTC SFU) + `livekit-egress` (per-participant audio recording to S3). LiveKit reuses the same Redis and S3 storage as the rest of the stack.
New rooms default to LiveKit when `DEFAULT_VIDEO_PLATFORM=livekit` is set (done automatically by the setup script). Existing Daily.co and Whereby rooms continue to work. On re-runs, the script detects the existing `LIVEKIT_API_KEY` in `server/.env` automatically.
> For detailed configuration, environment variables, ports, and troubleshooting, see [LiveKit Setup](livekit-setup.md).
**Using a domain (recommended for production):** Point a DNS A record at your server's IP, then pass `--domain your.domain.com`. Caddy will automatically obtain and renew a Let's Encrypt certificate. Ports 80 and 443 must be open.
**Without a domain:** `--caddy` alone uses a self-signed certificate. Browsers will show a security warning that must be accepted.
## Per-Service Backend Overrides
Override individual ML services without changing the base mode. Useful when you want most services on one backend but need specific services on another.
| Flag | Valid backends | Default (`--gpu`/`--hosted`) | Default (`--cpu`) |
|------|---------------|------------------------------|-------------------|
| `--transcript BACKEND` | `whisper`, `modal` | `modal` | `whisper` |
| `--diarization BACKEND` | `pyannote`, `modal` | `modal` | `pyannote` |
| `--translation BACKEND` | `marian`, `modal`, `passthrough` | `modal` | `marian` |
| `--padding BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
| `--mixdown BACKEND` | `pyav`, `modal` | `modal` | `pyav` |
**Examples:**
```bash
# CPU base, but use a remote modal service for padding only
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
# GPU base, but skip translation entirely (passthrough)
./scripts/setup-selfhosted.sh --gpu --translation passthrough --garage --caddy
# CPU base with remote modal diarization and translation
./scripts/setup-selfhosted.sh --cpu --diarization modal --translation modal --garage
```
When overriding a service to `modal` in `--cpu` mode, the script will warn you to configure the service URL (`TRANSCRIPT_URL` etc.) in `server/.env` to point to your GPU service, then re-run.
When overriding a service to a CPU backend (e.g., `--transcript whisper`) in `--gpu` mode, that service runs in-process on the server/worker containers while the GPU container still serves the remaining `modal` services.
## Config Memory (No-Flag Re-run)
After a successful run, the script saves your CLI arguments to `data/.selfhosted-last-args`. On subsequent runs with no arguments, the saved configuration is automatically replayed:
```bash
# First run — saves the config
./scripts/setup-selfhosted.sh --gpu --ollama-gpu --garage --caddy
# Later re-runs — same config, no flags needed
./scripts/setup-selfhosted.sh
# => "No flags provided — replaying saved configuration:"
# => " --gpu --ollama-gpu --garage --caddy"
```
To change the configuration, pass new flags — they override and replace the saved config:
```bash
# Switch to CPU mode with overrides — this becomes the new saved config
./scripts/setup-selfhosted.sh --cpu --padding modal --garage --caddy
```
## What the Script Does
1. **Prerequisites check** — Docker, NVIDIA GPU (if needed), compose file exists
@@ -189,6 +254,8 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
| `TRANSCRIPT_URL` | Specialized model endpoint | `http://transcription:8000` |
| `PADDING_BACKEND` | Audio padding backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
| `PADDING_URL` | Audio padding endpoint (when `PADDING_BACKEND=modal`) | `http://transcription:8000` |
| `MIXDOWN_BACKEND` | Audio mixdown backend (`pyav` or `modal`) | `modal` (selfhosted), `pyav` (default) |
| `MIXDOWN_URL` | Audio mixdown endpoint (when `MIXDOWN_BACKEND=modal`) | `http://transcription:8000` |
| `LLM_URL` | OpenAI-compatible LLM endpoint | Auto-set for Ollama modes |
| `LLM_API_KEY` | LLM API key | `not-needed` for Ollama |
| `LLM_MODEL` | LLM model name | `qwen2.5:14b` for Ollama (override with `--llm-model`) |
@@ -199,6 +266,11 @@ Without `--caddy` or `--domain`, no ports are exposed. Point your own reverse pr
| `DAILY_SUBDOMAIN` | Daily.co subdomain | *(unset)* |
| `DAILYCO_STORAGE_AWS_ACCESS_KEY_ID` | AWS access key for reading Daily's recording bucket | *(unset)* |
| `DAILYCO_STORAGE_AWS_SECRET_ACCESS_KEY` | AWS secret key for reading Daily's recording bucket | *(unset)* |
| `ZULIP_REALM` | Zulip server hostname (e.g. `zulip.example.com`) | *(unset)* |
| `ZULIP_API_KEY` | Zulip bot API key | *(unset)* |
| `ZULIP_BOT_EMAIL` | Zulip bot email address | *(unset)* |
| `ZULIP_DAG_STREAM` | Zulip stream for pipeline failure alerts | *(unset)* |
| `ZULIP_DAG_TOPIC` | Zulip topic for pipeline failure alerts | *(unset)* |
| `HATCHET_CLIENT_TOKEN` | Hatchet API token (auto-generated) | *(unset)* |
| `HATCHET_CLIENT_SERVER_URL` | Hatchet server URL | Auto-set when Daily.co configured |
| `HATCHET_CLIENT_HOST_PORT` | Hatchet gRPC address | Auto-set when Daily.co configured |
@@ -248,6 +320,48 @@ TRANSCRIPT_STORAGE_AWS_REGION=us-east-1
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL=http://minio:9000
```
### S3 IAM Permissions Reference
Reflector uses up to 3 separate S3 credential sets, each scoped to a specific bucket. When using AWS IAM in production, each key should have only the permissions it needs.
**Transcript storage key** (`TRANSCRIPT_STORAGE_AWS_*`) — the main bucket for processed files:
```json
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::reflector-media/*", "arn:aws:s3:::reflector-media"]
}
```
Used for: processed MP3 audio, waveform JSON, temporary pipeline files. Deletions happen during trash "Destroy", consent-denied cleanup, and public mode data retention.
**Daily.co worker key** (`DAILYCO_STORAGE_AWS_ACCESS_KEY_ID/SECRET_ACCESS_KEY`) — for reading and cleaning up Daily recordings:
```json
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::your-daily-bucket/*", "arn:aws:s3:::your-daily-bucket"]
}
```
Used for: downloading multitrack recording files for processing, deleting track files and composed video on consent denial or trash destroy. No `s3:PutObject` needed — Daily's own API writes via the Role ARN.
**Whereby worker key** (`WHEREBY_STORAGE_AWS_ACCESS_KEY_ID/SECRET_ACCESS_KEY`) — same pattern as Daily:
```json
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::your-whereby-bucket/*", "arn:aws:s3:::your-whereby-bucket"]
}
```
> **Fallback behavior:** If platform-specific worker keys are not set, Reflector falls back to the transcript storage master key with a bucket override. This means the master key would need cross-bucket access to the Daily/Whereby buckets. For least-privilege, configure platform-specific keys so each only accesses its own bucket.
> **Garage / single-bucket setups:** When using Garage or a single S3 bucket for everything, one master key with full permissions on that bucket is sufficient. The IAM scoping above only matters when using separate buckets per platform (typical in AWS production).
## What Authentication Enables
By default, Reflector runs in **public mode** (`AUTH_BACKEND=none`, `PUBLIC_MODE=true`) — anyone can create and view transcripts without logging in. Transcripts are anonymous (not linked to any user) and cannot be edited or deleted after creation.
@@ -571,9 +685,9 @@ docker compose -f docker-compose.selfhosted.yml exec gpu curl http://localhost:8
## Updating
```bash
# Option A: Pull latest prebuilt images and restart
# Option A: Pull latest prebuilt images and restart (replays saved config automatically)
docker compose -f docker-compose.selfhosted.yml down
./scripts/setup-selfhosted.sh <same-flags-as-before>
./scripts/setup-selfhosted.sh
# Option B: Build from source (after git pull) and restart
git pull
@@ -584,6 +698,8 @@ docker compose -f docker-compose.selfhosted.yml down
docker compose -f docker-compose.selfhosted.yml build gpu # or cpu
```
> **Note on config memory:** Running with no flags replays the saved config from your last run. Running with *any* flags replaces the saved config entirely — the script always saves the complete set of flags you provide. See [Config Memory](#config-memory-no-flag-re-run).
The setup script is idempotent — it won't overwrite existing secrets or env vars that are already set.
## Architecture Overview

144
docsv2/tunnel-setup.md Normal file
View File

@@ -0,0 +1,144 @@
# Tunnel Setup (Self-Hosting Behind NAT)
Expose your self-hosted Reflector + LiveKit stack to the internet without port forwarding, static IPs, or cloud VMs using tunneling services.
## Requirements
You need **two tunnels**:
| Tunnel | Protocol | What it carries | Local port | Examples |
|--------|----------|----------------|------------|----------|
| **TCP tunnel** | TCP | Web app, API, LiveKit signaling (WebSocket) | 443 (Caddy) | playit.gg, ngrok, Cloudflare Tunnel, bore, frp |
| **UDP tunnel** | UDP | WebRTC audio/video media | Assigned by tunnel service | playit.gg, frp |
> **Important:** Most tunneling services only support TCP. WebRTC media requires UDP. Make sure your chosen service supports UDP tunnels. As of writing, [playit.gg](https://playit.gg) is one of the few that supports both TCP and UDP (premium $3/mo).
## Architecture
```
Internet participants
├── TCP tunnel (HTTPS)
│ └── tunnel service → your machine port 443 (Caddy)
│ ├── /v1/* → server:1250 (API)
│ ├── /lk-ws/* → livekit-server:7880 (signaling)
│ └── /* → web:3000 (frontend)
└── UDP tunnel
└── tunnel service → your machine port N (LiveKit ICE)
```
## Setup
### Step 1: Create tunnels with your chosen service
Create two tunnels and note the public addresses:
- **TCP tunnel**: Points to your local port `443`
- You'll get an address like `your-tunnel.example.com:PORT`
- **UDP tunnel**: Points to a local port (e.g., `14139`)
- You'll get an address like `udp-host.example.com:PORT`
- **The local port must match the public port** (or LiveKit ICE candidates won't match). Set the local port to the same number as the public port assigned by the tunnel service.
### Step 2: Run the setup script
```bash
./scripts/setup-selfhosted.sh <mode> --livekit --garage \
--tunnels <TCP_ADDRESS>,<UDP_ADDRESS>
```
Example:
```bash
./scripts/setup-selfhosted.sh --cpu --livekit --garage \
--tunnels my-tunnel.example.com:9055,udp-host.example.com:14139
```
Or use separate flags:
```bash
./scripts/setup-selfhosted.sh --cpu --livekit --garage \
--tunnel-tcp my-tunnel.example.com:9055 \
--tunnel-udp udp-host.example.com:14139
```
The script automatically:
- Sets all URLs (API, frontend, LiveKit signaling) to the TCP tunnel address
- Configures LiveKit with the UDP tunnel port and resolved IP for ICE candidates
- Enables Caddy with self-signed TLS (catch-all on port 443)
- Saves tunnel config for re-runs
### Step 3: Start the tunnel agent
Run your tunneling service's agent/client on the same machine. It must be running whenever you want external access.
### Step 4: Access
Share `https://<TCP_TUNNEL_ADDRESS>` with participants. They'll need to accept the self-signed certificate warning in their browser.
## Flag Reference
| Flag | Description |
|------|-------------|
| `--tunnels TCP,UDP` | Both tunnel addresses comma-separated (e.g., `host:9055,host:14139`) |
| `--tunnel-tcp ADDR` | TCP tunnel address only (e.g., `host.example.com:9055`) |
| `--tunnel-udp ADDR` | UDP tunnel address only (e.g., `host.example.com:14139`) |
Tunnel flags:
- Imply `--caddy` (HTTPS required for browser mic/camera access)
- Are mutually exclusive with `--ip` and `--domain`
- Are saved to config memory (re-run without flags replays saved config)
## UDP Port Matching
LiveKit advertises ICE candidates with a specific IP and port. The browser connects to that exact address. If the tunnel's public port differs from the local port, ICE will fail.
**Correct setup:** Set the tunnel's local port to match its public port.
```
Tunnel assigns public port 14139
→ Set local port to 14139
→ LiveKit listens on 14139 (udp_port in livekit.yaml)
→ Docker maps 14139:14139/udp
→ ICE candidates advertise tunnel_ip:14139
→ Browser connects to tunnel_ip:14139 → tunnel → local:14139 → LiveKit
```
If your tunneling service doesn't let you choose the local port, you'll need to update `livekit.yaml` manually with the assigned ports.
## TLS Certificate Warning
With tunnel services on non-standard ports (e.g., `:9055`), Let's Encrypt can't auto-provision certificates (it requires ports 80/443). Caddy uses `tls internal` which generates a self-signed certificate. Participants will see a browser warning they must accept.
**To avoid the warning:**
- Use a tunnel service that provides port 443 for TCP
- Or use a real domain with `--domain` on a server with a public IP
## Compatible Tunnel Services
| Service | TCP | UDP | Free tier | Notes |
|---------|-----|-----|-----------|-------|
| [playit.gg](https://playit.gg) | Yes (premium) | Yes (premium) | Limited | $3/mo premium. Supports both TCP + UDP. |
| [ngrok](https://ngrok.com) | Yes | No | Limited | TCP only — needs a separate UDP tunnel for media |
| [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/) | Yes | No | Yes | TCP only — needs a separate UDP tunnel for media |
| [bore](https://github.com/ekzhang/bore) | Yes | No | Self-hosted | TCP only |
| [frp](https://github.com/fatedier/frp) | Yes | Yes | Self-hosted | Requires your own VPS to run the frp server |
| [Tailscale Funnel](https://tailscale.com/kb/1223/funnel) | Yes | No | Free (3 nodes) | TCP only, requires Tailscale account |
For a full self-contained setup without a VPS, playit.gg (TCP + UDP) is currently the simplest option.
## Limitations
- **Latency**: Adds a hop through the tunnel service's relay servers
- **Bandwidth**: Tunnel services may have bandwidth limits on free/cheap tiers
- **Reliability**: Depends on the tunnel service's uptime
- **Certificate warning**: Unavoidable with non-standard ports (see above)
- **Single UDP port**: Tunnel mode uses a single UDP port instead of a range, which limits concurrent WebRTC connections (~50 participants max)
- **Not production-grade**: Suitable for demos, small teams, development, and privacy-first setups. For production, use a server with a public IP.
## Comparison
| Approach | Cost | Setup | Data location | Port forwarding needed |
|----------|------|-------|---------------|----------------------|
| **Tunnel (this guide)** | $0-3/mo | Low | Your machine | No |
| **Cloud VM** | $5-20/mo | Low | Cloud provider | No |
| **Port forwarding** | $0 | Medium | Your machine | Yes (router config) |
| **VPN mesh (Tailscale)** | $0 | Low | Your machine | No (VPN peers only) |

26
egress.yaml.example Normal file
View File

@@ -0,0 +1,26 @@
# LiveKit Egress configuration
# Generated by setup-selfhosted.sh — do not edit manually.
# See: https://docs.livekit.io/self-hosting/egress/
api_key: __LIVEKIT_API_KEY__
api_secret: __LIVEKIT_API_SECRET__
ws_url: ws://livekit-server:7880
redis:
address: redis:6379
# Health check
health_port: 7082
# Logging
log_level: info
# CPU cost limits (Track Egress only — no composite video)
# Track Egress costs 1.0 CPU unit per track; hundreds can run on one instance.
# Default max_cpu_utilization is 0.8 (80% of available cores).
# Session limits
session_limits:
file_output_max_duration: 4h # Max 4 hours per recording
# S3 storage is configured per-request via the API (not here).
# The server passes S3 credentials when starting each Track Egress.

View File

@@ -114,8 +114,8 @@ modal secret create reflector-gpu REFLECTOR_GPU_APIKEY="$API_KEY"
# --- Deploy Functions ---
echo ""
echo "Deploying transcriber (Whisper)..."
TRANSCRIBER_URL=$(modal deploy reflector_transcriber.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
echo "Deploying transcriber (Parakeet)..."
TRANSCRIBER_URL=$(modal deploy reflector_transcriber_parakeet.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
if [ -z "$TRANSCRIBER_URL" ]; then
echo "Error: Failed to deploy transcriber. Check Modal dashboard for details."
exit 1
@@ -132,13 +132,22 @@ fi
echo " -> $DIARIZER_URL"
echo ""
echo "Deploying padding (CPU audio processing via Modal SDK)..."
modal deploy reflector_padding.py
if [ $? -ne 0 ]; then
echo "Deploying padding (CPU audio processing)..."
PADDING_URL=$(modal deploy reflector_padding.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
if [ -z "$PADDING_URL" ]; then
echo "Error: Failed to deploy padding. Check Modal dashboard for details."
exit 1
fi
echo " -> reflector-padding.pad_track (Modal SDK function)"
echo " -> $PADDING_URL"
echo ""
echo "Deploying mixdown (CPU multi-track audio mixing)..."
MIXDOWN_URL=$(modal deploy reflector_mixdown.py 2>&1 | grep -o 'https://[^ ]*web.modal.run' | head -1)
if [ -z "$MIXDOWN_URL" ]; then
echo "Error: Failed to deploy mixdown. Check Modal dashboard for details."
exit 1
fi
echo " -> $MIXDOWN_URL"
# --- Output Configuration ---
echo ""
@@ -157,5 +166,11 @@ echo "DIARIZATION_BACKEND=modal"
echo "DIARIZATION_URL=$DIARIZER_URL"
echo "DIARIZATION_MODAL_API_KEY=$API_KEY"
echo ""
echo "# Padding uses Modal SDK (requires MODAL_TOKEN_ID/SECRET in worker containers)"
echo "PADDING_BACKEND=modal"
echo "PADDING_URL=$PADDING_URL"
echo "PADDING_MODAL_API_KEY=$API_KEY"
echo ""
echo "MIXDOWN_BACKEND=modal"
echo "MIXDOWN_URL=$MIXDOWN_URL"
echo "MIXDOWN_MODAL_API_KEY=$API_KEY"
echo "# --- End Modal Configuration ---"

View File

@@ -113,12 +113,14 @@ def download_pyannote_audio():
diarizer_image = (
modal.Image.debian_slim(python_version="3.10")
modal.Image.from_registry(
"nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04", add_python="3.10"
)
.pip_install(
"pyannote.audio==3.1.0",
"requests",
"onnx",
"torchaudio",
"torchaudio==2.0.1",
"onnxruntime-gpu",
"torch==2.0.0",
"transformers==4.34.0",
@@ -133,14 +135,6 @@ diarizer_image = (
secrets=[modal.Secret.from_name("hf_token")],
)
.run_function(migrate_cache_llm)
.env(
{
"LD_LIBRARY_PATH": (
"/usr/local/lib/python3.10/site-packages/nvidia/cudnn/lib/:"
"/opt/conda/lib/python3.10/site-packages/nvidia/cublas/lib/"
)
}
)
)

View File

@@ -0,0 +1,385 @@
"""
Reflector GPU backend - audio mixdown
=====================================
CPU-intensive multi-track audio mixdown service.
Mixes N audio tracks into a single MP3 using PyAV amix filter graph.
IMPORTANT: This mixdown logic is duplicated from server/reflector/utils/audio_mixdown.py
for Modal deployment isolation (Modal can't import from server/reflector/). If you modify
the PyAV filter graph or mixdown algorithm, you MUST update both:
- gpu/modal_deployments/reflector_mixdown.py (this file)
- server/reflector/utils/audio_mixdown.py
Constants duplicated from server/reflector/utils/audio_constants.py for same reason.
"""
import os
import tempfile
from fractions import Fraction
import asyncio
import modal
S3_TIMEOUT = 120 # Higher than padding (60s) — multiple track downloads
MIXDOWN_TIMEOUT = 1200 + (S3_TIMEOUT * 2) # 1440s total
SCALEDOWN_WINDOW = 60
DISCONNECT_CHECK_INTERVAL = 2
app = modal.App("reflector-mixdown")
# CPU-based image (mixdown is CPU-bound, no GPU needed)
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("ffmpeg") # Required by PyAV
.pip_install(
"av==13.1.0", # PyAV for audio processing
"requests==2.32.3", # HTTP for presigned URL downloads/uploads
"fastapi==0.115.12", # API framework
)
)
@app.function(
cpu=4.0, # Higher than padding (2.0) for multi-track mixing
timeout=MIXDOWN_TIMEOUT,
scaledown_window=SCALEDOWN_WINDOW,
image=image,
secrets=[modal.Secret.from_name("reflector-gpu")],
)
@modal.asgi_app()
def web():
from fastapi import Depends, FastAPI, HTTPException, Request, status
from fastapi.security import OAuth2PasswordBearer
from pydantic import BaseModel
class MixdownRequest(BaseModel):
track_urls: list[str]
output_url: str
target_sample_rate: int | None = None
offsets_seconds: list[float] | None = None
class MixdownResponse(BaseModel):
size: int
duration_ms: float = 0.0
cancelled: bool = False
web_app = FastAPI()
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def apikey_auth(apikey: str = Depends(oauth2_scheme)):
if apikey == os.environ["REFLECTOR_GPU_APIKEY"]:
return
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API key",
headers={"WWW-Authenticate": "Bearer"},
)
@web_app.post("/mixdown", dependencies=[Depends(apikey_auth)])
async def mixdown_endpoint(request: Request, req: MixdownRequest) -> MixdownResponse:
"""Modal web endpoint for mixing audio tracks with disconnect detection."""
import logging
import threading
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
valid_urls = [u for u in req.track_urls if u]
if not valid_urls:
raise HTTPException(status_code=400, detail="No valid track URLs provided")
if req.offsets_seconds is not None:
if len(req.offsets_seconds) != len(req.track_urls):
raise HTTPException(
status_code=400,
detail=f"offsets_seconds length ({len(req.offsets_seconds)}) "
f"must match track_urls ({len(req.track_urls)})",
)
if any(o > 18000 for o in req.offsets_seconds):
raise HTTPException(status_code=400, detail="offsets_seconds exceeds maximum 18000s (5 hours)")
if not req.output_url:
raise HTTPException(status_code=400, detail="output_url cannot be empty")
logger.info(f"Mixdown request: {len(valid_urls)} tracks")
# Thread-safe cancellation flag
cancelled = threading.Event()
async def check_disconnect():
"""Background task to check for client disconnect."""
while not cancelled.is_set():
await asyncio.sleep(DISCONNECT_CHECK_INTERVAL)
if await request.is_disconnected():
logger.warning("Client disconnected, setting cancellation flag")
cancelled.set()
break
disconnect_task = asyncio.create_task(check_disconnect())
try:
result = await asyncio.get_event_loop().run_in_executor(
None, _mixdown_tracks_blocking, req, cancelled, logger
)
return MixdownResponse(**result)
finally:
cancelled.set()
disconnect_task.cancel()
try:
await disconnect_task
except asyncio.CancelledError:
pass
def _mixdown_tracks_blocking(req, cancelled, logger) -> dict:
"""Blocking CPU-bound mixdown work with periodic cancellation checks.
Downloads all tracks, builds PyAV amix filter graph, encodes to MP3,
and uploads the result to the presigned output URL.
"""
import av
import requests
from av.audio.resampler import AudioResampler
import time
temp_dir = tempfile.mkdtemp()
track_paths = []
output_path = None
last_check = time.time()
try:
# --- Download all tracks ---
valid_urls = [u for u in req.track_urls if u]
for i, url in enumerate(valid_urls):
if cancelled.is_set():
logger.info("Cancelled during download phase")
return {"size": 0, "duration_ms": 0.0, "cancelled": True}
logger.info(f"Downloading track {i}")
response = requests.get(url, stream=True, timeout=S3_TIMEOUT)
response.raise_for_status()
track_path = os.path.join(temp_dir, f"track_{i}.webm")
total_bytes = 0
chunk_count = 0
with open(track_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
total_bytes += len(chunk)
chunk_count += 1
if chunk_count % 12 == 0:
now = time.time()
if now - last_check >= DISCONNECT_CHECK_INTERVAL:
if cancelled.is_set():
logger.info(f"Cancelled during track {i} download")
return {"size": 0, "duration_ms": 0.0, "cancelled": True}
last_check = now
track_paths.append(track_path)
logger.info(f"Track {i} downloaded: {total_bytes} bytes")
if not track_paths:
raise ValueError("No tracks downloaded")
# --- Detect sample rate ---
target_sample_rate = req.target_sample_rate
if target_sample_rate is None:
for path in track_paths:
try:
container = av.open(path)
for frame in container.decode(audio=0):
target_sample_rate = frame.sample_rate
container.close()
break
else:
container.close()
continue
break
except Exception:
continue
if target_sample_rate is None:
raise ValueError("Could not detect sample rate from any track")
logger.info(f"Target sample rate: {target_sample_rate}")
# --- Calculate per-input delays ---
input_offsets_seconds = None
if req.offsets_seconds is not None:
input_offsets_seconds = [
req.offsets_seconds[i] for i, url in enumerate(req.track_urls) if url
]
delays_ms = []
if input_offsets_seconds is not None:
base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
delays_ms = [max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds]
else:
delays_ms = [0 for _ in track_paths]
# --- Build filter graph ---
# N abuffer -> optional adelay -> amix -> aformat -> abuffersink
graph = av.filter.Graph()
inputs = []
for idx in range(len(track_paths)):
args = (
f"time_base=1/{target_sample_rate}:"
f"sample_rate={target_sample_rate}:"
f"sample_fmt=s32:"
f"channel_layout=stereo"
)
in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
inputs.append(in_ctx)
mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
fmt = graph.add(
"aformat",
args=f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}",
name="fmt",
)
sink = graph.add("abuffersink", name="out")
for idx, in_ctx in enumerate(inputs):
delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
if delay_ms > 0:
adelay = graph.add(
"adelay",
args=f"delays={delay_ms}|{delay_ms}:all=1",
name=f"delay{idx}",
)
in_ctx.link_to(adelay)
adelay.link_to(mixer, 0, idx)
else:
in_ctx.link_to(mixer, 0, idx)
mixer.link_to(fmt)
fmt.link_to(sink)
graph.configure()
# --- Open all containers and decode ---
containers = []
output_path = os.path.join(temp_dir, "mixed.mp3")
try:
for path in track_paths:
containers.append(av.open(path))
decoders = [c.decode(audio=0) for c in containers]
active = [True] * len(decoders)
resamplers = [
AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
for _ in decoders
]
# Open output MP3
out_container = av.open(output_path, "w", format="mp3")
out_stream = out_container.add_stream("libmp3lame", rate=target_sample_rate)
total_duration = 0
while any(active):
# Check cancellation periodically
now = time.time()
if now - last_check >= DISCONNECT_CHECK_INTERVAL:
if cancelled.is_set():
logger.info("Cancelled during mixing")
out_container.close()
return {"size": 0, "duration_ms": 0.0, "cancelled": True}
last_check = now
for i, (dec, is_active) in enumerate(zip(decoders, active)):
if not is_active:
continue
try:
frame = next(dec)
except StopIteration:
active[i] = False
inputs[i].push(None)
continue
if frame.sample_rate != target_sample_rate:
continue
out_frames = resamplers[i].resample(frame) or []
for rf in out_frames:
rf.sample_rate = target_sample_rate
rf.time_base = Fraction(1, target_sample_rate)
inputs[i].push(rf)
while True:
try:
mixed = sink.pull()
except Exception:
break
mixed.sample_rate = target_sample_rate
mixed.time_base = Fraction(1, target_sample_rate)
for packet in out_stream.encode(mixed):
out_container.mux(packet)
total_duration += packet.duration
# Flush filter graph
while True:
try:
mixed = sink.pull()
except Exception:
break
mixed.sample_rate = target_sample_rate
mixed.time_base = Fraction(1, target_sample_rate)
for packet in out_stream.encode(mixed):
out_container.mux(packet)
total_duration += packet.duration
# Flush encoder
for packet in out_stream.encode(None):
out_container.mux(packet)
total_duration += packet.duration
# Calculate duration in ms
last_tb = out_stream.time_base
duration_ms = 0.0
if last_tb and total_duration > 0:
duration_ms = round(float(total_duration * last_tb * 1000), 2)
out_container.close()
finally:
for c in containers:
try:
c.close()
except Exception:
pass
file_size = os.path.getsize(output_path)
logger.info(f"Mixdown complete: {file_size} bytes, {duration_ms}ms")
if cancelled.is_set():
logger.info("Cancelled after mixing, before upload")
return {"size": 0, "duration_ms": 0.0, "cancelled": True}
# --- Upload result ---
logger.info("Uploading mixed audio to S3")
with open(output_path, "rb") as f:
upload_response = requests.put(req.output_url, data=f, timeout=S3_TIMEOUT)
upload_response.raise_for_status()
logger.info(f"Upload complete: {file_size} bytes")
return {"size": file_size, "duration_ms": duration_ms}
finally:
# Cleanup all temp files
for path in track_paths:
if os.path.exists(path):
try:
os.unlink(path)
except Exception as e:
logger.warning(f"Failed to cleanup track file: {e}")
if output_path and os.path.exists(output_path):
try:
os.unlink(output_path)
except Exception as e:
logger.warning(f"Failed to cleanup output file: {e}")
try:
os.rmdir(temp_dir)
except Exception as e:
logger.warning(f"Failed to cleanup temp directory: {e}")
return web_app

View File

@@ -52,10 +52,12 @@ OPUS_DEFAULT_BIT_RATE = 128000
timeout=PADDING_TIMEOUT,
scaledown_window=SCALEDOWN_WINDOW,
image=image,
secrets=[modal.Secret.from_name("reflector-gpu")],
)
@modal.asgi_app()
def web():
from fastapi import FastAPI, Request, HTTPException
from fastapi import Depends, FastAPI, HTTPException, Request, status
from fastapi.security import OAuth2PasswordBearer
from pydantic import BaseModel
class PaddingRequest(BaseModel):
@@ -70,7 +72,18 @@ def web():
web_app = FastAPI()
@web_app.post("/pad")
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def apikey_auth(apikey: str = Depends(oauth2_scheme)):
if apikey == os.environ["REFLECTOR_GPU_APIKEY"]:
return
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API key",
headers={"WWW-Authenticate": "Bearer"},
)
@web_app.post("/pad", dependencies=[Depends(apikey_auth)])
async def pad_track_endpoint(request: Request, req: PaddingRequest) -> PaddingResponse:
"""Modal web endpoint for padding audio tracks with disconnect detection.
"""

View File

@@ -42,6 +42,7 @@ COPY pyproject.toml uv.lock /app/
COPY ./app /app/app
COPY ./main.py /app/
COPY ./runserver.sh /app/
COPY ./docker-entrypoint.sh /app/
# prevent uv failing with too many open files on big cpus
ENV UV_CONCURRENT_INSTALLS=16
@@ -52,6 +53,8 @@ RUN --mount=type=cache,target=/root/.cache/uv \
EXPOSE 8000
CMD ["sh", "/app/runserver.sh"]
RUN chmod +x /app/docker-entrypoint.sh
CMD ["sh", "/app/docker-entrypoint.sh"]

View File

@@ -26,6 +26,7 @@ COPY pyproject.toml uv.lock /app/
COPY ./app /app/app
COPY ./main.py /app/
COPY ./runserver.sh /app/
COPY ./docker-entrypoint.sh /app/
# prevent uv failing with too many open files on big cpus
ENV UV_CONCURRENT_INSTALLS=16
@@ -36,4 +37,6 @@ RUN --mount=type=cache,target=/root/.cache/uv \
EXPOSE 8000
CMD ["sh", "/app/runserver.sh"]
RUN chmod +x /app/docker-entrypoint.sh
CMD ["sh", "/app/docker-entrypoint.sh"]

View File

@@ -3,6 +3,7 @@ from contextlib import asynccontextmanager
from fastapi import FastAPI
from .routers.diarization import router as diarization_router
from .routers.mixdown import router as mixdown_router
from .routers.padding import router as padding_router
from .routers.transcription import router as transcription_router
from .routers.translation import router as translation_router
@@ -29,4 +30,5 @@ def create_app() -> FastAPI:
app.include_router(translation_router)
app.include_router(diarization_router)
app.include_router(padding_router)
app.include_router(mixdown_router)
return app

View File

@@ -0,0 +1,288 @@
"""
Audio mixdown endpoint for selfhosted GPU service.
CPU-intensive multi-track audio mixing service for combining N audio tracks
into a single MP3 using PyAV amix filter graph.
IMPORTANT: This mixdown logic is duplicated from server/reflector/utils/audio_mixdown.py
for deployment isolation (self_hosted can't import from server/reflector/). If you modify
the PyAV filter graph or mixdown algorithm, you MUST update both:
- gpu/self_hosted/app/routers/mixdown.py (this file)
- server/reflector/utils/audio_mixdown.py
Constants duplicated from server/reflector/utils/audio_constants.py for same reason.
"""
import logging
import os
import tempfile
from fractions import Fraction
import av
import requests
from av.audio.resampler import AudioResampler
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from ..auth import apikey_auth
logger = logging.getLogger(__name__)
router = APIRouter(tags=["mixdown"])
S3_TIMEOUT = 120
class MixdownRequest(BaseModel):
track_urls: list[str]
output_url: str
target_sample_rate: int | None = None
offsets_seconds: list[float] | None = None
class MixdownResponse(BaseModel):
size: int
duration_ms: float = 0.0
cancelled: bool = False
@router.post("/mixdown", dependencies=[Depends(apikey_auth)], response_model=MixdownResponse)
def mixdown_tracks(req: MixdownRequest):
"""Mix multiple audio tracks into single MP3 using PyAV amix filter graph."""
valid_urls = [u for u in req.track_urls if u]
if not valid_urls:
raise HTTPException(status_code=400, detail="No valid track URLs provided")
if req.offsets_seconds is not None:
if len(req.offsets_seconds) != len(req.track_urls):
raise HTTPException(
status_code=400,
detail=f"offsets_seconds length ({len(req.offsets_seconds)}) "
f"must match track_urls ({len(req.track_urls)})",
)
if any(o > 18000 for o in req.offsets_seconds):
raise HTTPException(
status_code=400, detail="offsets_seconds exceeds maximum 18000s (5 hours)"
)
if not req.output_url:
raise HTTPException(status_code=400, detail="output_url cannot be empty")
logger.info("Mixdown request: %d tracks", len(valid_urls))
temp_dir = tempfile.mkdtemp()
track_paths = []
output_path = None
try:
# --- Download all tracks ---
for i, url in enumerate(valid_urls):
logger.info("Downloading track %d", i)
response = requests.get(url, stream=True, timeout=S3_TIMEOUT)
response.raise_for_status()
track_path = os.path.join(temp_dir, f"track_{i}.webm")
total_bytes = 0
with open(track_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
total_bytes += len(chunk)
track_paths.append(track_path)
logger.info("Track %d downloaded: %d bytes", i, total_bytes)
if not track_paths:
raise HTTPException(status_code=400, detail="No tracks could be downloaded")
# --- Detect sample rate ---
target_sample_rate = req.target_sample_rate
if target_sample_rate is None:
for path in track_paths:
try:
container = av.open(path)
for frame in container.decode(audio=0):
target_sample_rate = frame.sample_rate
container.close()
break
else:
container.close()
continue
break
except Exception:
continue
if target_sample_rate is None:
raise HTTPException(
status_code=400, detail="Could not detect sample rate from any track"
)
logger.info("Target sample rate: %d", target_sample_rate)
# --- Calculate per-input delays ---
input_offsets_seconds = None
if req.offsets_seconds is not None:
input_offsets_seconds = [
req.offsets_seconds[i] for i, url in enumerate(req.track_urls) if url
]
delays_ms = []
if input_offsets_seconds is not None:
base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
delays_ms = [max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds]
else:
delays_ms = [0 for _ in track_paths]
# --- Build filter graph ---
# N abuffer -> optional adelay -> amix -> aformat -> abuffersink
graph = av.filter.Graph()
inputs = []
for idx in range(len(track_paths)):
args = (
f"time_base=1/{target_sample_rate}:"
f"sample_rate={target_sample_rate}:"
f"sample_fmt=s32:"
f"channel_layout=stereo"
)
in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
inputs.append(in_ctx)
mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
fmt = graph.add(
"aformat",
args=f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}",
name="fmt",
)
sink = graph.add("abuffersink", name="out")
for idx, in_ctx in enumerate(inputs):
delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
if delay_ms > 0:
adelay = graph.add(
"adelay",
args=f"delays={delay_ms}|{delay_ms}:all=1",
name=f"delay{idx}",
)
in_ctx.link_to(adelay)
adelay.link_to(mixer, 0, idx)
else:
in_ctx.link_to(mixer, 0, idx)
mixer.link_to(fmt)
fmt.link_to(sink)
graph.configure()
# --- Open all containers and decode ---
containers = []
output_path = os.path.join(temp_dir, "mixed.mp3")
try:
for path in track_paths:
containers.append(av.open(path))
decoders = [c.decode(audio=0) for c in containers]
active = [True] * len(decoders)
resamplers = [
AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
for _ in decoders
]
# Open output MP3
out_container = av.open(output_path, "w", format="mp3")
out_stream = out_container.add_stream("libmp3lame", rate=target_sample_rate)
total_duration = 0
while any(active):
for i, (dec, is_active) in enumerate(zip(decoders, active)):
if not is_active:
continue
try:
frame = next(dec)
except StopIteration:
active[i] = False
inputs[i].push(None)
continue
if frame.sample_rate != target_sample_rate:
continue
out_frames = resamplers[i].resample(frame) or []
for rf in out_frames:
rf.sample_rate = target_sample_rate
rf.time_base = Fraction(1, target_sample_rate)
inputs[i].push(rf)
while True:
try:
mixed = sink.pull()
except Exception:
break
mixed.sample_rate = target_sample_rate
mixed.time_base = Fraction(1, target_sample_rate)
for packet in out_stream.encode(mixed):
out_container.mux(packet)
total_duration += packet.duration
# Flush filter graph
while True:
try:
mixed = sink.pull()
except Exception:
break
mixed.sample_rate = target_sample_rate
mixed.time_base = Fraction(1, target_sample_rate)
for packet in out_stream.encode(mixed):
out_container.mux(packet)
total_duration += packet.duration
# Flush encoder
for packet in out_stream.encode(None):
out_container.mux(packet)
total_duration += packet.duration
# Calculate duration in ms
last_tb = out_stream.time_base
duration_ms = 0.0
if last_tb and total_duration > 0:
duration_ms = round(float(total_duration * last_tb * 1000), 2)
out_container.close()
finally:
for c in containers:
try:
c.close()
except Exception:
pass
file_size = os.path.getsize(output_path)
logger.info("Mixdown complete: %d bytes, %.2fms", file_size, duration_ms)
# --- Upload result ---
logger.info("Uploading mixed audio to S3")
with open(output_path, "rb") as f:
upload_response = requests.put(req.output_url, data=f, timeout=S3_TIMEOUT)
upload_response.raise_for_status()
logger.info("Upload complete: %d bytes", file_size)
return MixdownResponse(size=file_size, duration_ms=duration_ms)
except HTTPException:
raise
except Exception as e:
logger.error("Mixdown failed: %s", e, exc_info=True)
raise HTTPException(status_code=500, detail=f"Mixdown failed: {e}") from e
finally:
for path in track_paths:
if os.path.exists(path):
try:
os.unlink(path)
except Exception as e:
logger.warning("Failed to cleanup track file: %s", e)
if output_path and os.path.exists(output_path):
try:
os.unlink(output_path)
except Exception as e:
logger.warning("Failed to cleanup output file: %s", e)
try:
os.rmdir(temp_dir)
except Exception as e:
logger.warning("Failed to cleanup temp directory: %s", e)

View File

@@ -0,0 +1,23 @@
#!/bin/sh
set -e
# Custom CA certificate injection
# If a CA cert is mounted at this path (via docker-compose.ca.yml),
# add it to the system trust store and configure all Python SSL libraries.
CUSTOM_CA_PATH="/usr/local/share/ca-certificates/custom-ca.crt"
if [ -s "$CUSTOM_CA_PATH" ]; then
echo "[entrypoint] Custom CA certificate detected, updating trust store..."
update-ca-certificates 2>/dev/null
# update-ca-certificates creates a combined bundle (system + custom CAs)
COMBINED_BUNDLE="/etc/ssl/certs/ca-certificates.crt"
export SSL_CERT_FILE="$COMBINED_BUNDLE"
export REQUESTS_CA_BUNDLE="$COMBINED_BUNDLE"
export CURL_CA_BUNDLE="$COMBINED_BUNDLE"
# Note: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH is intentionally NOT set here.
# Setting it causes grpcio to attempt TLS on connections that may be plaintext.
echo "[entrypoint] CA trust store updated (SSL_CERT_FILE=$COMBINED_BUNDLE)"
fi
exec sh /app/runserver.sh

252
gpu/self_hosted/uv.lock generated
View File

@@ -13,7 +13,7 @@ wheels = [
[[package]]
name = "aiohttp"
version = "3.13.3"
version = "3.13.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohappyeyeballs" },
@@ -24,76 +24,76 @@ dependencies = [
{ name = "propcache" },
{ name = "yarl" },
]
sdist = { url = "https://files.pythonhosted.org/packages/50/42/32cf8e7704ceb4481406eb87161349abb46a57fee3f008ba9cb610968646/aiohttp-3.13.3.tar.gz", hash = "sha256:a949eee43d3782f2daae4f4a2819b2cb9b0c5d3b7f7a927067cc84dafdbb9f88", size = 7844556, upload-time = "2026-01-03T17:33:05.204Z" }
sdist = { url = "https://files.pythonhosted.org/packages/45/4a/064321452809dae953c1ed6e017504e72551a26b6f5708a5a80e4bf556ff/aiohttp-3.13.4.tar.gz", hash = "sha256:d97a6d09c66087890c2ab5d49069e1e570583f7ac0314ecf98294c1b6aaebd38", size = 7859748, upload-time = "2026-03-28T17:19:40.6Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a0/be/4fc11f202955a69e0db803a12a062b8379c970c7c84f4882b6da17337cc1/aiohttp-3.13.3-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:b903a4dfee7d347e2d87697d0713be59e0b87925be030c9178c5faa58ea58d5c", size = 739732, upload-time = "2026-01-03T17:30:14.23Z" },
{ url = "https://files.pythonhosted.org/packages/97/2c/621d5b851f94fa0bb7430d6089b3aa970a9d9b75196bc93bb624b0db237a/aiohttp-3.13.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a45530014d7a1e09f4a55f4f43097ba0fd155089372e105e4bff4ca76cb1b168", size = 494293, upload-time = "2026-01-03T17:30:15.96Z" },
{ url = "https://files.pythonhosted.org/packages/5d/43/4be01406b78e1be8320bb8316dc9c42dbab553d281c40364e0f862d5661c/aiohttp-3.13.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:27234ef6d85c914f9efeb77ff616dbf4ad2380be0cda40b4db086ffc7ddd1b7d", size = 493533, upload-time = "2026-01-03T17:30:17.431Z" },
{ url = "https://files.pythonhosted.org/packages/8d/a8/5a35dc56a06a2c90d4742cbf35294396907027f80eea696637945a106f25/aiohttp-3.13.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d32764c6c9aafb7fb55366a224756387cd50bfa720f32b88e0e6fa45b27dcf29", size = 1737839, upload-time = "2026-01-03T17:30:19.422Z" },
{ url = "https://files.pythonhosted.org/packages/bf/62/4b9eeb331da56530bf2e198a297e5303e1c1ebdceeb00fe9b568a65c5a0c/aiohttp-3.13.3-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b1a6102b4d3ebc07dad44fbf07b45bb600300f15b552ddf1851b5390202ea2e3", size = 1703932, upload-time = "2026-01-03T17:30:21.756Z" },
{ url = "https://files.pythonhosted.org/packages/7c/f6/af16887b5d419e6a367095994c0b1332d154f647e7dc2bd50e61876e8e3d/aiohttp-3.13.3-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c014c7ea7fb775dd015b2d3137378b7be0249a448a1612268b5a90c2d81de04d", size = 1771906, upload-time = "2026-01-03T17:30:23.932Z" },
{ url = "https://files.pythonhosted.org/packages/ce/83/397c634b1bcc24292fa1e0c7822800f9f6569e32934bdeef09dae7992dfb/aiohttp-3.13.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2b8d8ddba8f95ba17582226f80e2de99c7a7948e66490ef8d947e272a93e9463", size = 1871020, upload-time = "2026-01-03T17:30:26Z" },
{ url = "https://files.pythonhosted.org/packages/86/f6/a62cbbf13f0ac80a70f71b1672feba90fdb21fd7abd8dbf25c0105fb6fa3/aiohttp-3.13.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9ae8dd55c8e6c4257eae3a20fd2c8f41edaea5992ed67156642493b8daf3cecc", size = 1755181, upload-time = "2026-01-03T17:30:27.554Z" },
{ url = "https://files.pythonhosted.org/packages/0a/87/20a35ad487efdd3fba93d5843efdfaa62d2f1479eaafa7453398a44faf13/aiohttp-3.13.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:01ad2529d4b5035578f5081606a465f3b814c542882804e2e8cda61adf5c71bf", size = 1561794, upload-time = "2026-01-03T17:30:29.254Z" },
{ url = "https://files.pythonhosted.org/packages/de/95/8fd69a66682012f6716e1bc09ef8a1a2a91922c5725cb904689f112309c4/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bb4f7475e359992b580559e008c598091c45b5088f28614e855e42d39c2f1033", size = 1697900, upload-time = "2026-01-03T17:30:31.033Z" },
{ url = "https://files.pythonhosted.org/packages/e5/66/7b94b3b5ba70e955ff597672dad1691333080e37f50280178967aff68657/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:c19b90316ad3b24c69cd78d5c9b4f3aa4497643685901185b65166293d36a00f", size = 1728239, upload-time = "2026-01-03T17:30:32.703Z" },
{ url = "https://files.pythonhosted.org/packages/47/71/6f72f77f9f7d74719692ab65a2a0252584bf8d5f301e2ecb4c0da734530a/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:96d604498a7c782cb15a51c406acaea70d8c027ee6b90c569baa6e7b93073679", size = 1740527, upload-time = "2026-01-03T17:30:34.695Z" },
{ url = "https://files.pythonhosted.org/packages/fa/b4/75ec16cbbd5c01bdaf4a05b19e103e78d7ce1ef7c80867eb0ace42ff4488/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:084911a532763e9d3dd95adf78a78f4096cd5f58cdc18e6fdbc1b58417a45423", size = 1554489, upload-time = "2026-01-03T17:30:36.864Z" },
{ url = "https://files.pythonhosted.org/packages/52/8f/bc518c0eea29f8406dcf7ed1f96c9b48e3bc3995a96159b3fc11f9e08321/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:7a4a94eb787e606d0a09404b9c38c113d3b099d508021faa615d70a0131907ce", size = 1767852, upload-time = "2026-01-03T17:30:39.433Z" },
{ url = "https://files.pythonhosted.org/packages/9d/f2/a07a75173124f31f11ea6f863dc44e6f09afe2bca45dd4e64979490deab1/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:87797e645d9d8e222e04160ee32aa06bc5c163e8499f24db719e7852ec23093a", size = 1722379, upload-time = "2026-01-03T17:30:41.081Z" },
{ url = "https://files.pythonhosted.org/packages/3c/4a/1a3fee7c21350cac78e5c5cef711bac1b94feca07399f3d406972e2d8fcd/aiohttp-3.13.3-cp312-cp312-win32.whl", hash = "sha256:b04be762396457bef43f3597c991e192ee7da460a4953d7e647ee4b1c28e7046", size = 428253, upload-time = "2026-01-03T17:30:42.644Z" },
{ url = "https://files.pythonhosted.org/packages/d9/b7/76175c7cb4eb73d91ad63c34e29fc4f77c9386bba4a65b53ba8e05ee3c39/aiohttp-3.13.3-cp312-cp312-win_amd64.whl", hash = "sha256:e3531d63d3bdfa7e3ac5e9b27b2dd7ec9df3206a98e0b3445fa906f233264c57", size = 455407, upload-time = "2026-01-03T17:30:44.195Z" },
{ url = "https://files.pythonhosted.org/packages/97/8a/12ca489246ca1faaf5432844adbfce7ff2cc4997733e0af120869345643a/aiohttp-3.13.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:5dff64413671b0d3e7d5918ea490bdccb97a4ad29b3f311ed423200b2203e01c", size = 734190, upload-time = "2026-01-03T17:30:45.832Z" },
{ url = "https://files.pythonhosted.org/packages/32/08/de43984c74ed1fca5c014808963cc83cb00d7bb06af228f132d33862ca76/aiohttp-3.13.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:87b9aab6d6ed88235aa2970294f496ff1a1f9adcd724d800e9b952395a80ffd9", size = 491783, upload-time = "2026-01-03T17:30:47.466Z" },
{ url = "https://files.pythonhosted.org/packages/17/f8/8dd2cf6112a5a76f81f81a5130c57ca829d101ad583ce57f889179accdda/aiohttp-3.13.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:425c126c0dc43861e22cb1c14ba4c8e45d09516d0a3ae0a3f7494b79f5f233a3", size = 490704, upload-time = "2026-01-03T17:30:49.373Z" },
{ url = "https://files.pythonhosted.org/packages/6d/40/a46b03ca03936f832bc7eaa47cfbb1ad012ba1be4790122ee4f4f8cba074/aiohttp-3.13.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f9120f7093c2a32d9647abcaf21e6ad275b4fbec5b55969f978b1a97c7c86bf", size = 1720652, upload-time = "2026-01-03T17:30:50.974Z" },
{ url = "https://files.pythonhosted.org/packages/f7/7e/917fe18e3607af92657e4285498f500dca797ff8c918bd7d90b05abf6c2a/aiohttp-3.13.3-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:697753042d57f4bf7122cab985bf15d0cef23c770864580f5af4f52023a56bd6", size = 1692014, upload-time = "2026-01-03T17:30:52.729Z" },
{ url = "https://files.pythonhosted.org/packages/71/b6/cefa4cbc00d315d68973b671cf105b21a609c12b82d52e5d0c9ae61d2a09/aiohttp-3.13.3-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:6de499a1a44e7de70735d0b39f67c8f25eb3d91eb3103be99ca0fa882cdd987d", size = 1759777, upload-time = "2026-01-03T17:30:54.537Z" },
{ url = "https://files.pythonhosted.org/packages/fb/e3/e06ee07b45e59e6d81498b591fc589629be1553abb2a82ce33efe2a7b068/aiohttp-3.13.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:37239e9f9a7ea9ac5bf6b92b0260b01f8a22281996da609206a84df860bc1261", size = 1861276, upload-time = "2026-01-03T17:30:56.512Z" },
{ url = "https://files.pythonhosted.org/packages/7c/24/75d274228acf35ceeb2850b8ce04de9dd7355ff7a0b49d607ee60c29c518/aiohttp-3.13.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f76c1e3fe7d7c8afad7ed193f89a292e1999608170dcc9751a7462a87dfd5bc0", size = 1743131, upload-time = "2026-01-03T17:30:58.256Z" },
{ url = "https://files.pythonhosted.org/packages/04/98/3d21dde21889b17ca2eea54fdcff21b27b93f45b7bb94ca029c31ab59dc3/aiohttp-3.13.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fc290605db2a917f6e81b0e1e0796469871f5af381ce15c604a3c5c7e51cb730", size = 1556863, upload-time = "2026-01-03T17:31:00.445Z" },
{ url = "https://files.pythonhosted.org/packages/9e/84/da0c3ab1192eaf64782b03971ab4055b475d0db07b17eff925e8c93b3aa5/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4021b51936308aeea0367b8f006dc999ca02bc118a0cc78c303f50a2ff6afb91", size = 1682793, upload-time = "2026-01-03T17:31:03.024Z" },
{ url = "https://files.pythonhosted.org/packages/ff/0f/5802ada182f575afa02cbd0ec5180d7e13a402afb7c2c03a9aa5e5d49060/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:49a03727c1bba9a97d3e93c9f93ca03a57300f484b6e935463099841261195d3", size = 1716676, upload-time = "2026-01-03T17:31:04.842Z" },
{ url = "https://files.pythonhosted.org/packages/3f/8c/714d53bd8b5a4560667f7bbbb06b20c2382f9c7847d198370ec6526af39c/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:3d9908a48eb7416dc1f4524e69f1d32e5d90e3981e4e37eb0aa1cd18f9cfa2a4", size = 1733217, upload-time = "2026-01-03T17:31:06.868Z" },
{ url = "https://files.pythonhosted.org/packages/7d/79/e2176f46d2e963facea939f5be2d26368ce543622be6f00a12844d3c991f/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:2712039939ec963c237286113c68dbad80a82a4281543f3abf766d9d73228998", size = 1552303, upload-time = "2026-01-03T17:31:08.958Z" },
{ url = "https://files.pythonhosted.org/packages/ab/6a/28ed4dea1759916090587d1fe57087b03e6c784a642b85ef48217b0277ae/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:7bfdc049127717581866fa4708791220970ce291c23e28ccf3922c700740fdc0", size = 1763673, upload-time = "2026-01-03T17:31:10.676Z" },
{ url = "https://files.pythonhosted.org/packages/e8/35/4a3daeb8b9fab49240d21c04d50732313295e4bd813a465d840236dd0ce1/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8057c98e0c8472d8846b9c79f56766bcc57e3e8ac7bfd510482332366c56c591", size = 1721120, upload-time = "2026-01-03T17:31:12.575Z" },
{ url = "https://files.pythonhosted.org/packages/bc/9f/d643bb3c5fb99547323e635e251c609fbbc660d983144cfebec529e09264/aiohttp-3.13.3-cp313-cp313-win32.whl", hash = "sha256:1449ceddcdbcf2e0446957863af03ebaaa03f94c090f945411b61269e2cb5daf", size = 427383, upload-time = "2026-01-03T17:31:14.382Z" },
{ url = "https://files.pythonhosted.org/packages/4e/f1/ab0395f8a79933577cdd996dd2f9aa6014af9535f65dddcf88204682fe62/aiohttp-3.13.3-cp313-cp313-win_amd64.whl", hash = "sha256:693781c45a4033d31d4187d2436f5ac701e7bbfe5df40d917736108c1cc7436e", size = 453899, upload-time = "2026-01-03T17:31:15.958Z" },
{ url = "https://files.pythonhosted.org/packages/99/36/5b6514a9f5d66f4e2597e40dea2e3db271e023eb7a5d22defe96ba560996/aiohttp-3.13.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:ea37047c6b367fd4bd632bff8077449b8fa034b69e812a18e0132a00fae6e808", size = 737238, upload-time = "2026-01-03T17:31:17.909Z" },
{ url = "https://files.pythonhosted.org/packages/f7/49/459327f0d5bcd8c6c9ca69e60fdeebc3622861e696490d8674a6d0cb90a6/aiohttp-3.13.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:6fc0e2337d1a4c3e6acafda6a78a39d4c14caea625124817420abceed36e2415", size = 492292, upload-time = "2026-01-03T17:31:19.919Z" },
{ url = "https://files.pythonhosted.org/packages/e8/0b/b97660c5fd05d3495b4eb27f2d0ef18dc1dc4eff7511a9bf371397ff0264/aiohttp-3.13.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c685f2d80bb67ca8c3837823ad76196b3694b0159d232206d1e461d3d434666f", size = 493021, upload-time = "2026-01-03T17:31:21.636Z" },
{ url = "https://files.pythonhosted.org/packages/54/d4/438efabdf74e30aeceb890c3290bbaa449780583b1270b00661126b8aae4/aiohttp-3.13.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:48e377758516d262bde50c2584fc6c578af272559c409eecbdd2bae1601184d6", size = 1717263, upload-time = "2026-01-03T17:31:23.296Z" },
{ url = "https://files.pythonhosted.org/packages/71/f2/7bddc7fd612367d1459c5bcf598a9e8f7092d6580d98de0e057eb42697ad/aiohttp-3.13.3-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:34749271508078b261c4abb1767d42b8d0c0cc9449c73a4df494777dc55f0687", size = 1669107, upload-time = "2026-01-03T17:31:25.334Z" },
{ url = "https://files.pythonhosted.org/packages/00/5a/1aeaecca40e22560f97610a329e0e5efef5e0b5afdf9f857f0d93839ab2e/aiohttp-3.13.3-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:82611aeec80eb144416956ec85b6ca45a64d76429c1ed46ae1b5f86c6e0c9a26", size = 1760196, upload-time = "2026-01-03T17:31:27.394Z" },
{ url = "https://files.pythonhosted.org/packages/f8/f8/0ff6992bea7bd560fc510ea1c815f87eedd745fe035589c71ce05612a19a/aiohttp-3.13.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2fff83cfc93f18f215896e3a190e8e5cb413ce01553901aca925176e7568963a", size = 1843591, upload-time = "2026-01-03T17:31:29.238Z" },
{ url = "https://files.pythonhosted.org/packages/e3/d1/e30e537a15f53485b61f5be525f2157da719819e8377298502aebac45536/aiohttp-3.13.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bbe7d4cecacb439e2e2a8a1a7b935c25b812af7a5fd26503a66dadf428e79ec1", size = 1720277, upload-time = "2026-01-03T17:31:31.053Z" },
{ url = "https://files.pythonhosted.org/packages/84/45/23f4c451d8192f553d38d838831ebbc156907ea6e05557f39563101b7717/aiohttp-3.13.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b928f30fe49574253644b1ca44b1b8adbd903aa0da4b9054a6c20fc7f4092a25", size = 1548575, upload-time = "2026-01-03T17:31:32.87Z" },
{ url = "https://files.pythonhosted.org/packages/6a/ed/0a42b127a43712eda7807e7892c083eadfaf8429ca8fb619662a530a3aab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7b5e8fe4de30df199155baaf64f2fcd604f4c678ed20910db8e2c66dc4b11603", size = 1679455, upload-time = "2026-01-03T17:31:34.76Z" },
{ url = "https://files.pythonhosted.org/packages/2e/b5/c05f0c2b4b4fe2c9d55e73b6d3ed4fd6c9dc2684b1d81cbdf77e7fad9adb/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:8542f41a62bcc58fc7f11cf7c90e0ec324ce44950003feb70640fc2a9092c32a", size = 1687417, upload-time = "2026-01-03T17:31:36.699Z" },
{ url = "https://files.pythonhosted.org/packages/c9/6b/915bc5dad66aef602b9e459b5a973529304d4e89ca86999d9d75d80cbd0b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5e1d8c8b8f1d91cd08d8f4a3c2b067bfca6ec043d3ff36de0f3a715feeedf926", size = 1729968, upload-time = "2026-01-03T17:31:38.622Z" },
{ url = "https://files.pythonhosted.org/packages/11/3b/e84581290a9520024a08640b63d07673057aec5ca548177a82026187ba73/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:90455115e5da1c3c51ab619ac57f877da8fd6d73c05aacd125c5ae9819582aba", size = 1545690, upload-time = "2026-01-03T17:31:40.57Z" },
{ url = "https://files.pythonhosted.org/packages/f5/04/0c3655a566c43fd647c81b895dfe361b9f9ad6d58c19309d45cff52d6c3b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:042e9e0bcb5fba81886c8b4fbb9a09d6b8a00245fd8d88e4d989c1f96c74164c", size = 1746390, upload-time = "2026-01-03T17:31:42.857Z" },
{ url = "https://files.pythonhosted.org/packages/1f/53/71165b26978f719c3419381514c9690bd5980e764a09440a10bb816ea4ab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2eb752b102b12a76ca02dff751a801f028b4ffbbc478840b473597fc91a9ed43", size = 1702188, upload-time = "2026-01-03T17:31:44.984Z" },
{ url = "https://files.pythonhosted.org/packages/29/a7/cbe6c9e8e136314fa1980da388a59d2f35f35395948a08b6747baebb6aa6/aiohttp-3.13.3-cp314-cp314-win32.whl", hash = "sha256:b556c85915d8efaed322bf1bdae9486aa0f3f764195a0fb6ee962e5c71ef5ce1", size = 433126, upload-time = "2026-01-03T17:31:47.463Z" },
{ url = "https://files.pythonhosted.org/packages/de/56/982704adea7d3b16614fc5936014e9af85c0e34b58f9046655817f04306e/aiohttp-3.13.3-cp314-cp314-win_amd64.whl", hash = "sha256:9bf9f7a65e7aa20dd764151fb3d616c81088f91f8df39c3893a536e279b4b984", size = 459128, upload-time = "2026-01-03T17:31:49.2Z" },
{ url = "https://files.pythonhosted.org/packages/6c/2a/3c79b638a9c3d4658d345339d22070241ea341ed4e07b5ac60fb0f418003/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:05861afbbec40650d8a07ea324367cb93e9e8cc7762e04dd4405df99fa65159c", size = 769512, upload-time = "2026-01-03T17:31:51.134Z" },
{ url = "https://files.pythonhosted.org/packages/29/b9/3e5014d46c0ab0db8707e0ac2711ed28c4da0218c358a4e7c17bae0d8722/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:2fc82186fadc4a8316768d61f3722c230e2c1dcab4200d52d2ebdf2482e47592", size = 506444, upload-time = "2026-01-03T17:31:52.85Z" },
{ url = "https://files.pythonhosted.org/packages/90/03/c1d4ef9a054e151cd7839cdc497f2638f00b93cbe8043983986630d7a80c/aiohttp-3.13.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:0add0900ff220d1d5c5ebbf99ed88b0c1bbf87aa7e4262300ed1376a6b13414f", size = 510798, upload-time = "2026-01-03T17:31:54.91Z" },
{ url = "https://files.pythonhosted.org/packages/ea/76/8c1e5abbfe8e127c893fe7ead569148a4d5a799f7cf958d8c09f3eedf097/aiohttp-3.13.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:568f416a4072fbfae453dcf9a99194bbb8bdeab718e08ee13dfa2ba0e4bebf29", size = 1868835, upload-time = "2026-01-03T17:31:56.733Z" },
{ url = "https://files.pythonhosted.org/packages/8e/ac/984c5a6f74c363b01ff97adc96a3976d9c98940b8969a1881575b279ac5d/aiohttp-3.13.3-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:add1da70de90a2569c5e15249ff76a631ccacfe198375eead4aadf3b8dc849dc", size = 1720486, upload-time = "2026-01-03T17:31:58.65Z" },
{ url = "https://files.pythonhosted.org/packages/b2/9a/b7039c5f099c4eb632138728828b33428585031a1e658d693d41d07d89d1/aiohttp-3.13.3-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:10b47b7ba335d2e9b1239fa571131a87e2d8ec96b333e68b2a305e7a98b0bae2", size = 1847951, upload-time = "2026-01-03T17:32:00.989Z" },
{ url = "https://files.pythonhosted.org/packages/3c/02/3bec2b9a1ba3c19ff89a43a19324202b8eb187ca1e928d8bdac9bbdddebd/aiohttp-3.13.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3dd4dce1c718e38081c8f35f323209d4c1df7d4db4bab1b5c88a6b4d12b74587", size = 1941001, upload-time = "2026-01-03T17:32:03.122Z" },
{ url = "https://files.pythonhosted.org/packages/37/df/d879401cedeef27ac4717f6426c8c36c3091c6e9f08a9178cc87549c537f/aiohttp-3.13.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:34bac00a67a812570d4a460447e1e9e06fae622946955f939051e7cc895cfab8", size = 1797246, upload-time = "2026-01-03T17:32:05.255Z" },
{ url = "https://files.pythonhosted.org/packages/8d/15/be122de1f67e6953add23335c8ece6d314ab67c8bebb3f181063010795a7/aiohttp-3.13.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a19884d2ee70b06d9204b2727a7b9f983d0c684c650254679e716b0b77920632", size = 1627131, upload-time = "2026-01-03T17:32:07.607Z" },
{ url = "https://files.pythonhosted.org/packages/12/12/70eedcac9134cfa3219ab7af31ea56bc877395b1ac30d65b1bc4b27d0438/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5f8ca7f2bb6ba8348a3614c7918cc4bb73268c5ac2a207576b7afea19d3d9f64", size = 1795196, upload-time = "2026-01-03T17:32:09.59Z" },
{ url = "https://files.pythonhosted.org/packages/32/11/b30e1b1cd1f3054af86ebe60df96989c6a414dd87e27ad16950eee420bea/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:b0d95340658b9d2f11d9697f59b3814a9d3bb4b7a7c20b131df4bcef464037c0", size = 1782841, upload-time = "2026-01-03T17:32:11.445Z" },
{ url = "https://files.pythonhosted.org/packages/88/0d/d98a9367b38912384a17e287850f5695c528cff0f14f791ce8ee2e4f7796/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:a1e53262fd202e4b40b70c3aff944a8155059beedc8a89bba9dc1f9ef06a1b56", size = 1795193, upload-time = "2026-01-03T17:32:13.705Z" },
{ url = "https://files.pythonhosted.org/packages/43/a5/a2dfd1f5ff5581632c7f6a30e1744deda03808974f94f6534241ef60c751/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:d60ac9663f44168038586cab2157e122e46bdef09e9368b37f2d82d354c23f72", size = 1621979, upload-time = "2026-01-03T17:32:15.965Z" },
{ url = "https://files.pythonhosted.org/packages/fa/f0/12973c382ae7c1cccbc4417e129c5bf54c374dfb85af70893646e1f0e749/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:90751b8eed69435bac9ff4e3d2f6b3af1f57e37ecb0fbeee59c0174c9e2d41df", size = 1822193, upload-time = "2026-01-03T17:32:18.219Z" },
{ url = "https://files.pythonhosted.org/packages/3c/5f/24155e30ba7f8c96918af1350eb0663e2430aad9e001c0489d89cd708ab1/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:fc353029f176fd2b3ec6cfc71be166aba1936fe5d73dd1992ce289ca6647a9aa", size = 1769801, upload-time = "2026-01-03T17:32:20.25Z" },
{ url = "https://files.pythonhosted.org/packages/eb/f8/7314031ff5c10e6ece114da79b338ec17eeff3a079e53151f7e9f43c4723/aiohttp-3.13.3-cp314-cp314t-win32.whl", hash = "sha256:2e41b18a58da1e474a057b3d35248d8320029f61d70a37629535b16a0c8f3767", size = 466523, upload-time = "2026-01-03T17:32:22.215Z" },
{ url = "https://files.pythonhosted.org/packages/b4/63/278a98c715ae467624eafe375542d8ba9b4383a016df8fdefe0ae28382a7/aiohttp-3.13.3-cp314-cp314t-win_amd64.whl", hash = "sha256:44531a36aa2264a1860089ffd4dce7baf875ee5a6079d5fb42e261c704ef7344", size = 499694, upload-time = "2026-01-03T17:32:24.546Z" },
{ url = "https://files.pythonhosted.org/packages/1e/bd/ede278648914cabbabfdf95e436679b5d4156e417896a9b9f4587169e376/aiohttp-3.13.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ee62d4471ce86b108b19c3364db4b91180d13fe3510144872d6bad5401957360", size = 752158, upload-time = "2026-03-28T17:16:06.901Z" },
{ url = "https://files.pythonhosted.org/packages/90/de/581c053253c07b480b03785196ca5335e3c606a37dc73e95f6527f1591fe/aiohttp-3.13.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c0fd8f41b54b58636402eb493afd512c23580456f022c1ba2db0f810c959ed0d", size = 501037, upload-time = "2026-03-28T17:16:08.82Z" },
{ url = "https://files.pythonhosted.org/packages/fa/f9/a5ede193c08f13cc42c0a5b50d1e246ecee9115e4cf6e900d8dbd8fd6acb/aiohttp-3.13.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4baa48ce49efd82d6b1a0be12d6a36b35e5594d1dd42f8bfba96ea9f8678b88c", size = 501556, upload-time = "2026-03-28T17:16:10.63Z" },
{ url = "https://files.pythonhosted.org/packages/d6/10/88ff67cd48a6ec36335b63a640abe86135791544863e0cfe1f065d6cef7a/aiohttp-3.13.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d738ebab9f71ee652d9dbd0211057690022201b11197f9a7324fd4dba128aa97", size = 1757314, upload-time = "2026-03-28T17:16:12.498Z" },
{ url = "https://files.pythonhosted.org/packages/8b/15/fdb90a5cf5a1f52845c276e76298c75fbbcc0ac2b4a86551906d54529965/aiohttp-3.13.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:0ce692c3468fa831af7dceed52edf51ac348cebfc8d3feb935927b63bd3e8576", size = 1731819, upload-time = "2026-03-28T17:16:14.558Z" },
{ url = "https://files.pythonhosted.org/packages/ec/df/28146785a007f7820416be05d4f28cc207493efd1e8c6c1068e9bdc29198/aiohttp-3.13.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8e08abcfe752a454d2cb89ff0c08f2d1ecd057ae3e8cc6d84638de853530ebab", size = 1793279, upload-time = "2026-03-28T17:16:16.594Z" },
{ url = "https://files.pythonhosted.org/packages/10/47/689c743abf62ea7a77774d5722f220e2c912a77d65d368b884d9779ef41b/aiohttp-3.13.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5977f701b3fff36367a11087f30ea73c212e686d41cd363c50c022d48b011d8d", size = 1891082, upload-time = "2026-03-28T17:16:18.71Z" },
{ url = "https://files.pythonhosted.org/packages/b0/b6/f7f4f318c7e58c23b761c9b13b9a3c9b394e0f9d5d76fbc6622fa98509f6/aiohttp-3.13.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:54203e10405c06f8b6020bd1e076ae0fe6c194adcee12a5a78af3ffa3c57025e", size = 1773938, upload-time = "2026-03-28T17:16:21.125Z" },
{ url = "https://files.pythonhosted.org/packages/aa/06/f207cb3121852c989586a6fc16ff854c4fcc8651b86c5d3bd1fc83057650/aiohttp-3.13.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:358a6af0145bc4dda037f13167bef3cce54b132087acc4c295c739d05d16b1c3", size = 1579548, upload-time = "2026-03-28T17:16:23.588Z" },
{ url = "https://files.pythonhosted.org/packages/6c/58/e1289661a32161e24c1fe479711d783067210d266842523752869cc1d9c2/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:898ea1850656d7d61832ef06aa9846ab3ddb1621b74f46de78fbc5e1a586ba83", size = 1714669, upload-time = "2026-03-28T17:16:25.713Z" },
{ url = "https://files.pythonhosted.org/packages/96/0a/3e86d039438a74a86e6a948a9119b22540bae037d6ba317a042ae3c22711/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:7bc30cceb710cf6a44e9617e43eebb6e3e43ad855a34da7b4b6a73537d8a6763", size = 1754175, upload-time = "2026-03-28T17:16:28.18Z" },
{ url = "https://files.pythonhosted.org/packages/f4/30/e717fc5df83133ba467a560b6d8ef20197037b4bb5d7075b90037de1018e/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4a31c0c587a8a038f19a4c7e60654a6c899c9de9174593a13e7cc6e15ff271f9", size = 1762049, upload-time = "2026-03-28T17:16:30.941Z" },
{ url = "https://files.pythonhosted.org/packages/e4/28/8f7a2d4492e336e40005151bdd94baf344880a4707573378579f833a64c1/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2062f675f3fe6e06d6113eb74a157fb9df58953ffed0cdb4182554b116545758", size = 1570861, upload-time = "2026-03-28T17:16:32.953Z" },
{ url = "https://files.pythonhosted.org/packages/78/45/12e1a3d0645968b1c38de4b23fdf270b8637735ea057d4f84482ff918ad9/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:3d1ba8afb847ff80626d5e408c1fdc99f942acc877d0702fe137015903a220a9", size = 1790003, upload-time = "2026-03-28T17:16:35.468Z" },
{ url = "https://files.pythonhosted.org/packages/eb/0f/60374e18d590de16dcb39d6ff62f39c096c1b958e6f37727b5870026ea30/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:b08149419994cdd4d5eecf7fd4bc5986b5a9380285bcd01ab4c0d6bfca47b79d", size = 1737289, upload-time = "2026-03-28T17:16:38.187Z" },
{ url = "https://files.pythonhosted.org/packages/02/bf/535e58d886cfbc40a8b0013c974afad24ef7632d645bca0b678b70033a60/aiohttp-3.13.4-cp312-cp312-win32.whl", hash = "sha256:fc432f6a2c4f720180959bc19aa37259651c1a4ed8af8afc84dd41c60f15f791", size = 434185, upload-time = "2026-03-28T17:16:40.735Z" },
{ url = "https://files.pythonhosted.org/packages/1e/1a/d92e3325134ebfff6f4069f270d3aac770d63320bd1fcd0eca023e74d9a8/aiohttp-3.13.4-cp312-cp312-win_amd64.whl", hash = "sha256:6148c9ae97a3e8bff9a1fc9c757fa164116f86c100468339730e717590a3fb77", size = 461285, upload-time = "2026-03-28T17:16:42.713Z" },
{ url = "https://files.pythonhosted.org/packages/e3/ac/892f4162df9b115b4758d615f32ec63d00f3084c705ff5526630887b9b42/aiohttp-3.13.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:63dd5e5b1e43b8fb1e91b79b7ceba1feba588b317d1edff385084fcc7a0a4538", size = 745744, upload-time = "2026-03-28T17:16:44.67Z" },
{ url = "https://files.pythonhosted.org/packages/97/a9/c5b87e4443a2f0ea88cb3000c93a8fdad1ee63bffc9ded8d8c8e0d66efc6/aiohttp-3.13.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:746ac3cc00b5baea424dacddea3ec2c2702f9590de27d837aa67004db1eebc6e", size = 498178, upload-time = "2026-03-28T17:16:46.766Z" },
{ url = "https://files.pythonhosted.org/packages/94/42/07e1b543a61250783650df13da8ddcdc0d0a5538b2bd15cef6e042aefc61/aiohttp-3.13.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bda8f16ea99d6a6705e5946732e48487a448be874e54a4f73d514660ff7c05d3", size = 498331, upload-time = "2026-03-28T17:16:48.9Z" },
{ url = "https://files.pythonhosted.org/packages/20/d6/492f46bf0328534124772d0cf58570acae5b286ea25006900650f69dae0e/aiohttp-3.13.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4b061e7b5f840391e3f64d0ddf672973e45c4cfff7a0feea425ea24e51530fc2", size = 1744414, upload-time = "2026-03-28T17:16:50.968Z" },
{ url = "https://files.pythonhosted.org/packages/e2/4d/e02627b2683f68051246215d2d62b2d2f249ff7a285e7a858dc47d6b6a14/aiohttp-3.13.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b252e8d5cd66184b570d0d010de742736e8a4fab22c58299772b0c5a466d4b21", size = 1719226, upload-time = "2026-03-28T17:16:53.173Z" },
{ url = "https://files.pythonhosted.org/packages/7b/6c/5d0a3394dd2b9f9aeba6e1b6065d0439e4b75d41f1fb09a3ec010b43552b/aiohttp-3.13.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:20af8aad61d1803ff11152a26146d8d81c266aa8c5aa9b4504432abb965c36a0", size = 1782110, upload-time = "2026-03-28T17:16:55.362Z" },
{ url = "https://files.pythonhosted.org/packages/0d/2d/c20791e3437700a7441a7edfb59731150322424f5aadf635602d1d326101/aiohttp-3.13.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:13a5cc924b59859ad2adb1478e31f410a7ed46e92a2a619d6d1dd1a63c1a855e", size = 1884809, upload-time = "2026-03-28T17:16:57.734Z" },
{ url = "https://files.pythonhosted.org/packages/c8/94/d99dbfbd1924a87ef643833932eb2a3d9e5eee87656efea7d78058539eff/aiohttp-3.13.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:534913dfb0a644d537aebb4123e7d466d94e3be5549205e6a31f72368980a81a", size = 1764938, upload-time = "2026-03-28T17:17:00.221Z" },
{ url = "https://files.pythonhosted.org/packages/49/61/3ce326a1538781deb89f6cf5e094e2029cd308ed1e21b2ba2278b08426f6/aiohttp-3.13.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:320e40192a2dcc1cf4b5576936e9652981ab596bf81eb309535db7e2f5b5672f", size = 1570697, upload-time = "2026-03-28T17:17:02.985Z" },
{ url = "https://files.pythonhosted.org/packages/b6/77/4ab5a546857bb3028fbaf34d6eea180267bdab022ee8b1168b1fcde4bfdd/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9e587fcfce2bcf06526a43cb705bdee21ac089096f2e271d75de9c339db3100c", size = 1702258, upload-time = "2026-03-28T17:17:05.28Z" },
{ url = "https://files.pythonhosted.org/packages/79/63/d8f29021e39bc5af8e5d5e9da1b07976fb9846487a784e11e4f4eeda4666/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:9eb9c2eea7278206b5c6c1441fdd9dc420c278ead3f3b2cc87f9b693698cc500", size = 1740287, upload-time = "2026-03-28T17:17:07.712Z" },
{ url = "https://files.pythonhosted.org/packages/55/3a/cbc6b3b124859a11bc8055d3682c26999b393531ef926754a3445b99dfef/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:29be00c51972b04bf9d5c8f2d7f7314f48f96070ca40a873a53056e652e805f7", size = 1753011, upload-time = "2026-03-28T17:17:10.053Z" },
{ url = "https://files.pythonhosted.org/packages/e0/30/836278675205d58c1368b21520eab9572457cf19afd23759216c04483048/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:90c06228a6c3a7c9f776fe4fc0b7ff647fffd3bed93779a6913c804ae00c1073", size = 1566359, upload-time = "2026-03-28T17:17:12.433Z" },
{ url = "https://files.pythonhosted.org/packages/50/b4/8032cc9b82d17e4277704ba30509eaccb39329dc18d6a35f05e424439e32/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:a533ec132f05fd9a1d959e7f34184cd7d5e8511584848dab85faefbaac573069", size = 1785537, upload-time = "2026-03-28T17:17:14.721Z" },
{ url = "https://files.pythonhosted.org/packages/17/7d/5873e98230bde59f493bf1f7c3e327486a4b5653fa401144704df5d00211/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1c946f10f413836f82ea4cfb90200d2a59578c549f00857e03111cf45ad01ca5", size = 1740752, upload-time = "2026-03-28T17:17:17.387Z" },
{ url = "https://files.pythonhosted.org/packages/7b/f2/13e46e0df051494d7d3c68b7f72d071f48c384c12716fc294f75d5b1a064/aiohttp-3.13.4-cp313-cp313-win32.whl", hash = "sha256:48708e2706106da6967eff5908c78ca3943f005ed6bcb75da2a7e4da94ef8c70", size = 433187, upload-time = "2026-03-28T17:17:19.523Z" },
{ url = "https://files.pythonhosted.org/packages/ea/c0/649856ee655a843c8f8664592cfccb73ac80ede6a8c8db33a25d810c12db/aiohttp-3.13.4-cp313-cp313-win_amd64.whl", hash = "sha256:74a2eb058da44fa3a877a49e2095b591d4913308bb424c418b77beb160c55ce3", size = 459778, upload-time = "2026-03-28T17:17:21.964Z" },
{ url = "https://files.pythonhosted.org/packages/6d/29/6657cc37ae04cacc2dbf53fb730a06b6091cc4cbe745028e047c53e6d840/aiohttp-3.13.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:e0a2c961fc92abeff61d6444f2ce6ad35bb982db9fc8ff8a47455beacf454a57", size = 749363, upload-time = "2026-03-28T17:17:24.044Z" },
{ url = "https://files.pythonhosted.org/packages/90/7f/30ccdf67ca3d24b610067dc63d64dcb91e5d88e27667811640644aa4a85d/aiohttp-3.13.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:153274535985a0ff2bff1fb6c104ed547cec898a09213d21b0f791a44b14d933", size = 499317, upload-time = "2026-03-28T17:17:26.199Z" },
{ url = "https://files.pythonhosted.org/packages/93/13/e372dd4e68ad04ee25dafb050c7f98b0d91ea643f7352757e87231102555/aiohttp-3.13.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:351f3171e2458da3d731ce83f9e6b9619e325c45cbd534c7759750cabf453ad7", size = 500477, upload-time = "2026-03-28T17:17:28.279Z" },
{ url = "https://files.pythonhosted.org/packages/e5/fe/ee6298e8e586096fb6f5eddd31393d8544f33ae0792c71ecbb4c2bef98ac/aiohttp-3.13.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f989ac8bc5595ff761a5ccd32bdb0768a117f36dd1504b1c2c074ed5d3f4df9c", size = 1737227, upload-time = "2026-03-28T17:17:30.587Z" },
{ url = "https://files.pythonhosted.org/packages/b0/b9/a7a0463a09e1a3fe35100f74324f23644bfc3383ac5fd5effe0722a5f0b7/aiohttp-3.13.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:d36fc1709110ec1e87a229b201dd3ddc32aa01e98e7868083a794609b081c349", size = 1694036, upload-time = "2026-03-28T17:17:33.29Z" },
{ url = "https://files.pythonhosted.org/packages/57/7c/8972ae3fb7be00a91aee6b644b2a6a909aedb2c425269a3bfd90115e6f8f/aiohttp-3.13.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:42adaeea83cbdf069ab94f5103ce0787c21fb1a0153270da76b59d5578302329", size = 1786814, upload-time = "2026-03-28T17:17:36.035Z" },
{ url = "https://files.pythonhosted.org/packages/93/01/c81e97e85c774decbaf0d577de7d848934e8166a3a14ad9f8aa5be329d28/aiohttp-3.13.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:92deb95469928cc41fd4b42a95d8012fa6df93f6b1c0a83af0ffbc4a5e218cde", size = 1866676, upload-time = "2026-03-28T17:17:38.441Z" },
{ url = "https://files.pythonhosted.org/packages/5a/5f/5b46fe8694a639ddea2cd035bf5729e4677ea882cb251396637e2ef1590d/aiohttp-3.13.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0c0c7c07c4257ef3a1df355f840bc62d133bcdef5c1c5ba75add3c08553e2eed", size = 1740842, upload-time = "2026-03-28T17:17:40.783Z" },
{ url = "https://files.pythonhosted.org/packages/20/a2/0d4b03d011cca6b6b0acba8433193c1e484efa8d705ea58295590fe24203/aiohttp-3.13.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f062c45de8a1098cb137a1898819796a2491aec4e637a06b03f149315dff4d8f", size = 1566508, upload-time = "2026-03-28T17:17:43.235Z" },
{ url = "https://files.pythonhosted.org/packages/98/17/e689fd500da52488ec5f889effd6404dece6a59de301e380f3c64f167beb/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:76093107c531517001114f0ebdb4f46858ce818590363e3e99a4a2280334454a", size = 1700569, upload-time = "2026-03-28T17:17:46.165Z" },
{ url = "https://files.pythonhosted.org/packages/d8/0d/66402894dbcf470ef7db99449e436105ea862c24f7ea4c95c683e635af35/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:6f6ec32162d293b82f8b63a16edc80769662fbd5ae6fbd4936d3206a2c2cc63b", size = 1707407, upload-time = "2026-03-28T17:17:48.825Z" },
{ url = "https://files.pythonhosted.org/packages/2f/eb/af0ab1a3650092cbd8e14ef29e4ab0209e1460e1c299996c3f8288b3f1ff/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5903e2db3d202a00ad9f0ec35a122c005e85d90c9836ab4cda628f01edf425e2", size = 1752214, upload-time = "2026-03-28T17:17:51.206Z" },
{ url = "https://files.pythonhosted.org/packages/5a/bf/72326f8a98e4c666f292f03c385545963cc65e358835d2a7375037a97b57/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2d5bea57be7aca98dbbac8da046d99b5557c5cf4e28538c4c786313078aca09e", size = 1562162, upload-time = "2026-03-28T17:17:53.634Z" },
{ url = "https://files.pythonhosted.org/packages/67/9f/13b72435f99151dd9a5469c96b3b5f86aa29b7e785ca7f35cf5e538f74c0/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:bcf0c9902085976edc0232b75006ef38f89686901249ce14226b6877f88464fb", size = 1768904, upload-time = "2026-03-28T17:17:55.991Z" },
{ url = "https://files.pythonhosted.org/packages/18/bc/28d4970e7d5452ac7776cdb5431a1164a0d9cf8bd2fffd67b4fb463aa56d/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:c3295f98bfeed2e867cab588f2a146a9db37a85e3ae9062abf46ba062bd29165", size = 1723378, upload-time = "2026-03-28T17:17:58.348Z" },
{ url = "https://files.pythonhosted.org/packages/53/74/b32458ca1a7f34d65bdee7aef2036adbe0438123d3d53e2b083c453c24dd/aiohttp-3.13.4-cp314-cp314-win32.whl", hash = "sha256:a598a5c5767e1369d8f5b08695cab1d8160040f796c4416af76fd773d229b3c9", size = 438711, upload-time = "2026-03-28T17:18:00.728Z" },
{ url = "https://files.pythonhosted.org/packages/40/b2/54b487316c2df3e03a8f3435e9636f8a81a42a69d942164830d193beb56a/aiohttp-3.13.4-cp314-cp314-win_amd64.whl", hash = "sha256:c555db4bc7a264bead5a7d63d92d41a1122fcd39cc62a4db815f45ad46f9c2c8", size = 464977, upload-time = "2026-03-28T17:18:03.367Z" },
{ url = "https://files.pythonhosted.org/packages/47/fb/e41b63c6ce71b07a59243bb8f3b457ee0c3402a619acb9d2c0d21ef0e647/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:45abbbf09a129825d13c18c7d3182fecd46d9da3cfc383756145394013604ac1", size = 781549, upload-time = "2026-03-28T17:18:05.779Z" },
{ url = "https://files.pythonhosted.org/packages/97/53/532b8d28df1e17e44c4d9a9368b78dcb6bf0b51037522136eced13afa9e8/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:74c80b2bc2c2adb7b3d1941b2b60701ee2af8296fc8aad8b8bc48bc25767266c", size = 514383, upload-time = "2026-03-28T17:18:08.096Z" },
{ url = "https://files.pythonhosted.org/packages/1b/1f/62e5d400603e8468cd635812d99cb81cfdc08127a3dc474c647615f31339/aiohttp-3.13.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c97989ae40a9746650fa196894f317dafc12227c808c774929dda0ff873a5954", size = 518304, upload-time = "2026-03-28T17:18:10.642Z" },
{ url = "https://files.pythonhosted.org/packages/90/57/2326b37b10896447e3c6e0cbef4fe2486d30913639a5cfd1332b5d870f82/aiohttp-3.13.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dae86be9811493f9990ef44fff1685f5c1a3192e9061a71a109d527944eed551", size = 1893433, upload-time = "2026-03-28T17:18:13.121Z" },
{ url = "https://files.pythonhosted.org/packages/d2/b4/a24d82112c304afdb650167ef2fe190957d81cbddac7460bedd245f765aa/aiohttp-3.13.4-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:1db491abe852ca2fa6cc48a3341985b0174b3741838e1341b82ac82c8bd9e871", size = 1755901, upload-time = "2026-03-28T17:18:16.21Z" },
{ url = "https://files.pythonhosted.org/packages/9e/2d/0883ef9d878d7846287f036c162a951968f22aabeef3ac97b0bea6f76d5d/aiohttp-3.13.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:0e5d701c0aad02a7dce72eef6b93226cf3734330f1a31d69ebbf69f33b86666e", size = 1876093, upload-time = "2026-03-28T17:18:18.703Z" },
{ url = "https://files.pythonhosted.org/packages/ad/52/9204bb59c014869b71971addad6778f005daa72a96eed652c496789d7468/aiohttp-3.13.4-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8ac32a189081ae0a10ba18993f10f338ec94341f0d5df8fff348043962f3c6f8", size = 1970815, upload-time = "2026-03-28T17:18:21.858Z" },
{ url = "https://files.pythonhosted.org/packages/d6/b5/e4eb20275a866dde0f570f411b36c6b48f7b53edfe4f4071aa1b0728098a/aiohttp-3.13.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98e968cdaba43e45c73c3f306fca418c8009a957733bac85937c9f9cf3f4de27", size = 1816223, upload-time = "2026-03-28T17:18:24.729Z" },
{ url = "https://files.pythonhosted.org/packages/d8/23/e98075c5bb146aa61a1239ee1ac7714c85e814838d6cebbe37d3fe19214a/aiohttp-3.13.4-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ca114790c9144c335d538852612d3e43ea0f075288f4849cf4b05d6cd2238ce7", size = 1649145, upload-time = "2026-03-28T17:18:27.269Z" },
{ url = "https://files.pythonhosted.org/packages/d6/c1/7bad8be33bb06c2bb224b6468874346026092762cbec388c3bdb65a368ee/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ea2e071661ba9cfe11eabbc81ac5376eaeb3061f6e72ec4cc86d7cdd1ffbdbbb", size = 1816562, upload-time = "2026-03-28T17:18:29.847Z" },
{ url = "https://files.pythonhosted.org/packages/5c/10/c00323348695e9a5e316825969c88463dcc24c7e9d443244b8a2c9cf2eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:34e89912b6c20e0fd80e07fa401fd218a410aa1ce9f1c2f1dad6db1bd0ce0927", size = 1800333, upload-time = "2026-03-28T17:18:32.269Z" },
{ url = "https://files.pythonhosted.org/packages/84/43/9b2147a1df3559f49bd723e22905b46a46c068a53adb54abdca32c4de180/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0e217cf9f6a42908c52b46e42c568bd57adc39c9286ced31aaace614b6087965", size = 1820617, upload-time = "2026-03-28T17:18:35.238Z" },
{ url = "https://files.pythonhosted.org/packages/a9/7f/b3481a81e7a586d02e99387b18c6dafff41285f6efd3daa2124c01f87eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:0c296f1221e21ba979f5ac1964c3b78cfde15c5c5f855ffd2caab337e9cd9182", size = 1643417, upload-time = "2026-03-28T17:18:37.949Z" },
{ url = "https://files.pythonhosted.org/packages/8f/72/07181226bc99ce1124e0f89280f5221a82d3ae6a6d9d1973ce429d48e52b/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:d99a9d168ebaffb74f36d011750e490085ac418f4db926cce3989c8fe6cb6b1b", size = 1849286, upload-time = "2026-03-28T17:18:40.534Z" },
{ url = "https://files.pythonhosted.org/packages/1a/e6/1b3566e103eca6da5be4ae6713e112a053725c584e96574caf117568ffef/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:cb19177205d93b881f3f89e6081593676043a6828f59c78c17a0fd6c1fbed2ba", size = 1782635, upload-time = "2026-03-28T17:18:43.073Z" },
{ url = "https://files.pythonhosted.org/packages/37/58/1b11c71904b8d079eb0c39fe664180dd1e14bebe5608e235d8bfbadc8929/aiohttp-3.13.4-cp314-cp314t-win32.whl", hash = "sha256:c606aa5656dab6552e52ca368e43869c916338346bfaf6304e15c58fb113ea30", size = 472537, upload-time = "2026-03-28T17:18:46.286Z" },
{ url = "https://files.pythonhosted.org/packages/bc/8f/87c56a1a1977d7dddea5b31e12189665a140fdb48a71e9038ff90bb564ec/aiohttp-3.13.4-cp314-cp314t-win_amd64.whl", hash = "sha256:014dcc10ec8ab8db681f0d68e939d1e9286a5aa2b993cbbdb0db130853e02144", size = 506381, upload-time = "2026-03-28T17:18:48.74Z" },
]
[[package]]
@@ -764,17 +764,34 @@ wheels = [
[[package]]
name = "hf-xet"
version = "1.1.9"
version = "1.4.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/23/0f/5b60fc28ee7f8cc17a5114a584fd6b86e11c3e0a6e142a7f97a161e9640a/hf_xet-1.1.9.tar.gz", hash = "sha256:c99073ce404462e909f1d5839b2d14a3827b8fe75ed8aed551ba6609c026c803", size = 484242, upload-time = "2025-08-27T23:05:19.441Z" }
sdist = { url = "https://files.pythonhosted.org/packages/53/92/ec9ad04d0b5728dca387a45af7bc98fbb0d73b2118759f5f6038b61a57e8/hf_xet-1.4.3.tar.gz", hash = "sha256:8ddedb73c8c08928c793df2f3401ec26f95be7f7e516a7bee2fbb546f6676113", size = 670477, upload-time = "2026-03-31T22:40:07.874Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/de/12/56e1abb9a44cdef59a411fe8a8673313195711b5ecce27880eb9c8fa90bd/hf_xet-1.1.9-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:a3b6215f88638dd7a6ff82cb4e738dcbf3d863bf667997c093a3c990337d1160", size = 2762553, upload-time = "2025-08-27T23:05:15.153Z" },
{ url = "https://files.pythonhosted.org/packages/3a/e6/2d0d16890c5f21b862f5df3146519c182e7f0ae49b4b4bf2bd8a40d0b05e/hf_xet-1.1.9-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:9b486de7a64a66f9a172f4b3e0dfe79c9f0a93257c501296a2521a13495a698a", size = 2623216, upload-time = "2025-08-27T23:05:13.778Z" },
{ url = "https://files.pythonhosted.org/packages/81/42/7e6955cf0621e87491a1fb8cad755d5c2517803cea174229b0ec00ff0166/hf_xet-1.1.9-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a4c5a840c2c4e6ec875ed13703a60e3523bc7f48031dfd750923b2a4d1a5fc3c", size = 3186789, upload-time = "2025-08-27T23:05:12.368Z" },
{ url = "https://files.pythonhosted.org/packages/df/8b/759233bce05457f5f7ec062d63bbfd2d0c740b816279eaaa54be92aa452a/hf_xet-1.1.9-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:96a6139c9e44dad1c52c52520db0fffe948f6bce487cfb9d69c125f254bb3790", size = 3088747, upload-time = "2025-08-27T23:05:10.439Z" },
{ url = "https://files.pythonhosted.org/packages/6c/3c/28cc4db153a7601a996985bcb564f7b8f5b9e1a706c7537aad4b4809f358/hf_xet-1.1.9-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ad1022e9a998e784c97b2173965d07fe33ee26e4594770b7785a8cc8f922cd95", size = 3251429, upload-time = "2025-08-27T23:05:16.471Z" },
{ url = "https://files.pythonhosted.org/packages/84/17/7caf27a1d101bfcb05be85850d4aa0a265b2e1acc2d4d52a48026ef1d299/hf_xet-1.1.9-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:86754c2d6d5afb11b0a435e6e18911a4199262fe77553f8c50d75e21242193ea", size = 3354643, upload-time = "2025-08-27T23:05:17.828Z" },
{ url = "https://files.pythonhosted.org/packages/cd/50/0c39c9eed3411deadcc98749a6699d871b822473f55fe472fad7c01ec588/hf_xet-1.1.9-cp37-abi3-win_amd64.whl", hash = "sha256:5aad3933de6b725d61d51034e04174ed1dce7a57c63d530df0014dea15a40127", size = 2804797, upload-time = "2025-08-27T23:05:20.77Z" },
{ url = "https://files.pythonhosted.org/packages/72/43/724d307b34e353da0abd476e02f72f735cdd2bc86082dee1b32ea0bfee1d/hf_xet-1.4.3-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:7551659ba4f1e1074e9623996f28c3873682530aee0a846b7f2f066239228144", size = 3800935, upload-time = "2026-03-31T22:39:49.618Z" },
{ url = "https://files.pythonhosted.org/packages/2b/d2/8bee5996b699262edb87dbb54118d287c0e1b2fc78af7cdc41857ba5e3c4/hf_xet-1.4.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:bee693ada985e7045997f05f081d0e12c4c08bd7626dc397f8a7c487e6c04f7f", size = 3558942, upload-time = "2026-03-31T22:39:47.938Z" },
{ url = "https://files.pythonhosted.org/packages/c3/a1/e993d09cbe251196fb60812b09a58901c468127b7259d2bf0f68bf6088eb/hf_xet-1.4.3-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:21644b404bb0100fe3857892f752c4d09642586fd988e61501c95bbf44b393a3", size = 4207657, upload-time = "2026-03-31T22:39:39.69Z" },
{ url = "https://files.pythonhosted.org/packages/64/44/9eb6d21e5c34c63e5e399803a6932fa983cabdf47c0ecbcfe7ea97684b8c/hf_xet-1.4.3-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:987f09cfe418237812896a6736b81b1af02a3a6dcb4b4944425c4c4fca7a7cf8", size = 3986765, upload-time = "2026-03-31T22:39:37.936Z" },
{ url = "https://files.pythonhosted.org/packages/ea/7b/8ad6f16fdb82f5f7284a34b5ec48645bd575bdcd2f6f0d1644775909c486/hf_xet-1.4.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:60cf7fc43a99da0a853345cf86d23738c03983ee5249613a6305d3e57a5dca74", size = 4188162, upload-time = "2026-03-31T22:39:58.382Z" },
{ url = "https://files.pythonhosted.org/packages/1b/c4/39d6e136cbeea9ca5a23aad4b33024319222adbdc059ebcda5fc7d9d5ff4/hf_xet-1.4.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:2815a49a7a59f3e2edf0cf113ae88e8cb2ca2a221bf353fb60c609584f4884d4", size = 4424525, upload-time = "2026-03-31T22:40:00.225Z" },
{ url = "https://files.pythonhosted.org/packages/46/f2/adc32dae6bdbc367853118b9878139ac869419a4ae7ba07185dc31251b76/hf_xet-1.4.3-cp313-cp313t-win_amd64.whl", hash = "sha256:42ee323265f1e6a81b0e11094564fb7f7e0ec75b5105ffd91ae63f403a11931b", size = 3671610, upload-time = "2026-03-31T22:40:10.42Z" },
{ url = "https://files.pythonhosted.org/packages/e2/19/25d897dcc3f81953e0c2cde9ec186c7a0fee413eb0c9a7a9130d87d94d3a/hf_xet-1.4.3-cp313-cp313t-win_arm64.whl", hash = "sha256:27c976ba60079fb8217f485b9c5c7fcd21c90b0367753805f87cb9f3cdc4418a", size = 3528529, upload-time = "2026-03-31T22:40:09.106Z" },
{ url = "https://files.pythonhosted.org/packages/ec/36/3e8f85ca9fe09b8de2b2e10c63b3b3353d7dda88a0b3d426dffbe7b8313b/hf_xet-1.4.3-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:5251d5ece3a81815bae9abab41cf7ddb7bcb8f56411bce0827f4a3071c92fdc6", size = 3801019, upload-time = "2026-03-31T22:39:56.651Z" },
{ url = "https://files.pythonhosted.org/packages/b5/9c/defb6cb1de28bccb7bd8d95f6e60f72a3d3fa4cb3d0329c26fb9a488bfe7/hf_xet-1.4.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1feb0f3abeacee143367c326a128a2e2b60868ec12a36c225afb1d6c5a05e6d2", size = 3558746, upload-time = "2026-03-31T22:39:54.766Z" },
{ url = "https://files.pythonhosted.org/packages/c1/bd/8d001191893178ff8e826e46ad5299446e62b93cd164e17b0ffea08832ec/hf_xet-1.4.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8b301fc150290ca90b4fccd079829b84bb4786747584ae08b94b4577d82fb791", size = 4207692, upload-time = "2026-03-31T22:39:46.246Z" },
{ url = "https://files.pythonhosted.org/packages/ce/48/6790b402803250e9936435613d3a78b9aaeee7973439f0918848dde58309/hf_xet-1.4.3-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:d972fbe95ddc0d3c0fc49b31a8a69f47db35c1e3699bf316421705741aab6653", size = 3986281, upload-time = "2026-03-31T22:39:44.648Z" },
{ url = "https://files.pythonhosted.org/packages/51/56/ea62552fe53db652a9099eda600b032d75554d0e86c12a73824bfedef88b/hf_xet-1.4.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c5b48db1ee344a805a1b9bd2cda9b6b65fe77ed3787bd6e87ad5521141d317cd", size = 4187414, upload-time = "2026-03-31T22:40:04.951Z" },
{ url = "https://files.pythonhosted.org/packages/7d/f5/bc1456d4638061bea997e6d2db60a1a613d7b200e0755965ec312dc1ef79/hf_xet-1.4.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:22bdc1f5fb8b15bf2831440b91d1c9bbceeb7e10c81a12e8d75889996a5c9da8", size = 4424368, upload-time = "2026-03-31T22:40:06.347Z" },
{ url = "https://files.pythonhosted.org/packages/e4/76/ab597bae87e1f06d18d3ecb8ed7f0d3c9a37037fc32ce76233d369273c64/hf_xet-1.4.3-cp314-cp314t-win_amd64.whl", hash = "sha256:0392c79b7cf48418cd61478c1a925246cf10639f4cd9d94368d8ca1e8df9ea07", size = 3672280, upload-time = "2026-03-31T22:40:16.401Z" },
{ url = "https://files.pythonhosted.org/packages/62/05/2e462d34e23a09a74d73785dbed71cc5dbad82a72eee2ad60a72a554155d/hf_xet-1.4.3-cp314-cp314t-win_arm64.whl", hash = "sha256:681c92a07796325778a79d76c67011764ecc9042a8c3579332b61b63ae512075", size = 3528945, upload-time = "2026-03-31T22:40:14.995Z" },
{ url = "https://files.pythonhosted.org/packages/ac/9f/9c23e4a447b8f83120798f9279d0297a4d1360bdbf59ef49ebec78fe2545/hf_xet-1.4.3-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:d0da85329eaf196e03e90b84c2d0aca53bd4573d097a75f99609e80775f98025", size = 3805048, upload-time = "2026-03-31T22:39:53.105Z" },
{ url = "https://files.pythonhosted.org/packages/0b/f8/7aacb8e5f4a7899d39c787b5984e912e6c18b11be136ef13947d7a66d265/hf_xet-1.4.3-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:e23717ce4186b265f69afa66e6f0069fe7efbf331546f5c313d00e123dc84583", size = 3562178, upload-time = "2026-03-31T22:39:51.295Z" },
{ url = "https://files.pythonhosted.org/packages/df/9a/a24b26dc8a65f0ecc0fe5be981a19e61e7ca963b85e062c083f3a9100529/hf_xet-1.4.3-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fc360b70c815bf340ed56c7b8c63aacf11762a4b099b2fe2c9bd6d6068668c08", size = 4212320, upload-time = "2026-03-31T22:39:42.922Z" },
{ url = "https://files.pythonhosted.org/packages/53/60/46d493db155d2ee2801b71fb1b0fd67696359047fdd8caee2c914cc50c79/hf_xet-1.4.3-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:39f2d2e9654cd9b4319885733993807aab6de9dfbd34c42f0b78338d6617421f", size = 3991546, upload-time = "2026-03-31T22:39:41.335Z" },
{ url = "https://files.pythonhosted.org/packages/bc/f5/067363e1c96c6b17256910830d1b54099d06287e10f4ec6ec4e7e08371fc/hf_xet-1.4.3-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:49ad8a8cead2b56051aa84d7fce3e1335efe68df3cf6c058f22a65513885baac", size = 4193200, upload-time = "2026-03-31T22:40:01.936Z" },
{ url = "https://files.pythonhosted.org/packages/42/4b/53951592882d9c23080c7644542fda34a3813104e9e11fa1a7d82d419cb8/hf_xet-1.4.3-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:7716d62015477a70ea272d2d68cd7cad140f61c52ee452e133e139abfe2c17ba", size = 4429392, upload-time = "2026-03-31T22:40:03.492Z" },
{ url = "https://files.pythonhosted.org/packages/8a/21/75a6c175b4e79662ad8e62f46a40ce341d8d6b206b06b4320d07d55b188c/hf_xet-1.4.3-cp37-abi3-win_amd64.whl", hash = "sha256:6b591fcad34e272a5b02607485e4f2a1334aebf1bc6d16ce8eb1eb8978ac2021", size = 3677359, upload-time = "2026-03-31T22:40:13.619Z" },
{ url = "https://files.pythonhosted.org/packages/8a/7c/44314ecd0e89f8b2b51c9d9e5e7a60a9c1c82024ac471d415860557d3cd8/hf_xet-1.4.3-cp37-abi3-win_arm64.whl", hash = "sha256:7c2c7e20bcfcc946dc67187c203463f5e932e395845d098cc2a93f5b67ca0b47", size = 3533664, upload-time = "2026-03-31T22:40:12.152Z" },
]
[[package]]
@@ -829,21 +846,22 @@ wheels = [
[[package]]
name = "huggingface-hub"
version = "0.34.4"
version = "1.9.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "filelock" },
{ name = "fsspec" },
{ name = "hf-xet", marker = "platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'" },
{ name = "hf-xet", marker = "platform_machine == 'AMD64' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'" },
{ name = "httpx" },
{ name = "packaging" },
{ name = "pyyaml" },
{ name = "requests" },
{ name = "tqdm" },
{ name = "typer" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/45/c9/bdbe19339f76d12985bc03572f330a01a93c04dffecaaea3061bdd7fb892/huggingface_hub-0.34.4.tar.gz", hash = "sha256:a4228daa6fb001be3f4f4bdaf9a0db00e1739235702848df00885c9b5742c85c", size = 459768, upload-time = "2025-08-08T09:14:52.365Z" }
sdist = { url = "https://files.pythonhosted.org/packages/44/40/68d9b286b125d9318ae95c8f8b206e8672e7244b0eea61ebb4a88037638c/huggingface_hub-1.9.1.tar.gz", hash = "sha256:442af372207cc24dcb089caf507fcd7dbc1217c11d6059a06f6b90afe64e8bd2", size = 750355, upload-time = "2026-04-07T13:47:59.167Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/39/7b/bb06b061991107cd8783f300adff3e7b7f284e330fd82f507f2a1417b11d/huggingface_hub-0.34.4-py3-none-any.whl", hash = "sha256:9b365d781739c93ff90c359844221beef048403f1bc1f1c123c191257c3c890a", size = 561452, upload-time = "2025-08-08T09:14:50.159Z" },
{ url = "https://files.pythonhosted.org/packages/3d/af/10a89c54937dccf6c10792770f362d96dd67aedfde108e6e1fd7a0836789/huggingface_hub-1.9.1-py3-none-any.whl", hash = "sha256:8dae771b969b318203727a6c6c5209d25e661f6f0dd010fc09cc4a12cf81c657", size = 637356, upload-time = "2026-04-07T13:47:57.239Z" },
]
[[package]]
@@ -2153,7 +2171,7 @@ wheels = [
[[package]]
name = "requests"
version = "2.32.5"
version = "2.33.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "certifi" },
@@ -2161,9 +2179,9 @@ dependencies = [
{ name = "idna" },
{ name = "urllib3" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
sdist = { url = "https://files.pythonhosted.org/packages/34/64/8860370b167a9721e8956ae116825caff829224fbca0ca6e7bf8ddef8430/requests-2.33.0.tar.gz", hash = "sha256:c7ebc5e8b0f21837386ad0e1c8fe8b829fa5f544d8df3b2253bff14ef29d7652", size = 134232, upload-time = "2026-03-25T15:10:41.586Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
{ url = "https://files.pythonhosted.org/packages/56/5d/c814546c2333ceea4ba42262d8c4d55763003e767fa169adc693bd524478/requests-2.33.0-py3-none-any.whl", hash = "sha256:3324635456fa185245e24865e810cecec7b4caf933d7eb133dcde67d48cee69b", size = 65017, upload-time = "2026-03-25T15:10:40.382Z" },
]
[[package]]
@@ -2657,27 +2675,28 @@ wheels = [
[[package]]
name = "tokenizers"
version = "0.22.0"
version = "0.22.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "huggingface-hub" },
]
sdist = { url = "https://files.pythonhosted.org/packages/5e/b4/c1ce3699e81977da2ace8b16d2badfd42b060e7d33d75c4ccdbf9dc920fa/tokenizers-0.22.0.tar.gz", hash = "sha256:2e33b98525be8453f355927f3cab312c36cd3e44f4d7e9e97da2fa94d0a49dcb", size = 362771, upload-time = "2025-08-29T10:25:33.914Z" }
sdist = { url = "https://files.pythonhosted.org/packages/73/6f/f80cfef4a312e1fb34baf7d85c72d4411afde10978d4657f8cdd811d3ccc/tokenizers-0.22.2.tar.gz", hash = "sha256:473b83b915e547aa366d1eee11806deaf419e17be16310ac0a14077f1e28f917", size = 372115, upload-time = "2026-01-05T10:45:15.988Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/6d/b1/18c13648edabbe66baa85fe266a478a7931ddc0cd1ba618802eb7b8d9865/tokenizers-0.22.0-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:eaa9620122a3fb99b943f864af95ed14c8dfc0f47afa3b404ac8c16b3f2bb484", size = 3081954, upload-time = "2025-08-29T10:25:24.993Z" },
{ url = "https://files.pythonhosted.org/packages/c2/02/c3c454b641bd7c4f79e4464accfae9e7dfc913a777d2e561e168ae060362/tokenizers-0.22.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:71784b9ab5bf0ff3075bceeb198149d2c5e068549c0d18fe32d06ba0deb63f79", size = 2945644, upload-time = "2025-08-29T10:25:23.405Z" },
{ url = "https://files.pythonhosted.org/packages/55/02/d10185ba2fd8c2d111e124c9d92de398aee0264b35ce433f79fb8472f5d0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ec5b71f668a8076802b0241a42387d48289f25435b86b769ae1837cad4172a17", size = 3254764, upload-time = "2025-08-29T10:25:12.445Z" },
{ url = "https://files.pythonhosted.org/packages/13/89/17514bd7ef4bf5bfff58e2b131cec0f8d5cea2b1c8ffe1050a2c8de88dbb/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ea8562fa7498850d02a16178105b58803ea825b50dc9094d60549a7ed63654bb", size = 3161654, upload-time = "2025-08-29T10:25:15.493Z" },
{ url = "https://files.pythonhosted.org/packages/5a/d8/bac9f3a7ef6dcceec206e3857c3b61bb16c6b702ed7ae49585f5bd85c0ef/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4136e1558a9ef2e2f1de1555dcd573e1cbc4a320c1a06c4107a3d46dc8ac6e4b", size = 3511484, upload-time = "2025-08-29T10:25:20.477Z" },
{ url = "https://files.pythonhosted.org/packages/aa/27/9c9800eb6763683010a4851db4d1802d8cab9cec114c17056eccb4d4a6e0/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cdf5954de3962a5fd9781dc12048d24a1a6f1f5df038c6e95db328cd22964206", size = 3712829, upload-time = "2025-08-29T10:25:17.154Z" },
{ url = "https://files.pythonhosted.org/packages/10/e3/b1726dbc1f03f757260fa21752e1921445b5bc350389a8314dd3338836db/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8337ca75d0731fc4860e6204cc24bb36a67d9736142aa06ed320943b50b1e7ed", size = 3408934, upload-time = "2025-08-29T10:25:18.76Z" },
{ url = "https://files.pythonhosted.org/packages/d4/61/aeab3402c26874b74bb67a7f2c4b569dde29b51032c5384db592e7b216f4/tokenizers-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a89264e26f63c449d8cded9061adea7b5de53ba2346fc7e87311f7e4117c1cc8", size = 3345585, upload-time = "2025-08-29T10:25:22.08Z" },
{ url = "https://files.pythonhosted.org/packages/bc/d3/498b4a8a8764cce0900af1add0f176ff24f475d4413d55b760b8cdf00893/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:790bad50a1b59d4c21592f9c3cf5e5cf9c3c7ce7e1a23a739f13e01fb1be377a", size = 9322986, upload-time = "2025-08-29T10:25:26.607Z" },
{ url = "https://files.pythonhosted.org/packages/a2/62/92378eb1c2c565837ca3cb5f9569860d132ab9d195d7950c1ea2681dffd0/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:76cf6757c73a10ef10bf06fa937c0ec7393d90432f543f49adc8cab3fb6f26cb", size = 9276630, upload-time = "2025-08-29T10:25:28.349Z" },
{ url = "https://files.pythonhosted.org/packages/eb/f0/342d80457aa1cda7654327460f69db0d69405af1e4c453f4dc6ca7c4a76e/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:1626cb186e143720c62c6c6b5371e62bbc10af60481388c0da89bc903f37ea0c", size = 9547175, upload-time = "2025-08-29T10:25:29.989Z" },
{ url = "https://files.pythonhosted.org/packages/14/84/8aa9b4adfc4fbd09381e20a5bc6aa27040c9c09caa89988c01544e008d18/tokenizers-0.22.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:da589a61cbfea18ae267723d6b029b84598dc8ca78db9951d8f5beff72d8507c", size = 9692735, upload-time = "2025-08-29T10:25:32.089Z" },
{ url = "https://files.pythonhosted.org/packages/bf/24/83ee2b1dc76bfe05c3142e7d0ccdfe69f0ad2f1ebf6c726cea7f0874c0d0/tokenizers-0.22.0-cp39-abi3-win32.whl", hash = "sha256:dbf9d6851bddae3e046fedfb166f47743c1c7bd11c640f0691dd35ef0bcad3be", size = 2471915, upload-time = "2025-08-29T10:25:36.411Z" },
{ url = "https://files.pythonhosted.org/packages/d1/9b/0e0bf82214ee20231845b127aa4a8015936ad5a46779f30865d10e404167/tokenizers-0.22.0-cp39-abi3-win_amd64.whl", hash = "sha256:c78174859eeaee96021f248a56c801e36bfb6bd5b067f2e95aa82445ca324f00", size = 2680494, upload-time = "2025-08-29T10:25:35.14Z" },
{ url = "https://files.pythonhosted.org/packages/92/97/5dbfabf04c7e348e655e907ed27913e03db0923abb5dfdd120d7b25630e1/tokenizers-0.22.2-cp39-abi3-macosx_10_12_x86_64.whl", hash = "sha256:544dd704ae7238755d790de45ba8da072e9af3eea688f698b137915ae959281c", size = 3100275, upload-time = "2026-01-05T10:41:02.158Z" },
{ url = "https://files.pythonhosted.org/packages/2e/47/174dca0502ef88b28f1c9e06b73ce33500eedfac7a7692108aec220464e7/tokenizers-0.22.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1e418a55456beedca4621dbab65a318981467a2b188e982a23e117f115ce5001", size = 2981472, upload-time = "2026-01-05T10:41:00.276Z" },
{ url = "https://files.pythonhosted.org/packages/d6/84/7990e799f1309a8b87af6b948f31edaa12a3ed22d11b352eaf4f4b2e5753/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2249487018adec45d6e3554c71d46eb39fa8ea67156c640f7513eb26f318cec7", size = 3290736, upload-time = "2026-01-05T10:40:32.165Z" },
{ url = "https://files.pythonhosted.org/packages/78/59/09d0d9ba94dcd5f4f1368d4858d24546b4bdc0231c2354aa31d6199f0399/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:25b85325d0815e86e0bac263506dd114578953b7b53d7de09a6485e4a160a7dd", size = 3168835, upload-time = "2026-01-05T10:40:38.847Z" },
{ url = "https://files.pythonhosted.org/packages/47/50/b3ebb4243e7160bda8d34b731e54dd8ab8b133e50775872e7a434e524c28/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:bfb88f22a209ff7b40a576d5324bf8286b519d7358663db21d6246fb17eea2d5", size = 3521673, upload-time = "2026-01-05T10:40:56.614Z" },
{ url = "https://files.pythonhosted.org/packages/e0/fa/89f4cb9e08df770b57adb96f8cbb7e22695a4cb6c2bd5f0c4f0ebcf33b66/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1c774b1276f71e1ef716e5486f21e76333464f47bece56bbd554485982a9e03e", size = 3724818, upload-time = "2026-01-05T10:40:44.507Z" },
{ url = "https://files.pythonhosted.org/packages/64/04/ca2363f0bfbe3b3d36e95bf67e56a4c88c8e3362b658e616d1ac185d47f2/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:df6c4265b289083bf710dff49bc51ef252f9d5be33a45ee2bed151114a56207b", size = 3379195, upload-time = "2026-01-05T10:40:51.139Z" },
{ url = "https://files.pythonhosted.org/packages/2e/76/932be4b50ef6ccedf9d3c6639b056a967a86258c6d9200643f01269211ca/tokenizers-0.22.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:369cc9fc8cc10cb24143873a0d95438bb8ee257bb80c71989e3ee290e8d72c67", size = 3274982, upload-time = "2026-01-05T10:40:58.331Z" },
{ url = "https://files.pythonhosted.org/packages/1d/28/5f9f5a4cc211b69e89420980e483831bcc29dade307955cc9dc858a40f01/tokenizers-0.22.2-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:29c30b83d8dcd061078b05ae0cb94d3c710555fbb44861139f9f83dcca3dc3e4", size = 9478245, upload-time = "2026-01-05T10:41:04.053Z" },
{ url = "https://files.pythonhosted.org/packages/6c/fb/66e2da4704d6aadebf8cb39f1d6d1957df667ab24cff2326b77cda0dcb85/tokenizers-0.22.2-cp39-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:37ae80a28c1d3265bb1f22464c856bd23c02a05bb211e56d0c5301a435be6c1a", size = 9560069, upload-time = "2026-01-05T10:45:10.673Z" },
{ url = "https://files.pythonhosted.org/packages/16/04/fed398b05caa87ce9b1a1bb5166645e38196081b225059a6edaff6440fac/tokenizers-0.22.2-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:791135ee325f2336f498590eb2f11dc5c295232f288e75c99a36c5dbce63088a", size = 9899263, upload-time = "2026-01-05T10:45:12.559Z" },
{ url = "https://files.pythonhosted.org/packages/05/a1/d62dfe7376beaaf1394917e0f8e93ee5f67fea8fcf4107501db35996586b/tokenizers-0.22.2-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:38337540fbbddff8e999d59970f3c6f35a82de10053206a7562f1ea02d046fa5", size = 10033429, upload-time = "2026-01-05T10:45:14.333Z" },
{ url = "https://files.pythonhosted.org/packages/fd/18/a545c4ea42af3df6effd7d13d250ba77a0a86fb20393143bbb9a92e434d4/tokenizers-0.22.2-cp39-abi3-win32.whl", hash = "sha256:a6bf3f88c554a2b653af81f3204491c818ae2ac6fbc09e76ef4773351292bc92", size = 2502363, upload-time = "2026-01-05T10:45:20.593Z" },
{ url = "https://files.pythonhosted.org/packages/65/71/0670843133a43d43070abeb1949abfdef12a86d490bea9cd9e18e37c5ff7/tokenizers-0.22.2-cp39-abi3-win_amd64.whl", hash = "sha256:c9ea31edff2968b44a88f97d784c2f16dc0729b8b143ed004699ebca91f05c48", size = 2747786, upload-time = "2026-01-05T10:45:18.411Z" },
{ url = "https://files.pythonhosted.org/packages/72/f4/0de46cfa12cdcbcd464cc59fde36912af405696f687e53a091fb432f694c/tokenizers-0.22.2-cp39-abi3-win_arm64.whl", hash = "sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc", size = 2612133, upload-time = "2026-01-05T10:45:17.232Z" },
]
[[package]]
@@ -2804,7 +2823,7 @@ wheels = [
[[package]]
name = "transformers"
version = "4.56.1"
version = "5.0.0rc3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "filelock" },
@@ -2817,10 +2836,11 @@ dependencies = [
{ name = "safetensors" },
{ name = "tokenizers" },
{ name = "tqdm" },
{ name = "typer-slim" },
]
sdist = { url = "https://files.pythonhosted.org/packages/89/21/dc88ef3da1e49af07ed69386a11047a31dcf1aaf4ded3bc4b173fbf94116/transformers-4.56.1.tar.gz", hash = "sha256:0d88b1089a563996fc5f2c34502f10516cad3ea1aa89f179f522b54c8311fe74", size = 9855473, upload-time = "2025-09-04T20:47:13.14Z" }
sdist = { url = "https://files.pythonhosted.org/packages/3f/a3/7c116a8d85f69ea7749cf4c2df79e64c35d028e5fc7ea0168f299d03b8c7/transformers-5.0.0rc3.tar.gz", hash = "sha256:a0315b92b7e087617ade42ec9e6e92ee7620541cc5d6a3331886c52cbe306f5c", size = 8388520, upload-time = "2026-01-14T16:49:02.952Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/71/7c/283c3dd35e00e22a7803a0b2a65251347b745474a82399be058bde1c9f15/transformers-4.56.1-py3-none-any.whl", hash = "sha256:1697af6addfb6ddbce9618b763f4b52d5a756f6da4899ffd1b4febf58b779248", size = 11608197, upload-time = "2025-09-04T20:47:04.895Z" },
{ url = "https://files.pythonhosted.org/packages/1e/f2/ae2b8968764253bdf38a48dee3c299b8d0bedf7c8ffbe3449fca9bd95338/transformers-5.0.0rc3-py3-none-any.whl", hash = "sha256:383fad27f4f73092d330e45fae384681e5c8521e1dc1cf6cb1a297780e68bf2d", size = 10107087, upload-time = "2026-01-14T16:48:59.393Z" },
]
[[package]]
@@ -2838,17 +2858,29 @@ wheels = [
[[package]]
name = "typer"
version = "0.17.3"
version = "0.24.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-doc" },
{ name = "click" },
{ name = "rich" },
{ name = "shellingham" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/dd/82/f4bfed3bc18c6ebd6f828320811bbe4098f92a31adf4040bee59c4ae02ea/typer-0.17.3.tar.gz", hash = "sha256:0c600503d472bcf98d29914d4dcd67f80c24cc245395e2e00ba3603c9332e8ba", size = 103517, upload-time = "2025-08-30T12:35:24.05Z" }
sdist = { url = "https://files.pythonhosted.org/packages/f5/24/cb09efec5cc954f7f9b930bf8279447d24618bb6758d4f6adf2574c41780/typer-0.24.1.tar.gz", hash = "sha256:e39b4732d65fbdcde189ae76cf7cd48aeae72919dea1fdfc16593be016256b45", size = 118613, upload-time = "2026-02-21T16:54:40.609Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ca/e8/b3d537470e8404659a6335e7af868e90657efb73916ef31ddf3d8b9cb237/typer-0.17.3-py3-none-any.whl", hash = "sha256:643919a79182ab7ac7581056d93c6a2b865b026adf2872c4d02c72758e6f095b", size = 46494, upload-time = "2025-08-30T12:35:22.391Z" },
{ url = "https://files.pythonhosted.org/packages/4a/91/48db081e7a63bb37284f9fbcefda7c44c277b18b0e13fbc36ea2335b71e6/typer-0.24.1-py3-none-any.whl", hash = "sha256:112c1f0ce578bfb4cab9ffdabc68f031416ebcc216536611ba21f04e9aa84c9e", size = 56085, upload-time = "2026-02-21T16:54:41.616Z" },
]
[[package]]
name = "typer-slim"
version = "0.24.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typer" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a7/a7/e6aecc4b4eb59598829a3b5076a93aff291b4fdaa2ded25efc4e1f4d219c/typer_slim-0.24.0.tar.gz", hash = "sha256:f0ed36127183f52ae6ced2ecb2521789995992c521a46083bfcdbb652d22ad34", size = 4776, upload-time = "2026-02-16T22:08:51.2Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a7/24/5480c20380dfd18cf33d14784096dca45a24eae6102e91d49a718d3b6855/typer_slim-0.24.0-py3-none-any.whl", hash = "sha256:d5d7ee1ee2834d5020c7c616ed5e0d0f29b9a4b1dd283bdebae198ec09778d0e", size = 3394, upload-time = "2026-02-16T22:08:49.92Z" },
]
[[package]]

34
livekit.yaml.example Normal file
View File

@@ -0,0 +1,34 @@
# LiveKit server configuration
# Generated by setup-selfhosted.sh — do not edit manually.
# See: https://docs.livekit.io/self-hosting/deployment/
port: 7880
rtc:
tcp_port: 7881
port_range_start: 44200
port_range_end: 44300
# use_external_ip: true # Uncomment for production with public IP
redis:
address: redis:6379
keys:
# API key : API secret (generated by setup script)
# devkey: secret
__LIVEKIT_API_KEY__: __LIVEKIT_API_SECRET__
webhook:
urls:
- http://server:1250/v1/livekit/webhook
api_key: __LIVEKIT_API_KEY__
logging:
level: info
# Room settings
room:
empty_timeout: 300 # 5 minutes after last participant leaves
max_participants: 0 # 0 = unlimited
# Track Egress only (no composite video)
# Egress is configured via egress.yaml on the egress service

130
scripts/generate-certs.sh Executable file
View File

@@ -0,0 +1,130 @@
#!/usr/bin/env bash
#
# Generate a local CA and server certificate for Reflector self-hosted deployments.
#
# Usage:
# ./scripts/generate-certs.sh DOMAIN [EXTRA_SANS...]
#
# Examples:
# ./scripts/generate-certs.sh reflector.local
# ./scripts/generate-certs.sh reflector.local "DNS:gpu.local,IP:192.168.1.100"
#
# Generates in certs/:
# ca.key — CA private key (keep secret)
# ca.crt — CA certificate (distribute to clients)
# server-key.pem — Server private key
# server.pem — Server certificate (signed by CA)
#
# Then use with setup-selfhosted.sh:
# ./scripts/setup-selfhosted.sh --gpu --caddy --domain DOMAIN --custom-ca certs/
#
set -euo pipefail
DOMAIN="${1:?Usage: $0 DOMAIN [EXTRA_SANS...]}"
EXTRA_SANS="${2:-}"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CERTS_DIR="$(cd "$SCRIPT_DIR/.." && pwd)/certs"
# Colors
GREEN='\033[0;32m'
CYAN='\033[0;36m'
NC='\033[0m'
info() { echo -e "${CYAN}==>${NC} $*"; }
ok() { echo -e "${GREEN}${NC} $*"; }
# Check for openssl
if ! command -v openssl &>/dev/null; then
echo "Error: openssl is required but not found. Install it first." >&2
exit 1
fi
mkdir -p "$CERTS_DIR"
# Build SAN list
SAN_LIST="DNS:$DOMAIN,DNS:localhost,IP:127.0.0.1"
if [[ -n "$EXTRA_SANS" ]]; then
SAN_LIST="$SAN_LIST,$EXTRA_SANS"
fi
info "Generating CA and server certificate for: $DOMAIN"
echo " SANs: $SAN_LIST"
echo ""
# --- Step 1: Generate CA ---
if [[ -f "$CERTS_DIR/ca.key" ]] && [[ -f "$CERTS_DIR/ca.crt" ]]; then
ok "CA already exists at certs/ca.key + certs/ca.crt — reusing"
else
info "Generating CA key and certificate..."
openssl genrsa -out "$CERTS_DIR/ca.key" 4096 2>/dev/null
openssl req -x509 -new -nodes \
-key "$CERTS_DIR/ca.key" \
-sha256 -days 3650 \
-out "$CERTS_DIR/ca.crt" \
-subj "/CN=Reflector Local CA/O=Reflector Self-Hosted"
ok "CA certificate generated (valid for 10 years)"
fi
# --- Step 2: Generate server key ---
info "Generating server key..."
openssl genrsa -out "$CERTS_DIR/server-key.pem" 2048 2>/dev/null
ok "Server key generated"
# --- Step 3: Create CSR with SANs ---
info "Creating certificate signing request..."
openssl req -new \
-key "$CERTS_DIR/server-key.pem" \
-out "$CERTS_DIR/server.csr" \
-subj "/CN=$DOMAIN" \
-addext "subjectAltName=$SAN_LIST"
ok "CSR created"
# --- Step 4: Sign with CA ---
info "Signing server certificate with CA..."
openssl x509 -req \
-in "$CERTS_DIR/server.csr" \
-CA "$CERTS_DIR/ca.crt" \
-CAkey "$CERTS_DIR/ca.key" \
-CAcreateserial \
-out "$CERTS_DIR/server.pem" \
-days 365 -sha256 \
-copy_extensions copyall \
2>/dev/null
ok "Server certificate signed (valid for 1 year)"
# --- Cleanup ---
rm -f "$CERTS_DIR/server.csr" "$CERTS_DIR/ca.srl"
# --- Set permissions ---
chmod 644 "$CERTS_DIR/ca.crt" "$CERTS_DIR/server.pem"
chmod 600 "$CERTS_DIR/ca.key" "$CERTS_DIR/server-key.pem"
echo ""
echo "=========================================="
echo -e " ${GREEN}Certificates generated in certs/${NC}"
echo "=========================================="
echo ""
echo " certs/ca.key CA private key (keep secret)"
echo " certs/ca.crt CA certificate (distribute to clients)"
echo " certs/server-key.pem Server private key"
echo " certs/server.pem Server certificate for $DOMAIN"
echo ""
echo " SANs: $SAN_LIST"
echo ""
echo "Use with setup-selfhosted.sh:"
echo " ./scripts/setup-selfhosted.sh --gpu --caddy --domain $DOMAIN --custom-ca certs/"
echo ""
echo "Trust the CA on your machine:"
case "$(uname -s)" in
Darwin)
echo " sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain certs/ca.crt"
;;
Linux)
echo " sudo cp certs/ca.crt /usr/local/share/ca-certificates/reflector-ca.crt"
echo " sudo update-ca-certificates"
;;
*)
echo " See docsv2/custom-ca-setup.md for your platform"
;;
esac
echo ""

View File

@@ -13,14 +13,25 @@
# Optional:
# LLM_MODEL — Model name (default: qwen2.5:14b)
#
# Flags:
# --build — Rebuild backend Docker images (server, workers, test-runner)
#
# Usage:
# export LLM_URL="https://api.openai.com/v1"
# export LLM_API_KEY="sk-..."
# export HF_TOKEN="hf_..."
# ./scripts/run-integration-tests.sh
# ./scripts/run-integration-tests.sh --build # rebuild backend images
#
set -euo pipefail
BUILD_FLAG=""
for arg in "$@"; do
case "$arg" in
--build) BUILD_FLAG="--build" ;;
esac
done
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
COMPOSE_DIR="$REPO_ROOT/server/tests"
@@ -66,7 +77,7 @@ trap cleanup EXIT
# ── Step 1: Build and start infrastructure ──────────────────────────────────
info "Building and starting infrastructure services..."
$COMPOSE up -d --build postgres redis garage hatchet mock-daily
$COMPOSE up -d --build postgres redis garage hatchet mock-daily mailpit
# ── Step 2: Set up Garage (S3 bucket + keys) ───────────────────────────────
wait_for "Garage" "$COMPOSE exec -T garage /garage stats" 60
@@ -116,7 +127,7 @@ ok "Hatchet token generated"
# ── Step 4: Start backend services ──────────────────────────────────────────
info "Starting backend services..."
$COMPOSE up -d server worker hatchet-worker-cpu hatchet-worker-llm test-runner
$COMPOSE up -d $BUILD_FLAG server worker hatchet-worker-cpu hatchet-worker-llm test-runner
# ── Step 5: Wait for server + run migrations ────────────────────────────────
wait_for "Server" "$COMPOSE exec -T test-runner curl -sf http://server:1250/health" 60

496
scripts/setup-gpu-host.sh Executable file
View File

@@ -0,0 +1,496 @@
#!/usr/bin/env bash
#
# Standalone GPU service setup for Reflector.
# Deploys ONLY the GPU transcription/diarization/translation service on a dedicated machine.
# The main Reflector instance connects to this machine over HTTPS.
#
# Usage:
# ./scripts/setup-gpu-host.sh [--domain DOMAIN] [--custom-ca PATH] [--extra-ca FILE] [--api-key KEY] [--cpu] [--build]
#
# Options:
# --domain DOMAIN Domain name for this GPU host (e.g., gpu.example.com)
# With --custom-ca: uses custom TLS cert. Without: uses Let's Encrypt.
# --custom-ca PATH Custom CA certificate (dir with ca.crt + server.pem + server-key.pem, or single PEM file)
# --extra-ca FILE Additional CA cert to trust (repeatable)
# --api-key KEY API key to protect the GPU service (recommended for internet-facing deployments)
# --cpu Use CPU-only Dockerfile (no NVIDIA GPU required)
# --build Build image from source (default: build, since no pre-built GPU image is published)
# --port PORT Host port to expose (default: 443 with Caddy, 8000 without)
#
# Examples:
# # GPU on LAN with custom CA
# ./scripts/generate-certs.sh gpu.local
# ./scripts/setup-gpu-host.sh --domain gpu.local --custom-ca certs/ --api-key my-secret-key
#
# # GPU on public internet with Let's Encrypt
# ./scripts/setup-gpu-host.sh --domain gpu.example.com --api-key my-secret-key
#
# # GPU on LAN, IP access only (self-signed cert)
# ./scripts/setup-gpu-host.sh --api-key my-secret-key
#
# # CPU-only mode (no NVIDIA GPU)
# ./scripts/setup-gpu-host.sh --cpu --api-key my-secret-key
#
# After setup, configure the main Reflector instance to use this GPU:
# In server/.env on the Reflector machine:
# TRANSCRIPT_BACKEND=modal
# TRANSCRIPT_URL=https://gpu.example.com
# TRANSCRIPT_MODAL_API_KEY=my-secret-key
# DIARIZATION_BACKEND=modal
# DIARIZATION_URL=https://gpu.example.com
# DIARIZATION_MODAL_API_KEY=my-secret-key
# TRANSLATION_BACKEND=modal
# TRANSLATE_URL=https://gpu.example.com
# TRANSLATION_MODAL_API_KEY=my-secret-key
#
# DNS Resolution:
# - Public domain: Create a DNS A record pointing to this machine's public IP.
# - Internal domain (e.g., gpu.local): Add to /etc/hosts on both machines:
# <GPU_MACHINE_IP> gpu.local
# - IP-only: Use the machine's IP directly in TRANSCRIPT_URL/DIARIZATION_URL.
# The Reflector backend must trust the CA or accept self-signed certs.
#
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GPU_DIR="$ROOT_DIR/gpu/self_hosted"
OS="$(uname -s)"
# --- Colors ---
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
NC='\033[0m'
info() { echo -e "${CYAN}==>${NC} $*"; }
ok() { echo -e "${GREEN}${NC} $*"; }
warn() { echo -e "${YELLOW} !${NC} $*"; }
err() { echo -e "${RED}${NC} $*" >&2; }
# --- Parse arguments ---
CUSTOM_DOMAIN=""
CUSTOM_CA=""
EXTRA_CA_FILES=()
API_KEY=""
USE_CPU=false
HOST_PORT=""
SKIP_NEXT=false
ARGS=("$@")
for i in "${!ARGS[@]}"; do
if [[ "$SKIP_NEXT" == "true" ]]; then
SKIP_NEXT=false
continue
fi
arg="${ARGS[$i]}"
case "$arg" in
--domain)
next_i=$((i + 1))
if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
err "--domain requires a domain name"
exit 1
fi
CUSTOM_DOMAIN="${ARGS[$next_i]}"
SKIP_NEXT=true ;;
--custom-ca)
next_i=$((i + 1))
if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
err "--custom-ca requires a path to a directory or PEM certificate file"
exit 1
fi
CUSTOM_CA="${ARGS[$next_i]}"
SKIP_NEXT=true ;;
--extra-ca)
next_i=$((i + 1))
if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
err "--extra-ca requires a path to a PEM certificate file"
exit 1
fi
if [[ ! -f "${ARGS[$next_i]}" ]]; then
err "--extra-ca file not found: ${ARGS[$next_i]}"
exit 1
fi
EXTRA_CA_FILES+=("${ARGS[$next_i]}")
SKIP_NEXT=true ;;
--api-key)
next_i=$((i + 1))
if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
err "--api-key requires a key value"
exit 1
fi
API_KEY="${ARGS[$next_i]}"
SKIP_NEXT=true ;;
--cpu)
USE_CPU=true ;;
--port)
next_i=$((i + 1))
if [[ $next_i -ge ${#ARGS[@]} ]] || [[ "${ARGS[$next_i]}" == --* ]]; then
err "--port requires a port number"
exit 1
fi
HOST_PORT="${ARGS[$next_i]}"
SKIP_NEXT=true ;;
--build)
;; # Always build from source for GPU, flag accepted for compatibility
*)
err "Unknown argument: $arg"
err "Usage: $0 [--domain DOMAIN] [--custom-ca PATH] [--extra-ca FILE] [--api-key KEY] [--cpu] [--port PORT]"
exit 1
;;
esac
done
# --- Resolve CA paths ---
CA_CERT_PATH=""
TLS_CERT_PATH=""
TLS_KEY_PATH=""
USE_CUSTOM_CA=false
USE_CADDY=false
if [[ -n "$CUSTOM_CA" ]] || [[ -n "${EXTRA_CA_FILES[0]+x}" ]]; then
USE_CUSTOM_CA=true
fi
if [[ -n "$CUSTOM_CA" ]]; then
CUSTOM_CA="${CUSTOM_CA%/}"
if [[ -d "$CUSTOM_CA" ]]; then
[[ -f "$CUSTOM_CA/ca.crt" ]] || { err "$CUSTOM_CA/ca.crt not found"; exit 1; }
CA_CERT_PATH="$CUSTOM_CA/ca.crt"
if [[ -f "$CUSTOM_CA/server.pem" ]] && [[ -f "$CUSTOM_CA/server-key.pem" ]]; then
TLS_CERT_PATH="$CUSTOM_CA/server.pem"
TLS_KEY_PATH="$CUSTOM_CA/server-key.pem"
elif [[ -f "$CUSTOM_CA/server.pem" ]] || [[ -f "$CUSTOM_CA/server-key.pem" ]]; then
warn "Found only one of server.pem/server-key.pem — both needed for TLS. Skipping."
fi
elif [[ -f "$CUSTOM_CA" ]]; then
CA_CERT_PATH="$CUSTOM_CA"
else
err "--custom-ca path not found: $CUSTOM_CA"
exit 1
fi
elif [[ -n "${EXTRA_CA_FILES[0]+x}" ]]; then
CA_CERT_PATH="${EXTRA_CA_FILES[0]}"
unset 'EXTRA_CA_FILES[0]'
EXTRA_CA_FILES=("${EXTRA_CA_FILES[@]+"${EXTRA_CA_FILES[@]}"}")
fi
# Caddy if we have a domain or TLS certs
if [[ -n "$CUSTOM_DOMAIN" ]] || [[ -n "$TLS_CERT_PATH" ]]; then
USE_CADDY=true
fi
# Default port
if [[ -z "$HOST_PORT" ]]; then
if [[ "$USE_CADDY" == "true" ]]; then
HOST_PORT="443"
else
HOST_PORT="8000"
fi
fi
# Detect primary IP
PRIMARY_IP=""
if [[ "$OS" == "Linux" ]]; then
PRIMARY_IP=$(hostname -I 2>/dev/null | awk '{print $1}' || true)
if [[ "$PRIMARY_IP" == "127."* ]] || [[ -z "$PRIMARY_IP" ]]; then
PRIMARY_IP=$(ip -4 route get 1 2>/dev/null | sed -n 's/.*src \([0-9.]*\).*/\1/p' || true)
fi
fi
# --- Display config ---
echo ""
echo "=========================================="
echo " Reflector — Standalone GPU Host Setup"
echo "=========================================="
echo ""
echo " Mode: $(if [[ "$USE_CPU" == "true" ]]; then echo "CPU-only"; else echo "NVIDIA GPU"; fi)"
echo " Caddy: $USE_CADDY"
[[ -n "$CUSTOM_DOMAIN" ]] && echo " Domain: $CUSTOM_DOMAIN"
[[ "$USE_CUSTOM_CA" == "true" ]] && echo " CA: Custom"
[[ -n "$TLS_CERT_PATH" ]] && echo " TLS: Custom cert"
[[ -n "$API_KEY" ]] && echo " Auth: API key protected"
[[ -z "$API_KEY" ]] && echo " Auth: NONE (open access — use --api-key for production!)"
echo " Port: $HOST_PORT"
echo ""
# --- Prerequisites ---
info "Checking prerequisites"
if ! command -v docker &>/dev/null; then
err "Docker not found. Install Docker first."
exit 1
fi
ok "Docker available"
if ! docker compose version &>/dev/null; then
err "Docker Compose V2 not found."
exit 1
fi
ok "Docker Compose V2 available"
if [[ "$USE_CPU" != "true" ]]; then
if ! docker info 2>/dev/null | grep -qi nvidia; then
warn "NVIDIA runtime not detected in Docker. GPU mode may fail."
warn "Install nvidia-container-toolkit if you have an NVIDIA GPU."
else
ok "NVIDIA Docker runtime available"
fi
fi
# --- Stage certificates ---
CERTS_DIR="$ROOT_DIR/certs"
if [[ "$USE_CUSTOM_CA" == "true" ]]; then
info "Staging certificates"
mkdir -p "$CERTS_DIR"
if [[ -n "$CA_CERT_PATH" ]]; then
local_ca_dest="$CERTS_DIR/ca.crt"
src_id=$(ls -i "$CA_CERT_PATH" 2>/dev/null | awk '{print $1}')
dst_id=$(ls -i "$local_ca_dest" 2>/dev/null | awk '{print $1}')
if [[ "$src_id" != "$dst_id" ]] || [[ -z "$dst_id" ]]; then
cp "$CA_CERT_PATH" "$local_ca_dest"
fi
chmod 644 "$local_ca_dest"
ok "CA certificate staged"
# Append extra CAs
for extra_ca in "${EXTRA_CA_FILES[@]+"${EXTRA_CA_FILES[@]}"}"; do
echo "" >> "$local_ca_dest"
cat "$extra_ca" >> "$local_ca_dest"
ok "Appended extra CA: $extra_ca"
done
fi
if [[ -n "$TLS_CERT_PATH" ]]; then
cert_dest="$CERTS_DIR/server.pem"
key_dest="$CERTS_DIR/server-key.pem"
src_id=$(ls -i "$TLS_CERT_PATH" 2>/dev/null | awk '{print $1}')
dst_id=$(ls -i "$cert_dest" 2>/dev/null | awk '{print $1}')
if [[ "$src_id" != "$dst_id" ]] || [[ -z "$dst_id" ]]; then
cp "$TLS_CERT_PATH" "$cert_dest"
cp "$TLS_KEY_PATH" "$key_dest"
fi
chmod 644 "$cert_dest"
chmod 600 "$key_dest"
ok "TLS cert/key staged"
fi
fi
# --- Build profiles and compose command ---
COMPOSE_FILE="$ROOT_DIR/docker-compose.gpu-host.yml"
COMPOSE_PROFILES=()
GPU_SERVICE="gpu"
if [[ "$USE_CPU" == "true" ]]; then
COMPOSE_PROFILES+=("cpu")
GPU_SERVICE="cpu"
else
COMPOSE_PROFILES+=("gpu")
fi
if [[ "$USE_CADDY" == "true" ]]; then
COMPOSE_PROFILES+=("caddy")
fi
# Compose command helper
compose_cmd() {
local profiles="" files="-f $COMPOSE_FILE"
if [[ "$USE_CUSTOM_CA" == "true" ]] && [[ -f "$ROOT_DIR/docker-compose.gpu-ca.yml" ]]; then
files="$files -f $ROOT_DIR/docker-compose.gpu-ca.yml"
fi
for p in "${COMPOSE_PROFILES[@]}"; do
profiles="$profiles --profile $p"
done
docker compose $files $profiles "$@"
}
# Generate CA compose override if needed (mounts certs into containers)
if [[ "$USE_CUSTOM_CA" == "true" ]]; then
info "Generating docker-compose.gpu-ca.yml override"
ca_override="$ROOT_DIR/docker-compose.gpu-ca.yml"
cat > "$ca_override" << 'CAEOF'
# Generated by setup-gpu-host.sh — custom CA trust.
# Do not edit manually; re-run setup-gpu-host.sh with --custom-ca to regenerate.
services:
gpu:
volumes:
- ./certs/ca.crt:/usr/local/share/ca-certificates/custom-ca.crt:ro
cpu:
volumes:
- ./certs/ca.crt:/usr/local/share/ca-certificates/custom-ca.crt:ro
CAEOF
if [[ -n "$TLS_CERT_PATH" ]]; then
cat >> "$ca_override" << 'CADDYCAEOF'
caddy:
volumes:
- ./certs:/etc/caddy/certs:ro
CADDYCAEOF
fi
ok "Generated docker-compose.gpu-ca.yml"
else
rm -f "$ROOT_DIR/docker-compose.gpu-ca.yml"
fi
# --- Generate Caddyfile ---
if [[ "$USE_CADDY" == "true" ]]; then
info "Generating Caddyfile.gpu-host"
CADDYFILE="$ROOT_DIR/Caddyfile.gpu-host"
if [[ -n "$TLS_CERT_PATH" ]] && [[ -n "$CUSTOM_DOMAIN" ]]; then
cat > "$CADDYFILE" << CADDYEOF
# Generated by setup-gpu-host.sh — Custom TLS cert for $CUSTOM_DOMAIN
$CUSTOM_DOMAIN {
tls /etc/caddy/certs/server.pem /etc/caddy/certs/server-key.pem
reverse_proxy transcription:8000
}
CADDYEOF
ok "Caddyfile: custom TLS for $CUSTOM_DOMAIN"
elif [[ -n "$CUSTOM_DOMAIN" ]]; then
cat > "$CADDYFILE" << CADDYEOF
# Generated by setup-gpu-host.sh — Let's Encrypt for $CUSTOM_DOMAIN
$CUSTOM_DOMAIN {
reverse_proxy transcription:8000
}
CADDYEOF
ok "Caddyfile: Let's Encrypt for $CUSTOM_DOMAIN"
else
cat > "$CADDYFILE" << 'CADDYEOF'
# Generated by setup-gpu-host.sh — self-signed cert for IP access
:443 {
tls internal
reverse_proxy transcription:8000
}
CADDYEOF
ok "Caddyfile: self-signed cert for IP access"
fi
fi
# --- Generate .env ---
info "Generating GPU service .env"
GPU_ENV="$ROOT_DIR/.env.gpu-host"
cat > "$GPU_ENV" << EOF
# Generated by setup-gpu-host.sh
# HuggingFace token for pyannote diarization models
HF_TOKEN=${HF_TOKEN:-}
# API key to protect the GPU service (set via --api-key)
REFLECTOR_GPU_APIKEY=${API_KEY:-}
# Port configuration
GPU_HOST_PORT=${HOST_PORT}
CADDY_HTTPS_PORT=${HOST_PORT}
EOF
if [[ -z "${HF_TOKEN:-}" ]]; then
warn "HF_TOKEN not set. Diarization requires a HuggingFace token."
warn "Set it: export HF_TOKEN=your-token-here and re-run, or edit .env.gpu-host"
fi
ok "Generated .env.gpu-host"
# --- Build and start ---
info "Building $GPU_SERVICE image (first build downloads ML models — may take a while)..."
compose_cmd --env-file "$GPU_ENV" build "$GPU_SERVICE"
ok "$GPU_SERVICE image built"
info "Starting services..."
compose_cmd --env-file "$GPU_ENV" up -d
ok "Services started"
# --- Wait for health ---
info "Waiting for GPU service to be healthy (model loading takes 1-2 minutes)..."
local_url="http://localhost:8000"
for i in $(seq 1 40); do
if curl -sf "$local_url/docs" >/dev/null 2>&1; then
ok "GPU service is healthy!"
break
fi
if [[ $i -eq 40 ]]; then
err "GPU service did not become healthy after 5 minutes."
err "Check logs: docker compose -f docker-compose.gpu-host.yml logs gpu"
exit 1
fi
sleep 8
done
# --- Summary ---
echo ""
echo "=========================================="
echo -e " ${GREEN}GPU service is running!${NC}"
echo "=========================================="
echo ""
if [[ "$USE_CADDY" == "true" ]]; then
if [[ -n "$CUSTOM_DOMAIN" ]]; then
echo " URL: https://$CUSTOM_DOMAIN"
elif [[ -n "$PRIMARY_IP" ]]; then
echo " URL: https://$PRIMARY_IP"
else
echo " URL: https://localhost"
fi
else
if [[ -n "$PRIMARY_IP" ]]; then
echo " URL: http://$PRIMARY_IP:$HOST_PORT"
else
echo " URL: http://localhost:$HOST_PORT"
fi
fi
echo " Health: curl \$(URL)/docs"
[[ -n "$API_KEY" ]] && echo " API key: $API_KEY"
echo ""
echo " Configure the main Reflector instance (in server/.env):"
echo ""
local_gpu_url=""
if [[ "$USE_CADDY" == "true" ]]; then
if [[ -n "$CUSTOM_DOMAIN" ]]; then
local_gpu_url="https://$CUSTOM_DOMAIN"
elif [[ -n "$PRIMARY_IP" ]]; then
local_gpu_url="https://$PRIMARY_IP"
else
local_gpu_url="https://localhost"
fi
else
if [[ -n "$PRIMARY_IP" ]]; then
local_gpu_url="http://$PRIMARY_IP:$HOST_PORT"
else
local_gpu_url="http://localhost:$HOST_PORT"
fi
fi
echo " TRANSCRIPT_BACKEND=modal"
echo " TRANSCRIPT_URL=$local_gpu_url"
[[ -n "$API_KEY" ]] && echo " TRANSCRIPT_MODAL_API_KEY=$API_KEY"
echo " DIARIZATION_BACKEND=modal"
echo " DIARIZATION_URL=$local_gpu_url"
[[ -n "$API_KEY" ]] && echo " DIARIZATION_MODAL_API_KEY=$API_KEY"
echo " TRANSLATION_BACKEND=modal"
echo " TRANSLATE_URL=$local_gpu_url"
[[ -n "$API_KEY" ]] && echo " TRANSLATION_MODAL_API_KEY=$API_KEY"
echo ""
if [[ "$USE_CUSTOM_CA" == "true" ]]; then
echo " The Reflector instance must also trust this CA."
echo " On the Reflector machine, run setup-selfhosted.sh with:"
echo " --extra-ca /path/to/this-machines-ca.crt"
echo ""
fi
echo " DNS Resolution:"
if [[ -n "$CUSTOM_DOMAIN" ]]; then
echo " Ensure '$CUSTOM_DOMAIN' resolves to this machine's IP."
echo " Public: Create a DNS A record."
echo " Internal: Add to /etc/hosts on the Reflector machine:"
echo " ${PRIMARY_IP:-<GPU_IP>} $CUSTOM_DOMAIN"
else
echo " Use this machine's IP directly in TRANSCRIPT_URL/DIARIZATION_URL."
fi
echo ""
echo " To stop: docker compose -f docker-compose.gpu-host.yml down"
echo " To re-run: ./scripts/setup-gpu-host.sh $*"
echo " Logs: docker compose -f docker-compose.gpu-host.yml logs -f gpu"
echo ""

File diff suppressed because it is too large Load Diff

View File

@@ -6,7 +6,7 @@ ENV PYTHONUNBUFFERED=1 \
# builder install base dependencies
WORKDIR /tmp
RUN apt-get update && apt-get install -y curl ffmpeg && apt-get clean
RUN apt-get update && apt-get install -y curl ffmpeg ca-certificates && apt-get clean
ADD https://astral.sh/uv/install.sh /uv-installer.sh
RUN sh /uv-installer.sh && rm /uv-installer.sh
ENV PATH="/root/.local/bin/:$PATH"
@@ -18,7 +18,7 @@ COPY pyproject.toml uv.lock README.md /app/
RUN uv sync --compile-bytecode --locked
# bootstrap
COPY alembic.ini runserver.sh /app/
COPY alembic.ini docker-entrypoint.sh runserver.sh /app/
COPY images /app/images
COPY migrations /app/migrations
COPY reflector /app/reflector
@@ -35,4 +35,6 @@ RUN if [ "$(uname -m)" = "aarch64" ] && [ ! -f /usr/lib/libgomp.so.1 ]; then \
# Pre-check just to make sure the image will not fail
RUN uv run python -c "import silero_vad.model"
CMD ["./runserver.sh"]
RUN chmod +x /app/docker-entrypoint.sh
CMD ["./docker-entrypoint.sh"]

View File

@@ -0,0 +1,25 @@
#!/bin/bash
set -e
# Custom CA certificate injection
# If a CA cert is mounted at this path (via docker-compose.ca.yml),
# add it to the system trust store and configure all Python SSL libraries.
CUSTOM_CA_PATH="/usr/local/share/ca-certificates/custom-ca.crt"
if [ -s "$CUSTOM_CA_PATH" ]; then
echo "[entrypoint] Custom CA certificate detected, updating trust store..."
update-ca-certificates 2>/dev/null
# update-ca-certificates creates a combined bundle (system + custom CAs)
COMBINED_BUNDLE="/etc/ssl/certs/ca-certificates.crt"
export SSL_CERT_FILE="$COMBINED_BUNDLE"
export REQUESTS_CA_BUNDLE="$COMBINED_BUNDLE"
export CURL_CA_BUNDLE="$COMBINED_BUNDLE"
# Note: GRPC_DEFAULT_SSL_ROOTS_FILE_PATH is intentionally NOT set here.
# Setting it causes grpcio to attempt TLS on internal Hatchet connections
# that run without TLS (SERVER_GRPC_INSECURE=t), resulting in handshake failures.
# If you need gRPC with custom CA, set GRPC_DEFAULT_SSL_ROOTS_FILE_PATH explicitly.
echo "[entrypoint] CA trust store updated (SSL_CERT_FILE=$COMBINED_BUNDLE)"
fi
exec ./runserver.sh

View File

@@ -419,3 +419,18 @@ User-room broadcasts to `user:{user_id}`:
- `TRANSCRIPT_STATUS`
- `TRANSCRIPT_FINAL_TITLE`
- `TRANSCRIPT_DURATION`
## Failed Runs Monitor (Hatchet Cron)
A `FailedRunsMonitor` Hatchet cron workflow runs hourly (`0 * * * *`) and checks for failed pipeline runs
(DiarizationPipeline, FilePipeline, LivePostProcessingPipeline) in the last hour. For each failed run,
it renders a DAG status overview and posts it to Zulip.
**Required env vars** (all must be set to enable):
- `ZULIP_REALM` — Zulip server hostname
- `ZULIP_API_KEY` — Zulip bot API key
- `ZULIP_BOT_EMAIL` — Zulip bot email
- `ZULIP_DAG_STREAM` — Zulip stream for alerts
- `ZULIP_DAG_TOPIC` — Zulip topic for alerts
If any of these are unset, the monitor workflow is not registered with the Hatchet worker.

View File

@@ -0,0 +1,47 @@
"""add soft delete fields to transcript and recording
Revision ID: 501c73a6b0d5
Revises: e1f093f7f124
Create Date: 2026-03-19 00:00:00.000000
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "501c73a6b0d5"
down_revision: Union[str, None] = "e1f093f7f124"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"transcript",
sa.Column("deleted_at", sa.DateTime(timezone=True), nullable=True),
)
op.add_column(
"recording",
sa.Column("deleted_at", sa.DateTime(timezone=True), nullable=True),
)
op.create_index(
"idx_transcript_not_deleted",
"transcript",
["id"],
postgresql_where=sa.text("deleted_at IS NULL"),
)
op.create_index(
"idx_recording_not_deleted",
"recording",
["id"],
postgresql_where=sa.text("deleted_at IS NULL"),
)
def downgrade() -> None:
op.drop_index("idx_recording_not_deleted", table_name="recording")
op.drop_index("idx_transcript_not_deleted", table_name="transcript")
op.drop_column("recording", "deleted_at")
op.drop_column("transcript", "deleted_at")

View File

@@ -0,0 +1,29 @@
"""add email_recipients to meeting
Revision ID: a2b3c4d5e6f7
Revises: 501c73a6b0d5
Create Date: 2026-03-20 00:00:00.000000
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects.postgresql import JSONB
revision: str = "a2b3c4d5e6f7"
down_revision: Union[str, None] = "501c73a6b0d5"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"meeting",
sa.Column("email_recipients", JSONB, nullable=True),
)
def downgrade() -> None:
op.drop_column("meeting", "email_recipients")

View File

@@ -0,0 +1,28 @@
"""add email_transcript_to to room
Revision ID: b4c7e8f9a012
Revises: a2b3c4d5e6f7
Create Date: 2026-03-24 00:00:00.000000
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "b4c7e8f9a012"
down_revision: Union[str, None] = "a2b3c4d5e6f7"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"room",
sa.Column("email_transcript_to", sa.String(), nullable=True),
)
def downgrade() -> None:
op.drop_column("room", "email_transcript_to")

View File

@@ -40,6 +40,9 @@ dependencies = [
"icalendar>=6.0.0",
"hatchet-sdk==1.22.16",
"pydantic>=2.12.5",
"aiosmtplib>=3.0.0",
"email-validator>=2.0.0",
"livekit-api>=1.1.0",
]
[dependency-groups]

View File

@@ -13,18 +13,22 @@ from reflector.events import subscribers_shutdown, subscribers_startup
from reflector.logger import logger
from reflector.metrics import metrics_init
from reflector.settings import settings
from reflector.views.config import router as config_router
from reflector.views.daily import router as daily_router
from reflector.views.livekit import router as livekit_router
from reflector.views.meetings import router as meetings_router
from reflector.views.rooms import router as rooms_router
from reflector.views.rtc_offer import router as rtc_offer_router
from reflector.views.transcripts import router as transcripts_router
from reflector.views.transcripts_audio import router as transcripts_audio_router
from reflector.views.transcripts_download import router as transcripts_download_router
from reflector.views.transcripts_participants import (
router as transcripts_participants_router,
)
from reflector.views.transcripts_process import router as transcripts_process_router
from reflector.views.transcripts_speaker import router as transcripts_speaker_router
from reflector.views.transcripts_upload import router as transcripts_upload_router
from reflector.views.transcripts_video import router as transcripts_video_router
from reflector.views.transcripts_webrtc import router as transcripts_webrtc_router
from reflector.views.transcripts_websocket import router as transcripts_websocket_router
from reflector.views.user import router as user_router
@@ -97,15 +101,19 @@ app.include_router(transcripts_audio_router, prefix="/v1")
app.include_router(transcripts_participants_router, prefix="/v1")
app.include_router(transcripts_speaker_router, prefix="/v1")
app.include_router(transcripts_upload_router, prefix="/v1")
app.include_router(transcripts_download_router, prefix="/v1")
app.include_router(transcripts_video_router, prefix="/v1")
app.include_router(transcripts_websocket_router, prefix="/v1")
app.include_router(transcripts_webrtc_router, prefix="/v1")
app.include_router(transcripts_process_router, prefix="/v1")
app.include_router(user_router, prefix="/v1")
app.include_router(user_api_keys_router, prefix="/v1")
app.include_router(user_ws_router, prefix="/v1")
app.include_router(config_router, prefix="/v1")
app.include_router(zulip_router, prefix="/v1")
app.include_router(whereby_router, prefix="/v1")
app.include_router(daily_router, prefix="/v1/daily")
app.include_router(livekit_router, prefix="/v1/livekit")
if auth_router:
app.include_router(auth_router, prefix="/v1")
add_pagination(app)

View File

@@ -1,3 +1,4 @@
from contextlib import asynccontextmanager
from datetime import datetime, timedelta
from typing import Any, Literal
@@ -66,6 +67,8 @@ meetings = sa.Table(
# Daily.co composed video (Brady Bunch grid layout) - Daily.co only, not Whereby
sa.Column("daily_composed_video_s3_key", sa.String, nullable=True),
sa.Column("daily_composed_video_duration", sa.Integer, nullable=True),
# Email recipients for transcript notification
sa.Column("email_recipients", JSONB, nullable=True),
sa.Index("idx_meeting_room_id", "room_id"),
sa.Index("idx_meeting_calendar_event", "calendar_event_id"),
)
@@ -116,6 +119,9 @@ class Meeting(BaseModel):
# Daily.co composed video (Brady Bunch grid) - Daily.co only
daily_composed_video_s3_key: str | None = None
daily_composed_video_duration: int | None = None
# Email recipients for transcript notification
# Each entry is {"email": str, "include_link": bool} or a legacy plain str
email_recipients: list[dict | str] | None = None
class MeetingController:
@@ -159,6 +165,17 @@ class MeetingController:
results = await get_database().fetch_all(query)
return [Meeting(**result) for result in results]
async def get_all_inactive_livekit(self) -> list[Meeting]:
"""Get inactive LiveKit meetings (for multitrack processing discovery)."""
query = meetings.select().where(
sa.and_(
meetings.c.is_active == sa.false(),
meetings.c.platform == "livekit",
)
)
results = await get_database().fetch_all(query)
return [Meeting(**result) for result in results]
async def get_by_room_name(
self,
room_name: str,
@@ -388,6 +405,36 @@ class MeetingController:
# If was_null=False, the WHERE clause prevented the update
return was_null
@asynccontextmanager
async def transaction(self):
"""A context manager for database transaction."""
async with get_database().transaction(isolation="serializable"):
yield
async def add_email_recipient(
self, meeting_id: str, email: str, *, include_link: bool = True
) -> list[dict]:
"""Add an email to the meeting's email_recipients list (no duplicates).
Each entry is stored as {"email": str, "include_link": bool}.
Legacy plain-string entries are normalised on read.
"""
async with self.transaction():
meeting = await self.get_by_id(meeting_id)
if not meeting:
raise ValueError(f"Meeting {meeting_id} not found")
# Normalise legacy string entries
current: list[dict] = [
entry
if isinstance(entry, dict)
else {"email": entry, "include_link": True}
for entry in (meeting.email_recipients or [])
]
if not any(r["email"] == email for r in current):
current.append({"email": email, "include_link": include_link})
await self.update_meeting(meeting_id, email_recipients=current)
return current
async def increment_num_clients(self, meeting_id: str) -> None:
"""Atomically increment participant count."""
query = (

View File

@@ -1,4 +1,4 @@
from datetime import datetime
from datetime import datetime, timezone
from typing import Literal
import sqlalchemy as sa
@@ -24,6 +24,7 @@ recordings = sa.Table(
),
sa.Column("meeting_id", sa.String),
sa.Column("track_keys", sa.JSON, nullable=True),
sa.Column("deleted_at", sa.DateTime(timezone=True), nullable=True),
sa.Index("idx_recording_meeting_id", "meeting_id"),
)
@@ -40,6 +41,7 @@ class Recording(BaseModel):
# track_keys can be empty list [] if recording finished but no audio was captured (silence/muted)
# None means not a multitrack recording, [] means multitrack with no tracks
track_keys: list[str] | None = None
deleted_at: datetime | None = None
@property
def is_multitrack(self) -> bool:
@@ -69,6 +71,18 @@ class RecordingController:
return Recording(**result) if result else None
async def remove_by_id(self, id: str) -> None:
query = (
recordings.update()
.where(recordings.c.id == id)
.values(deleted_at=datetime.now(timezone.utc))
)
await get_database().execute(query)
async def restore_by_id(self, id: str) -> None:
query = recordings.update().where(recordings.c.id == id).values(deleted_at=None)
await get_database().execute(query)
async def hard_delete_by_id(self, id: str) -> None:
query = recordings.delete().where(recordings.c.id == id)
await get_database().execute(query)
@@ -114,6 +128,7 @@ class RecordingController:
.where(
recordings.c.bucket_name == bucket_name,
recordings.c.track_keys.isnot(None),
recordings.c.deleted_at.is_(None),
or_(
transcripts.c.id.is_(None),
transcripts.c.status == "error",

View File

@@ -63,6 +63,7 @@ rooms = sqlalchemy.Table(
nullable=False,
server_default=sqlalchemy.sql.false(),
),
sqlalchemy.Column("email_transcript_to", sqlalchemy.String, nullable=True),
sqlalchemy.Index("idx_room_is_shared", "is_shared"),
sqlalchemy.Index("idx_room_ics_enabled", "ics_enabled"),
)
@@ -92,6 +93,7 @@ class Room(BaseModel):
ics_last_etag: str | None = None
platform: Platform = Field(default_factory=lambda: settings.DEFAULT_VIDEO_PLATFORM)
skip_consent: bool = False
email_transcript_to: str | None = None
class RoomController:
@@ -147,6 +149,7 @@ class RoomController:
ics_enabled: bool = False,
platform: Platform = settings.DEFAULT_VIDEO_PLATFORM,
skip_consent: bool = False,
email_transcript_to: str | None = None,
):
"""
Add a new room
@@ -172,6 +175,7 @@ class RoomController:
"ics_enabled": ics_enabled,
"platform": platform,
"skip_consent": skip_consent,
"email_transcript_to": email_transcript_to,
}
room = Room(**room_data)

View File

@@ -138,6 +138,7 @@ class SearchParameters(BaseModel):
source_kind: SourceKind | None = None
from_datetime: datetime | None = None
to_datetime: datetime | None = None
include_deleted: bool = False
class SearchResultDB(BaseModel):
@@ -387,6 +388,11 @@ class SearchController:
transcripts.join(rooms, transcripts.c.room_id == rooms.c.id, isouter=True)
)
if params.include_deleted:
base_query = base_query.where(transcripts.c.deleted_at.isnot(None))
else:
base_query = base_query.where(transcripts.c.deleted_at.is_(None))
if params.query_text is not None:
# because already initialized based on params.query_text presence above
assert search_query is not None
@@ -394,7 +400,13 @@ class SearchController:
transcripts.c.search_vector_en.op("@@")(search_query)
)
if params.user_id:
if params.include_deleted:
# Trash view: only show user's own deleted transcripts.
# Defense-in-depth: require user_id to prevent leaking all users' trash.
if not params.user_id:
return [], 0
base_query = base_query.where(transcripts.c.user_id == params.user_id)
elif params.user_id:
base_query = base_query.where(
sqlalchemy.or_(
transcripts.c.user_id == params.user_id, rooms.c.is_shared
@@ -419,6 +431,8 @@ class SearchController:
if params.query_text is not None:
order_by = sqlalchemy.desc(sqlalchemy.text("rank"))
elif params.include_deleted:
order_by = sqlalchemy.desc(transcripts.c.deleted_at)
else:
order_by = sqlalchemy.desc(transcripts.c.created_at)

View File

@@ -24,7 +24,7 @@ from reflector.db.utils import is_postgresql
from reflector.logger import logger
from reflector.processors.types import Word as ProcessorWord
from reflector.settings import settings
from reflector.storage import get_transcripts_storage
from reflector.storage import get_source_storage, get_transcripts_storage
from reflector.utils import generate_uuid4
from reflector.utils.webvtt import topics_to_webvtt
@@ -91,6 +91,7 @@ transcripts = sqlalchemy.Table(
sqlalchemy.Column("webvtt", sqlalchemy.Text),
# Hatchet workflow run ID for resumption of failed workflows
sqlalchemy.Column("workflow_run_id", sqlalchemy.String),
sqlalchemy.Column("deleted_at", sqlalchemy.DateTime(timezone=True), nullable=True),
sqlalchemy.Column(
"change_seq",
sqlalchemy.BigInteger,
@@ -238,6 +239,7 @@ class Transcript(BaseModel):
webvtt: str | None = None
workflow_run_id: str | None = None # Hatchet workflow run ID for resumption
change_seq: int | None = None
deleted_at: datetime | None = None
@field_serializer("created_at", when_used="json")
def serialize_datetime(self, dt: datetime) -> str:
@@ -418,6 +420,8 @@ class TranscriptController:
rooms, transcripts.c.room_id == rooms.c.id, isouter=True
)
query = query.where(transcripts.c.deleted_at.is_(None))
if user_id:
query = query.where(
or_(transcripts.c.user_id == user_id, rooms.c.is_shared)
@@ -482,6 +486,14 @@ class TranscriptController:
return None
return Transcript(**result)
async def get_by_meeting_id(self, meeting_id: str) -> Transcript | None:
"""Get a transcript by meeting_id (first match)."""
query = transcripts.select().where(transcripts.c.meeting_id == meeting_id)
result = await get_database().fetch_one(query)
if not result:
return None
return Transcript(**result)
async def get_by_recording_id(
self, recording_id: str, **kwargs
) -> Transcript | None:
@@ -500,7 +512,10 @@ class TranscriptController:
"""
Get transcripts by room_id (direct access without joins)
"""
query = transcripts.select().where(transcripts.c.room_id == room_id)
query = transcripts.select().where(
transcripts.c.room_id == room_id,
transcripts.c.deleted_at.is_(None),
)
if "user_id" in kwargs:
query = query.where(transcripts.c.user_id == kwargs["user_id"])
if "order_by" in kwargs:
@@ -531,8 +546,11 @@ class TranscriptController:
if not result:
raise HTTPException(status_code=404, detail="Transcript not found")
# if the transcript is anonymous, share mode is not checked
transcript = Transcript(**result)
if transcript.deleted_at is not None:
raise HTTPException(status_code=404, detail="Transcript not found")
# if the transcript is anonymous, share mode is not checked
if transcript.user_id is None:
return transcript
@@ -632,56 +650,169 @@ class TranscriptController:
user_id: str | None = None,
) -> None:
"""
Remove a transcript by id
Soft-delete a transcript by id.
Sets deleted_at on the transcript and its associated recording.
All files (S3 and local) are preserved for later retrieval.
"""
transcript = await self.get_by_id(transcript_id)
if not transcript:
return
if user_id is not None and transcript.user_id != user_id:
return
if transcript.deleted_at is not None:
return
now = datetime.now(timezone.utc)
# Soft-delete the associated recording (keeps S3 files intact)
if transcript.recording_id:
try:
await recordings_controller.remove_by_id(transcript.recording_id)
except Exception as e:
logger.warning(
"Failed to soft-delete recording",
exc_info=e,
recording_id=transcript.recording_id,
)
# Soft-delete the transcript (keeps all files intact)
query = (
transcripts.update()
.where(transcripts.c.id == transcript_id)
.values(deleted_at=now)
)
await get_database().execute(query)
async def restore_by_id(
self,
transcript_id: str,
user_id: str | None = None,
) -> bool:
"""
Restore a soft-deleted transcript by clearing deleted_at.
Also restores the associated recording if present.
Returns True if the transcript was restored, False otherwise.
"""
transcript = await self.get_by_id(transcript_id)
if not transcript:
return False
if transcript.deleted_at is None:
return False
if user_id is not None and transcript.user_id != user_id:
return False
query = (
transcripts.update()
.where(transcripts.c.id == transcript_id)
.values(deleted_at=None)
)
await get_database().execute(query)
if transcript.recording_id:
try:
await recordings_controller.restore_by_id(transcript.recording_id)
except Exception as e:
logger.warning(
"Failed to restore recording",
exc_info=e,
recording_id=transcript.recording_id,
)
return True
async def hard_delete(self, transcript_id: str) -> None:
"""
Permanently delete a transcript, its recording, and all associated files.
Only deletes transcript-owned resources:
- Transcript row and recording row from DB (first, to make data inaccessible)
- Transcript audio in S3 storage
- Recording files in S3 (both object_key and track_keys, since a recording can have both)
- Local files (data_path directory)
Does NOT delete: meetings, consent records, rooms, or any shared entity.
Requires the transcript to be soft-deleted first (deleted_at must be set).
"""
transcript = await self.get_by_id(transcript_id)
if not transcript:
return
if transcript.deleted_at is None:
return
# Collect file references before deleting DB rows
recording = None
recording_storage = None
if transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
# Determine the correct storage backend for recording files.
# Recordings from different platforms (daily, whereby) live in
# platform-specific buckets with separate credentials.
if recording and recording.meeting_id:
from reflector.db.meetings import meetings_controller # noqa: PLC0415
meeting = await meetings_controller.get_by_id(recording.meeting_id)
if meeting:
recording_storage = get_source_storage(meeting.platform)
if recording_storage is None:
recording_storage = get_transcripts_storage()
# 1. Hard-delete DB rows first (makes data inaccessible immediately)
if recording:
await recordings_controller.hard_delete_by_id(recording.id)
await get_database().execute(
transcripts.delete().where(transcripts.c.id == transcript_id)
)
# 2. Delete transcript audio from S3 (always uses transcript storage)
transcript_storage = get_transcripts_storage()
if transcript.audio_location == "storage" and not transcript.audio_deleted:
try:
await get_transcripts_storage().delete_file(
transcript.storage_audio_path
)
await transcript_storage.delete_file(transcript.storage_audio_path)
except Exception as e:
logger.warning(
"Failed to delete transcript audio from storage",
exc_info=e,
transcript_id=transcript.id,
transcript_id=transcript_id,
path=transcript.storage_audio_path,
)
# 3. Delete recording files from S3 (both object_key and track_keys —
# a recording can have both, unlike consent cleanup which uses elif).
# Uses platform-specific storage resolved above.
if recording and recording.bucket_name and recording_storage:
keys_to_delete = []
if recording.track_keys:
keys_to_delete = recording.track_keys
if recording.object_key:
keys_to_delete.append(recording.object_key)
for key in keys_to_delete:
try:
await recording_storage.delete_file(
key, bucket=recording.bucket_name
)
except Exception as e:
logger.warning(
"Failed to delete recording file",
exc_info=e,
key=key,
bucket=recording.bucket_name,
)
# 4. Delete local files
transcript.unlink()
if transcript.recording_id:
try:
recording = await recordings_controller.get_by_id(
transcript.recording_id
)
if recording:
try:
await get_transcripts_storage().delete_file(
recording.object_key, bucket=recording.bucket_name
)
except Exception as e:
logger.warning(
"Failed to delete recording object from S3",
exc_info=e,
recording_id=transcript.recording_id,
)
await recordings_controller.remove_by_id(transcript.recording_id)
except Exception as e:
logger.warning(
"Failed to delete recording row",
exc_info=e,
recording_id=transcript.recording_id,
)
query = transcripts.delete().where(transcripts.c.id == transcript_id)
await get_database().execute(query)
async def remove_by_recording_id(self, recording_id: str):
"""
Remove a transcript by recording_id
Soft-delete a transcript by recording_id
"""
query = transcripts.delete().where(transcripts.c.recording_id == recording_id)
query = (
transcripts.update()
.where(transcripts.c.recording_id == recording_id)
.values(deleted_at=datetime.now(timezone.utc))
)
await get_database().execute(query)
@staticmethod

162
server/reflector/email.py Normal file
View File

@@ -0,0 +1,162 @@
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from html import escape
import aiosmtplib
import structlog
from reflector.db.transcripts import SourceKind, Transcript
from reflector.settings import settings
from reflector.utils.transcript_formats import transcript_to_text_timestamped
logger = structlog.get_logger(__name__)
def is_email_configured() -> bool:
return bool(settings.SMTP_HOST and settings.SMTP_FROM_EMAIL)
def get_transcript_url(transcript: Transcript) -> str:
return f"{settings.UI_BASE_URL}/transcripts/{transcript.id}"
def _get_timestamped_text(transcript: Transcript) -> str:
"""Build the full timestamped transcript text using existing utility."""
if not transcript.topics:
return ""
is_multitrack = transcript.source_kind == SourceKind.ROOM
return transcript_to_text_timestamped(
transcript.topics, transcript.participants, is_multitrack=is_multitrack
)
def _build_plain_text(transcript: Transcript, url: str, include_link: bool) -> str:
title = transcript.title or "Unnamed recording"
lines = [f"Reflector: {title}", ""]
if transcript.short_summary:
lines.extend(["Summary:", transcript.short_summary, ""])
timestamped = _get_timestamped_text(transcript)
if timestamped:
lines.extend(["Transcript:", timestamped, ""])
if include_link:
lines.append(f"View transcript: {url}")
lines.append("")
lines.append(
"This email was sent because you requested to receive "
"the transcript from a meeting."
)
return "\n".join(lines)
def _build_html(transcript: Transcript, url: str, include_link: bool) -> str:
title = escape(transcript.title or "Unnamed recording")
summary_html = ""
if transcript.short_summary:
summary_html = (
f'<p style="color:#555;margin-bottom:16px;">'
f"{escape(transcript.short_summary)}</p>"
)
transcript_html = ""
timestamped = _get_timestamped_text(transcript)
if timestamped:
# Build styled transcript lines
styled_lines = []
for line in timestamped.split("\n"):
if not line.strip():
continue
# Lines are formatted as "[MM:SS] Speaker: text"
if line.startswith("[") and "] " in line:
bracket_end = line.index("] ")
timestamp = escape(line[: bracket_end + 1])
rest = line[bracket_end + 2 :]
if ": " in rest:
colon_pos = rest.index(": ")
speaker = escape(rest[:colon_pos])
text = escape(rest[colon_pos + 2 :])
styled_lines.append(
f'<div style="margin-bottom:4px;">'
f'<span style="color:#888;font-size:12px;">{timestamp}</span> '
f"<strong>{speaker}:</strong> {text}</div>"
)
else:
styled_lines.append(
f'<div style="margin-bottom:4px;">{escape(line)}</div>'
)
else:
styled_lines.append(
f'<div style="margin-bottom:4px;">{escape(line)}</div>'
)
transcript_html = (
'<h3 style="margin-top:20px;margin-bottom:8px;">Transcript</h3>'
'<div style="background:#f7f7f7;padding:16px;border-radius:6px;'
'font-size:13px;line-height:1.6;max-height:600px;overflow-y:auto;">'
f"{''.join(styled_lines)}</div>"
)
link_html = ""
if include_link:
link_html = (
'<p style="margin-top:20px;">'
f'<a href="{url}" style="display:inline-block;padding:10px 20px;'
"background:#4A90D9;color:#fff;text-decoration:none;"
'border-radius:4px;">View Transcript</a></p>'
)
return f"""\
<div style="font-family:sans-serif;max-width:600px;margin:0 auto;">
<h2 style="margin-bottom:4px;">{title}</h2>
{summary_html}
{transcript_html}
{link_html}
<p style="color:#999;font-size:12px;margin-top:20px;">This email was sent because you requested to receive the transcript from a meeting.</p>
</div>"""
async def send_transcript_email(
to_emails: list[str],
transcript: Transcript,
*,
include_link: bool = True,
) -> int:
"""Send transcript notification to all emails. Returns count sent."""
if not is_email_configured() or not to_emails:
return 0
url = get_transcript_url(transcript)
title = transcript.title or "Unnamed recording"
sent = 0
for email_addr in to_emails:
msg = MIMEMultipart("alternative")
msg["Subject"] = f"Reflector: {title}"
msg["From"] = settings.SMTP_FROM_EMAIL
msg["To"] = email_addr
msg.attach(MIMEText(_build_plain_text(transcript, url, include_link), "plain"))
msg.attach(MIMEText(_build_html(transcript, url, include_link), "html"))
try:
await aiosmtplib.send(
msg,
hostname=settings.SMTP_HOST,
port=settings.SMTP_PORT,
username=settings.SMTP_USERNAME,
password=settings.SMTP_PASSWORD,
start_tls=settings.SMTP_USE_TLS,
)
sent += 1
except Exception:
logger.exception(
"Failed to send transcript email",
to=email_addr,
transcript_id=transcript.id,
)
return sent

View File

@@ -21,6 +21,7 @@ class TaskName(StrEnum):
CLEANUP_CONSENT = "cleanup_consent"
POST_ZULIP = "post_zulip"
SEND_WEBHOOK = "send_webhook"
SEND_EMAIL = "send_email"
PAD_TRACK = "pad_track"
TRANSCRIBE_TRACK = "transcribe_track"
DETECT_CHUNK_TOPIC = "detect_chunk_topic"
@@ -59,7 +60,13 @@ TIMEOUT_AUDIO = 720 # Audio processing: padding, mixdown (Hatchet execution_tim
TIMEOUT_AUDIO_HTTP = (
660 # httpx timeout for pad_track — below 720 so Hatchet doesn't race
)
TIMEOUT_HEAVY = 600 # Transcription, fan-out LLM tasks (Hatchet execution_timeout)
TIMEOUT_HEAVY = 1200 # Transcription, fan-out LLM tasks (Hatchet execution_timeout)
TIMEOUT_HEAVY_HTTP = (
540 # httpx timeout for transcribe_track — below 600 so Hatchet doesn't race
1150 # httpx timeout for transcribe_track — below 1200 so Hatchet doesn't race
)
TIMEOUT_EXTRA_HEAVY = (
3600 # Detect Topics, fan-out LLM tasks (Hatchet execution_timeout)
)
TIMEOUT_EXTRA_HEAVY_HTTP = (
3400 # httpx timeout for detect_topics — below 3600 so Hatchet doesn't race
)

View File

@@ -16,6 +16,7 @@ from reflector.hatchet.workflows.subject_processing import subject_workflow
from reflector.hatchet.workflows.topic_chunk_processing import topic_chunk_workflow
from reflector.hatchet.workflows.track_processing import track_workflow
from reflector.logger import logger
from reflector.settings import settings
SLOTS = 10
WORKER_NAME = "llm-worker-pool"
@@ -34,6 +35,38 @@ def main():
error=str(e),
)
workflows = [
daily_multitrack_pipeline,
file_pipeline,
live_post_pipeline,
topic_chunk_workflow,
subject_workflow,
track_workflow,
]
_zulip_dag_enabled = all(
[
settings.ZULIP_REALM,
settings.ZULIP_API_KEY,
settings.ZULIP_BOT_EMAIL,
settings.ZULIP_DAG_STREAM,
settings.ZULIP_DAG_TOPIC,
]
)
if _zulip_dag_enabled:
from reflector.hatchet.workflows.failed_runs_monitor import ( # noqa: PLC0415
failed_runs_monitor,
)
workflows.append(failed_runs_monitor)
logger.info(
"FailedRunsMonitor cron enabled",
stream=settings.ZULIP_DAG_STREAM,
topic=settings.ZULIP_DAG_TOPIC,
)
else:
logger.info("FailedRunsMonitor cron disabled (Zulip DAG not configured)")
logger.info(
"Starting Hatchet LLM worker pool (all tasks except mixdown)",
worker_name=WORKER_NAME,
@@ -47,14 +80,7 @@ def main():
labels={
"pool": POOL,
},
workflows=[
daily_multitrack_pipeline,
file_pipeline,
live_post_pipeline,
topic_chunk_workflow,
subject_workflow,
track_workflow,
],
workflows=workflows,
)
try:

View File

@@ -33,6 +33,7 @@ from hatchet_sdk.labels import DesiredWorkerLabel
from pydantic import BaseModel
from reflector.dailyco_api.client import DailyApiClient
from reflector.email import is_email_configured, send_transcript_email
from reflector.hatchet.broadcast import (
append_event_and_broadcast,
set_status_and_broadcast,
@@ -40,6 +41,7 @@ from reflector.hatchet.broadcast import (
from reflector.hatchet.client import HatchetClientManager
from reflector.hatchet.constants import (
TIMEOUT_AUDIO,
TIMEOUT_EXTRA_HEAVY,
TIMEOUT_HEAVY,
TIMEOUT_LONG,
TIMEOUT_MEDIUM,
@@ -51,6 +53,7 @@ from reflector.hatchet.error_classification import is_non_retryable
from reflector.hatchet.workflows.models import (
ActionItemsResult,
ConsentResult,
EmailResult,
FinalizeResult,
MixdownResult,
PaddedTrackInfo,
@@ -82,7 +85,7 @@ from reflector.hatchet.workflows.topic_chunk_processing import (
from reflector.hatchet.workflows.track_processing import TrackInput, track_workflow
from reflector.logger import logger
from reflector.pipelines import topic_processing
from reflector.processors import AudioFileWriterProcessor
from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
from reflector.processors.summary.models import ActionItemsResponse
from reflector.processors.summary.prompts import (
RECAP_PROMPT,
@@ -97,10 +100,6 @@ from reflector.utils.audio_constants import (
PRESIGNED_URL_EXPIRATION_SECONDS,
WAVEFORM_SEGMENTS,
)
from reflector.utils.audio_mixdown import (
detect_sample_rate_from_tracks,
mixdown_tracks_pyav,
)
from reflector.utils.audio_waveform import get_audio_waveform
from reflector.utils.daily import (
filter_cam_audio_tracks,
@@ -274,8 +273,10 @@ def with_error_handling(
)
@with_error_handling(TaskName.GET_RECORDING)
async def get_recording(input: PipelineInput, ctx: Context) -> RecordingResult:
"""Fetch recording metadata from Daily.co API."""
ctx.log(f"get_recording: starting for recording_id={input.recording_id}")
"""Fetch recording metadata. Platform-aware: Daily calls API, LiveKit skips."""
ctx.log(
f"get_recording: starting for recording_id={input.recording_id}, platform={input.source_platform}"
)
ctx.log(
f"get_recording: transcript_id={input.transcript_id}, room_id={input.room_id}"
)
@@ -300,6 +301,18 @@ async def get_recording(input: PipelineInput, ctx: Context) -> RecordingResult:
)
ctx.log(f"get_recording: status set to 'processing' and broadcasted")
# LiveKit: no external API call needed — metadata comes from S3 track listing
if input.source_platform == "livekit":
ctx.log(
"get_recording: LiveKit platform — skipping API call (metadata from S3)"
)
return RecordingResult(
id=input.recording_id,
mtg_session_id=None,
duration=0, # Duration calculated from tracks later
)
# Daily.co: fetch recording metadata from API
if not settings.DAILY_API_KEY:
ctx.log("get_recording: ERROR - DAILY_API_KEY not configured")
raise ValueError("DAILY_API_KEY not configured")
@@ -333,11 +346,12 @@ async def get_recording(input: PipelineInput, ctx: Context) -> RecordingResult:
)
@with_error_handling(TaskName.GET_PARTICIPANTS)
async def get_participants(input: PipelineInput, ctx: Context) -> ParticipantsResult:
"""Fetch participant list from Daily.co API and update transcript in database."""
ctx.log(f"get_participants: transcript_id={input.transcript_id}")
"""Fetch participant list and update transcript. Platform-aware."""
ctx.log(
f"get_participants: transcript_id={input.transcript_id}, platform={input.source_platform}"
)
recording = ctx.task_output(get_recording)
mtg_session_id = recording.mtg_session_id
async with fresh_db_connection():
from reflector.db.transcripts import ( # noqa: PLC0415
TranscriptDuration,
@@ -348,8 +362,8 @@ async def get_participants(input: PipelineInput, ctx: Context) -> ParticipantsRe
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if not transcript:
raise ValueError(f"Transcript {input.transcript_id} not found")
# Note: title NOT cleared - preserves existing titles
# Duration from Daily API (seconds -> milliseconds) - master source
# Duration from recording metadata (seconds -> milliseconds)
duration_ms = recording.duration * 1000 if recording.duration else 0
await transcripts_controller.update(
transcript,
@@ -361,65 +375,141 @@ async def get_participants(input: PipelineInput, ctx: Context) -> ParticipantsRe
},
)
await append_event_and_broadcast(
input.transcript_id,
transcript,
"DURATION",
TranscriptDuration(duration=duration_ms),
logger=logger,
)
mtg_session_id = assert_non_none_and_non_empty(
mtg_session_id, "mtg_session_id is required"
)
daily_api_key = assert_non_none_and_non_empty(
settings.DAILY_API_KEY, "DAILY_API_KEY is required"
)
async with DailyApiClient(
api_key=daily_api_key, base_url=settings.DAILY_API_URL
) as client:
participants = await client.get_meeting_participants(mtg_session_id)
id_to_name = {}
id_to_user_id = {}
for p in participants.data:
if p.user_name:
id_to_name[p.participant_id] = p.user_name
if p.user_id:
id_to_user_id[p.participant_id] = p.user_id
track_keys = [t["s3_key"] for t in input.tracks]
cam_audio_keys = filter_cam_audio_tracks(track_keys)
if duration_ms:
await append_event_and_broadcast(
input.transcript_id,
transcript,
"DURATION",
TranscriptDuration(duration=duration_ms),
logger=logger,
)
participants_list: list[ParticipantInfo] = []
for idx, key in enumerate(cam_audio_keys):
if input.source_platform == "livekit":
# LiveKit: participant identity is in the track dict or can be parsed from filepath
from reflector.utils.livekit import (
parse_livekit_track_filepath, # noqa: PLC0415
)
# Look up identity → Reflector user_id mapping from Redis
# (stored at join time in rooms.py)
identity_to_user_id: dict[str, str] = {}
try:
parsed = parse_daily_recording_filename(key)
participant_id = parsed.participant_id
except ValueError as e:
logger.error(
"Failed to parse Daily recording filename",
error=str(e),
key=key,
from reflector.db.meetings import (
meetings_controller as mc, # noqa: PLC0415
)
from reflector.redis_cache import (
get_async_redis_client, # noqa: PLC0415
)
continue
default_name = f"Speaker {idx}"
name = id_to_name.get(participant_id, default_name)
user_id = id_to_user_id.get(participant_id)
meeting = (
await mc.get_by_id(transcript.meeting_id)
if transcript.meeting_id
else None
)
if meeting:
redis_client = await get_async_redis_client()
mapping_key = f"livekit:participant_map:{meeting.room_name}"
raw_map = await redis_client.hgetall(mapping_key)
identity_to_user_id = {
k.decode() if isinstance(k, bytes) else k: v.decode()
if isinstance(v, bytes)
else v
for k, v in raw_map.items()
}
ctx.log(
f"get_participants: loaded {len(identity_to_user_id)} identity→user_id mappings from Redis"
)
except Exception as e:
ctx.log(
f"get_participants: could not load identity map from Redis: {e}"
)
participant = TranscriptParticipant(
id=participant_id, speaker=idx, name=name, user_id=user_id
)
await transcripts_controller.upsert_participant(transcript, participant)
participants_list.append(
ParticipantInfo(
participant_id=participant_id,
user_name=name,
for idx, track in enumerate(input.tracks):
identity = track.get("participant_identity")
if not identity:
# Reprocess path: parse from S3 key
try:
parsed = parse_livekit_track_filepath(track["s3_key"])
identity = parsed.participant_identity
except (ValueError, KeyError):
identity = f"speaker-{idx}"
# Strip the uuid suffix from identity for display name
# e.g., "Juan-2bcea0" → "Juan"
display_name = (
identity.rsplit("-", 1)[0] if "-" in identity else identity
)
reflector_user_id = identity_to_user_id.get(identity)
participant = TranscriptParticipant(
id=identity,
speaker=idx,
name=display_name,
user_id=reflector_user_id,
)
await transcripts_controller.upsert_participant(transcript, participant)
participants_list.append(
ParticipantInfo(
participant_id=identity,
user_name=display_name,
speaker=idx,
)
)
else:
# Daily.co: fetch participant names from API
mtg_session_id = recording.mtg_session_id
mtg_session_id = assert_non_none_and_non_empty(
mtg_session_id, "mtg_session_id is required"
)
daily_api_key = assert_non_none_and_non_empty(
settings.DAILY_API_KEY, "DAILY_API_KEY is required"
)
async with DailyApiClient(
api_key=daily_api_key, base_url=settings.DAILY_API_URL
) as client:
participants = await client.get_meeting_participants(mtg_session_id)
id_to_name = {}
id_to_user_id = {}
for p in participants.data:
if p.user_name:
id_to_name[p.participant_id] = p.user_name
if p.user_id:
id_to_user_id[p.participant_id] = p.user_id
track_keys = [t["s3_key"] for t in input.tracks]
cam_audio_keys = filter_cam_audio_tracks(track_keys)
for idx, key in enumerate(cam_audio_keys):
try:
parsed = parse_daily_recording_filename(key)
participant_id = parsed.participant_id
except ValueError as e:
logger.error(
"Failed to parse Daily recording filename",
error=str(e),
key=key,
)
continue
default_name = f"Speaker {idx}"
name = id_to_name.get(participant_id, default_name)
user_id = id_to_user_id.get(participant_id)
participant = TranscriptParticipant(
id=participant_id, speaker=idx, name=name, user_id=user_id
)
await transcripts_controller.upsert_participant(transcript, participant)
participants_list.append(
ParticipantInfo(
participant_id=participant_id,
user_name=name,
speaker=idx,
)
)
ctx.log(f"get_participants complete: {len(participants_list)} participants")
@@ -434,6 +524,7 @@ async def get_participants(input: PipelineInput, ctx: Context) -> ParticipantsRe
@daily_multitrack_pipeline.task(
parents=[get_participants],
execution_timeout=timedelta(seconds=TIMEOUT_HEAVY),
schedule_timeout=timedelta(seconds=TIMEOUT_HEAVY),
retries=3,
backoff_factor=2.0,
backoff_max_seconds=30,
@@ -441,11 +532,66 @@ async def get_participants(input: PipelineInput, ctx: Context) -> ParticipantsRe
@with_error_handling(TaskName.PROCESS_TRACKS)
async def process_tracks(input: PipelineInput, ctx: Context) -> ProcessTracksResult:
"""Spawn child workflows for each track (dynamic fan-out)."""
ctx.log(f"process_tracks: spawning {len(input.tracks)} track workflows")
ctx.log(
f"process_tracks: spawning {len(input.tracks)} track workflows, platform={input.source_platform}"
)
participants_result = ctx.task_output(get_participants)
source_language = participants_result.source_language
# For LiveKit: calculate padding offsets from filename timestamps.
# OGG files don't have embedded start_time metadata, so we pre-calculate.
track_padding: dict[int, float] = {}
if input.source_platform == "livekit":
from datetime import datetime # noqa: PLC0415
from reflector.utils.livekit import (
parse_livekit_track_filepath, # noqa: PLC0415
)
timestamps = []
for i, track in enumerate(input.tracks):
ts_str = track.get("timestamp")
if ts_str:
try:
ts = datetime.fromisoformat(ts_str)
timestamps.append((i, ts))
except (ValueError, TypeError):
ctx.log(
f"process_tracks: could not parse timestamp for track {i}: {ts_str}"
)
timestamps.append((i, None))
else:
# Reprocess path: parse timestamp from S3 key
try:
parsed = parse_livekit_track_filepath(track["s3_key"])
timestamps.append((i, parsed.timestamp))
ctx.log(
f"process_tracks: parsed timestamp from S3 key for track {i}: {parsed.timestamp}"
)
except (ValueError, KeyError):
timestamps.append((i, None))
valid_timestamps = [(i, ts) for i, ts in timestamps if ts is not None]
if valid_timestamps:
earliest = min(ts for _, ts in valid_timestamps)
# LiveKit Track Egress outputs OGG/Opus files, but the transcription
# service only accepts WebM. The padding step converts OGG→WebM as a
# side effect of applying the adelay filter. For the earliest track
# (offset=0), we use a minimal padding to force this conversion.
LIVEKIT_MIN_PADDING_SECONDS = (
0.001 # 1ms — inaudible, forces OGG→WebM conversion
)
for i, ts in valid_timestamps:
offset = (ts - earliest).total_seconds()
if offset == 0.0:
offset = LIVEKIT_MIN_PADDING_SECONDS
track_padding[i] = offset
ctx.log(
f"process_tracks: track {i} padding={offset}s (from filename timestamp)"
)
bulk_runs = [
track_workflow.create_bulk_run_item(
input=TrackInput(
@@ -455,6 +601,7 @@ async def process_tracks(input: PipelineInput, ctx: Context) -> ProcessTracksRes
transcript_id=input.transcript_id,
language=source_language,
source_platform=input.source_platform,
padding_seconds=track_padding.get(i),
)
)
for i, track in enumerate(input.tracks)
@@ -517,6 +664,7 @@ async def process_tracks(input: PipelineInput, ctx: Context) -> ProcessTracksRes
@daily_multitrack_pipeline.task(
parents=[process_tracks],
execution_timeout=timedelta(seconds=TIMEOUT_AUDIO),
schedule_timeout=timedelta(seconds=TIMEOUT_HEAVY),
retries=2,
backoff_factor=2.0,
backoff_max_seconds=15,
@@ -537,7 +685,7 @@ async def process_tracks(input: PipelineInput, ctx: Context) -> ProcessTracksRes
)
@with_error_handling(TaskName.MIXDOWN_TRACKS)
async def mixdown_tracks(input: PipelineInput, ctx: Context) -> MixdownResult:
"""Mix all padded tracks into single audio file using PyAV (same as Celery)."""
"""Mix all padded tracks into single audio file via configured backend."""
ctx.log("mixdown_tracks: mixing padded tracks into single audio file")
track_result = ctx.task_output(process_tracks)
@@ -577,52 +725,66 @@ async def mixdown_tracks(input: PipelineInput, ctx: Context) -> MixdownResult:
if not valid_urls:
raise ValueError("No valid padded tracks to mixdown")
target_sample_rate = detect_sample_rate_from_tracks(valid_urls, logger=logger)
if not target_sample_rate:
logger.error("Mixdown failed - no decodable audio frames found")
raise ValueError("No decodable audio frames in any track")
output_path = tempfile.mktemp(suffix=".mp3")
duration_ms_callback_capture_container = [0.0]
async def capture_duration(d):
duration_ms_callback_capture_container[0] = d
writer = AudioFileWriterProcessor(path=output_path, on_duration=capture_duration)
await mixdown_tracks_pyav(
valid_urls,
writer,
target_sample_rate,
offsets_seconds=None,
logger=logger,
progress_callback=make_audio_progress_logger(ctx, TaskName.MIXDOWN_TRACKS),
expected_duration_sec=recording_duration if recording_duration > 0 else None,
)
await writer.flush()
file_size = Path(output_path).stat().st_size
storage_path = f"{input.transcript_id}/audio.mp3"
with open(output_path, "rb") as mixed_file:
await storage.put_file(storage_path, mixed_file)
# Generate presigned PUT URL for the output (used by modal backend;
# pyav backend ignores it and writes locally instead)
output_url = await storage.get_file_url(
storage_path,
operation="put_object",
expires_in=PRESIGNED_URL_EXPIRATION_SECONDS,
)
Path(output_path).unlink(missing_ok=True)
processor = AudioMixdownAutoProcessor()
result = await processor.mixdown_tracks(
valid_urls, output_url, offsets_seconds=None
)
if result.output_path:
# Pyav backend wrote locally — upload to storage ourselves
output_file = Path(result.output_path)
with open(output_file, "rb") as mixed_file:
await storage.put_file(storage_path, mixed_file)
output_file.unlink(missing_ok=True)
# Clean up the temp directory the pyav processor created
try:
output_file.parent.rmdir()
except OSError:
pass
# else: modal backend already uploaded to output_url
async with fresh_db_connection():
from reflector.db.transcripts import transcripts_controller # noqa: PLC0415
from reflector.db.transcripts import ( # noqa: PLC0415
TranscriptDuration,
transcripts_controller,
)
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if transcript:
await transcripts_controller.update(
transcript, {"audio_location": "storage"}
)
update_data = {"audio_location": "storage"}
# Set duration from mixdown if not already set (LiveKit: duration starts at 0)
if not transcript.duration or transcript.duration == 0:
update_data["duration"] = result.duration_ms
await transcripts_controller.update(transcript, update_data)
ctx.log(f"mixdown_tracks complete: uploaded {file_size} bytes to {storage_path}")
# Broadcast duration update if it was missing
if not transcript.duration or transcript.duration == 0:
await append_event_and_broadcast(
input.transcript_id,
transcript,
"DURATION",
TranscriptDuration(duration=result.duration_ms),
logger=logger,
)
ctx.log(
f"mixdown_tracks: set duration={result.duration_ms}ms from mixdown"
)
ctx.log(f"mixdown_tracks complete: {result.size} bytes to {storage_path}")
return MixdownResult(
audio_key=storage_path,
duration=duration_ms_callback_capture_container[0],
duration=result.duration_ms,
tracks_mixed=len(valid_urls),
)
@@ -699,7 +861,8 @@ async def generate_waveform(input: PipelineInput, ctx: Context) -> WaveformResul
@daily_multitrack_pipeline.task(
parents=[process_tracks],
execution_timeout=timedelta(seconds=TIMEOUT_HEAVY),
execution_timeout=timedelta(seconds=TIMEOUT_EXTRA_HEAVY),
schedule_timeout=timedelta(seconds=TIMEOUT_HEAVY),
retries=3,
backoff_factor=2.0,
backoff_max_seconds=30,
@@ -1283,6 +1446,7 @@ async def cleanup_consent(input: PipelineInput, ctx: Context) -> ConsentResult:
return ConsentResult()
consent_denied = False
meeting = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if meeting:
@@ -1345,6 +1509,22 @@ async def cleanup_consent(input: PipelineInput, ctx: Context) -> ConsentResult:
logger.error(error_msg, exc_info=True)
deletion_errors.append(error_msg)
# Delete cloud video if present
if meeting and meeting.daily_composed_video_s3_key:
try:
source_storage = get_source_storage("daily")
await source_storage.delete_file(meeting.daily_composed_video_s3_key)
await meetings_controller.update_meeting(
meeting.id,
daily_composed_video_s3_key=None,
daily_composed_video_duration=None,
)
ctx.log(f"Deleted cloud video: {meeting.daily_composed_video_s3_key}")
except Exception as e:
error_msg = f"Failed to delete cloud video: {e}"
logger.error(error_msg, exc_info=True)
deletion_errors.append(error_msg)
if deletion_errors:
logger.warning(
"[Hatchet] cleanup_consent completed with errors",
@@ -1355,7 +1535,7 @@ async def cleanup_consent(input: PipelineInput, ctx: Context) -> ConsentResult:
ctx.log(f"cleanup_consent completed with {len(deletion_errors)} errors")
else:
await transcripts_controller.update(transcript, {"audio_deleted": True})
ctx.log("cleanup_consent: all audio deleted successfully")
ctx.log("cleanup_consent: all audio and video deleted successfully")
return ConsentResult()
@@ -1465,6 +1645,96 @@ async def send_webhook(input: PipelineInput, ctx: Context) -> WebhookResult:
return WebhookResult(webhook_sent=False)
@daily_multitrack_pipeline.task(
parents=[cleanup_consent],
execution_timeout=timedelta(seconds=TIMEOUT_SHORT),
retries=5,
backoff_factor=2.0,
backoff_max_seconds=15,
)
@with_error_handling(TaskName.SEND_EMAIL, set_error_status=False)
async def send_email(input: PipelineInput, ctx: Context) -> EmailResult:
"""Send transcript email to collected recipients."""
ctx.log(f"send_email: transcript_id={input.transcript_id}")
if not is_email_configured():
ctx.log("send_email skipped (SMTP not configured)")
return EmailResult(skipped=True)
async with fresh_db_connection():
from reflector.db.meetings import meetings_controller # noqa: PLC0415
from reflector.db.recordings import recordings_controller # noqa: PLC0415
from reflector.db.transcripts import transcripts_controller # noqa: PLC0415
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if not transcript:
ctx.log("send_email skipped (transcript not found)")
return EmailResult(skipped=True)
meeting = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if not meeting and transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
if recording and recording.meeting_id:
meeting = await meetings_controller.get_by_id(recording.meeting_id)
# Normalise meeting recipients (legacy strings → dicts)
meeting_recipients: list[dict] = (
[
entry
if isinstance(entry, dict)
else {"email": entry, "include_link": True}
for entry in (meeting.email_recipients or [])
]
if meeting and meeting.email_recipients
else []
)
# Room-level email always gets a link (room owner)
from reflector.db.rooms import rooms_controller # noqa: PLC0415
room_email = None
if transcript.room_id:
room = await rooms_controller.get_by_id(transcript.room_id)
if room and room.email_transcript_to:
room_email = room.email_transcript_to
# Build two groups: with link and without link
with_link = [
r["email"] for r in meeting_recipients if r.get("include_link", True)
]
without_link = [
r["email"] for r in meeting_recipients if not r.get("include_link", True)
]
if room_email:
if room_email not in with_link:
with_link.append(room_email)
without_link = [e for e in without_link if e != room_email]
if not with_link and not without_link:
ctx.log("send_email skipped (no email recipients)")
return EmailResult(skipped=True)
# For room-level emails, do NOT change share_mode (only set public if meeting had recipients)
if meeting and meeting.email_recipients:
await transcripts_controller.update(transcript, {"share_mode": "public"})
count = 0
if with_link:
count += await send_transcript_email(
with_link, transcript, include_link=True
)
if without_link:
count += await send_transcript_email(
without_link, transcript, include_link=False
)
ctx.log(f"send_email complete: sent {count} emails")
return EmailResult(emails_sent=count)
async def on_workflow_failure(input: PipelineInput, ctx: Context) -> None:
"""Run when the workflow is truly dead (all retries exhausted).

View File

@@ -0,0 +1,109 @@
"""
Hatchet cron workflow: FailedRunsMonitor
Runs hourly, queries Hatchet for failed pipeline runs in the last hour,
and posts details to Zulip for visibility.
Only registered with the worker when Zulip DAG settings are configured.
"""
from datetime import datetime, timedelta, timezone
from hatchet_sdk import Context
from hatchet_sdk.clients.rest.models import V1TaskStatus
from reflector.hatchet.client import HatchetClientManager
from reflector.logger import logger
from reflector.settings import settings
from reflector.tools.render_hatchet_run import render_run_detail
from reflector.zulip import send_message_to_zulip
MONITORED_PIPELINES = {
"DiarizationPipeline",
"FilePipeline",
"LivePostProcessingPipeline",
}
LOOKBACK_HOURS = 1
hatchet = HatchetClientManager.get_client()
failed_runs_monitor = hatchet.workflow(
name="FailedRunsMonitor",
on_crons=["0 * * * *"],
)
async def _check_failed_runs() -> dict:
"""Core logic: query for failed pipeline runs and post each to Zulip.
Extracted from the Hatchet task for testability.
"""
now = datetime.now(tz=timezone.utc)
since = now - timedelta(hours=LOOKBACK_HOURS)
client = HatchetClientManager.get_client()
try:
result = await client.runs.aio_list(
statuses=[V1TaskStatus.FAILED],
since=since,
until=now,
limit=200,
)
except Exception:
logger.exception("[FailedRunsMonitor] Failed to list runs from Hatchet")
return {"checked": 0, "reported": 0, "error": "failed to list runs"}
rows = result.rows or []
# Filter to main pipelines only (skip child workflows like TrackProcessing, etc.)
failed_main_runs = [run for run in rows if run.workflow_name in MONITORED_PIPELINES]
if not failed_main_runs:
logger.info(
"[FailedRunsMonitor] No failed pipeline runs in the last hour",
total_failed=len(rows),
since=since.isoformat(),
)
return {"checked": len(rows), "reported": 0}
logger.info(
"[FailedRunsMonitor] Found failed pipeline runs",
count=len(failed_main_runs),
since=since.isoformat(),
)
reported = 0
for run in failed_main_runs:
try:
details = await client.runs.aio_get(run.workflow_run_external_id)
content = render_run_detail(details)
await send_message_to_zulip(
settings.ZULIP_DAG_STREAM,
settings.ZULIP_DAG_TOPIC,
content,
)
reported += 1
except Exception:
logger.exception(
"[FailedRunsMonitor] Failed to report run",
workflow_run_id=run.workflow_run_external_id,
workflow_name=run.workflow_name,
)
logger.info(
"[FailedRunsMonitor] Finished reporting",
reported=reported,
total_failed_main=len(failed_main_runs),
)
return {"checked": len(rows), "reported": reported}
@failed_runs_monitor.task(
execution_timeout=timedelta(seconds=120),
retries=1,
)
async def check_failed_runs(input, ctx: Context) -> dict:
"""Hatchet task entry point — delegates to _check_failed_runs."""
return await _check_failed_runs()

View File

@@ -18,6 +18,7 @@ from pathlib import Path
from hatchet_sdk import Context
from pydantic import BaseModel
from reflector.email import is_email_configured, send_transcript_email
from reflector.hatchet.broadcast import (
append_event_and_broadcast,
set_status_and_broadcast,
@@ -37,6 +38,7 @@ from reflector.hatchet.workflows.daily_multitrack_pipeline import (
)
from reflector.hatchet.workflows.models import (
ConsentResult,
EmailResult,
TitleResult,
TopicsResult,
WaveformResult,
@@ -686,7 +688,10 @@ async def cleanup_consent(input: FilePipelineInput, ctx: Context) -> ConsentResu
)
from reflector.db.recordings import recordings_controller # noqa: PLC0415
from reflector.db.transcripts import transcripts_controller # noqa: PLC0415
from reflector.storage import get_transcripts_storage # noqa: PLC0415
from reflector.storage import ( # noqa: PLC0415
get_source_storage,
get_transcripts_storage,
)
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if not transcript:
@@ -695,6 +700,7 @@ async def cleanup_consent(input: FilePipelineInput, ctx: Context) -> ConsentResu
consent_denied = False
recording = None
meeting = None
if transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
if recording and recording.meeting_id:
@@ -754,6 +760,22 @@ async def cleanup_consent(input: FilePipelineInput, ctx: Context) -> ConsentResu
logger.error(error_msg, exc_info=True)
deletion_errors.append(error_msg)
# Delete cloud video if present
if meeting and meeting.daily_composed_video_s3_key:
try:
source_storage = get_source_storage("daily")
await source_storage.delete_file(meeting.daily_composed_video_s3_key)
await meetings_controller.update_meeting(
meeting.id,
daily_composed_video_s3_key=None,
daily_composed_video_duration=None,
)
ctx.log(f"Deleted cloud video: {meeting.daily_composed_video_s3_key}")
except Exception as e:
error_msg = f"Failed to delete cloud video: {e}"
logger.error(error_msg, exc_info=True)
deletion_errors.append(error_msg)
if deletion_errors:
logger.warning(
"[Hatchet] cleanup_consent completed with errors",
@@ -762,7 +784,7 @@ async def cleanup_consent(input: FilePipelineInput, ctx: Context) -> ConsentResu
)
else:
await transcripts_controller.update(transcript, {"audio_deleted": True})
ctx.log("cleanup_consent: all audio deleted successfully")
ctx.log("cleanup_consent: all audio and video deleted successfully")
return ConsentResult()
@@ -859,6 +881,97 @@ async def send_webhook(input: FilePipelineInput, ctx: Context) -> WebhookResult:
return WebhookResult(webhook_sent=False)
@file_pipeline.task(
parents=[cleanup_consent],
execution_timeout=timedelta(seconds=TIMEOUT_SHORT),
retries=5,
backoff_factor=2.0,
backoff_max_seconds=15,
)
@with_error_handling(TaskName.SEND_EMAIL, set_error_status=False)
async def send_email(input: FilePipelineInput, ctx: Context) -> EmailResult:
"""Send transcript email to collected recipients."""
ctx.log(f"send_email: transcript_id={input.transcript_id}")
if not is_email_configured():
ctx.log("send_email skipped (SMTP not configured)")
return EmailResult(skipped=True)
async with fresh_db_connection():
from reflector.db.meetings import meetings_controller # noqa: PLC0415
from reflector.db.recordings import recordings_controller # noqa: PLC0415
from reflector.db.transcripts import transcripts_controller # noqa: PLC0415
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if not transcript:
ctx.log("send_email skipped (transcript not found)")
return EmailResult(skipped=True)
# Try transcript.meeting_id first, then fall back to recording.meeting_id
meeting = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if not meeting and transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
if recording and recording.meeting_id:
meeting = await meetings_controller.get_by_id(recording.meeting_id)
# Normalise meeting recipients (legacy strings → dicts)
meeting_recipients: list[dict] = (
[
entry
if isinstance(entry, dict)
else {"email": entry, "include_link": True}
for entry in (meeting.email_recipients or [])
]
if meeting and meeting.email_recipients
else []
)
# Room-level email always gets a link (room owner)
from reflector.db.rooms import rooms_controller # noqa: PLC0415
room_email = None
if transcript.room_id:
room = await rooms_controller.get_by_id(transcript.room_id)
if room and room.email_transcript_to:
room_email = room.email_transcript_to
# Build two groups: with link and without link
with_link = [
r["email"] for r in meeting_recipients if r.get("include_link", True)
]
without_link = [
r["email"] for r in meeting_recipients if not r.get("include_link", True)
]
if room_email:
if room_email not in with_link:
with_link.append(room_email)
without_link = [e for e in without_link if e != room_email]
if not with_link and not without_link:
ctx.log("send_email skipped (no email recipients)")
return EmailResult(skipped=True)
# For room-level emails, do NOT change share_mode (only set public if meeting had recipients)
if meeting and meeting.email_recipients:
await transcripts_controller.update(transcript, {"share_mode": "public"})
count = 0
if with_link:
count += await send_transcript_email(
with_link, transcript, include_link=True
)
if without_link:
count += await send_transcript_email(
without_link, transcript, include_link=False
)
ctx.log(f"send_email complete: sent {count} emails")
return EmailResult(emails_sent=count)
# --- On failure handler ---

View File

@@ -17,6 +17,7 @@ from datetime import timedelta
from hatchet_sdk import Context
from pydantic import BaseModel
from reflector.email import is_email_configured, send_transcript_email
from reflector.hatchet.client import HatchetClientManager
from reflector.hatchet.constants import (
TIMEOUT_HEAVY,
@@ -32,6 +33,7 @@ from reflector.hatchet.workflows.daily_multitrack_pipeline import (
)
from reflector.hatchet.workflows.models import (
ConsentResult,
EmailResult,
TitleResult,
WaveformResult,
WebhookResult,
@@ -361,6 +363,96 @@ async def send_webhook(input: LivePostPipelineInput, ctx: Context) -> WebhookRes
return WebhookResult(webhook_sent=False)
@live_post_pipeline.task(
parents=[final_summaries],
execution_timeout=timedelta(seconds=TIMEOUT_SHORT),
retries=5,
backoff_factor=2.0,
backoff_max_seconds=15,
)
@with_error_handling(TaskName.SEND_EMAIL, set_error_status=False)
async def send_email(input: LivePostPipelineInput, ctx: Context) -> EmailResult:
"""Send transcript email to collected recipients."""
ctx.log(f"send_email: transcript_id={input.transcript_id}")
if not is_email_configured():
ctx.log("send_email skipped (SMTP not configured)")
return EmailResult(skipped=True)
async with fresh_db_connection():
from reflector.db.meetings import meetings_controller # noqa: PLC0415
from reflector.db.recordings import recordings_controller # noqa: PLC0415
from reflector.db.transcripts import transcripts_controller # noqa: PLC0415
transcript = await transcripts_controller.get_by_id(input.transcript_id)
if not transcript:
ctx.log("send_email skipped (transcript not found)")
return EmailResult(skipped=True)
meeting = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if not meeting and transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
if recording and recording.meeting_id:
meeting = await meetings_controller.get_by_id(recording.meeting_id)
# Normalise meeting recipients (legacy strings → dicts)
meeting_recipients: list[dict] = (
[
entry
if isinstance(entry, dict)
else {"email": entry, "include_link": True}
for entry in (meeting.email_recipients or [])
]
if meeting and meeting.email_recipients
else []
)
# Room-level email always gets a link (room owner)
from reflector.db.rooms import rooms_controller # noqa: PLC0415
room_email = None
if transcript.room_id:
room = await rooms_controller.get_by_id(transcript.room_id)
if room and room.email_transcript_to:
room_email = room.email_transcript_to
# Build two groups: with link and without link
with_link = [
r["email"] for r in meeting_recipients if r.get("include_link", True)
]
without_link = [
r["email"] for r in meeting_recipients if not r.get("include_link", True)
]
if room_email:
if room_email not in with_link:
with_link.append(room_email)
without_link = [e for e in without_link if e != room_email]
if not with_link and not without_link:
ctx.log("send_email skipped (no email recipients)")
return EmailResult(skipped=True)
# For room-level emails, do NOT change share_mode (only set public if meeting had recipients)
if meeting and meeting.email_recipients:
await transcripts_controller.update(transcript, {"share_mode": "public"})
count = 0
if with_link:
count += await send_transcript_email(
with_link, transcript, include_link=True
)
if without_link:
count += await send_transcript_email(
without_link, transcript, include_link=False
)
ctx.log(f"send_email complete: sent {count} emails")
return EmailResult(emails_sent=count)
# --- On failure handler ---

View File

@@ -170,3 +170,10 @@ class WebhookResult(BaseModel):
webhook_sent: bool
skipped: bool = False
response_code: int | None = None
class EmailResult(BaseModel):
"""Result from send_email task."""
emails_sent: int = 0
skipped: bool = False

View File

@@ -37,6 +37,9 @@ class TrackInput(BaseModel):
transcript_id: str
language: str = "en"
source_platform: str = "daily"
# Pre-calculated padding in seconds (from filename timestamps for LiveKit).
# When set, overrides container metadata extraction for start_time.
padding_seconds: float | None = None
hatchet = HatchetClientManager.get_client()
@@ -53,15 +56,19 @@ track_workflow = hatchet.workflow(name="TrackProcessing", input_validator=TrackI
async def pad_track(input: TrackInput, ctx: Context) -> PadTrackResult:
"""Pad single audio track with silence for alignment.
Extracts stream.start_time from WebM container metadata and applies
silence padding using PyAV filter graph (adelay).
For Daily: extracts stream.start_time from WebM container metadata.
For LiveKit: uses pre-calculated padding_seconds from filename timestamps
(OGG files don't have embedded start_time metadata).
"""
ctx.log(f"pad_track: track {input.track_index}, s3_key={input.s3_key}")
ctx.log(
f"pad_track: track {input.track_index}, s3_key={input.s3_key}, padding_seconds={input.padding_seconds}"
)
logger.info(
"[Hatchet] pad_track",
track_index=input.track_index,
s3_key=input.s3_key,
transcript_id=input.transcript_id,
padding_seconds=input.padding_seconds,
)
try:
@@ -79,10 +86,16 @@ async def pad_track(input: TrackInput, ctx: Context) -> PadTrackResult:
bucket=input.bucket_name,
)
with av.open(source_url) as in_container:
start_time_seconds = extract_stream_start_time_from_container(
in_container, input.track_index, logger=logger
)
if input.padding_seconds is not None:
# Pre-calculated offset (LiveKit: from filename timestamps)
start_time_seconds = input.padding_seconds
ctx.log(f"pad_track: using pre-calculated padding={start_time_seconds}s")
else:
# Extract from container metadata (Daily: WebM start_time)
with av.open(source_url) as in_container:
start_time_seconds = extract_stream_start_time_from_container(
in_container, input.track_index, logger=logger
)
# If no padding needed, return original S3 key
if start_time_seconds <= 0:

View File

@@ -0,0 +1,12 @@
"""
LiveKit API Module — thin wrapper around the livekit-api SDK.
"""
from .client import LiveKitApiClient
from .webhooks import create_webhook_receiver, verify_webhook
__all__ = [
"LiveKitApiClient",
"create_webhook_receiver",
"verify_webhook",
]

View File

@@ -0,0 +1,195 @@
"""
LiveKit API client wrapping the official livekit-api Python SDK.
Handles room management, access tokens, and Track Egress for
per-participant audio recording to S3-compatible storage.
"""
from datetime import timedelta
from livekit.api import (
AccessToken,
AutoTrackEgress,
CreateRoomRequest,
DeleteRoomRequest,
DirectFileOutput,
EgressInfo,
ListEgressRequest,
ListParticipantsRequest,
LiveKitAPI,
Room,
RoomEgress,
S3Upload,
StopEgressRequest,
TrackEgressRequest,
VideoGrants,
)
class LiveKitApiClient:
"""Thin wrapper around LiveKitAPI for Reflector's needs."""
def __init__(
self,
url: str,
api_key: str,
api_secret: str,
s3_bucket: str | None = None,
s3_region: str | None = None,
s3_access_key: str | None = None,
s3_secret_key: str | None = None,
s3_endpoint: str | None = None,
):
self._url = url
self._api_key = api_key
self._api_secret = api_secret
self._s3_bucket = s3_bucket
self._s3_region = s3_region or "us-east-1"
self._s3_access_key = s3_access_key
self._s3_secret_key = s3_secret_key
self._s3_endpoint = s3_endpoint
self._api = LiveKitAPI(url=url, api_key=api_key, api_secret=api_secret)
# ── Room management ──────────────────────────────────────────
async def create_room(
self,
name: str,
empty_timeout: int = 300,
max_participants: int = 0,
enable_auto_track_egress: bool = False,
track_egress_filepath: str = "livekit/{room_name}/{publisher_identity}-{time}",
) -> Room:
"""Create a LiveKit room.
Args:
name: Room name (unique identifier).
empty_timeout: Seconds to keep room alive after last participant leaves.
max_participants: 0 = unlimited.
enable_auto_track_egress: If True, automatically record each participant's
audio track to S3 as a separate file (OGG/Opus).
track_egress_filepath: S3 filepath template for auto track egress.
Supports {room_name}, {publisher_identity}, {time}.
"""
egress = None
if enable_auto_track_egress:
egress = RoomEgress(
tracks=AutoTrackEgress(
filepath=track_egress_filepath,
s3=self._build_s3_upload(),
),
)
req = CreateRoomRequest(
name=name,
empty_timeout=empty_timeout,
max_participants=max_participants,
egress=egress,
)
return await self._api.room.create_room(req)
async def delete_room(self, room_name: str) -> None:
await self._api.room.delete_room(DeleteRoomRequest(room=room_name))
async def list_participants(self, room_name: str):
resp = await self._api.room.list_participants(
ListParticipantsRequest(room=room_name)
)
return resp.participants
# ── Access tokens ────────────────────────────────────────────
def create_access_token(
self,
room_name: str,
participant_identity: str,
participant_name: str | None = None,
can_publish: bool = True,
can_subscribe: bool = True,
room_admin: bool = False,
ttl_seconds: int = 86400,
) -> str:
"""Generate a JWT access token for a participant."""
token = AccessToken(
api_key=self._api_key,
api_secret=self._api_secret,
)
token.identity = participant_identity
token.name = participant_name or participant_identity
token.ttl = timedelta(seconds=ttl_seconds)
token.with_grants(
VideoGrants(
room_join=True,
room=room_name,
can_publish=can_publish,
can_subscribe=can_subscribe,
room_admin=room_admin,
)
)
return token.to_jwt()
# ── Track Egress (per-participant audio recording) ───────────
def _build_s3_upload(self) -> S3Upload:
"""Build S3Upload config for egress output."""
if not all([self._s3_bucket, self._s3_access_key, self._s3_secret_key]):
raise ValueError(
"S3 storage not configured for LiveKit egress. "
"Set LIVEKIT_STORAGE_AWS_* environment variables."
)
kwargs = {
"access_key": self._s3_access_key,
"secret": self._s3_secret_key,
"bucket": self._s3_bucket,
"region": self._s3_region,
"force_path_style": True, # Required for Garage/MinIO
}
if self._s3_endpoint:
kwargs["endpoint"] = self._s3_endpoint
return S3Upload(**kwargs)
async def start_track_egress(
self,
room_name: str,
track_sid: str,
s3_filepath: str,
) -> EgressInfo:
"""Start Track Egress for a single audio track (writes OGG/Opus to S3).
Args:
room_name: LiveKit room name.
track_sid: Track SID to record.
s3_filepath: S3 key path for the output file.
"""
req = TrackEgressRequest(
room_name=room_name,
track_id=track_sid,
file=DirectFileOutput(
filepath=s3_filepath,
s3=self._build_s3_upload(),
),
)
return await self._api.egress.start_track_egress(req)
async def list_egress(self, room_name: str | None = None) -> list[EgressInfo]:
req = ListEgressRequest()
if room_name:
req.room_name = room_name
resp = await self._api.egress.list_egress(req)
return list(resp.items)
async def stop_egress(self, egress_id: str) -> EgressInfo:
return await self._api.egress.stop_egress(
StopEgressRequest(egress_id=egress_id)
)
# ── Cleanup ──────────────────────────────────────────────────
async def close(self):
await self._api.aclose()
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()

View File

@@ -0,0 +1,52 @@
"""
LiveKit webhook verification and event parsing.
LiveKit signs webhooks using the API secret as a JWT.
The WebhookReceiver from the SDK handles verification.
"""
from livekit.api import TokenVerifier, WebhookEvent, WebhookReceiver
from reflector.logger import logger
def create_webhook_receiver(api_key: str, api_secret: str) -> WebhookReceiver:
"""Create a WebhookReceiver for verifying LiveKit webhook signatures."""
return WebhookReceiver(
token_verifier=TokenVerifier(api_key=api_key, api_secret=api_secret)
)
def verify_webhook(
receiver: WebhookReceiver,
body: str | bytes,
auth_header: str,
) -> WebhookEvent | None:
"""Verify and parse a LiveKit webhook event.
Returns the parsed WebhookEvent if valid, None if verification fails.
Logs at different levels depending on failure type:
- WARNING: invalid signature, expired token, malformed JWT (expected rejections)
- ERROR: unexpected exceptions (potential bugs or attacks)
"""
if isinstance(body, bytes):
body = body.decode("utf-8")
try:
return receiver.receive(body, auth_header)
except (ValueError, KeyError) as e:
# Expected verification failures (bad JWT, wrong key, expired, malformed)
logger.warning(
"LiveKit webhook verification failed",
error=str(e),
error_type=type(e).__name__,
)
return None
except Exception as e:
# Unexpected errors — log at ERROR for visibility (potential attack or SDK bug)
logger.error(
"Unexpected error during LiveKit webhook verification",
error=str(e),
error_type=type(e).__name__,
exc_info=True,
)
return None

View File

@@ -61,7 +61,7 @@ from reflector.processors.types import (
)
from reflector.processors.types import Transcript as TranscriptProcessorType
from reflector.settings import settings
from reflector.storage import get_transcripts_storage
from reflector.storage import get_source_storage, get_transcripts_storage
from reflector.views.transcripts import GetTranscriptTopic
from reflector.ws_events import TranscriptEventName
from reflector.ws_manager import WebsocketManager, get_ws_manager
@@ -671,6 +671,22 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
logger.error(error_msg, exc_info=e)
deletion_errors.append(error_msg)
# Delete cloud video if present
if meeting and meeting.daily_composed_video_s3_key:
try:
source_storage = get_source_storage("daily")
await source_storage.delete_file(meeting.daily_composed_video_s3_key)
await meetings_controller.update_meeting(
meeting.id,
daily_composed_video_s3_key=None,
daily_composed_video_duration=None,
)
logger.info(f"Deleted cloud video: {meeting.daily_composed_video_s3_key}")
except Exception as e:
error_msg = f"Failed to delete cloud video: {e}"
logger.error(error_msg, exc_info=e)
deletion_errors.append(error_msg)
if deletion_errors:
logger.warning(
f"Consent cleanup completed with {len(deletion_errors)} errors",
@@ -678,7 +694,7 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
)
else:
await transcripts_controller.update(transcript, {"audio_deleted": True})
logger.info("Consent cleanup done - all audio deleted")
logger.info("Consent cleanup done - all audio and video deleted")
@get_transcript

View File

@@ -4,6 +4,8 @@ from .audio_diarization_auto import AudioDiarizationAutoProcessor # noqa: F401
from .audio_downscale import AudioDownscaleProcessor # noqa: F401
from .audio_file_writer import AudioFileWriterProcessor # noqa: F401
from .audio_merge import AudioMergeProcessor # noqa: F401
from .audio_mixdown import AudioMixdownProcessor # noqa: F401
from .audio_mixdown_auto import AudioMixdownAutoProcessor # noqa: F401
from .audio_padding import AudioPaddingProcessor # noqa: F401
from .audio_padding_auto import AudioPaddingAutoProcessor # noqa: F401
from .audio_transcript import AudioTranscriptProcessor # noqa: F401

View File

@@ -0,0 +1,27 @@
"""
Base class for audio mixdown processors.
"""
from pydantic import BaseModel
class MixdownResponse(BaseModel):
size: int
duration_ms: float = 0.0
cancelled: bool = False
output_path: str | None = (
None # Local file path (pyav sets this; modal leaves None)
)
class AudioMixdownProcessor:
"""Base class for audio mixdown processors."""
async def mixdown_tracks(
self,
track_urls: list[str],
output_url: str,
target_sample_rate: int | None = None,
offsets_seconds: list[float] | None = None,
) -> MixdownResponse:
raise NotImplementedError

View File

@@ -0,0 +1,32 @@
import importlib
from reflector.processors.audio_mixdown import AudioMixdownProcessor
from reflector.settings import settings
class AudioMixdownAutoProcessor(AudioMixdownProcessor):
_registry = {}
@classmethod
def register(cls, name, kclass):
cls._registry[name] = kclass
def __new__(cls, name: str | None = None, **kwargs):
if name is None:
name = settings.MIXDOWN_BACKEND
if name not in cls._registry:
module_name = f"reflector.processors.audio_mixdown_{name}"
importlib.import_module(module_name)
# gather specific configuration for the processor
# search `MIXDOWN_XXX_YYY`, push to constructor as `xxx_yyy`
config = {}
name_upper = name.upper()
settings_prefix = "MIXDOWN_"
config_prefix = f"{settings_prefix}{name_upper}_"
for key, value in settings:
if key.startswith(config_prefix):
config_name = key[len(settings_prefix) :].lower()
config[config_name] = value
return cls._registry[name](**config | kwargs)

View File

@@ -0,0 +1,110 @@
"""
Modal.com backend for audio mixdown.
"""
import asyncio
import os
import httpx
from reflector.hatchet.constants import TIMEOUT_HEAVY_HTTP
from reflector.logger import logger
from reflector.processors.audio_mixdown import AudioMixdownProcessor, MixdownResponse
from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
class AudioMixdownModalProcessor(AudioMixdownProcessor):
"""Audio mixdown processor using Modal.com/self-hosted backend via HTTP."""
def __init__(
self, mixdown_url: str | None = None, modal_api_key: str | None = None
):
self.mixdown_url = mixdown_url or os.getenv("MIXDOWN_URL")
if not self.mixdown_url:
raise ValueError(
"MIXDOWN_URL required to use AudioMixdownModalProcessor. "
"Set MIXDOWN_URL environment variable or pass mixdown_url parameter."
)
self.modal_api_key = modal_api_key or os.getenv("MODAL_API_KEY")
async def mixdown_tracks(
self,
track_urls: list[str],
output_url: str,
target_sample_rate: int | None = None,
offsets_seconds: list[float] | None = None,
) -> MixdownResponse:
"""Mix audio tracks via remote Modal/self-hosted backend.
Args:
track_urls: Presigned GET URLs for source audio tracks
output_url: Presigned PUT URL for output MP3
target_sample_rate: Sample rate for output (Hz), auto-detected if None
offsets_seconds: Optional per-track delays in seconds for alignment
"""
valid_count = len([u for u in track_urls if u])
log = logger.bind(track_count=valid_count)
log.info("Sending Modal mixdown HTTP request")
url = f"{self.mixdown_url}/mixdown"
headers = {}
if self.modal_api_key:
headers["Authorization"] = f"Bearer {self.modal_api_key}"
# Scale timeout with track count: base TIMEOUT_HEAVY_HTTP + 60s per track beyond 2
extra_timeout = max(0, (valid_count - 2)) * 60
timeout = TIMEOUT_HEAVY_HTTP + extra_timeout
try:
async with httpx.AsyncClient(timeout=timeout) as client:
response = await client.post(
url,
headers=headers,
json={
"track_urls": track_urls,
"output_url": output_url,
"target_sample_rate": target_sample_rate,
"offsets_seconds": offsets_seconds,
},
follow_redirects=True,
)
if response.status_code != 200:
error_body = response.text
log.error(
"Modal mixdown API error",
status_code=response.status_code,
error_body=error_body,
)
response.raise_for_status()
result = response.json()
# Check if work was cancelled
if result.get("cancelled"):
log.warning("Modal mixdown was cancelled by disconnect detection")
raise asyncio.CancelledError(
"Mixdown cancelled due to client disconnect"
)
log.info("Modal mixdown complete", size=result["size"])
return MixdownResponse(**result)
except asyncio.CancelledError:
log.warning(
"Modal mixdown cancelled (Hatchet timeout, disconnect detected on Modal side)"
)
raise
except httpx.TimeoutException as e:
log.error("Modal mixdown timeout", error=str(e), exc_info=True)
raise Exception(f"Modal mixdown timeout: {e}") from e
except httpx.HTTPStatusError as e:
log.error("Modal mixdown HTTP error", error=str(e), exc_info=True)
raise Exception(f"Modal mixdown HTTP error: {e}") from e
except Exception as e:
log.error("Modal mixdown unexpected error", error=str(e), exc_info=True)
raise
AudioMixdownAutoProcessor.register("modal", AudioMixdownModalProcessor)

View File

@@ -0,0 +1,101 @@
"""
PyAV audio mixdown processor.
Mixes N tracks in-process using the existing utility from reflector.utils.audio_mixdown.
Writes to a local temp file (does NOT upload to S3 — the pipeline handles upload).
"""
import os
import tempfile
from reflector.logger import logger
from reflector.processors.audio_file_writer import AudioFileWriterProcessor
from reflector.processors.audio_mixdown import AudioMixdownProcessor, MixdownResponse
from reflector.processors.audio_mixdown_auto import AudioMixdownAutoProcessor
from reflector.utils.audio_mixdown import (
detect_sample_rate_from_tracks,
mixdown_tracks_pyav,
)
class AudioMixdownPyavProcessor(AudioMixdownProcessor):
"""Audio mixdown processor using PyAV (no HTTP backend).
Writes the mixed output to a local temp file and returns its path
in MixdownResponse.output_path. The caller is responsible for
uploading the file and cleaning it up.
"""
async def mixdown_tracks(
self,
track_urls: list[str],
output_url: str,
target_sample_rate: int | None = None,
offsets_seconds: list[float] | None = None,
) -> MixdownResponse:
log = logger.bind(track_count=len(track_urls))
log.info("Starting local PyAV mixdown")
valid_urls = [url for url in track_urls if url]
if not valid_urls:
raise ValueError("No valid track URLs provided")
# Auto-detect sample rate if not provided
if target_sample_rate is None:
target_sample_rate = detect_sample_rate_from_tracks(
valid_urls, logger=logger
)
if not target_sample_rate:
raise ValueError("No decodable audio frames in any track")
# Write to temp MP3 file
temp_dir = tempfile.mkdtemp()
output_path = os.path.join(temp_dir, "mixed.mp3")
duration_ms_container = [0.0]
async def capture_duration(d):
duration_ms_container[0] = d
writer = AudioFileWriterProcessor(
path=output_path, on_duration=capture_duration
)
try:
await mixdown_tracks_pyav(
valid_urls,
writer,
target_sample_rate,
offsets_seconds=offsets_seconds,
logger=logger,
)
await writer.flush()
file_size = os.path.getsize(output_path)
log.info(
"Local mixdown complete",
size=file_size,
duration_ms=duration_ms_container[0],
)
return MixdownResponse(
size=file_size,
duration_ms=duration_ms_container[0],
output_path=output_path,
)
except Exception as e:
# Cleanup on failure
if os.path.exists(output_path):
try:
os.unlink(output_path)
except Exception:
pass
try:
os.rmdir(temp_dir)
except Exception:
pass
log.error("Local mixdown failed", error=str(e), exc_info=True)
raise
AudioMixdownAutoProcessor.register("pyav", AudioMixdownPyavProcessor)

View File

@@ -1,5 +1,6 @@
from typing import Literal
Platform = Literal["whereby", "daily"]
Platform = Literal["whereby", "daily", "livekit"]
WHEREBY_PLATFORM: Platform = "whereby"
DAILY_PLATFORM: Platform = "daily"
LIVEKIT_PLATFORM: Platform = "livekit"

View File

@@ -155,12 +155,17 @@ async def prepare_transcript_processing(validation: ValidationOk) -> PrepareResu
)
if track_keys:
# Detect platform from recording ID prefix
source_platform = (
"livekit" if recording_id and recording_id.startswith("lk-") else "daily"
)
return MultitrackProcessingConfig(
bucket_name=bucket_name, # type: ignore (validated above)
track_keys=track_keys,
transcript_id=validation.transcript_id,
recording_id=recording_id,
room_id=validation.room_id,
source_platform=source_platform,
)
return FileProcessingConfig(

View File

@@ -127,6 +127,14 @@ class Settings(BaseSettings):
PADDING_URL: str | None = None
PADDING_MODAL_API_KEY: str | None = None
# Audio Mixdown
# backends:
# - pyav: in-process PyAV mixdown (no HTTP, runs in same process)
# - modal: HTTP API client (works with Modal.com OR self-hosted gpu/self_hosted/)
MIXDOWN_BACKEND: str = "pyav"
MIXDOWN_URL: str | None = None
MIXDOWN_MODAL_API_KEY: str | None = None
# Sentry
SENTRY_DSN: str | None = None
@@ -187,6 +195,23 @@ class Settings(BaseSettings):
DAILY_WEBHOOK_UUID: str | None = (
None # Webhook UUID for this environment. Not used by production code
)
# LiveKit integration (self-hosted open-source video platform)
LIVEKIT_URL: str | None = (
None # e.g. ws://livekit:7880 (internal) or wss://livekit.example.com
)
LIVEKIT_API_KEY: str | None = None
LIVEKIT_API_SECRET: str | None = None
LIVEKIT_WEBHOOK_SECRET: str | None = None # Defaults to API_SECRET if not set
# LiveKit egress S3 storage (Track Egress writes per-participant audio here)
LIVEKIT_STORAGE_AWS_BUCKET_NAME: str | None = None
LIVEKIT_STORAGE_AWS_REGION: str | None = None
LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
LIVEKIT_STORAGE_AWS_ENDPOINT_URL: str | None = None # For Garage/MinIO
# Public URL for LiveKit (used in frontend room_url, e.g. wss://livekit.example.com)
LIVEKIT_PUBLIC_URL: str | None = None
# Platform Configuration
DEFAULT_VIDEO_PLATFORM: Platform = DAILY_PLATFORM
@@ -194,6 +219,16 @@ class Settings(BaseSettings):
ZULIP_REALM: str | None = None
ZULIP_API_KEY: str | None = None
ZULIP_BOT_EMAIL: str | None = None
ZULIP_DAG_STREAM: str | None = None
ZULIP_DAG_TOPIC: str | None = None
# Email / SMTP integration (for transcript email notifications)
SMTP_HOST: str | None = None
SMTP_PORT: int = 587
SMTP_USERNAME: str | None = None
SMTP_PASSWORD: str | None = None
SMTP_FROM_EMAIL: str | None = None
SMTP_USE_TLS: bool = True
# Hatchet workflow orchestration (always enabled for multitrack processing)
HATCHET_CLIENT_TOKEN: str | None = None

View File

@@ -57,6 +57,22 @@ def get_source_storage(platform: str) -> Storage:
aws_secret_access_key=settings.WHEREBY_STORAGE_AWS_SECRET_ACCESS_KEY,
)
elif platform == "livekit":
if (
settings.LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID
and settings.LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY
and settings.LIVEKIT_STORAGE_AWS_BUCKET_NAME
):
from reflector.storage.storage_aws import AwsStorage
return AwsStorage(
aws_bucket_name=settings.LIVEKIT_STORAGE_AWS_BUCKET_NAME,
aws_region=settings.LIVEKIT_STORAGE_AWS_REGION or "us-east-1",
aws_access_key_id=settings.LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY,
aws_endpoint_url=settings.LIVEKIT_STORAGE_AWS_ENDPOINT_URL,
)
return get_transcripts_storage()

View File

@@ -116,9 +116,12 @@ class Storage:
expires_in: int = 3600,
*,
bucket: str | None = None,
extra_params: dict | None = None,
) -> str:
"""Generate presigned URL. bucket: override instance default if provided."""
return await self._get_file_url(filename, operation, expires_in, bucket=bucket)
return await self._get_file_url(
filename, operation, expires_in, bucket=bucket, extra_params=extra_params
)
async def _get_file_url(
self,
@@ -127,6 +130,7 @@ class Storage:
expires_in: int = 3600,
*,
bucket: str | None = None,
extra_params: dict | None = None,
) -> str:
raise NotImplementedError

View File

@@ -170,16 +170,23 @@ class AwsStorage(Storage):
expires_in: int = 3600,
*,
bucket: str | None = None,
extra_params: dict | None = None,
) -> str:
actual_bucket = bucket or self._bucket_name
folder = self.aws_folder
s3filename = f"{folder}/{filename}" if folder else filename
params = {}
if extra_params:
params.update(extra_params)
# Always set Bucket/Key after extra_params to prevent overrides
params["Bucket"] = actual_bucket
params["Key"] = s3filename
async with self.session.client(
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
) as client:
presigned_url = await client.generate_presigned_url(
operation,
Params={"Bucket": actual_bucket, "Key": s3filename},
Params=params,
ExpiresIn=expires_in,
)

View File

@@ -0,0 +1,257 @@
#!/usr/bin/env python
"""
CLI tool for managing soft-deleted transcripts.
Usage:
uv run python -m reflector.tools.deleted_transcripts list
uv run python -m reflector.tools.deleted_transcripts files <transcript_id>
uv run python -m reflector.tools.deleted_transcripts download <transcript_id> [--output-dir ./]
"""
import argparse
import asyncio
import json
import os
import structlog
from reflector.db import get_database
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import recordings_controller
from reflector.db.transcripts import Transcript, transcripts
from reflector.storage import get_source_storage, get_transcripts_storage
logger = structlog.get_logger(__name__)
async def list_deleted():
"""List all soft-deleted transcripts."""
database = get_database()
await database.connect()
try:
query = (
transcripts.select()
.where(transcripts.c.deleted_at.isnot(None))
.order_by(transcripts.c.deleted_at.desc())
)
results = await database.fetch_all(query)
if not results:
print("No deleted transcripts found.")
return
print(
f"{'ID':<40} {'Title':<40} {'Deleted At':<28} {'Recording ID':<40} {'Meeting ID'}"
)
print("-" * 180)
for row in results:
t = Transcript(**row)
title = (t.title or "")[:38]
deleted = t.deleted_at.isoformat() if t.deleted_at else ""
print(
f"{t.id:<40} {title:<40} {deleted:<28} {t.recording_id or '':<40} {t.meeting_id or ''}"
)
print(f"\nTotal: {len(results)} deleted transcript(s)")
finally:
await database.disconnect()
async def list_files(transcript_id: str):
"""List all S3 keys associated with a deleted transcript."""
database = get_database()
await database.connect()
try:
query = transcripts.select().where(transcripts.c.id == transcript_id)
result = await database.fetch_one(query)
if not result:
print(f"Transcript {transcript_id} not found.")
return
t = Transcript(**result)
if t.deleted_at is None:
print(f"Transcript {transcript_id} is not deleted.")
return
print(f"Transcript: {t.id}")
print(f"Title: {t.title}")
print(f"Deleted at: {t.deleted_at}")
print()
files = []
# Transcript audio
if t.audio_location == "storage" and not t.audio_deleted:
files.append(("Transcript audio", t.storage_audio_path, None))
# Recording files
if t.recording_id:
recording = await recordings_controller.get_by_id(t.recording_id)
if recording:
if recording.object_key:
files.append(
(
"Recording object_key",
recording.object_key,
recording.bucket_name,
)
)
if recording.track_keys:
for i, key in enumerate(recording.track_keys):
files.append((f"Track {i}", key, recording.bucket_name))
# Cloud video
if t.meeting_id:
meeting = await meetings_controller.get_by_id(t.meeting_id)
if meeting and meeting.daily_composed_video_s3_key:
files.append(("Cloud video", meeting.daily_composed_video_s3_key, None))
if not files:
print("No associated files found.")
return
print(f"{'Type':<25} {'Bucket':<30} {'S3 Key'}")
print("-" * 120)
for label, key, bucket in files:
print(f"{label:<25} {bucket or '(default)':<30} {key}")
# Generate presigned URLs
print("\nPresigned URLs (valid for 1 hour):")
print("-" * 120)
storage = get_transcripts_storage()
for label, key, bucket in files:
try:
url = await storage.get_file_url(key, bucket=bucket, expires_in=3600)
print(f"{label}: {url}")
except Exception as e:
print(f"{label}: ERROR - {e}")
finally:
await database.disconnect()
async def download_files(transcript_id: str, output_dir: str):
"""Download all files associated with a deleted transcript."""
database = get_database()
await database.connect()
try:
query = transcripts.select().where(transcripts.c.id == transcript_id)
result = await database.fetch_one(query)
if not result:
print(f"Transcript {transcript_id} not found.")
return
t = Transcript(**result)
if t.deleted_at is None:
print(f"Transcript {transcript_id} is not deleted.")
return
dest = os.path.join(output_dir, t.id)
os.makedirs(dest, exist_ok=True)
storage = get_transcripts_storage()
# Download transcript audio
if t.audio_location == "storage" and not t.audio_deleted:
try:
data = await storage.get_file(t.storage_audio_path)
path = os.path.join(dest, "audio.mp3")
with open(path, "wb") as f:
f.write(data)
print(f"Downloaded: {path}")
except Exception as e:
print(f"Failed to download audio: {e}")
# Download recording files
if t.recording_id:
recording = await recordings_controller.get_by_id(t.recording_id)
if recording and recording.track_keys:
tracks_dir = os.path.join(dest, "tracks")
os.makedirs(tracks_dir, exist_ok=True)
for i, key in enumerate(recording.track_keys):
try:
data = await storage.get_file(key, bucket=recording.bucket_name)
filename = os.path.basename(key) or f"track_{i}"
path = os.path.join(tracks_dir, filename)
with open(path, "wb") as f:
f.write(data)
print(f"Downloaded: {path}")
except Exception as e:
print(f"Failed to download track {i}: {e}")
# Download cloud video
if t.meeting_id:
meeting = await meetings_controller.get_by_id(t.meeting_id)
if meeting and meeting.daily_composed_video_s3_key:
try:
source_storage = get_source_storage("daily")
data = await source_storage.get_file(
meeting.daily_composed_video_s3_key
)
path = os.path.join(dest, "cloud_video.mp4")
with open(path, "wb") as f:
f.write(data)
print(f"Downloaded: {path}")
except Exception as e:
print(f"Failed to download cloud video: {e}")
# Write metadata
metadata = {
"id": t.id,
"title": t.title,
"created_at": t.created_at.isoformat() if t.created_at else None,
"deleted_at": t.deleted_at.isoformat() if t.deleted_at else None,
"duration": t.duration,
"source_language": t.source_language,
"target_language": t.target_language,
"short_summary": t.short_summary,
"long_summary": t.long_summary,
"topics": [topic.model_dump() for topic in t.topics] if t.topics else [],
"participants": [p.model_dump() for p in t.participants]
if t.participants
else [],
"action_items": t.action_items,
"webvtt": t.webvtt,
"recording_id": t.recording_id,
"meeting_id": t.meeting_id,
}
path = os.path.join(dest, "metadata.json")
with open(path, "w") as f:
json.dump(metadata, f, indent=2, default=str)
print(f"Downloaded: {path}")
print(f"\nAll files saved to: {dest}")
finally:
await database.disconnect()
def main():
parser = argparse.ArgumentParser(description="Manage soft-deleted transcripts")
subparsers = parser.add_subparsers(dest="command", required=True)
subparsers.add_parser("list", help="List all deleted transcripts")
files_parser = subparsers.add_parser(
"files", help="List S3 keys for a deleted transcript"
)
files_parser.add_argument("transcript_id", help="Transcript ID")
download_parser = subparsers.add_parser(
"download", help="Download files for a deleted transcript"
)
download_parser.add_argument("transcript_id", help="Transcript ID")
download_parser.add_argument(
"--output-dir", default=".", help="Output directory (default: .)"
)
args = parser.parse_args()
if args.command == "list":
asyncio.run(list_deleted())
elif args.command == "files":
asyncio.run(list_files(args.transcript_id))
elif args.command == "download":
asyncio.run(download_files(args.transcript_id, args.output_dir))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,412 @@
"""
Render Hatchet workflow runs as text DAG.
Usage:
# Show latest 5 runs (summary table)
uv run -m reflector.tools.render_hatchet_run
# Show specific run with full DAG + task details
uv run -m reflector.tools.render_hatchet_run <workflow_run_id>
# Drill into Nth run from the list (1-indexed)
uv run -m reflector.tools.render_hatchet_run --show 1
# Show latest N runs
uv run -m reflector.tools.render_hatchet_run --last 10
# Filter by status
uv run -m reflector.tools.render_hatchet_run --status FAILED
uv run -m reflector.tools.render_hatchet_run --status RUNNING
"""
import argparse
import asyncio
from collections import defaultdict
from datetime import datetime, timedelta, timezone
from hatchet_sdk.clients.rest.models import (
V1TaskEvent,
V1TaskStatus,
V1TaskSummary,
V1WorkflowRunDetails,
WorkflowRunShapeItemForWorkflowRunDetails,
)
from reflector.hatchet.client import HatchetClientManager
STATUS_ICON = {
V1TaskStatus.COMPLETED: "\u2705",
V1TaskStatus.RUNNING: "\u23f3",
V1TaskStatus.FAILED: "\u274c",
V1TaskStatus.QUEUED: "\u23f8\ufe0f",
V1TaskStatus.CANCELLED: "\u26a0\ufe0f",
}
STATUS_LABEL = {
V1TaskStatus.COMPLETED: "Complete",
V1TaskStatus.RUNNING: "Running",
V1TaskStatus.FAILED: "FAILED",
V1TaskStatus.QUEUED: "Queued",
V1TaskStatus.CANCELLED: "Cancelled",
}
def _fmt_time(dt: datetime | None) -> str:
if dt is None:
return "-"
return dt.strftime("%H:%M:%S")
def _fmt_duration(ms: int | None) -> str:
if ms is None:
return "-"
secs = ms / 1000
if secs < 60:
return f"{secs:.1f}s"
mins = secs / 60
return f"{mins:.1f}m"
def _fmt_status_line(task: V1TaskSummary) -> str:
"""Format a status line like: Complete (finished 20:31:44)"""
label = STATUS_LABEL.get(task.status, task.status.value)
icon = STATUS_ICON.get(task.status, "?")
if task.status == V1TaskStatus.COMPLETED and task.finished_at:
return f"{icon} {label} (finished {_fmt_time(task.finished_at)})"
elif task.status == V1TaskStatus.RUNNING and task.started_at:
parts = [f"started {_fmt_time(task.started_at)}"]
if task.duration:
parts.append(f"{_fmt_duration(task.duration)} elapsed")
return f"{icon} {label} ({', '.join(parts)})"
elif task.status == V1TaskStatus.FAILED and task.finished_at:
return f"{icon} {label} (failed {_fmt_time(task.finished_at)})"
elif task.status == V1TaskStatus.CANCELLED:
return f"{icon} {label}"
elif task.status == V1TaskStatus.QUEUED:
return f"{icon} {label}"
return f"{icon} {label}"
def _topo_sort(
shape: list[WorkflowRunShapeItemForWorkflowRunDetails],
) -> list[str]:
"""Topological sort of step_ids from shape DAG."""
step_ids = {s.step_id for s in shape}
children_map: dict[str, list[str]] = {}
in_degree: dict[str, int] = {sid: 0 for sid in step_ids}
for s in shape:
children = [c for c in (s.children_step_ids or []) if c in step_ids]
children_map[s.step_id] = children
for c in children:
in_degree[c] += 1
queue = sorted(sid for sid, deg in in_degree.items() if deg == 0)
result: list[str] = []
while queue:
node = queue.pop(0)
result.append(node)
for c in children_map.get(node, []):
in_degree[c] -= 1
if in_degree[c] == 0:
queue.append(c)
queue.sort()
return result
def render_run_detail(details: V1WorkflowRunDetails) -> str:
"""Render a single workflow run as markdown DAG with task details."""
shape = details.shape or []
tasks = details.tasks or []
events = details.task_events or []
run = details.run
if not shape:
return f"Run {run.metadata.id}: {run.status.value} (no shape data)"
# Build lookups
step_to_shape: dict[str, WorkflowRunShapeItemForWorkflowRunDetails] = {
s.step_id: s for s in shape
}
step_to_name: dict[str, str] = {s.step_id: s.task_name for s in shape}
# Reverse edges (parents)
parents: dict[str, list[str]] = {s.step_id: [] for s in shape}
for s in shape:
for child_id in s.children_step_ids or []:
if child_id in parents:
parents[child_id].append(s.step_id)
# Join tasks by step_id
task_by_step: dict[str, V1TaskSummary] = {}
for t in tasks:
if t.step_id and t.step_id in step_to_name:
task_by_step[t.step_id] = t
# Events indexed by task_external_id
events_by_task: dict[str, list[V1TaskEvent]] = defaultdict(list)
for ev in events:
events_by_task[ev.task_id].append(ev)
ordered = _topo_sort(shape)
lines: list[str] = []
# Run header
run_icon = STATUS_ICON.get(run.status, "?")
run_name = run.display_name or run.workflow_id
dur = _fmt_duration(run.duration)
lines.append(f"**{run_name}** {run_icon} {dur}")
lines.append(f"ID: `{run.metadata.id}`")
if run.additional_metadata:
meta_parts = [f"{k}=`{v}`" for k, v in run.additional_metadata.items()]
lines.append(f"Meta: {', '.join(meta_parts)}")
if run.error_message:
# Take first line of error only for header
first_line = run.error_message.split("\n")[0]
lines.append(f"Error: {first_line}")
lines.append("")
# DAG Status Overview table (collapsible)
lines.append("```spoiler DAG Status Overview")
lines.append("| Node | Status | Duration | Dependencies |")
lines.append("|------|--------|----------|--------------|")
for step_id in ordered:
s = step_to_shape[step_id]
t = task_by_step.get(step_id)
name = step_to_name[step_id]
icon = STATUS_ICON.get(t.status, "?") if t else "?"
dur = _fmt_duration(t.duration) if t else "-"
parent_names = [step_to_name[p] for p in parents[step_id]]
child_names = [
step_to_name[c] for c in (s.children_step_ids or []) if c in step_to_name
]
deps_left = ", ".join(parent_names) if parent_names else ""
deps_right = ", ".join(child_names) if child_names else ""
if deps_left and deps_right:
deps = f"{deps_left} \u2192 {deps_right}"
elif deps_right:
deps = f"\u2192 {deps_right}"
elif deps_left:
deps = f"{deps_left} \u2192"
else:
deps = "-"
lines.append(f"| {name} | {icon} | {dur} | {deps} |")
lines.append("```")
lines.append("")
# Node details (collapsible)
lines.append("```spoiler Node Details")
for step_id in ordered:
t = task_by_step.get(step_id)
name = step_to_name[step_id]
if not t:
lines.append(f"**\U0001f4e6 {name}**")
lines.append("Status: no task data")
lines.append("")
continue
lines.append(f"**\U0001f4e6 {name}**")
lines.append(f"Status: {_fmt_status_line(t)}")
if t.duration:
lines.append(f"Duration: {_fmt_duration(t.duration)}")
if t.retry_count and t.retry_count > 0:
lines.append(f"Retries: {t.retry_count}")
# Fan-out children
if t.num_spawned_children and t.num_spawned_children > 0:
children = t.children or []
completed = sum(1 for c in children if c.status == V1TaskStatus.COMPLETED)
failed = sum(1 for c in children if c.status == V1TaskStatus.FAILED)
running = sum(1 for c in children if c.status == V1TaskStatus.RUNNING)
lines.append(
f"Spawned children: {completed}/{t.num_spawned_children} done"
f"{f', {running} running' if running else ''}"
f"{f', {failed} failed' if failed else ''}"
)
# Error message (first meaningful line only, full trace in events)
if t.error_message:
err_lines = t.error_message.strip().split("\n")
# Find first non-empty, non-traceback line
err_summary = err_lines[0]
for line in err_lines:
stripped = line.strip()
if stripped and not stripped.startswith(
("Traceback", "File ", "{", ")")
):
err_summary = stripped
break
lines.append(f"Error: `{err_summary}`")
# Events log
task_events = sorted(
events_by_task.get(t.task_external_id, []),
key=lambda e: e.timestamp,
)
if task_events:
lines.append("Events:")
for ev in task_events:
ts = ev.timestamp.strftime("%H:%M:%S")
ev_icon = ""
if ev.event_type.value == "FINISHED":
ev_icon = "\u2705 "
elif ev.event_type.value in ("FAILED", "TIMED_OUT"):
ev_icon = "\u274c "
elif ev.event_type.value == "STARTED":
ev_icon = "\u25b6\ufe0f "
elif ev.event_type.value == "RETRYING":
ev_icon = "\U0001f504 "
elif ev.event_type.value == "CANCELLED":
ev_icon = "\u26a0\ufe0f "
msg = ev.message.strip()
if ev.error_message:
# Just first line of error in event log
err_first = ev.error_message.strip().split("\n")[0]
if msg:
msg += f" | {err_first}"
else:
msg = err_first
if msg:
lines.append(f" `{ts}` {ev_icon}{ev.event_type.value}: {msg}")
else:
lines.append(f" `{ts}` {ev_icon}{ev.event_type.value}")
lines.append("")
lines.append("```")
return "\n".join(lines)
def render_run_summary(idx: int, run: V1TaskSummary) -> str:
"""One-line summary for a run in the list view."""
icon = STATUS_ICON.get(run.status, "?")
name = run.display_name or run.workflow_name or "?"
run_id = run.workflow_run_external_id or "?"
dur = _fmt_duration(run.duration)
started = _fmt_time(run.started_at)
meta = ""
if run.additional_metadata:
meta_parts = [f"{k}=`{v}`" for k, v in run.additional_metadata.items()]
meta = f" ({', '.join(meta_parts)})"
return (
f" {idx}. {icon} **{name}** started={started} dur={dur}{meta}\n"
f" `{run_id}`"
)
async def _fetch_run_list(
count: int = 5,
statuses: list[V1TaskStatus] | None = None,
) -> list[V1TaskSummary]:
client = HatchetClientManager.get_client()
since = datetime.now(timezone.utc) - timedelta(days=7)
runs = await client.runs.aio_list(
since=since,
statuses=statuses,
limit=count,
)
return runs.rows or []
async def list_recent_runs(
count: int = 5,
statuses: list[V1TaskStatus] | None = None,
) -> str:
"""List recent workflow runs as text."""
rows = await _fetch_run_list(count, statuses)
if not rows:
return "No runs found in the last 7 days."
lines = [f"Recent runs ({len(rows)}):", ""]
for i, run in enumerate(rows, 1):
lines.append(render_run_summary(i, run))
lines.append("")
lines.append("Use `--show N` to see full DAG for run N")
return "\n".join(lines)
async def show_run(workflow_run_id: str) -> str:
"""Fetch and render a single run."""
client = HatchetClientManager.get_client()
details = await client.runs.aio_get(workflow_run_id)
return render_run_detail(details)
async def show_nth_run(
n: int,
count: int = 5,
statuses: list[V1TaskStatus] | None = None,
) -> str:
"""Fetch list, then drill into Nth run."""
rows = await _fetch_run_list(count, statuses)
if not rows:
return "No runs found in the last 7 days."
if n < 1 or n > len(rows):
return f"Invalid index {n}. Have {len(rows)} runs (1-{len(rows)})."
run = rows[n - 1]
return await show_run(run.workflow_run_external_id)
async def main_async(args: argparse.Namespace) -> None:
statuses = [V1TaskStatus(args.status)] if args.status else None
if args.run_id:
output = await show_run(args.run_id)
elif args.show is not None:
output = await show_nth_run(args.show, count=args.last, statuses=statuses)
else:
output = await list_recent_runs(count=args.last, statuses=statuses)
print(output)
def main() -> None:
parser = argparse.ArgumentParser(
description="Render Hatchet workflow runs as text DAG"
)
parser.add_argument(
"run_id",
nargs="?",
default=None,
help="Workflow run ID to show in detail. If omitted, lists recent runs.",
)
parser.add_argument(
"--show",
type=int,
default=None,
metavar="N",
help="Show full DAG for the Nth run in the list (1-indexed)",
)
parser.add_argument(
"--last",
type=int,
default=5,
help="Number of recent runs to list (default: 5)",
)
parser.add_argument(
"--status",
choices=["QUEUED", "RUNNING", "COMPLETED", "FAILED", "CANCELLED"],
help="Filter by status",
)
args = parser.parse_args()
asyncio.run(main_async(args))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,112 @@
"""
LiveKit track file utilities.
Parse participant identity and timing from Auto Track Egress S3 filepaths.
Actual filepath format from LiveKit Auto Track Egress:
livekit/{room_name}/{publisher_identity}-{ISO_timestamp}-{track_id}.{ext}
Examples:
livekit/myroom-20260401172036/juan-4b82ed-2026-04-01T195758-TR_AMR3SWs74Divho.ogg
livekit/myroom-20260401172036/juan2-63abcf-2026-04-01T195847-TR_AMyoSbM7tAQbYj.ogg
livekit/myroom-20260401172036/EG_K5sipvfB5fTM.json (manifest, skip)
livekit/myroom-20260401172036/juan-4b82ed-2026-04-01T195727-TR_VC679dgMQBdfhT.webm (video, skip)
"""
import re
from dataclasses import dataclass
from datetime import datetime, timezone
from reflector.utils.string import NonEmptyString
@dataclass
class LiveKitTrackFile:
"""Parsed info from a LiveKit track egress filepath."""
s3_key: str
room_name: str
participant_identity: str
timestamp: datetime # Parsed from ISO timestamp in filename
track_id: str # LiveKit track ID (e.g., TR_AMR3SWs74Divho)
# Pattern: livekit/{room_name}/{identity}-{ISO_date}T{time}-{track_id}.{ext}
# The identity can contain alphanumeric, hyphens, underscores
# ISO timestamp is like 2026-04-01T195758
# Track ID starts with TR_
_TRACK_FILENAME_PATTERN = re.compile(
r"^livekit/(?P<room_name>[^/]+)/(?P<identity>.+?)-(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{6})-(?P<track_id>TR_\w+)\.(?P<ext>\w+)$"
)
def parse_livekit_track_filepath(s3_key: str) -> LiveKitTrackFile:
"""Parse a LiveKit track egress filepath into components.
Args:
s3_key: S3 key like 'livekit/myroom-20260401/juan-4b82ed-2026-04-01T195758-TR_AMR3SWs74Divho.ogg'
Returns:
LiveKitTrackFile with parsed components.
Raises:
ValueError: If the filepath doesn't match the expected format.
"""
match = _TRACK_FILENAME_PATTERN.match(s3_key)
if not match:
raise ValueError(
f"LiveKit track filepath doesn't match expected format: {s3_key}"
)
# Parse ISO-ish timestamp (e.g., 2026-04-01T195758 → datetime)
ts_str = match.group("timestamp")
try:
ts = datetime.strptime(ts_str, "%Y-%m-%dT%H%M%S").replace(tzinfo=timezone.utc)
except ValueError:
raise ValueError(f"Cannot parse timestamp '{ts_str}' from: {s3_key}")
return LiveKitTrackFile(
s3_key=s3_key,
room_name=match.group("room_name"),
participant_identity=match.group("identity"),
timestamp=ts,
track_id=match.group("track_id"),
)
def filter_audio_tracks(s3_keys: list[str]) -> list[str]:
"""Filter S3 keys to only audio tracks (.ogg), excluding manifests and video."""
return [k for k in s3_keys if k.endswith(".ogg")]
def calculate_track_offsets(
tracks: list[LiveKitTrackFile],
) -> list[tuple[LiveKitTrackFile, float]]:
"""Calculate silence padding offset for each track.
The earliest track starts at time zero. Each subsequent track
gets (track_timestamp - earliest_timestamp) seconds of silence prepended.
Returns:
List of (track, offset_seconds) tuples.
"""
if not tracks:
return []
earliest = min(t.timestamp for t in tracks)
return [(t, (t.timestamp - earliest).total_seconds()) for t in tracks]
def extract_livekit_base_room_name(livekit_room_name: str) -> NonEmptyString:
"""Extract base room name from LiveKit timestamped room name.
LiveKit rooms use the same naming as Daily: {base_name}-YYYYMMDDHHMMSS
"""
base_name = livekit_room_name.rsplit("-", 1)[0]
assert base_name, f"Extracted base name is empty from: {livekit_room_name}"
return NonEmptyString(base_name)
def recording_lock_key(room_name: str) -> str:
"""Redis lock key for preventing duplicate processing."""
return f"livekit:processing:{room_name}"

View File

@@ -1,7 +1,7 @@
from reflector.settings import settings
from reflector.storage import get_dailyco_storage, get_whereby_storage
from ..schemas.platform import WHEREBY_PLATFORM, Platform
from ..schemas.platform import LIVEKIT_PLATFORM, WHEREBY_PLATFORM, Platform
from .base import VideoPlatformClient, VideoPlatformConfig
from .registry import get_platform_client
@@ -44,6 +44,27 @@ def get_platform_config(platform: Platform) -> VideoPlatformConfig:
s3_region=daily_storage.region,
aws_role_arn=daily_storage.role_credential,
)
elif platform == LIVEKIT_PLATFORM:
if not settings.LIVEKIT_URL:
raise ValueError(
"LIVEKIT_URL is required when platform='livekit'. "
"Set LIVEKIT_URL environment variable."
)
if not settings.LIVEKIT_API_KEY or not settings.LIVEKIT_API_SECRET:
raise ValueError(
"LIVEKIT_API_KEY and LIVEKIT_API_SECRET are required when platform='livekit'. "
"Set LIVEKIT_API_KEY and LIVEKIT_API_SECRET environment variables."
)
return VideoPlatformConfig(
api_key=settings.LIVEKIT_API_KEY,
webhook_secret=settings.LIVEKIT_WEBHOOK_SECRET
or settings.LIVEKIT_API_SECRET,
api_url=settings.LIVEKIT_URL,
s3_bucket=settings.LIVEKIT_STORAGE_AWS_BUCKET_NAME,
s3_region=settings.LIVEKIT_STORAGE_AWS_REGION,
aws_access_key_id=settings.LIVEKIT_STORAGE_AWS_ACCESS_KEY_ID,
aws_access_key_secret=settings.LIVEKIT_STORAGE_AWS_SECRET_ACCESS_KEY,
)
else:
raise ValueError(f"Unknown platform: {platform}")

View File

@@ -0,0 +1,192 @@
"""
LiveKit video platform client for Reflector.
Self-hosted, open-source alternative to Daily.co.
Uses Track Egress for per-participant audio recording (no composite video).
"""
from datetime import datetime, timezone
from urllib.parse import urlencode
from uuid import uuid4
from reflector.db.rooms import Room
from reflector.livekit_api.client import LiveKitApiClient
from reflector.livekit_api.webhooks import create_webhook_receiver, verify_webhook
from reflector.logger import logger
from reflector.settings import settings
from ..schemas.platform import Platform
from ..utils.string import NonEmptyString
from .base import ROOM_PREFIX_SEPARATOR, VideoPlatformClient
from .models import MeetingData, SessionData, VideoPlatformConfig
class LiveKitClient(VideoPlatformClient):
PLATFORM_NAME: Platform = "livekit"
TIMESTAMP_FORMAT = "%Y%m%d%H%M%S"
def __init__(self, config: VideoPlatformConfig):
super().__init__(config)
self._api_client = LiveKitApiClient(
url=config.api_url or "",
api_key=config.api_key,
api_secret=config.webhook_secret, # LiveKit uses API secret for both auth and webhooks
s3_bucket=config.s3_bucket,
s3_region=config.s3_region,
s3_access_key=config.aws_access_key_id,
s3_secret_key=config.aws_access_key_secret,
s3_endpoint=settings.LIVEKIT_STORAGE_AWS_ENDPOINT_URL,
)
self._webhook_receiver = create_webhook_receiver(
api_key=config.api_key,
api_secret=config.webhook_secret,
)
async def create_meeting(
self, room_name_prefix: NonEmptyString, end_date: datetime, room: Room
) -> MeetingData:
"""Create a LiveKit room for this meeting.
LiveKit rooms are created explicitly via API. A new room is created
for each Reflector meeting (same pattern as Daily.co).
"""
now = datetime.now(timezone.utc)
timestamp = now.strftime(self.TIMESTAMP_FORMAT)
room_name = f"{room_name_prefix}{ROOM_PREFIX_SEPARATOR}{timestamp}"
# Calculate empty_timeout from end_date (seconds until expiry)
# Ensure end_date is timezone-aware for subtraction
end_date_aware = (
end_date if end_date.tzinfo else end_date.replace(tzinfo=timezone.utc)
)
remaining = int((end_date_aware - now).total_seconds())
empty_timeout = max(300, min(remaining, 86400)) # 5 min to 24 hours
# Enable auto track egress for cloud recording (per-participant audio to S3).
# Gracefully degrade if S3 credentials are missing — room still works, just no recording.
enable_recording = room.recording_type == "cloud"
egress_enabled = False
if enable_recording:
try:
self._api_client._build_s3_upload() # Validate credentials exist
egress_enabled = True
except ValueError:
logger.warning(
"S3 credentials not configured — room created without auto track egress. "
"Set LIVEKIT_STORAGE_AWS_* to enable recording.",
room_name=room_name,
)
lk_room = await self._api_client.create_room(
name=room_name,
empty_timeout=empty_timeout,
enable_auto_track_egress=egress_enabled,
)
logger.info(
"LiveKit room created",
room_name=lk_room.name,
room_sid=lk_room.sid,
empty_timeout=empty_timeout,
auto_track_egress=egress_enabled,
)
# room_url includes the server URL + room name as query param.
# The join endpoint in rooms.py appends the token as another query param.
# Frontend parses: ws://host:7880?room=<name>&token=<jwt>
public_url = settings.LIVEKIT_PUBLIC_URL or settings.LIVEKIT_URL or ""
room_url = f"{public_url}?{urlencode({'room': lk_room.name})}"
return MeetingData(
meeting_id=lk_room.sid or str(uuid4()),
room_name=lk_room.name,
room_url=room_url,
host_room_url=room_url,
platform=self.PLATFORM_NAME,
extra_data={"livekit_room_sid": lk_room.sid},
)
async def get_room_sessions(self, room_name: str) -> list[SessionData]:
"""Get current participants in a LiveKit room.
For historical sessions, we rely on webhook-stored data (same as Daily).
This returns currently-connected participants.
"""
try:
participants = await self._api_client.list_participants(room_name)
return [
SessionData(
session_id=p.sid,
started_at=datetime.fromtimestamp(
p.joined_at if p.joined_at else 0, tz=timezone.utc
),
ended_at=None, # Still active
)
for p in participants
if p.sid # Skip empty entries
]
except Exception as e:
logger.debug(
"Could not list LiveKit participants (room may not exist)",
room_name=room_name,
error=str(e),
)
return []
async def upload_logo(self, room_name: str, logo_path: str) -> bool:
# LiveKit doesn't have a logo upload concept; handled in frontend theming
return True
def verify_webhook_signature(
self, body: bytes, signature: str, timestamp: str | None = None
) -> bool:
"""Verify LiveKit webhook signature.
LiveKit sends the JWT in the Authorization header. The `signature`
param here receives the Authorization header value.
"""
event = verify_webhook(self._webhook_receiver, body, signature)
return event is not None
def create_access_token(
self,
room_name: str,
participant_identity: str,
participant_name: str | None = None,
is_admin: bool = False,
) -> str:
"""Generate a LiveKit access token for a participant."""
return self._api_client.create_access_token(
room_name=room_name,
participant_identity=participant_identity,
participant_name=participant_name,
room_admin=is_admin,
)
async def start_track_egress(
self,
room_name: str,
track_sid: str,
s3_filepath: str,
):
"""Start Track Egress for a single audio track."""
return await self._api_client.start_track_egress(
room_name=room_name,
track_sid=track_sid,
s3_filepath=s3_filepath,
)
async def list_egress(self, room_name: str | None = None):
return await self._api_client.list_egress(room_name=room_name)
async def stop_egress(self, egress_id: str):
return await self._api_client.stop_egress(egress_id=egress_id)
async def close(self):
await self._api_client.close()
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()

View File

@@ -1,6 +1,11 @@
from typing import Dict, Type
from ..schemas.platform import DAILY_PLATFORM, WHEREBY_PLATFORM, Platform
from ..schemas.platform import (
DAILY_PLATFORM,
LIVEKIT_PLATFORM,
WHEREBY_PLATFORM,
Platform,
)
from .base import VideoPlatformClient, VideoPlatformConfig
_PLATFORMS: Dict[Platform, Type[VideoPlatformClient]] = {}
@@ -26,10 +31,12 @@ def get_available_platforms() -> list[Platform]:
def _register_builtin_platforms():
from .daily import DailyClient # noqa: PLC0415
from .livekit import LiveKitClient # noqa: PLC0415
from .whereby import WherebyClient # noqa: PLC0415
register_platform(WHEREBY_PLATFORM, WherebyClient)
register_platform(DAILY_PLATFORM, DailyClient)
register_platform(LIVEKIT_PLATFORM, LiveKitClient)
_register_builtin_platforms()

View File

@@ -0,0 +1,20 @@
from fastapi import APIRouter
from pydantic import BaseModel
from reflector.email import is_email_configured
from reflector.settings import settings
router = APIRouter()
class ConfigResponse(BaseModel):
zulip_enabled: bool
email_enabled: bool
@router.get("/config", response_model=ConfigResponse)
async def get_config():
return ConfigResponse(
zulip_enabled=bool(settings.ZULIP_REALM),
email_enabled=is_email_configured(),
)

View File

@@ -0,0 +1,246 @@
"""LiveKit webhook handler.
Processes LiveKit webhook events for participant tracking and
Track Egress recording completion.
LiveKit sends webhooks as POST requests with JWT authentication
in the Authorization header.
Webhooks are used as fast-path triggers and logging. Track discovery
for the multitrack pipeline uses S3 listing (source of truth), not
webhook data.
"""
from fastapi import APIRouter, HTTPException, Request
from reflector.db.meetings import meetings_controller
from reflector.livekit_api.webhooks import create_webhook_receiver, verify_webhook
from reflector.logger import logger as _logger
from reflector.settings import settings
router = APIRouter()
logger = _logger.bind(platform="livekit")
# Module-level receiver, lazily initialized on first webhook
_webhook_receiver = None
def _get_webhook_receiver():
global _webhook_receiver
if _webhook_receiver is None:
if not settings.LIVEKIT_API_KEY or not settings.LIVEKIT_API_SECRET:
raise ValueError("LiveKit not configured")
_webhook_receiver = create_webhook_receiver(
api_key=settings.LIVEKIT_API_KEY,
api_secret=settings.LIVEKIT_WEBHOOK_SECRET or settings.LIVEKIT_API_SECRET,
)
return _webhook_receiver
@router.post("/webhook")
async def livekit_webhook(request: Request):
"""Handle LiveKit webhook events.
LiveKit webhook events include:
- participant_joined / participant_left
- egress_started / egress_updated / egress_ended
- room_started / room_finished
- track_published / track_unpublished
"""
if not settings.LIVEKIT_API_KEY or not settings.LIVEKIT_API_SECRET:
raise HTTPException(status_code=500, detail="LiveKit not configured")
body = await request.body()
auth_header = request.headers.get("Authorization", "")
receiver = _get_webhook_receiver()
event = verify_webhook(receiver, body, auth_header)
if event is None:
logger.warning(
"Invalid LiveKit webhook signature",
has_auth=bool(auth_header),
has_body=bool(body),
)
raise HTTPException(status_code=401, detail="Invalid webhook signature")
event_type = event.event
match event_type:
case "participant_joined":
await _handle_participant_joined(event)
case "participant_left":
await _handle_participant_left(event)
case "egress_started":
await _handle_egress_started(event)
case "egress_ended":
await _handle_egress_ended(event)
case "room_started":
logger.info(
"Room started",
room_name=event.room.name if event.room else None,
)
case "room_finished":
await _handle_room_finished(event)
case "track_published" | "track_unpublished":
logger.debug(
f"Track event: {event_type}",
room_name=event.room.name if event.room else None,
participant=event.participant.identity if event.participant else None,
)
case _:
logger.debug(
"Unhandled LiveKit webhook event",
event_type=event_type,
)
return {"status": "ok"}
async def _handle_participant_joined(event):
room_name = event.room.name if event.room else None
participant = event.participant
if not room_name or not participant:
logger.warning("participant_joined: missing room or participant data")
return
meeting = await meetings_controller.get_by_room_name(room_name)
if not meeting:
logger.warning("participant_joined: meeting not found", room_name=room_name)
return
logger.info(
"Participant joined",
meeting_id=meeting.id,
room_name=room_name,
participant_identity=participant.identity,
participant_sid=participant.sid,
)
async def _handle_participant_left(event):
room_name = event.room.name if event.room else None
participant = event.participant
if not room_name or not participant:
logger.warning("participant_left: missing room or participant data")
return
meeting = await meetings_controller.get_by_room_name(room_name)
if not meeting:
logger.warning("participant_left: meeting not found", room_name=room_name)
return
logger.info(
"Participant left",
meeting_id=meeting.id,
room_name=room_name,
participant_identity=participant.identity,
participant_sid=participant.sid,
)
async def _handle_egress_started(event):
egress = event.egress_info
logger.info(
"Egress started",
room_name=egress.room_name if egress else None,
egress_id=egress.egress_id if egress else None,
)
async def _handle_egress_ended(event):
"""Handle Track Egress completion. Delete video files immediately to save storage.
AutoTrackEgress records ALL tracks (audio + video). Audio is kept for the
transcription pipeline. Video files are unused and deleted on completion.
This saves ~50x storage (video is 98% of egress output for HD cameras).
"""
egress = event.egress_info
if not egress:
logger.warning("egress_ended: no egress info in payload")
return
# EGRESS_FAILED = 4
if egress.status == 4:
logger.error(
"Egress failed",
room_name=egress.room_name,
egress_id=egress.egress_id,
error=egress.error,
)
return
file_results = list(egress.file_results)
logger.info(
"Egress ended",
room_name=egress.room_name,
egress_id=egress.egress_id,
status=egress.status,
num_files=len(file_results),
filenames=[f.filename for f in file_results] if file_results else [],
)
# Delete video files (.webm) immediately — only audio (.ogg) is needed for transcription.
# Video tracks are 50-90x larger than audio and unused by the pipeline.
# JSON manifests are kept (lightweight metadata, ~430 bytes each).
for file_result in file_results:
filename = file_result.filename
if filename and filename.endswith(".webm"):
try:
from reflector.storage import get_source_storage # noqa: PLC0415
storage = get_source_storage("livekit")
await storage.delete_file(filename)
logger.info(
"Deleted video egress file",
filename=filename,
room_name=egress.room_name,
)
except Exception as e:
# Non-critical — pipeline filters these out anyway
logger.warning(
"Failed to delete video egress file",
filename=filename,
error=str(e),
)
async def _handle_room_finished(event):
"""Fast-path: trigger multitrack processing when room closes.
This is an optimization — if missed, the process_livekit_ended_meetings
beat task catches it within ~2 minutes.
"""
room_name = event.room.name if event.room else None
if not room_name:
logger.warning("room_finished: no room name in payload")
return
logger.info("Room finished", room_name=room_name)
meeting = await meetings_controller.get_by_room_name(room_name)
if not meeting:
logger.warning("room_finished: meeting not found", room_name=room_name)
return
# Deactivate the meeting — LiveKit room is destroyed, so process_meetings
# can't detect this via API (list_participants returns empty for deleted rooms).
if meeting.is_active:
await meetings_controller.update_meeting(meeting.id, is_active=False)
logger.info("room_finished: meeting deactivated", meeting_id=meeting.id)
# Import here to avoid circular imports (worker imports views)
from reflector.worker.process import process_livekit_multitrack
process_livekit_multitrack.delay(
room_name=room_name,
meeting_id=meeting.id,
)
logger.info(
"room_finished: queued multitrack processing",
meeting_id=meeting.id,
room_name=room_name,
)

View File

@@ -4,7 +4,7 @@ from typing import Annotated, Any, Optional
from uuid import UUID
from fastapi import APIRouter, Depends, HTTPException, Request
from pydantic import BaseModel
from pydantic import BaseModel, EmailStr
import reflector.auth as auth
from reflector.dailyco_api import RecordingType
@@ -91,9 +91,7 @@ class StartRecordingRequest(BaseModel):
async def start_recording(
meeting_id: NonEmptyString,
body: StartRecordingRequest,
user: Annotated[
Optional[auth.UserInfo], Depends(auth.current_user_optional_if_public_mode)
],
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
) -> dict[str, Any]:
"""Start cloud or raw-tracks recording via Daily.co REST API.
@@ -151,3 +149,26 @@ async def start_recording(
raise HTTPException(
status_code=500, detail=f"Failed to start recording: {str(e)}"
)
class AddEmailRecipientRequest(BaseModel):
email: EmailStr
@router.post("/meetings/{meeting_id}/email-recipient")
async def add_email_recipient(
meeting_id: str,
request: AddEmailRecipientRequest,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
):
"""Add an email address to receive the transcript link when processing completes."""
meeting = await meetings_controller.get_by_id(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
include_link = user is not None
recipients = await meetings_controller.add_email_recipient(
meeting_id, request.email, include_link=include_link
)
return {"status": "success", "email_recipients": recipients}

View File

@@ -44,6 +44,7 @@ class Room(BaseModel):
ics_last_etag: Optional[str] = None
platform: Platform
skip_consent: bool = False
email_transcript_to: str | None = None
class RoomDetails(Room):
@@ -93,6 +94,7 @@ class CreateRoom(BaseModel):
ics_enabled: bool = False
platform: Platform
skip_consent: bool = False
email_transcript_to: str | None = None
class UpdateRoom(BaseModel):
@@ -112,6 +114,7 @@ class UpdateRoom(BaseModel):
ics_enabled: Optional[bool] = None
platform: Optional[Platform] = None
skip_consent: Optional[bool] = None
email_transcript_to: Optional[str] = None
class CreateRoomMeeting(BaseModel):
@@ -253,6 +256,7 @@ async def rooms_create(
ics_enabled=room.ics_enabled,
platform=room.platform,
skip_consent=room.skip_consent,
email_transcript_to=room.email_transcript_to,
)
@@ -550,6 +554,7 @@ async def rooms_join_meeting(
room_name: str,
meeting_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
display_name: str | None = None,
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(room_name)
@@ -594,4 +599,51 @@ async def rooms_join_meeting(
meeting = meeting.model_copy()
meeting.room_url = add_query_param(meeting.room_url, "t", token)
elif meeting.platform == "livekit":
import re
import uuid
client = create_platform_client(meeting.platform)
# Identity must be unique per participant to avoid S3 key collisions.
# Format: {readable_name}-{short_uuid} ensures uniqueness even for same names.
uid_suffix = uuid.uuid4().hex[:6]
if display_name:
safe_name = re.sub(r"[^a-zA-Z0-9_-]", "_", display_name.strip())[:40]
participant_identity = (
f"{safe_name}-{uid_suffix}" if safe_name else f"anon-{uid_suffix}"
)
elif user_id:
email = getattr(user, "email", None)
if email and "@" in email:
participant_identity = f"{email.split('@')[0]}-{uid_suffix}"
else:
participant_identity = f"{user_id[:12]}-{uid_suffix}"
else:
participant_identity = f"anon-{uid_suffix}"
participant_name = display_name or participant_identity
# Store identity → Reflector user_id mapping for the pipeline
# (so TranscriptParticipant.user_id can be set correctly)
if user_id:
from reflector.redis_cache import get_async_redis_client # noqa: PLC0415
redis_client = await get_async_redis_client()
mapping_key = f"livekit:participant_map:{meeting.room_name}"
await redis_client.hset(mapping_key, participant_identity, user_id)
await redis_client.expire(mapping_key, 7 * 86400) # 7 day TTL
token = client.create_access_token(
room_name=meeting.room_name,
participant_identity=participant_identity,
participant_name=participant_name,
is_admin=user_id == room.user_id if user_id else False,
)
# Close the platform client to release aiohttp session
if hasattr(client, "close"):
await client.close()
meeting = meeting.model_copy()
# For LiveKit, room_url is the WS URL; token goes as a query param
meeting.room_url = add_query_param(meeting.room_url, "token", token)
return meeting

View File

@@ -16,6 +16,7 @@ from pydantic import (
import reflector.auth as auth
from reflector.db import get_database
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import recordings_controller
from reflector.db.rooms import rooms_controller
from reflector.db.search import (
@@ -39,6 +40,7 @@ from reflector.db.transcripts import (
transcripts_controller,
)
from reflector.db.users import user_controller
from reflector.email import is_email_configured, send_transcript_email
from reflector.processors.types import Transcript as ProcessorTranscript
from reflector.processors.types import Word
from reflector.schemas.transcript_formats import TranscriptFormat, TranscriptSegment
@@ -112,6 +114,8 @@ class GetTranscriptMinimal(BaseModel):
room_name: str | None = None
audio_deleted: bool | None = None
change_seq: int | None = None
has_cloud_video: bool = False
cloud_video_duration: int | None = None
class TranscriptParticipantWithEmail(TranscriptParticipant):
@@ -305,6 +309,7 @@ async def transcripts_search(
source_kind: Optional[SourceKind] = None,
from_datetime: SearchFromDatetimeParam = None,
to_datetime: SearchToDatetimeParam = None,
include_deleted: bool = False,
user: Annotated[
Optional[auth.UserInfo], Depends(auth.current_user_optional_if_public_mode)
] = None,
@@ -312,6 +317,12 @@ async def transcripts_search(
"""Full-text search across transcript titles and content."""
user_id = user["sub"] if user else None
if include_deleted and not user_id:
raise HTTPException(
status_code=401,
detail="Authentication required to view deleted transcripts",
)
if from_datetime and to_datetime and from_datetime > to_datetime:
raise HTTPException(
status_code=400, detail="'from' must be less than or equal to 'to'"
@@ -326,6 +337,7 @@ async def transcripts_search(
source_kind=source_kind,
from_datetime=from_datetime,
to_datetime=to_datetime,
include_deleted=include_deleted,
)
results, total = await search_controller.search_transcripts(search_params)
@@ -501,6 +513,14 @@ async def transcript_get(
)
)
has_cloud_video = False
cloud_video_duration = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if meeting and meeting.daily_composed_video_s3_key:
has_cloud_video = True
cloud_video_duration = meeting.daily_composed_video_duration
base_data = {
"id": transcript.id,
"user_id": transcript.user_id,
@@ -524,6 +544,8 @@ async def transcript_get(
"audio_deleted": transcript.audio_deleted,
"change_seq": transcript.change_seq,
"participants": participants,
"has_cloud_video": has_cloud_video,
"cloud_video_duration": cloud_video_duration,
}
if transcript_format == "text":
@@ -601,6 +623,54 @@ async def transcript_delete(
return DeletionStatus(status="ok")
@router.post("/transcripts/{transcript_id}/restore", response_model=DeletionStatus)
async def transcript_restore(
transcript_id: str,
user: Annotated[auth.UserInfo, Depends(auth.current_user)],
):
"""Restore a soft-deleted transcript."""
user_id = user["sub"]
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
if transcript.deleted_at is None:
raise HTTPException(status_code=400, detail="Transcript is not deleted")
if not transcripts_controller.user_can_mutate(transcript, user_id):
raise HTTPException(status_code=403, detail="Not authorized")
await transcripts_controller.restore_by_id(transcript.id, user_id=user_id)
await get_ws_manager().send_json(
room_id=f"user:{user_id}",
message={"event": "TRANSCRIPT_RESTORED", "data": {"id": transcript.id}},
)
return DeletionStatus(status="ok")
@router.delete("/transcripts/{transcript_id}/destroy", response_model=DeletionStatus)
async def transcript_destroy(
transcript_id: str,
user: Annotated[auth.UserInfo, Depends(auth.current_user)],
):
"""Permanently delete a transcript and all associated files."""
user_id = user["sub"]
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
if transcript.deleted_at is None:
raise HTTPException(
status_code=400, detail="Transcript must be soft-deleted first"
)
if not transcripts_controller.user_can_mutate(transcript, user_id):
raise HTTPException(status_code=403, detail="Not authorized")
await transcripts_controller.hard_delete(transcript.id)
await get_ws_manager().send_json(
room_id=f"user:{user_id}",
message={"event": "TRANSCRIPT_DELETED", "data": {"id": transcript.id}},
)
return DeletionStatus(status="ok")
@router.get(
"/transcripts/{transcript_id}/topics",
response_model=list[GetTranscriptTopic],
@@ -686,8 +756,6 @@ async def transcript_post_to_zulip(
)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
if not transcripts_controller.user_can_mutate(transcript, user_id):
raise HTTPException(status_code=403, detail="Not authorized")
content = get_zulip_message(transcript, include_topics)
message_updated = False
@@ -705,3 +773,31 @@ async def transcript_post_to_zulip(
await transcripts_controller.update(
transcript, {"zulip_message_id": response["id"]}
)
class SendEmailRequest(BaseModel):
email: str
class SendEmailResponse(BaseModel):
sent: int
@router.post("/transcripts/{transcript_id}/email", response_model=SendEmailResponse)
async def transcript_send_email(
transcript_id: str,
request: SendEmailRequest,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
):
if not is_email_configured():
raise HTTPException(status_code=400, detail="Email not configured")
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
sent = await send_transcript_email(
[request.email], transcript, include_link=(transcript.share_mode == "public")
)
return SendEmailResponse(sent=sent)

View File

@@ -53,9 +53,22 @@ async def transcript_get_audio_mp3(
else:
user_id = token_user["sub"]
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)
if not user_id and not token:
# No authentication provided at all. Only anonymous transcripts
# (user_id=None) are accessible without auth, to preserve
# pipeline access via _generate_local_audio_link().
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript or transcript.deleted_at is not None:
raise HTTPException(status_code=404, detail="Transcript not found")
if transcript.user_id is not None:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Authentication required",
)
else:
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)
if transcript.audio_location == "storage":
# proxy S3 file, to prevent issue with CORS
@@ -94,16 +107,16 @@ async def transcript_get_audio_mp3(
request,
transcript.audio_mp3_filename,
content_type="audio/mpeg",
content_disposition=f"attachment; filename={filename}",
content_disposition=f"inline; filename={filename}",
)
@router.get("/transcripts/{transcript_id}/audio/waveform")
async def transcript_get_audio_waveform(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
user: Annotated[auth.UserInfo, Depends(auth.current_user)],
) -> AudioWaveform:
user_id = user["sub"] if user else None
user_id = user["sub"]
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)

View File

@@ -0,0 +1,169 @@
"""
Transcript download endpoint — generates a zip archive with all transcript files.
"""
import json
import os
import tempfile
import zipfile
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import StreamingResponse
import reflector.auth as auth
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import recordings_controller
from reflector.db.transcripts import transcripts_controller
from reflector.logger import logger
from reflector.storage import get_source_storage, get_transcripts_storage
router = APIRouter()
@router.get(
"/transcripts/{transcript_id}/download/zip",
operation_id="transcript_download_zip",
)
async def transcript_download_zip(
transcript_id: str,
user: Annotated[auth.UserInfo, Depends(auth.current_user)],
):
user_id = user["sub"]
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)
if not transcripts_controller.user_can_mutate(transcript, user_id):
raise HTTPException(status_code=403, detail="Not authorized")
recording = None
if transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
meeting = None
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
truncated_id = str(transcript.id).split("-")[0]
with tempfile.TemporaryDirectory() as tmpdir:
zip_path = os.path.join(tmpdir, f"transcript_{truncated_id}.zip")
with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zf:
# Transcript audio
if transcript.audio_location == "storage" and not transcript.audio_deleted:
try:
storage = get_transcripts_storage()
data = await storage.get_file(transcript.storage_audio_path)
audio_path = os.path.join(tmpdir, "audio.mp3")
with open(audio_path, "wb") as f:
f.write(data)
zf.write(audio_path, "audio.mp3")
except Exception as e:
logger.warning(
"Failed to download transcript audio for zip",
exc_info=e,
transcript_id=transcript.id,
)
elif (
not transcript.audio_deleted
and hasattr(transcript, "audio_mp3_filename")
and transcript.audio_mp3_filename
and transcript.audio_mp3_filename.exists()
):
zf.write(str(transcript.audio_mp3_filename), "audio.mp3")
# Recording tracks (multitrack)
if recording and recording.track_keys:
try:
source_storage = get_source_storage(
"daily" if recording.track_keys else None
)
except Exception:
source_storage = get_transcripts_storage()
for i, key in enumerate(recording.track_keys):
try:
data = await source_storage.get_file(
key, bucket=recording.bucket_name
)
filename = os.path.basename(key) or f"track_{i}"
track_path = os.path.join(tmpdir, f"track_{i}")
with open(track_path, "wb") as f:
f.write(data)
zf.write(track_path, f"tracks/{filename}")
except Exception as e:
logger.warning(
"Failed to download track for zip",
exc_info=e,
track_key=key,
)
# Cloud video
if meeting and meeting.daily_composed_video_s3_key:
try:
source_storage = get_source_storage("daily")
data = await source_storage.get_file(
meeting.daily_composed_video_s3_key
)
video_path = os.path.join(tmpdir, "cloud_video.mp4")
with open(video_path, "wb") as f:
f.write(data)
zf.write(video_path, "cloud_video.mp4")
except Exception as e:
logger.warning(
"Failed to download cloud video for zip",
exc_info=e,
s3_key=meeting.daily_composed_video_s3_key,
)
# Metadata JSON
metadata = {
"id": transcript.id,
"title": transcript.title,
"created_at": (
transcript.created_at.isoformat() if transcript.created_at else None
),
"duration": transcript.duration,
"source_language": transcript.source_language,
"target_language": transcript.target_language,
"short_summary": transcript.short_summary,
"long_summary": transcript.long_summary,
"topics": (
[t.model_dump() for t in transcript.topics]
if transcript.topics
else []
),
"participants": (
[p.model_dump() for p in transcript.participants]
if transcript.participants
else []
),
"action_items": transcript.action_items,
"webvtt": transcript.webvtt,
"recording_id": transcript.recording_id,
"meeting_id": transcript.meeting_id,
}
meta_path = os.path.join(tmpdir, "metadata.json")
with open(meta_path, "w") as f:
json.dump(metadata, f, indent=2, default=str)
zf.write(meta_path, "metadata.json")
# Read zip into memory before tmpdir is cleaned up
with open(zip_path, "rb") as f:
zip_bytes = f.read()
def iter_zip():
offset = 0
chunk_size = 64 * 1024
while offset < len(zip_bytes):
yield zip_bytes[offset : offset + chunk_size]
offset += chunk_size
return StreamingResponse(
iter_zip(),
media_type="application/zip",
headers={
"Content-Disposition": f"attachment; filename=transcript_{truncated_id}.zip"
},
)

View File

@@ -0,0 +1,60 @@
"""
Transcript cloud video endpoint — returns a presigned URL for streaming playback.
"""
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
import reflector.auth as auth
from reflector.db.meetings import meetings_controller
from reflector.db.transcripts import transcripts_controller
from reflector.storage import get_source_storage
router = APIRouter()
class VideoUrlResponse(BaseModel):
url: str
duration: int | None = None
content_type: str = "video/mp4"
@router.get(
"/transcripts/{transcript_id}/video/url",
operation_id="transcript_get_video_url",
response_model=VideoUrlResponse,
)
async def transcript_get_video_url(
transcript_id: str,
user: Annotated[auth.UserInfo, Depends(auth.current_user)],
):
user_id = user["sub"]
transcript = await transcripts_controller.get_by_id_for_http(
transcript_id, user_id=user_id
)
if not transcript.meeting_id:
raise HTTPException(status_code=404, detail="No video available")
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
if not meeting or not meeting.daily_composed_video_s3_key:
raise HTTPException(status_code=404, detail="No video available")
source_storage = get_source_storage("daily")
url = await source_storage.get_file_url(
meeting.daily_composed_video_s3_key,
operation="get_object",
expires_in=900,
extra_params={
"ResponseContentDisposition": "inline",
"ResponseContentType": "video/mp4",
},
)
return VideoUrlResponse(
url=url,
duration=meeting.daily_composed_video_duration,
)

View File

@@ -83,7 +83,25 @@ def build_beat_schedule(
else:
logger.info("Daily.co beat tasks disabled (no DAILY_API_KEY)")
_any_platform = _whereby_enabled or _daily_enabled
_livekit_enabled = bool(settings.LIVEKIT_API_KEY and settings.LIVEKIT_URL)
if _livekit_enabled:
beat_schedule["process_livekit_ended_meetings"] = {
"task": "reflector.worker.process.process_livekit_ended_meetings",
"schedule": 120, # Every 2 minutes
}
beat_schedule["reprocess_failed_livekit_recordings"] = {
"task": "reflector.worker.process.reprocess_failed_livekit_recordings",
"schedule": crontab(hour=5, minute=0),
}
logger.info(
"LiveKit beat tasks enabled",
tasks=[
"process_livekit_ended_meetings",
"reprocess_failed_livekit_recordings",
],
)
_any_platform = _whereby_enabled or _daily_enabled or _livekit_enabled
if _any_platform:
beat_schedule["process_meetings"] = {
"task": "reflector.worker.process.process_meetings",
@@ -146,7 +164,6 @@ else:
app.conf.broker_connection_retry_on_startup = True
app.autodiscover_tasks(
[
"reflector.pipelines.main_live_pipeline",
"reflector.worker.healthcheck",
"reflector.worker.process",
"reflector.worker.cleanup",

View File

@@ -90,7 +90,9 @@ async def cleanup_old_transcripts(
):
"""Delete old anonymous transcripts and their associated recordings/meetings."""
query = transcripts.select().where(
(transcripts.c.created_at < cutoff_date) & (transcripts.c.user_id.is_(None))
(transcripts.c.created_at < cutoff_date)
& (transcripts.c.user_id.is_(None))
& (transcripts.c.deleted_at.is_(None))
)
old_transcripts = await db.fetch_all(query)

View File

@@ -12,6 +12,7 @@ from celery import shared_task
from celery.utils.log import get_task_logger
from pydantic import ValidationError
from reflector.asynctask import asynctask
from reflector.dailyco_api import FinishedRecordingResponse, RecordingResponse
from reflector.db.daily_participant_sessions import (
DailyParticipantSession,
@@ -25,9 +26,6 @@ from reflector.db.transcripts import (
transcripts_controller,
)
from reflector.hatchet.client import HatchetClientManager
from reflector.pipelines.main_live_pipeline import asynctask
from reflector.pipelines.topic_processing import EmptyPipeline
from reflector.processors import AudioFileWriterProcessor
from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
from reflector.redis_cache import RedisAsyncLock
from reflector.settings import settings
@@ -104,6 +102,12 @@ async def process_recording(bucket_name: str, object_key: str):
room = await rooms_controller.get_by_id(meeting.room_id)
recording = await recordings_controller.get_by_object_key(bucket_name, object_key)
if recording and recording.deleted_at is not None:
logger.info(
"Skipping soft-deleted recording",
recording_id=recording.id,
)
return
if not recording:
recording = await recordings_controller.create(
Recording(
@@ -115,6 +119,13 @@ async def process_recording(bucket_name: str, object_key: str):
)
transcript = await transcripts_controller.get_by_recording_id(recording.id)
if transcript and transcript.deleted_at is not None:
logger.info(
"Skipping soft-deleted transcript for recording",
recording_id=recording.id,
transcript_id=transcript.id,
)
return
if transcript:
await transcripts_controller.update(
transcript,
@@ -262,6 +273,13 @@ async def _process_multitrack_recording_inner(
# Check if recording already exists (reprocessing path)
recording = await recordings_controller.get_by_id(recording_id)
if recording and recording.deleted_at is not None:
logger.info(
"Skipping soft-deleted recording",
recording_id=recording_id,
)
return
if recording and recording.meeting_id:
# Reprocessing: recording exists with meeting already linked
meeting = await meetings_controller.get_by_id(recording.meeting_id)
@@ -341,6 +359,13 @@ async def _process_multitrack_recording_inner(
)
transcript = await transcripts_controller.get_by_recording_id(recording.id)
if transcript and transcript.deleted_at is not None:
logger.info(
"Skipping soft-deleted transcript for recording",
recording_id=recording.id,
transcript_id=transcript.id,
)
return
if not transcript:
transcript = await transcripts_controller.add(
"",
@@ -844,13 +869,30 @@ async def process_meetings():
elif has_had_sessions:
should_deactivate = True
logger_.info("Meeting ended - all participants left")
elif current_time > end_date:
should_deactivate = True
logger_.info(
"Meeting deactivated - scheduled time ended with no participants",
)
else:
logger_.debug("Meeting not yet started, keep it")
elif not has_had_sessions:
# No sessions recorded — either no one joined, or webhooks
# didn't arrive (e.g. local dev without tunnel).
meeting_start = meeting.start_date
if meeting_start.tzinfo is None:
meeting_start = meeting_start.replace(tzinfo=timezone.utc)
age_minutes = (current_time - meeting_start).total_seconds() / 60
is_scheduled = bool(meeting.calendar_event_id)
if is_scheduled and current_time > end_date:
# Scheduled meeting past its end time with no participants
should_deactivate = True
logger_.info(
"Meeting deactivated - scheduled time ended with no participants",
)
elif not is_scheduled and age_minutes > 30:
# On-the-fly meeting with no sessions after 30 min
should_deactivate = True
logger_.info(
"Meeting deactivated - no sessions after 30 min",
age_minutes=round(age_minutes, 1),
)
else:
logger_.debug("Meeting not yet started, keep it")
if should_deactivate:
await meetings_controller.update_meeting(
@@ -881,6 +923,11 @@ async def convert_audio_and_waveform(transcript) -> None:
transcript_id=transcript.id,
)
from reflector.pipelines.topic_processing import EmptyPipeline # noqa: PLC0415
from reflector.processors.audio_file_writer import (
AudioFileWriterProcessor, # noqa: PLC0415
)
upload_path = transcript.data_path / "upload.webm"
mp3_path = transcript.audio_mp3_filename
@@ -1140,3 +1187,311 @@ async def trigger_daily_reconciliation() -> None:
except Exception as e:
logger.error("Reconciliation trigger failed", error=str(e), exc_info=True)
# ============================================================
# LiveKit multitrack recording tasks
# ============================================================
@shared_task
@asynctask
async def process_livekit_multitrack(
room_name: str,
meeting_id: str,
):
"""
Process LiveKit multitrack recording by discovering tracks on S3.
Tracks are discovered via S3 listing (source of truth), not webhooks.
Called from room_finished webhook (fast-path) or beat task (fallback).
"""
from reflector.utils.livekit import ( # noqa: PLC0415
recording_lock_key,
)
logger.info(
"Processing LiveKit multitrack recording",
room_name=room_name,
meeting_id=meeting_id,
)
lock_key = recording_lock_key(room_name)
async with RedisAsyncLock(
key=lock_key,
timeout=600,
extend_interval=60,
skip_if_locked=True,
blocking=False,
) as lock:
if not lock.acquired:
logger.warning(
"LiveKit processing skipped - lock already held",
room_name=room_name,
lock_key=lock_key,
)
return
await _process_livekit_multitrack_inner(room_name, meeting_id)
async def _process_livekit_multitrack_inner(
room_name: str,
meeting_id: str,
):
"""Inner processing logic for LiveKit multitrack recording."""
# 1. Discover tracks by listing S3 prefix.
# Wait briefly for egress files to finish flushing to S3 — the room_finished
# webhook fires after empty_timeout, but egress finalization may still be in progress.
import asyncio as _asyncio # noqa: PLC0415
from reflector.storage import get_source_storage # noqa: PLC0415
from reflector.utils.livekit import ( # noqa: PLC0415
extract_livekit_base_room_name,
filter_audio_tracks,
parse_livekit_track_filepath,
)
EGRESS_FLUSH_DELAY = 10 # seconds — egress typically flushes within a few seconds
EGRESS_RETRY_DELAY = 30 # seconds — retry if first listing finds nothing
await _asyncio.sleep(EGRESS_FLUSH_DELAY)
storage = get_source_storage("livekit")
s3_prefix = f"livekit/{room_name}/"
all_keys = await storage.list_objects(prefix=s3_prefix)
# Filter to audio tracks only (.ogg) — skip .json manifests and .webm video
audio_keys = filter_audio_tracks(all_keys) if all_keys else []
if not audio_keys:
# Retry once after a longer delay — egress may still be flushing
logger.info(
"No audio tracks found yet, retrying after delay",
room_name=room_name,
retry_delay=EGRESS_RETRY_DELAY,
)
await _asyncio.sleep(EGRESS_RETRY_DELAY)
all_keys = await storage.list_objects(prefix=s3_prefix)
audio_keys = filter_audio_tracks(all_keys) if all_keys else []
# Sanity check: compare audio tracks against egress manifests.
# Each Track Egress (audio or video) produces a .json manifest.
# Video tracks produce .webm files. So expected audio count ≈ manifests - video files.
if all_keys:
manifest_count = sum(1 for k in all_keys if k.endswith(".json"))
video_count = sum(1 for k in all_keys if k.endswith(".webm"))
expected_audio = manifest_count - video_count
if expected_audio > len(audio_keys) and expected_audio > 0:
# Some audio tracks may still be flushing — wait and retry
logger.info(
"Expected more audio tracks based on manifests, waiting for late flushes",
room_name=room_name,
expected=expected_audio,
found=len(audio_keys),
)
await _asyncio.sleep(EGRESS_RETRY_DELAY)
all_keys = await storage.list_objects(prefix=s3_prefix)
audio_keys = filter_audio_tracks(all_keys) if all_keys else []
logger.info(
"S3 track discovery complete",
room_name=room_name,
total_files=len(all_keys) if all_keys else 0,
audio_files=len(audio_keys),
)
if not audio_keys:
logger.warning(
"No audio track files found on S3 after retries",
room_name=room_name,
s3_prefix=s3_prefix,
)
return
# 2. Parse track info from filenames
parsed_tracks = []
for key in audio_keys:
try:
parsed = parse_livekit_track_filepath(key)
parsed_tracks.append(parsed)
except ValueError as e:
logger.warning("Skipping unparseable track file", s3_key=key, error=str(e))
if not parsed_tracks:
logger.warning(
"No valid track files found after parsing",
room_name=room_name,
raw_keys=all_keys,
)
return
track_keys = [t.s3_key for t in parsed_tracks]
# 3. Find meeting and room
meeting = await meetings_controller.get_by_id(meeting_id)
if not meeting:
logger.error(
"Meeting not found for LiveKit recording",
meeting_id=meeting_id,
room_name=room_name,
)
return
base_room_name = extract_livekit_base_room_name(room_name)
room = await rooms_controller.get_by_name(base_room_name)
if not room:
logger.error("Room not found", room_name=base_room_name)
return
# 4. Create recording
recording_id = f"lk-{room_name}"
bucket_name = settings.LIVEKIT_STORAGE_AWS_BUCKET_NAME or ""
existing_recording = await recordings_controller.get_by_id(recording_id)
if existing_recording and existing_recording.deleted_at is not None:
logger.info("Skipping soft-deleted recording", recording_id=recording_id)
return
if not existing_recording:
recording = await recordings_controller.create(
Recording(
id=recording_id,
bucket_name=bucket_name,
object_key=s3_prefix,
recorded_at=datetime.now(timezone.utc),
meeting_id=meeting.id,
track_keys=track_keys,
)
)
else:
recording = existing_recording
# 5. Create or get transcript
transcript = await transcripts_controller.get_by_recording_id(recording.id)
if transcript and transcript.deleted_at is not None:
logger.info("Skipping soft-deleted transcript", recording_id=recording.id)
return
if not transcript:
transcript = await transcripts_controller.add(
"",
source_kind=SourceKind.ROOM,
source_language="en",
target_language="en",
user_id=room.user_id,
recording_id=recording.id,
share_mode="semi-private",
meeting_id=meeting.id,
room_id=room.id,
)
# 6. Start Hatchet pipeline (reuses DiarizationPipeline with source_platform="livekit")
workflow_id = await HatchetClientManager.start_workflow(
workflow_name="DiarizationPipeline",
input_data={
"recording_id": recording_id,
"tracks": [
{
"s3_key": t.s3_key,
"participant_identity": t.participant_identity,
"timestamp": t.timestamp.isoformat(),
}
for t in parsed_tracks
],
"bucket_name": bucket_name,
"transcript_id": transcript.id,
"room_id": room.id,
"source_platform": "livekit",
},
additional_metadata={
"transcript_id": transcript.id,
"recording_id": recording_id,
},
)
logger.info(
"Started LiveKit Hatchet workflow",
workflow_id=workflow_id,
transcript_id=transcript.id,
room_name=room_name,
num_tracks=len(parsed_tracks),
)
await transcripts_controller.update(transcript, {"workflow_run_id": workflow_id})
@shared_task
@asynctask
async def process_livekit_ended_meetings():
"""Check for inactive LiveKit meetings that need multitrack processing.
Runs on a beat schedule. Catches cases where room_finished webhook was missed.
Only processes meetings that:
- Platform is "livekit"
- is_active=False (already deactivated by process_meetings)
- No associated transcript yet
"""
from reflector.db.transcripts import transcripts_controller as tc # noqa: PLC0415
all_livekit = await meetings_controller.get_all_inactive_livekit()
queued = 0
for meeting in all_livekit:
# Skip if already has a transcript
existing = await tc.get_by_meeting_id(meeting.id)
if existing:
continue
logger.info(
"Found unprocessed inactive LiveKit meeting",
meeting_id=meeting.id,
room_name=meeting.room_name,
)
process_livekit_multitrack.delay(
room_name=meeting.room_name,
meeting_id=meeting.id,
)
queued += 1
if queued > 0:
logger.info("Queued LiveKit multitrack processing", count=queued)
@shared_task
@asynctask
async def reprocess_failed_livekit_recordings():
"""Reprocess LiveKit recordings that failed.
Runs daily at 5 AM. Finds recordings with livekit prefix and error status.
"""
bucket_name = settings.LIVEKIT_STORAGE_AWS_BUCKET_NAME
if not bucket_name:
return
failed = await recordings_controller.get_multitrack_needing_reprocessing(
bucket_name
)
livekit_failed = [r for r in failed if r.id.startswith("lk-")]
for recording in livekit_failed:
if not recording.meeting_id:
logger.warning(
"Skipping reprocess — no meeting_id",
recording_id=recording.id,
)
continue
meeting = await meetings_controller.get_by_id(recording.meeting_id)
if not meeting:
continue
logger.info(
"Reprocessing failed LiveKit recording",
recording_id=recording.id,
meeting_id=meeting.id,
)
process_livekit_multitrack.delay(
room_name=meeting.room_name,
meeting_id=meeting.id,
)

View File

@@ -8,8 +8,8 @@ import structlog
from celery import shared_task
from celery.utils.log import get_task_logger
from reflector.asynctask import asynctask
from reflector.db.rooms import rooms_controller
from reflector.pipelines.main_live_pipeline import asynctask
from reflector.utils.webhook import (
WebhookRoomPayload,
WebhookTestPayload,

View File

@@ -113,6 +113,7 @@ TranscriptWsEvent = Annotated[
UserEventName = Literal[
"TRANSCRIPT_CREATED",
"TRANSCRIPT_DELETED",
"TRANSCRIPT_RESTORED",
"TRANSCRIPT_STATUS",
"TRANSCRIPT_FINAL_TITLE",
"TRANSCRIPT_DURATION",
@@ -161,6 +162,15 @@ class UserWsTranscriptDeleted(BaseModel):
data: UserTranscriptDeletedData
class UserTranscriptRestoredData(BaseModel):
id: NonEmptyString
class UserWsTranscriptRestored(BaseModel):
event: Literal["TRANSCRIPT_RESTORED"] = "TRANSCRIPT_RESTORED"
data: UserTranscriptRestoredData
class UserWsTranscriptStatus(BaseModel):
event: Literal["TRANSCRIPT_STATUS"] = "TRANSCRIPT_STATUS"
data: UserTranscriptStatusData
@@ -180,6 +190,7 @@ UserWsEvent = Annotated[
Union[
UserWsTranscriptCreated,
UserWsTranscriptDeleted,
UserWsTranscriptRestored,
UserWsTranscriptStatus,
UserWsTranscriptFinalTitle,
UserWsTranscriptDuration,

View File

@@ -107,7 +107,8 @@ class WebsocketManager:
while True:
# timeout=1.0 prevents tight CPU loop when no messages available
message = await pubsub_subscriber.get_message(
ignore_subscribe_messages=True
ignore_subscribe_messages=True,
timeout=1.0,
)
if message is not None:
room_id = message["channel"].decode("utf-8")

View File

@@ -40,6 +40,11 @@ x-backend-env: &backend-env
# Garage S3 credentials — hardcoded test keys, containers are ephemeral
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID: GK0123456789abcdef01234567 # gitleaks:allow
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY: "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" # gitleaks:allow
# Email / SMTP — Mailpit captures emails without sending
SMTP_HOST: mailpit
SMTP_PORT: "1025"
SMTP_FROM_EMAIL: test@reflector.local
SMTP_USE_TLS: "false"
# NOTE: DAILYCO_STORAGE_AWS_* intentionally NOT set — forces fallback to
# get_transcripts_storage() which has ENDPOINT_URL pointing at Garage.
# Setting them would bypass the endpoint and generate presigned URLs for AWS.
@@ -101,6 +106,14 @@ services:
retries: 10
start_period: 5s
mailpit:
image: axllent/mailpit:latest
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8025/api/v1/messages"]
interval: 5s
timeout: 3s
retries: 5
mock-daily:
build:
context: .
@@ -131,6 +144,8 @@ services:
condition: service_healthy
mock-daily:
condition: service_healthy
mailpit:
condition: service_healthy
volumes:
- server_data:/app/data
@@ -194,6 +209,7 @@ services:
DATABASE_URL: postgresql+asyncpg://reflector:reflector@postgres:5432/reflector
SERVER_URL: http://server:1250
GARAGE_ENDPOINT: http://garage:3900
MAILPIT_URL: http://mailpit:8025
depends_on:
server:
condition: service_started

View File

@@ -17,6 +17,7 @@ from sqlalchemy.ext.asyncio import create_async_engine
SERVER_URL = os.environ.get("SERVER_URL", "http://server:1250")
GARAGE_ENDPOINT = os.environ.get("GARAGE_ENDPOINT", "http://garage:3900")
MAILPIT_URL = os.environ.get("MAILPIT_URL", "http://mailpit:8025")
DATABASE_URL = os.environ.get(
"DATABASE_URL_ASYNC",
os.environ.get(
@@ -114,3 +115,44 @@ async def _poll_transcript_status(
def poll_transcript_status():
"""Returns the poll_transcript_status async helper function."""
return _poll_transcript_status
@pytest_asyncio.fixture
async def mailpit_client():
"""HTTP client for Mailpit API — query captured emails."""
async with httpx.AsyncClient(
base_url=MAILPIT_URL,
timeout=httpx.Timeout(10.0),
) as client:
# Clear inbox before each test
await client.delete("/api/v1/messages")
yield client
async def _poll_mailpit_messages(
mailpit: httpx.AsyncClient,
to_email: str,
max_wait: int = 30,
interval: int = 2,
) -> list[dict]:
"""
Poll Mailpit API until at least one message is delivered to the given address.
Returns the list of matching messages.
"""
elapsed = 0
while elapsed < max_wait:
resp = await mailpit.get("/api/v1/messages", params={"query": f"to:{to_email}"})
resp.raise_for_status()
data = resp.json()
messages = data.get("messages", [])
if messages:
return messages
await asyncio.sleep(interval)
elapsed += interval
raise TimeoutError(f"No email delivered to {to_email} within {max_wait}s")
@pytest_asyncio.fixture
def poll_mailpit_messages():
"""Returns the poll_mailpit_messages async helper function."""
return _poll_mailpit_messages

View File

@@ -4,10 +4,12 @@ Integration test: Multitrack → DailyMultitrackPipeline → full processing.
Exercises: S3 upload → DB recording setup → process endpoint →
Hatchet DiarizationPipeline → mock Daily API → whisper per-track transcription →
diarization → mixdown → LLM summarization/topics → status "ended".
Also tests email transcript notification via Mailpit SMTP sink.
"""
import json
from datetime import datetime, timezone
import uuid
from datetime import datetime, timedelta, timezone
import pytest
from sqlalchemy import text
@@ -22,6 +24,9 @@ TRACK_KEYS = [
]
TEST_EMAIL = "integration-test@reflector.local"
@pytest.mark.asyncio
async def test_multitrack_pipeline_end_to_end(
api_client,
@@ -30,6 +35,8 @@ async def test_multitrack_pipeline_end_to_end(
test_records_dir,
bucket_name,
poll_transcript_status,
mailpit_client,
poll_mailpit_messages,
):
"""Set up multitrack recording in S3/DB and verify the full pipeline completes."""
# 1. Upload test audio as two separate tracks to Garage S3
@@ -52,16 +59,41 @@ async def test_multitrack_pipeline_end_to_end(
transcript = resp.json()
transcript_id = transcript["id"]
# 3. Insert Recording row and link to transcript via direct DB access
# 3. Insert Meeting, Recording, and link to transcript via direct DB access
recording_id = f"rec-integration-{transcript_id[:8]}"
meeting_id = str(uuid.uuid4())
now = datetime.now(timezone.utc)
async with db_engine.begin() as conn:
# Insert recording with track_keys
# Insert meeting with email_recipients for email notification test
await conn.execute(
text("""
INSERT INTO recording (id, bucket_name, object_key, recorded_at, status, track_keys)
VALUES (:id, :bucket_name, :object_key, :recorded_at, :status, CAST(:track_keys AS json))
INSERT INTO meeting (
id, room_name, room_url, host_room_url,
start_date, end_date, platform, email_recipients
)
VALUES (
:id, :room_name, :room_url, :host_room_url,
:start_date, :end_date, :platform, CAST(:email_recipients AS json)
)
"""),
{
"id": meeting_id,
"room_name": "integration-test-room",
"room_url": "https://test.daily.co/integration-test-room",
"host_room_url": "https://test.daily.co/integration-test-room",
"start_date": now,
"end_date": now + timedelta(hours=1),
"platform": "daily",
"email_recipients": json.dumps([TEST_EMAIL]),
},
)
# Insert recording with track_keys, linked to meeting
await conn.execute(
text("""
INSERT INTO recording (id, bucket_name, object_key, recorded_at, status, track_keys, meeting_id)
VALUES (:id, :bucket_name, :object_key, :recorded_at, :status, CAST(:track_keys AS json), :meeting_id)
"""),
{
"id": recording_id,
@@ -70,6 +102,7 @@ async def test_multitrack_pipeline_end_to_end(
"recorded_at": now,
"status": "completed",
"track_keys": json.dumps(TRACK_KEYS),
"meeting_id": meeting_id,
},
)
@@ -127,3 +160,16 @@ async def test_multitrack_pipeline_end_to_end(
assert (
len(participants) >= 2
), f"Expected at least 2 speakers for multitrack, got {len(participants)}"
# 7. Verify email transcript notification
# The send_email pipeline task should have sent an email to TEST_EMAIL via Mailpit.
# Note: share_mode is only set to "public" when meeting has email_recipients;
# room-level emails do NOT change share_mode.
# Poll Mailpit for the delivered email (send_email task runs async after finalize)
messages = await poll_mailpit_messages(mailpit_client, TEST_EMAIL, max_wait=30)
assert len(messages) >= 1, "Should have received at least 1 email"
email_msg = messages[0]
assert (
"Reflector:" in email_msg.get("Subject", "")
), f"Email subject should contain 'Transcript Ready', got: {email_msg.get('Subject')}"

View File

@@ -76,8 +76,10 @@ async def test_cleanup_old_public_data_deletes_old_anonymous_transcripts():
assert result["transcripts_deleted"] == 1
assert result["errors"] == []
# Verify old anonymous transcript was deleted
assert await transcripts_controller.get_by_id(old_transcript.id) is None
# Verify old anonymous transcript was soft-deleted
old = await transcripts_controller.get_by_id(old_transcript.id)
assert old is not None
assert old.deleted_at is not None
# Verify new anonymous transcript still exists
assert await transcripts_controller.get_by_id(new_transcript.id) is not None
@@ -150,15 +152,17 @@ async def test_cleanup_deletes_associated_meeting_and_recording():
assert result["recordings_deleted"] == 1
assert result["errors"] == []
# Verify transcript was deleted
assert await transcripts_controller.get_by_id(old_transcript.id) is None
# Verify transcript was soft-deleted
old = await transcripts_controller.get_by_id(old_transcript.id)
assert old is not None
assert old.deleted_at is not None
# Verify meeting was deleted
# Verify meeting was hard-deleted (cleanup deletes meetings directly)
query = meetings.select().where(meetings.c.id == meeting_id)
meeting_result = await get_database().fetch_one(query)
assert meeting_result is None
# Verify recording was deleted
# Verify recording was hard-deleted (cleanup deletes recordings directly)
assert await recordings_controller.get_by_id(recording.id) is None

206
server/tests/test_email.py Normal file
View File

@@ -0,0 +1,206 @@
"""Tests for reflector.email — transcript email composition and sending."""
from unittest.mock import AsyncMock, patch
import pytest
from reflector.db.transcripts import (
SourceKind,
Transcript,
TranscriptParticipant,
TranscriptTopic,
)
from reflector.email import (
_build_html,
_build_plain_text,
get_transcript_url,
send_transcript_email,
)
from reflector.processors.types import Word
def _make_transcript(
*,
title: str | None = "Weekly Standup",
short_summary: str | None = "Team discussed sprint progress.",
with_topics: bool = True,
share_mode: str = "private",
source_kind: SourceKind = SourceKind.FILE,
) -> Transcript:
topics = []
participants = []
if with_topics:
participants = [
TranscriptParticipant(id="p1", speaker=0, name="Alice"),
TranscriptParticipant(id="p2", speaker=1, name="Bob"),
]
topics = [
TranscriptTopic(
title="Intro",
summary="Greetings",
timestamp=0.0,
duration=10.0,
words=[
Word(text="Hello", start=0.0, end=0.5, speaker=0),
Word(text="everyone", start=0.5, end=1.0, speaker=0),
Word(text="Thanks", start=5.0, end=5.5, speaker=1),
Word(text="for", start=5.5, end=5.8, speaker=1),
Word(text="joining", start=5.8, end=6.2, speaker=1),
],
),
]
return Transcript(
id="tx-123",
title=title,
short_summary=short_summary,
topics=topics,
participants=participants,
share_mode=share_mode,
source_kind=source_kind,
)
URL = "http://localhost:3000/transcripts/tx-123"
class TestBuildPlainText:
def test_full_content_with_link(self):
t = _make_transcript()
text = _build_plain_text(t, URL, include_link=True)
assert text.startswith("Reflector: Weekly Standup")
assert "Team discussed sprint progress." in text
assert "[00:00] Alice:" in text
assert "[00:05] Bob:" in text
assert URL in text
def test_full_content_without_link(self):
t = _make_transcript()
text = _build_plain_text(t, URL, include_link=False)
assert "Reflector: Weekly Standup" in text
assert "Team discussed sprint progress." in text
assert "[00:00] Alice:" in text
assert URL not in text
def test_no_summary(self):
t = _make_transcript(short_summary=None)
text = _build_plain_text(t, URL, include_link=True)
assert "Summary:" not in text
assert "[00:00] Alice:" in text
def test_no_topics(self):
t = _make_transcript(with_topics=False)
text = _build_plain_text(t, URL, include_link=True)
assert "Transcript:" not in text
assert "Reflector: Weekly Standup" in text
def test_unnamed_recording(self):
t = _make_transcript(title=None)
text = _build_plain_text(t, URL, include_link=True)
assert "Reflector: Unnamed recording" in text
class TestBuildHtml:
def test_full_content_with_link(self):
t = _make_transcript()
html = _build_html(t, URL, include_link=True)
assert "Weekly Standup" in html
assert "Team discussed sprint progress." in html
assert "Alice" in html
assert "Bob" in html
assert URL in html
assert "View Transcript" in html
def test_full_content_without_link(self):
t = _make_transcript()
html = _build_html(t, URL, include_link=False)
assert "Weekly Standup" in html
assert "Alice" in html
assert URL not in html
assert "View Transcript" not in html
def test_no_summary(self):
t = _make_transcript(short_summary=None)
html = _build_html(t, URL, include_link=True)
assert "sprint progress" not in html
assert "Alice" in html
def test_no_topics(self):
t = _make_transcript(with_topics=False)
html = _build_html(t, URL, include_link=True)
assert "Transcript" not in html or "View Transcript" in html
def test_html_escapes_title(self):
t = _make_transcript(title='<script>alert("xss")</script>')
html = _build_html(t, URL, include_link=True)
assert "<script>" not in html
assert "&lt;script&gt;" in html
class TestGetTranscriptUrl:
def test_url_format(self):
t = _make_transcript()
url = get_transcript_url(t)
assert url.endswith("/transcripts/tx-123")
class TestSendTranscriptEmail:
@pytest.mark.asyncio
async def test_include_link_default_true(self):
t = _make_transcript()
with (
patch("reflector.email.is_email_configured", return_value=True),
patch(
"reflector.email.aiosmtplib.send", new_callable=AsyncMock
) as mock_send,
):
count = await send_transcript_email(["a@test.com"], t)
assert count == 1
call_args = mock_send.call_args
msg = call_args[0][0]
assert msg["Subject"] == "Reflector: Weekly Standup"
# Default include_link=True, so HTML part should contain the URL
html_part = msg.get_payload()[1].get_payload()
assert "/transcripts/tx-123" in html_part
@pytest.mark.asyncio
async def test_include_link_false(self):
t = _make_transcript()
with (
patch("reflector.email.is_email_configured", return_value=True),
patch(
"reflector.email.aiosmtplib.send", new_callable=AsyncMock
) as mock_send,
):
count = await send_transcript_email(["a@test.com"], t, include_link=False)
assert count == 1
msg = mock_send.call_args[0][0]
html_part = msg.get_payload()[1].get_payload()
assert "/transcripts/tx-123" not in html_part
plain_part = msg.get_payload()[0].get_payload()
assert "/transcripts/tx-123" not in plain_part
@pytest.mark.asyncio
async def test_skips_when_not_configured(self):
t = _make_transcript()
with patch("reflector.email.is_email_configured", return_value=False):
count = await send_transcript_email(["a@test.com"], t)
assert count == 0
@pytest.mark.asyncio
async def test_skips_empty_recipients(self):
t = _make_transcript()
with patch("reflector.email.is_email_configured", return_value=True):
count = await send_transcript_email([], t)
assert count == 0

View File

@@ -0,0 +1,290 @@
"""
Tests for FailedRunsMonitor Hatchet cron workflow.
Tests cover:
- No Zulip message sent when no failures found
- Messages sent for failed main pipeline runs
- Child workflow failures filtered out
- Errors in the monitor itself are caught and logged
"""
from datetime import timezone
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from hatchet_sdk.clients.rest.models import V1TaskStatus
def _make_task_summary(
workflow_name: str,
workflow_run_external_id: str = "run-123",
status: V1TaskStatus = V1TaskStatus.FAILED,
):
"""Create a mock V1TaskSummary."""
mock = MagicMock()
mock.workflow_name = workflow_name
mock.workflow_run_external_id = workflow_run_external_id
mock.status = status
return mock
@pytest.mark.asyncio
class TestCheckFailedRuns:
async def test_no_failures_sends_no_message(self):
mock_result = MagicMock()
mock_result.rows = []
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
with (
patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.send_message_to_zulip",
new_callable=AsyncMock,
) as mock_send,
):
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
assert result["checked"] == 0
assert result["reported"] == 0
mock_send.assert_not_called()
async def test_reports_failed_main_pipeline_runs(self):
failed_runs = [
_make_task_summary("DiarizationPipeline", "run-1"),
_make_task_summary("FilePipeline", "run-2"),
]
mock_result = MagicMock()
mock_result.rows = failed_runs
mock_details = MagicMock()
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
mock_client.runs.aio_get = AsyncMock(return_value=mock_details)
with (
patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.render_run_detail",
return_value="**rendered DAG**",
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.send_message_to_zulip",
new_callable=AsyncMock,
return_value={"id": 1},
) as mock_send,
patch(
"reflector.hatchet.workflows.failed_runs_monitor.settings"
) as mock_settings,
):
mock_settings.ZULIP_DAG_STREAM = "dag-stream"
mock_settings.ZULIP_DAG_TOPIC = "dag-topic"
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
assert result["checked"] == 2
assert result["reported"] == 2
assert mock_send.call_count == 2
mock_send.assert_any_call("dag-stream", "dag-topic", "**rendered DAG**")
async def test_filters_out_child_workflows(self):
runs = [
_make_task_summary("DiarizationPipeline", "run-1"),
_make_task_summary("TrackProcessing", "run-2"),
_make_task_summary("TopicChunkProcessing", "run-3"),
_make_task_summary("SubjectProcessing", "run-4"),
]
mock_result = MagicMock()
mock_result.rows = runs
mock_details = MagicMock()
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
mock_client.runs.aio_get = AsyncMock(return_value=mock_details)
with (
patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.render_run_detail",
return_value="**rendered**",
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.send_message_to_zulip",
new_callable=AsyncMock,
return_value={"id": 1},
) as mock_send,
patch(
"reflector.hatchet.workflows.failed_runs_monitor.settings"
) as mock_settings,
):
mock_settings.ZULIP_DAG_STREAM = "dag-stream"
mock_settings.ZULIP_DAG_TOPIC = "dag-topic"
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
# Only DiarizationPipeline should be reported
assert result["checked"] == 4
assert result["reported"] == 1
assert mock_send.call_count == 1
async def test_all_three_pipelines_reported(self):
runs = [
_make_task_summary("DiarizationPipeline", "run-1"),
_make_task_summary("FilePipeline", "run-2"),
_make_task_summary("LivePostProcessingPipeline", "run-3"),
]
mock_result = MagicMock()
mock_result.rows = runs
mock_details = MagicMock()
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
mock_client.runs.aio_get = AsyncMock(return_value=mock_details)
with (
patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.render_run_detail",
return_value="**rendered**",
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.send_message_to_zulip",
new_callable=AsyncMock,
return_value={"id": 1},
) as mock_send,
patch(
"reflector.hatchet.workflows.failed_runs_monitor.settings"
) as mock_settings,
):
mock_settings.ZULIP_DAG_STREAM = "dag-stream"
mock_settings.ZULIP_DAG_TOPIC = "dag-topic"
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
assert result["reported"] == 3
assert mock_send.call_count == 3
async def test_continues_on_individual_run_failure(self):
"""If one run fails to report, the others should still be reported."""
runs = [
_make_task_summary("DiarizationPipeline", "run-1"),
_make_task_summary("FilePipeline", "run-2"),
]
mock_result = MagicMock()
mock_result.rows = runs
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
# First call raises, second succeeds
mock_client.runs.aio_get = AsyncMock(
side_effect=[Exception("Hatchet API error"), MagicMock()]
)
with (
patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.render_run_detail",
return_value="**rendered**",
),
patch(
"reflector.hatchet.workflows.failed_runs_monitor.send_message_to_zulip",
new_callable=AsyncMock,
return_value={"id": 1},
) as mock_send,
patch(
"reflector.hatchet.workflows.failed_runs_monitor.settings"
) as mock_settings,
):
mock_settings.ZULIP_DAG_STREAM = "dag-stream"
mock_settings.ZULIP_DAG_TOPIC = "dag-topic"
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
# First run failed to report, second succeeded
assert result["reported"] == 1
assert mock_send.call_count == 1
async def test_handles_list_api_failure(self):
"""If aio_list fails, should return error and not crash."""
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(
side_effect=Exception("Connection refused")
)
with patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
):
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
result = await _check_failed_runs()
assert result["checked"] == 0
assert result["reported"] == 0
assert "error" in result
async def test_uses_correct_time_window(self):
"""Verify the correct since/until parameters are passed to aio_list."""
mock_result = MagicMock()
mock_result.rows = []
mock_client = MagicMock()
mock_client.runs.aio_list = AsyncMock(return_value=mock_result)
with patch(
"reflector.hatchet.workflows.failed_runs_monitor.HatchetClientManager.get_client",
return_value=mock_client,
):
from reflector.hatchet.workflows.failed_runs_monitor import (
_check_failed_runs,
)
await _check_failed_runs()
call_kwargs = mock_client.runs.aio_list.call_args
assert call_kwargs.kwargs["statuses"] == [V1TaskStatus.FAILED]
since = call_kwargs.kwargs["since"]
until = call_kwargs.kwargs["until"]
assert since.tzinfo == timezone.utc
assert until.tzinfo == timezone.utc
# Window should be ~1 hour
delta = until - since
assert 3590 < delta.total_seconds() < 3610

View File

@@ -0,0 +1,408 @@
"""
Tests for LiveKit backend: webhook verification, token generation,
display_name sanitization, and platform client behavior.
"""
import re
import pytest
from reflector.livekit_api.webhooks import create_webhook_receiver, verify_webhook
# ── Webhook verification ──────────────────────────────────────
class TestWebhookVerification:
def _make_receiver(self):
"""Create a receiver with test credentials."""
return create_webhook_receiver(
api_key="test_key",
api_secret="test_secret_that_is_long_enough_for_hmac",
)
def test_rejects_empty_auth_header(self):
receiver = self._make_receiver()
result = verify_webhook(receiver, b'{"event":"test"}', "")
assert result is None
def test_rejects_garbage_auth_header(self):
receiver = self._make_receiver()
result = verify_webhook(receiver, b'{"event":"test"}', "not-a-jwt")
assert result is None
def test_rejects_empty_body(self):
receiver = self._make_receiver()
result = verify_webhook(receiver, b"", "Bearer some.jwt.token")
assert result is None
def test_handles_bytes_body(self):
receiver = self._make_receiver()
# Should not crash on bytes input
result = verify_webhook(receiver, b'{"event":"test"}', "invalid")
assert result is None
def test_handles_string_body(self):
receiver = self._make_receiver()
result = verify_webhook(receiver, '{"event":"test"}', "invalid")
assert result is None
def test_rejects_wrong_secret(self):
"""Webhook signed with different secret should be rejected."""
receiver = self._make_receiver()
# A JWT signed with a different secret
fake_jwt = "eyJhbGciOiJIUzI1NiJ9.eyJ0ZXN0IjoxfQ.wrong_signature"
result = verify_webhook(receiver, b"{}", fake_jwt)
assert result is None
# ── Token generation ──────────────────────────────────────────
class TestTokenGeneration:
"""Test token generation using the LiveKit SDK directly (no client instantiation)."""
def _generate_token(
self, room_name="room", identity="user", name=None, admin=False, ttl=86400
):
"""Generate a token using the SDK directly, avoiding LiveKitAPI client session."""
from datetime import timedelta
from livekit.api import AccessToken, VideoGrants
token = AccessToken(
api_key="test_key", api_secret="test_secret_that_is_long_enough_for_hmac"
)
token.identity = identity
token.name = name or identity
token.ttl = timedelta(seconds=ttl)
token.with_grants(
VideoGrants(
room_join=True,
room=room_name,
can_publish=True,
can_subscribe=True,
room_admin=admin,
)
)
return token.to_jwt()
def _decode_claims(self, token):
import base64
import json
payload = token.split(".")[1]
payload += "=" * (4 - len(payload) % 4)
return json.loads(base64.b64decode(payload))
def test_creates_valid_jwt(self):
token = self._generate_token(
room_name="test-room", identity="user123", name="Test User"
)
assert isinstance(token, str)
assert len(token.split(".")) == 3
def test_token_includes_room_name(self):
token = self._generate_token(room_name="my-room-20260401", identity="alice")
claims = self._decode_claims(token)
assert claims.get("video", {}).get("room") == "my-room-20260401"
assert claims.get("sub") == "alice"
def test_token_respects_admin_flag(self):
token = self._generate_token(identity="admin", admin=True)
claims = self._decode_claims(token)
assert claims["video"]["roomAdmin"] is True
def test_token_non_admin_by_default(self):
token = self._generate_token(identity="user")
claims = self._decode_claims(token)
assert claims.get("video", {}).get("roomAdmin") in (None, False)
def test_ttl_is_timedelta(self):
"""Verify ttl as timedelta works (previous bug: int caused TypeError)."""
token = self._generate_token(ttl=3600)
assert isinstance(token, str)
# ── Display name sanitization ─────────────────────────────────
class TestDisplayNameSanitization:
"""Test the sanitization logic from rooms.py join endpoint."""
def _sanitize(self, display_name: str) -> str:
"""Replicate the sanitization from rooms_join_meeting."""
safe_name = re.sub(r"[^a-zA-Z0-9_-]", "_", display_name.strip())[:40]
return safe_name
def test_normal_name(self):
assert self._sanitize("Alice") == "Alice"
def test_name_with_spaces(self):
assert self._sanitize("John Doe") == "John_Doe"
def test_name_with_special_chars(self):
assert self._sanitize("user@email.com") == "user_email_com"
def test_name_with_unicode(self):
result = self._sanitize("José García")
assert result == "Jos__Garc_a"
assert all(
c in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-"
for c in result
)
def test_name_with_emoji(self):
result = self._sanitize("👋 Hello")
assert "_" in result # Emoji replaced with underscore
assert "Hello" in result
def test_very_long_name(self):
long_name = "A" * 100
result = self._sanitize(long_name)
assert len(result) == 40
def test_empty_name(self):
result = self._sanitize("")
assert result == ""
def test_only_special_chars(self):
result = self._sanitize("!!!")
assert result == "___"
def test_whitespace_stripped(self):
result = self._sanitize(" Alice ")
assert result == "Alice"
def test_hyphens_preserved(self):
assert self._sanitize("first-last") == "first-last"
def test_underscores_preserved(self):
assert self._sanitize("first_last") == "first_last"
def test_html_injection(self):
result = self._sanitize("<script>alert('xss')</script>")
assert "<" not in result
assert ">" not in result
assert "'" not in result
# ── S3 egress configuration ───────────────────────────────────
class TestS3EgressConfig:
"""Test S3Upload construction using the SDK directly."""
def test_build_s3_upload_requires_all_fields(self):
# Missing fields should raise or produce invalid config
# The validation happens in our client wrapper, not the SDK
# Test the validation logic directly
s3_bucket = None
s3_access_key = "AKID"
s3_secret_key = "secret"
assert not all([s3_bucket, s3_access_key, s3_secret_key])
def test_s3_upload_with_credentials(self):
from livekit.api import S3Upload
upload = S3Upload(
access_key="AKID",
secret="secret123",
bucket="test-bucket",
region="us-east-1",
force_path_style=True,
)
assert upload.bucket == "test-bucket"
assert upload.force_path_style is True
def test_s3_upload_with_endpoint(self):
from livekit.api import S3Upload
upload = S3Upload(
access_key="AKID",
secret="secret",
bucket="bucket",
region="us-east-1",
force_path_style=True,
endpoint="http://garage:3900",
)
assert upload.endpoint == "http://garage:3900"
# ── Platform detection ────────────────────────────────────────
# ── Redis participant mapping ──────────────────────────────
class TestParticipantIdentityMapping:
"""Test the identity → user_id Redis mapping pattern."""
def test_mapping_key_format(self):
room_name = "myroom-20260401172036"
mapping_key = f"livekit:participant_map:{room_name}"
assert mapping_key == "livekit:participant_map:myroom-20260401172036"
def test_identity_with_uuid_suffix_is_unique(self):
import uuid
name = "Juan"
id1 = f"{name}-{uuid.uuid4().hex[:6]}"
id2 = f"{name}-{uuid.uuid4().hex[:6]}"
assert id1 != id2
assert id1.startswith("Juan-")
assert id2.startswith("Juan-")
def test_strip_uuid_suffix_for_display(self):
"""Pipeline strips UUID suffix for display name."""
identity = "Juan-2bcea0"
display_name = identity.rsplit("-", 1)[0] if "-" in identity else identity
assert display_name == "Juan"
def test_strip_uuid_preserves_hyphenated_names(self):
identity = "Mary-Jane-abc123"
display_name = identity.rsplit("-", 1)[0] if "-" in identity else identity
assert display_name == "Mary-Jane"
def test_anon_identity_no_user_id(self):
"""Anonymous participants should not have a user_id mapping."""
identity = "anon-abc123"
# In the pipeline, anon identities don't get looked up
assert identity.startswith("anon-")
@pytest.mark.asyncio
async def test_redis_hset_hgetall_roundtrip(self):
"""Test the actual Redis operations used for participant mapping."""
try:
from reflector.redis_cache import get_async_redis_client
redis_client = await get_async_redis_client()
test_key = "livekit:participant_map:__test_room__"
# Write
await redis_client.hset(test_key, "Juan-abc123", "user-id-1")
await redis_client.hset(test_key, "Alice-def456", "user-id-2")
# Read
raw_map = await redis_client.hgetall(test_key)
decoded = {
k.decode() if isinstance(k, bytes) else k: v.decode()
if isinstance(v, bytes)
else v
for k, v in raw_map.items()
}
assert decoded["Juan-abc123"] == "user-id-1"
assert decoded["Alice-def456"] == "user-id-2"
# Cleanup
await redis_client.delete(test_key)
except Exception:
pytest.skip("Redis not available")
# ── Egress video cleanup safety ────────────────────────────────
class TestEgressVideoCleanup:
"""Ensure video cleanup logic NEVER deletes audio files."""
AUDIO_FILES = [
"livekit/room-20260401/juan-abc123-2026-04-01T100000-TR_AMR3SWs74Divho.ogg",
"livekit/room-20260401/alice-def456-2026-04-01T100030-TR_AMirKjdAvLteAZ.ogg",
"livekit/room-20260401/bob-789abc-2026-04-01T100100-TR_AMyoSbM7tAQbYj.ogg",
]
VIDEO_FILES = [
"livekit/room-20260401/juan-abc123-2026-04-01T100000-TR_VC679dgMQBdfhT.webm",
"livekit/room-20260401/alice-def456-2026-04-01T100030-TR_VCLsuRuxLp4eik.webm",
]
MANIFEST_FILES = [
"livekit/room-20260401/EG_K5sipvfB5fTM.json",
"livekit/room-20260401/EG_nzwBsH9xzgoj.json",
]
def _should_delete(self, filename: str) -> bool:
"""Replicate the deletion logic from _handle_egress_ended."""
return filename.endswith(".webm")
def test_audio_files_never_deleted(self):
"""CRITICAL: Audio files must NEVER be marked for deletion."""
for f in self.AUDIO_FILES:
assert not self._should_delete(f), f"Audio file would be deleted: {f}"
def test_video_files_are_deleted(self):
for f in self.VIDEO_FILES:
assert self._should_delete(f), f"Video file NOT marked for deletion: {f}"
def test_manifests_are_kept(self):
for f in self.MANIFEST_FILES:
assert not self._should_delete(f), f"Manifest would be deleted: {f}"
def test_ogg_extension_never_matches_delete(self):
"""Double-check: no .ogg file ever matches the deletion condition."""
test_names = [
"anything.ogg",
"livekit/room/track.ogg",
"video.ogg", # Even if someone names it "video.ogg"
".ogg",
"TR_VC_fake_video.ogg", # Video-like track ID but .ogg extension
]
for f in test_names:
assert not self._should_delete(f), f".ogg file would be deleted: {f}"
def test_webm_always_matches_delete(self):
test_names = [
"anything.webm",
"livekit/room/track.webm",
"audio.webm", # Even if someone names it "audio.webm"
".webm",
]
for f in test_names:
assert self._should_delete(f), f".webm file NOT marked for deletion: {f}"
def test_unknown_extensions_are_kept(self):
"""Unknown file types should NOT be deleted (safe by default)."""
test_names = [
"file.mp4",
"file.wav",
"file.mp3",
"file.txt",
"file",
"",
]
for f in test_names:
assert not self._should_delete(
f
), f"Unknown file type would be deleted: {f}"
# ── Platform detection ────────────────────────────────────────
class TestSourcePlatformDetection:
"""Test the recording ID prefix-based platform detection from transcript_process.py."""
def test_livekit_prefix(self):
recording_id = "lk-livekit-20260401234423"
platform = "livekit" if recording_id.startswith("lk-") else "daily"
assert platform == "livekit"
def test_daily_no_prefix(self):
recording_id = "08fa0b24-9220-44c5-846c-3f116cf8e738"
platform = "livekit" if recording_id.startswith("lk-") else "daily"
assert platform == "daily"
def test_none_recording_id(self):
recording_id = None
platform = (
"livekit" if recording_id and recording_id.startswith("lk-") else "daily"
)
assert platform == "daily"
def test_empty_recording_id(self):
recording_id = ""
platform = (
"livekit" if recording_id and recording_id.startswith("lk-") else "daily"
)
assert platform == "daily"

View File

@@ -0,0 +1,393 @@
"""
Tests for LiveKit track processing: filepath parsing, offset calculation,
and pad_track padding_seconds behavior.
"""
from datetime import datetime, timezone
from fractions import Fraction
import av
import pytest
from reflector.utils.livekit import (
LiveKitTrackFile,
calculate_track_offsets,
extract_livekit_base_room_name,
filter_audio_tracks,
parse_livekit_track_filepath,
)
# ── Filepath parsing ──────────────────────────────────────────
class TestParseLiveKitTrackFilepath:
def test_parses_ogg_audio_track(self):
result = parse_livekit_track_filepath(
"livekit/myroom-20260401172036/juan-4b82ed-2026-04-01T195758-TR_AMR3SWs74Divho.ogg"
)
assert result.room_name == "myroom-20260401172036"
assert result.participant_identity == "juan-4b82ed"
assert result.track_id == "TR_AMR3SWs74Divho"
assert result.timestamp == datetime(2026, 4, 1, 19, 57, 58, tzinfo=timezone.utc)
def test_parses_different_identities(self):
r1 = parse_livekit_track_filepath(
"livekit/room-20260401/alice-a1b2c3-2026-04-01T100000-TR_abc123.ogg"
)
r2 = parse_livekit_track_filepath(
"livekit/room-20260401/bob_smith-d4e5f6-2026-04-01T100030-TR_def456.ogg"
)
assert r1.participant_identity == "alice-a1b2c3"
assert r2.participant_identity == "bob_smith-d4e5f6"
def test_rejects_json_manifest(self):
with pytest.raises(ValueError, match="doesn't match expected format"):
parse_livekit_track_filepath("livekit/myroom-20260401/EG_K5sipvfB5fTM.json")
def test_rejects_webm_video(self):
# webm files match the pattern but are filtered by filter_audio_tracks
result = parse_livekit_track_filepath(
"livekit/myroom-20260401/juan-4b82ed-2026-04-01T195727-TR_VC679dgMQBdfhT.webm"
)
# webm parses successfully (TR_ prefix matches video tracks too)
assert result.track_id == "TR_VC679dgMQBdfhT"
def test_rejects_invalid_path(self):
with pytest.raises(ValueError):
parse_livekit_track_filepath("not/a/valid/path.ogg")
def test_rejects_missing_track_id(self):
with pytest.raises(ValueError):
parse_livekit_track_filepath("livekit/room/user-2026-04-01T100000.ogg")
def test_parses_timestamp_correctly(self):
result = parse_livekit_track_filepath(
"livekit/room-20260401/user-abc123-2026-12-25T235959-TR_test.ogg"
)
assert result.timestamp == datetime(
2026, 12, 25, 23, 59, 59, tzinfo=timezone.utc
)
# ── Audio track filtering ─────────────────────────────────────
class TestFilterAudioTracks:
def test_filters_to_ogg_only(self):
keys = [
"livekit/room/EG_abc.json",
"livekit/room/user-abc-2026-04-01T100000-TR_audio.ogg",
"livekit/room/user-abc-2026-04-01T100000-TR_video.webm",
"livekit/room/EG_def.json",
"livekit/room/user2-def-2026-04-01T100030-TR_audio2.ogg",
]
result = filter_audio_tracks(keys)
assert len(result) == 2
assert all(k.endswith(".ogg") for k in result)
def test_empty_input(self):
assert filter_audio_tracks([]) == []
def test_no_audio_tracks(self):
keys = ["livekit/room/EG_abc.json", "livekit/room/user-TR_v.webm"]
assert filter_audio_tracks(keys) == []
# ── Offset calculation ─────────────────────────────────────────
class TestCalculateTrackOffsets:
def test_single_track_zero_offset(self):
tracks = [
LiveKitTrackFile(
s3_key="k1",
room_name="r",
participant_identity="alice",
timestamp=datetime(2026, 4, 1, 10, 0, 0, tzinfo=timezone.utc),
track_id="TR_1",
)
]
offsets = calculate_track_offsets(tracks)
assert len(offsets) == 1
assert offsets[0][1] == 0.0
def test_two_tracks_correct_offset(self):
tracks = [
LiveKitTrackFile(
s3_key="k1",
room_name="r",
participant_identity="alice",
timestamp=datetime(2026, 4, 1, 10, 0, 0, tzinfo=timezone.utc),
track_id="TR_1",
),
LiveKitTrackFile(
s3_key="k2",
room_name="r",
participant_identity="bob",
timestamp=datetime(2026, 4, 1, 10, 1, 10, tzinfo=timezone.utc),
track_id="TR_2",
),
]
offsets = calculate_track_offsets(tracks)
assert offsets[0][1] == 0.0 # alice (earliest)
assert offsets[1][1] == 70.0 # bob (70 seconds later)
def test_three_tracks_earliest_is_zero(self):
tracks = [
LiveKitTrackFile(
s3_key="k2",
room_name="r",
participant_identity="bob",
timestamp=datetime(2026, 4, 1, 10, 0, 30, tzinfo=timezone.utc),
track_id="TR_2",
),
LiveKitTrackFile(
s3_key="k1",
room_name="r",
participant_identity="alice",
timestamp=datetime(2026, 4, 1, 10, 0, 0, tzinfo=timezone.utc),
track_id="TR_1",
),
LiveKitTrackFile(
s3_key="k3",
room_name="r",
participant_identity="charlie",
timestamp=datetime(2026, 4, 1, 10, 1, 0, tzinfo=timezone.utc),
track_id="TR_3",
),
]
offsets = calculate_track_offsets(tracks)
offset_map = {t.participant_identity: o for t, o in offsets}
assert offset_map["alice"] == 0.0
assert offset_map["bob"] == 30.0
assert offset_map["charlie"] == 60.0
def test_empty_tracks(self):
assert calculate_track_offsets([]) == []
def test_simultaneous_tracks_zero_offsets(self):
ts = datetime(2026, 4, 1, 10, 0, 0, tzinfo=timezone.utc)
tracks = [
LiveKitTrackFile(
s3_key="k1",
room_name="r",
participant_identity="a",
timestamp=ts,
track_id="TR_1",
),
LiveKitTrackFile(
s3_key="k2",
room_name="r",
participant_identity="b",
timestamp=ts,
track_id="TR_2",
),
]
offsets = calculate_track_offsets(tracks)
assert all(o == 0.0 for _, o in offsets)
# ── Room name extraction ───────────────────────────────────────
class TestExtractLiveKitBaseRoomName:
def test_strips_timestamp_suffix(self):
assert extract_livekit_base_room_name("myroom-20260401172036") == "myroom"
def test_preserves_hyphenated_name(self):
assert (
extract_livekit_base_room_name("my-room-name-20260401172036")
== "my-room-name"
)
def test_single_segment(self):
assert extract_livekit_base_room_name("room-20260401") == "room"
# ── pad_track padding_seconds behavior ─────────────────────────
class TestPadTrackPaddingSeconds:
"""Test that pad_track correctly uses pre-calculated padding_seconds
for LiveKit (skipping container metadata) vs extracting from container
for Daily (when padding_seconds is None).
"""
def _make_test_ogg(self, path: str, duration_seconds: float = 5.0):
"""Create a minimal OGG/Opus file for testing."""
with av.open(path, "w", format="ogg") as out:
stream = out.add_stream("libopus", rate=48000)
stream.bit_rate = 64000
samples_per_frame = 960 # Opus standard
total_samples = int(duration_seconds * 48000)
pts = 0
while pts < total_samples:
frame = av.AudioFrame(
format="s16", layout="stereo", samples=samples_per_frame
)
# Fill with silence (zeros)
frame.planes[0].update(bytes(samples_per_frame * 2 * 2)) # s16 * stereo
frame.sample_rate = 48000
frame.pts = pts
frame.time_base = Fraction(1, 48000)
for packet in stream.encode(frame):
out.mux(packet)
pts += samples_per_frame
for packet in stream.encode(None):
out.mux(packet)
def test_ogg_has_zero_start_time(self, tmp_path):
"""Verify that OGG files (like LiveKit produces) have start_time=0,
confirming why pre-calculated padding is needed."""
ogg_path = str(tmp_path / "test.ogg")
self._make_test_ogg(ogg_path)
with av.open(ogg_path) as container:
from reflector.utils.audio_padding import (
extract_stream_start_time_from_container,
)
start_time = extract_stream_start_time_from_container(container, 0)
assert start_time <= 0.0, (
"OGG files should have start_time<=0 (no usable offset), confirming "
f"LiveKit tracks need pre-calculated padding_seconds. Got: {start_time}"
)
def test_precalculated_padding_skips_metadata_extraction(self, tmp_path):
"""When padding_seconds is set, pad_track should use it directly
and NOT call extract_stream_start_time_from_container."""
from reflector.hatchet.workflows.track_processing import TrackInput
input_data = TrackInput(
track_index=0,
s3_key="livekit/room/user-abc-2026-04-01T100000-TR_audio.ogg",
bucket_name="test-bucket",
transcript_id="test-transcript",
source_platform="livekit",
padding_seconds=70.0,
)
assert input_data.padding_seconds == 70.0
# The pad_track function checks: if input.padding_seconds is not None → use it
# This means extract_stream_start_time_from_container is never called for LiveKit
def test_none_padding_falls_back_to_metadata(self, tmp_path):
"""When padding_seconds is None (Daily), pad_track should extract
start_time from container metadata."""
from reflector.hatchet.workflows.track_processing import TrackInput
input_data = TrackInput(
track_index=0,
s3_key="daily/room/track.webm",
bucket_name="test-bucket",
transcript_id="test-transcript",
source_platform="daily",
padding_seconds=None,
)
assert input_data.padding_seconds is None
# pad_track will call extract_stream_start_time_from_container for this case
def test_zero_padding_returns_original_key(self):
"""When padding_seconds=0.0, pad_track should return the original S3 key
without applying any padding (same as start_time=0 from metadata)."""
from reflector.hatchet.workflows.track_processing import TrackInput
input_data = TrackInput(
track_index=0,
s3_key="livekit/room/earliest-track.ogg",
bucket_name="test-bucket",
transcript_id="test-transcript",
source_platform="livekit",
padding_seconds=0.0,
)
# padding_seconds=0.0 → start_time_seconds=0.0 → "no padding needed" branch
assert input_data.padding_seconds == 0.0
# ── Pipeline offset calculation (process_tracks logic) ─────────
class TestProcessTracksOffsetCalculation:
"""Test the offset calculation logic used in process_tracks
for LiveKit source_platform."""
def test_livekit_offsets_from_timestamps(self):
"""Simulate the offset calculation done in process_tracks."""
tracks = [
{
"s3_key": "track1.ogg",
"participant_identity": "admin-0129c3",
"timestamp": "2026-04-01T23:44:50+00:00",
},
{
"s3_key": "track2.ogg",
"participant_identity": "juan-5a5b41",
"timestamp": "2026-04-01T23:46:00+00:00",
},
]
# Replicate the logic from process_tracks
timestamps = []
for i, track in enumerate(tracks):
ts_str = track.get("timestamp")
if ts_str:
ts = datetime.fromisoformat(ts_str)
timestamps.append((i, ts))
earliest = min(ts for _, ts in timestamps)
track_padding = {}
for i, ts in timestamps:
track_padding[i] = (ts - earliest).total_seconds()
assert track_padding[0] == 0.0 # admin (earliest)
assert track_padding[1] == 70.0 # juan (70s later)
def test_daily_tracks_get_no_precalculated_padding(self):
"""Daily tracks should NOT get padding_seconds (use container metadata)."""
tracks = [
{"s3_key": "daily-track1.webm"},
{"s3_key": "daily-track2.webm"},
]
# Daily tracks don't have "timestamp" field
track_padding = {}
source_platform = "daily"
if source_platform == "livekit":
# This block should NOT execute for daily
pass
# Daily tracks get no pre-calculated padding
assert track_padding == {}
for i, _ in enumerate(tracks):
assert track_padding.get(i) is None
def test_livekit_missing_timestamp_graceful(self):
"""If a LiveKit track is missing timestamp, it should be skipped."""
tracks = [
{
"s3_key": "track1.ogg",
"participant_identity": "alice",
"timestamp": "2026-04-01T10:00:00+00:00",
},
{"s3_key": "track2.ogg", "participant_identity": "bob"}, # no timestamp
]
timestamps = []
for i, track in enumerate(tracks):
ts_str = track.get("timestamp")
if ts_str:
try:
ts = datetime.fromisoformat(ts_str)
timestamps.append((i, ts))
except (ValueError, TypeError):
timestamps.append((i, None))
else:
timestamps.append((i, None))
valid = [(i, ts) for i, ts in timestamps if ts is not None]
assert len(valid) == 1 # only alice has a timestamp
assert valid[0][0] == 0 # track index 0

View File

@@ -137,6 +137,7 @@ async def mock_storage():
operation: str = "get_object",
expires_in: int = 3600,
bucket=None,
extra_params=None,
):
return f"http://test-storage/{path}"

View File

@@ -373,9 +373,9 @@ async def test_audio_mp3_requires_token_for_owned_transcript(
tr.audio_mp3_filename.parent.mkdir(parents=True, exist_ok=True)
shutil.copy(audio_path, tr.audio_mp3_filename)
# Anonymous GET without token should be 403 or 404 depending on access; we call mp3
# Anonymous GET without token should be 401 (auth required)
resp = await client.get(f"/transcripts/{t.id}/audio/mp3")
assert resp.status_code == 403
assert resp.status_code == 401
# With token should succeed
token = create_access_token(
@@ -452,9 +452,11 @@ async def test_anonymous_cannot_webrtc_record_when_not_public(client, monkeypatc
@pytest.mark.asyncio
async def test_anonymous_cannot_start_meeting_recording_when_not_public(
async def test_anonymous_can_start_meeting_recording_when_not_public(
client, monkeypatch
):
"""Anonymous users can start recording since it's triggered from the frontend
and recording is at room level via Daily REST API."""
monkeypatch.setattr(settings, "PUBLIC_MODE", False)
room = await rooms_controller.add(
@@ -486,7 +488,8 @@ async def test_anonymous_cannot_start_meeting_recording_when_not_public(
f"/meetings/{meeting.id}/recordings/start",
json={"type": "cloud", "instanceId": "00000000-0000-0000-0000-000000000001"},
)
assert resp.status_code == 401, resp.text
# Should not be 401 (may fail for other reasons like no Daily API, but auth passes)
assert resp.status_code != 401, f"Should not get 401: {resp.text}"
# ======================================================================
@@ -898,7 +901,7 @@ async def test_anonymous_transcript_in_list_when_public_mode(client, monkeypatch
@pytest.mark.asyncio
async def test_anonymous_transcript_audio_accessible(client, monkeypatch, tmpdir):
"""Anonymous transcript audio (mp3) is accessible without authentication
because user_id=None bypasses share_mode checks."""
because user_id=None bypasses the auth requirement (pipeline access)."""
monkeypatch.setattr(settings, "PUBLIC_MODE", True)
monkeypatch.setattr(settings, "DATA_DIR", Path(tmpdir).as_posix())
@@ -920,7 +923,7 @@ async def test_anonymous_transcript_audio_accessible(client, monkeypatch, tmpdir
resp = await client.get(f"/transcripts/{t.id}/audio/mp3")
assert (
resp.status_code == 200
), f"Anonymous transcript audio should be accessible: {resp.text}"
), f"Anonymous transcript audio should be accessible for pipeline: {resp.text}"
@pytest.mark.asyncio

View File

@@ -1,7 +1,12 @@
from datetime import datetime, timedelta, timezone
from unittest.mock import AsyncMock, patch
import pytest
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import Recording, recordings_controller
from reflector.db.rooms import rooms_controller
from reflector.db.transcripts import transcripts_controller
from reflector.db.transcripts import SourceKind, transcripts_controller
@pytest.mark.asyncio
@@ -192,9 +197,93 @@ async def test_transcript_delete(authenticated_client, client):
assert response.status_code == 200
assert response.json()["status"] == "ok"
# API returns 404 for soft-deleted transcripts
response = await client.get(f"/transcripts/{tid}")
assert response.status_code == 404
# But the transcript still exists in DB with deleted_at set
transcript = await transcripts_controller.get_by_id(tid)
assert transcript is not None
assert transcript.deleted_at is not None
@pytest.mark.asyncio
async def test_deleted_transcript_not_in_list(authenticated_client, client):
"""Soft-deleted transcripts should not appear in the list endpoint."""
response = await client.post("/transcripts", json={"name": "testdel_list"})
assert response.status_code == 200
tid = response.json()["id"]
# Verify it appears in the list
response = await client.get("/transcripts")
assert response.status_code == 200
ids = [t["id"] for t in response.json()["items"]]
assert tid in ids
# Delete it
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# Verify it no longer appears in the list
response = await client.get("/transcripts")
assert response.status_code == 200
ids = [t["id"] for t in response.json()["items"]]
assert tid not in ids
@pytest.mark.asyncio
async def test_delete_already_deleted_is_idempotent(authenticated_client, client):
"""Deleting an already-deleted transcript is idempotent (returns 200)."""
response = await client.post("/transcripts", json={"name": "testdel_idem"})
assert response.status_code == 200
tid = response.json()["id"]
# First delete
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# Second delete — idempotent, still returns ok
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# But deleted_at was only set once (not updated)
transcript = await transcripts_controller.get_by_id(tid)
assert transcript is not None
assert transcript.deleted_at is not None
@pytest.mark.asyncio
async def test_deleted_transcript_recording_soft_deleted(authenticated_client, client):
"""Soft-deleting a transcript also soft-deletes its recording."""
from datetime import datetime, timezone
recording = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="test.mp4",
recorded_at=datetime.now(timezone.utc),
)
)
transcript = await transcripts_controller.add(
name="with-recording",
source_kind=SourceKind.ROOM,
recording_id=recording.id,
user_id="randomuserid",
)
response = await client.delete(f"/transcripts/{transcript.id}")
assert response.status_code == 200
# Recording still in DB with deleted_at set
rec = await recordings_controller.get_by_id(recording.id)
assert rec is not None
assert rec.deleted_at is not None
# Transcript still in DB with deleted_at set
tr = await transcripts_controller.get_by_id(transcript.id)
assert tr is not None
assert tr.deleted_at is not None
@pytest.mark.asyncio
async def test_transcript_mark_reviewed(authenticated_client, client):
@@ -305,3 +394,463 @@ async def test_transcripts_list_filtered_by_room_id(authenticated_client, client
ids = [t["id"] for t in items]
assert in_room.id in ids
assert other.id not in ids
# ---------------------------------------------------------------------------
# Restore tests
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_transcript_restore(authenticated_client, client):
"""Soft-delete then restore, verify accessible again."""
response = await client.post("/transcripts", json={"name": "restore-me"})
assert response.status_code == 200
tid = response.json()["id"]
# Soft-delete
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# 404 while deleted
response = await client.get(f"/transcripts/{tid}")
assert response.status_code == 404
# Restore
response = await client.post(f"/transcripts/{tid}/restore")
assert response.status_code == 200
assert response.json()["status"] == "ok"
# Accessible again
response = await client.get(f"/transcripts/{tid}")
assert response.status_code == 200
assert response.json()["name"] == "restore-me"
# deleted_at is cleared
transcript = await transcripts_controller.get_by_id(tid)
assert transcript.deleted_at is None
@pytest.mark.asyncio
async def test_transcript_restore_recording_also_restored(authenticated_client, client):
"""Restoring a transcript also restores its recording."""
recording = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="restore-test.mp4",
recorded_at=datetime.now(timezone.utc),
)
)
transcript = await transcripts_controller.add(
name="restore-with-recording",
source_kind=SourceKind.ROOM,
recording_id=recording.id,
user_id="randomuserid",
)
# Soft-delete
response = await client.delete(f"/transcripts/{transcript.id}")
assert response.status_code == 200
# Both should be soft-deleted
rec = await recordings_controller.get_by_id(recording.id)
assert rec.deleted_at is not None
# Restore
response = await client.post(f"/transcripts/{transcript.id}/restore")
assert response.status_code == 200
# Recording also restored
rec = await recordings_controller.get_by_id(recording.id)
assert rec.deleted_at is None
tr = await transcripts_controller.get_by_id(transcript.id)
assert tr.deleted_at is None
@pytest.mark.asyncio
async def test_transcript_restore_not_deleted(authenticated_client, client):
"""Restoring a non-deleted transcript returns 400."""
response = await client.post("/transcripts", json={"name": "not-deleted"})
assert response.status_code == 200
tid = response.json()["id"]
response = await client.post(f"/transcripts/{tid}/restore")
assert response.status_code == 400
@pytest.mark.asyncio
async def test_transcript_restore_not_found(authenticated_client, client):
"""Restoring a nonexistent transcript returns 404."""
response = await client.post("/transcripts/nonexistent-id/restore")
assert response.status_code == 404
@pytest.mark.asyncio
async def test_transcript_restore_forbidden(authenticated_client, client):
"""Cannot restore another user's deleted transcript."""
# Create transcript owned by a different user
transcript = await transcripts_controller.add(
name="other-user-restore",
source_kind=SourceKind.FILE,
user_id="some-other-user",
)
# Soft-delete directly in DB
await transcripts_controller.remove_by_id(transcript.id, user_id="some-other-user")
# Try to restore as randomuserid (authenticated_client)
response = await client.post(f"/transcripts/{transcript.id}/restore")
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Destroy tests
# ---------------------------------------------------------------------------
@pytest.fixture
def mock_destroy_storage():
"""Mock storage backends so hard_delete doesn't require S3 credentials."""
with (
patch(
"reflector.db.transcripts.get_transcripts_storage",
return_value=AsyncMock(delete_file=AsyncMock()),
),
patch(
"reflector.db.transcripts.get_source_storage",
return_value=AsyncMock(delete_file=AsyncMock()),
),
):
yield
@pytest.mark.asyncio
async def test_transcript_destroy(authenticated_client, client, mock_destroy_storage):
"""Soft-delete then destroy, verify transcript gone from DB."""
response = await client.post("/transcripts", json={"name": "destroy-me"})
assert response.status_code == 200
tid = response.json()["id"]
# Soft-delete first
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# Destroy
response = await client.delete(f"/transcripts/{tid}/destroy")
assert response.status_code == 200
assert response.json()["status"] == "ok"
# Gone from DB entirely
transcript = await transcripts_controller.get_by_id(tid)
assert transcript is None
@pytest.mark.asyncio
async def test_transcript_destroy_not_soft_deleted(authenticated_client, client):
"""Cannot destroy a transcript that hasn't been soft-deleted."""
response = await client.post("/transcripts", json={"name": "not-soft-deleted"})
assert response.status_code == 200
tid = response.json()["id"]
response = await client.delete(f"/transcripts/{tid}/destroy")
assert response.status_code == 400
@pytest.mark.asyncio
async def test_transcript_destroy_with_recording(
authenticated_client, client, mock_destroy_storage
):
"""Destroying a transcript also hard-deletes its recording from DB."""
recording = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="destroy-test.mp4",
recorded_at=datetime.now(timezone.utc),
)
)
transcript = await transcripts_controller.add(
name="destroy-with-recording",
source_kind=SourceKind.ROOM,
recording_id=recording.id,
user_id="randomuserid",
)
# Soft-delete
response = await client.delete(f"/transcripts/{transcript.id}")
assert response.status_code == 200
# Destroy
response = await client.delete(f"/transcripts/{transcript.id}/destroy")
assert response.status_code == 200
# Both gone from DB
assert await transcripts_controller.get_by_id(transcript.id) is None
assert await recordings_controller.get_by_id(recording.id) is None
@pytest.mark.asyncio
async def test_transcript_destroy_forbidden(authenticated_client, client):
"""Cannot destroy another user's deleted transcript."""
transcript = await transcripts_controller.add(
name="other-user-destroy",
source_kind=SourceKind.FILE,
user_id="some-other-user",
)
await transcripts_controller.remove_by_id(transcript.id, user_id="some-other-user")
# Try to destroy as randomuserid (authenticated_client)
response = await client.delete(f"/transcripts/{transcript.id}/destroy")
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Isolation tests — verify unrelated data is NOT deleted
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_transcript_destroy_does_not_delete_meeting(
authenticated_client, client, mock_destroy_storage
):
"""Destroying a transcript must NOT delete its associated meeting."""
room = await rooms_controller.add(
name="room-for-meeting-isolation",
user_id="randomuserid",
zulip_auto_post=False,
zulip_stream="",
zulip_topic="",
is_locked=False,
room_mode="normal",
recording_type="cloud",
recording_trigger="automatic-2nd-participant",
is_shared=False,
webhook_url="",
webhook_secret="",
)
now = datetime.now(timezone.utc)
meeting = await meetings_controller.create(
id="meeting-isolation-test",
room_name=room.name,
room_url="https://example.com/room",
host_room_url="https://example.com/room-host",
start_date=now,
end_date=now + timedelta(hours=1),
room=room,
)
recording = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="meeting-iso.mp4",
recorded_at=now,
meeting_id=meeting.id,
)
)
transcript = await transcripts_controller.add(
name="transcript-with-meeting",
source_kind=SourceKind.ROOM,
recording_id=recording.id,
meeting_id=meeting.id,
room_id=room.id,
user_id="randomuserid",
)
# Soft-delete then destroy
await transcripts_controller.remove_by_id(transcript.id, user_id="randomuserid")
response = await client.delete(f"/transcripts/{transcript.id}/destroy")
assert response.status_code == 200
# Transcript and recording are gone
assert await transcripts_controller.get_by_id(transcript.id) is None
assert await recordings_controller.get_by_id(recording.id) is None
# Meeting still exists
m = await meetings_controller.get_by_id(meeting.id)
assert m is not None
assert m.id == meeting.id
@pytest.mark.asyncio
async def test_transcript_destroy_does_not_affect_other_transcripts(
authenticated_client, client, mock_destroy_storage
):
"""Destroying one transcript must not affect another transcript or its recording."""
user_id = "randomuserid"
rec1 = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="sibling1.mp4",
recorded_at=datetime.now(timezone.utc),
)
)
rec2 = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="sibling2.mp4",
recorded_at=datetime.now(timezone.utc),
)
)
t1 = await transcripts_controller.add(
name="sibling-1",
source_kind=SourceKind.FILE,
recording_id=rec1.id,
user_id=user_id,
)
t2 = await transcripts_controller.add(
name="sibling-2",
source_kind=SourceKind.FILE,
recording_id=rec2.id,
user_id=user_id,
)
# Soft-delete and destroy t1
await transcripts_controller.remove_by_id(t1.id, user_id=user_id)
response = await client.delete(f"/transcripts/{t1.id}/destroy")
assert response.status_code == 200
# t1 and rec1 gone
assert await transcripts_controller.get_by_id(t1.id) is None
assert await recordings_controller.get_by_id(rec1.id) is None
# t2 and rec2 untouched
t2_after = await transcripts_controller.get_by_id(t2.id)
assert t2_after is not None
assert t2_after.deleted_at is None
rec2_after = await recordings_controller.get_by_id(rec2.id)
assert rec2_after is not None
assert rec2_after.deleted_at is None
@pytest.mark.asyncio
async def test_transcript_destroy_meeting_with_multiple_transcripts(
authenticated_client, client, mock_destroy_storage
):
"""Destroying one transcript from a meeting must not affect the other
transcript, its recording, or the shared meeting."""
user_id = "randomuserid"
room = await rooms_controller.add(
name="room-multi-transcript",
user_id=user_id,
zulip_auto_post=False,
zulip_stream="",
zulip_topic="",
is_locked=False,
room_mode="normal",
recording_type="cloud",
recording_trigger="automatic-2nd-participant",
is_shared=False,
webhook_url="",
webhook_secret="",
)
now = datetime.now(timezone.utc)
meeting = await meetings_controller.create(
id="meeting-multi-transcript-test",
room_name=room.name,
room_url="https://example.com/room",
host_room_url="https://example.com/room-host",
start_date=now,
end_date=now + timedelta(hours=1),
room=room,
)
rec1 = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="multi1.mp4",
recorded_at=now,
meeting_id=meeting.id,
)
)
rec2 = await recordings_controller.create(
Recording(
bucket_name="test-bucket",
object_key="multi2.mp4",
recorded_at=now,
meeting_id=meeting.id,
)
)
t1 = await transcripts_controller.add(
name="multi-t1",
source_kind=SourceKind.ROOM,
recording_id=rec1.id,
meeting_id=meeting.id,
room_id=room.id,
user_id=user_id,
)
t2 = await transcripts_controller.add(
name="multi-t2",
source_kind=SourceKind.ROOM,
recording_id=rec2.id,
meeting_id=meeting.id,
room_id=room.id,
user_id=user_id,
)
# Soft-delete and destroy t1
await transcripts_controller.remove_by_id(t1.id, user_id=user_id)
response = await client.delete(f"/transcripts/{t1.id}/destroy")
assert response.status_code == 200
# t1 + rec1 gone
assert await transcripts_controller.get_by_id(t1.id) is None
assert await recordings_controller.get_by_id(rec1.id) is None
# t2 + rec2 + meeting all still exist
assert (await transcripts_controller.get_by_id(t2.id)) is not None
assert (await recordings_controller.get_by_id(rec2.id)) is not None
assert (await meetings_controller.get_by_id(meeting.id)) is not None
# ---------------------------------------------------------------------------
# Search tests
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_search_include_deleted(authenticated_client, client):
"""Search with include_deleted=true returns only deleted transcripts."""
response = await client.post("/transcripts", json={"name": "search-deleted"})
assert response.status_code == 200
tid = response.json()["id"]
# Soft-delete
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# Normal search should not include it
response = await client.get("/transcripts/search", params={"q": ""})
assert response.status_code == 200
ids = [r["id"] for r in response.json()["results"]]
assert tid not in ids
# Search with include_deleted should include it
response = await client.get(
"/transcripts/search", params={"q": "", "include_deleted": True}
)
assert response.status_code == 200
ids = [r["id"] for r in response.json()["results"]]
assert tid in ids
@pytest.mark.asyncio
async def test_search_exclude_deleted_by_default(authenticated_client, client):
"""Normal search excludes deleted transcripts by default."""
response = await client.post(
"/transcripts", json={"name": "search-exclude-deleted"}
)
assert response.status_code == 200
tid = response.json()["id"]
# Verify it appears in search
response = await client.get("/transcripts/search", params={"q": ""})
assert response.status_code == 200
ids = [r["id"] for r in response.json()["results"]]
assert tid in ids
# Soft-delete
response = await client.delete(f"/transcripts/{tid}")
assert response.status_code == 200
# Verify it no longer appears in default search
response = await client.get("/transcripts/search", params={"q": ""})
assert response.status_code == 200
ids = [r["id"] for r in response.json()["results"]]
assert tid not in ids

View File

@@ -40,7 +40,7 @@ async def fake_transcript(tmpdir, client, monkeypatch):
],
)
async def test_transcript_audio_download(
fake_transcript, url_suffix, content_type, client
authenticated_client, fake_transcript, url_suffix, content_type, client
):
response = await client.get(f"/transcripts/{fake_transcript.id}/audio{url_suffix}")
assert response.status_code == 200
@@ -61,7 +61,7 @@ async def test_transcript_audio_download(
],
)
async def test_transcript_audio_download_head(
fake_transcript, url_suffix, content_type, client
authenticated_client, fake_transcript, url_suffix, content_type, client
):
response = await client.head(f"/transcripts/{fake_transcript.id}/audio{url_suffix}")
assert response.status_code == 200
@@ -82,7 +82,7 @@ async def test_transcript_audio_download_head(
],
)
async def test_transcript_audio_download_range(
fake_transcript, url_suffix, content_type, client
authenticated_client, fake_transcript, url_suffix, content_type, client
):
response = await client.get(
f"/transcripts/{fake_transcript.id}/audio{url_suffix}",
@@ -102,7 +102,7 @@ async def test_transcript_audio_download_range(
],
)
async def test_transcript_audio_download_range_with_seek(
fake_transcript, url_suffix, content_type, client
authenticated_client, fake_transcript, url_suffix, content_type, client
):
response = await client.get(
f"/transcripts/{fake_transcript.id}/audio{url_suffix}",

Some files were not shown because too many files have changed in this diff Show More