* llm instructions * vibe dailyco * vibe dailyco * doc update (vibe) * dont show recording ui on call * stub processor (vibe) * stub processor (vibe) self-review * stub processor (vibe) self-review * chore(main): release 0.14.0 (#670) * Add multitrack pipeline * Mixdown audio tracks * Mixdown with pyav filter graph * Trigger multitrack processing for daily recordings * apply platform from envs in priority: non-dry * Use explicit track keys for processing * Align tracks of a multitrack recording * Generate waveforms for the mixed audio * Emit multriack pipeline events * Fix multitrack pipeline track alignment * dailico docs * Enable multitrack reprocessing * modal temp files uniform names, cleanup. remove llm temporary docs * docs cleanup * dont proceed with raw recordings if any of the downloads fail * dry transcription pipelines * remove is_miltitrack * comments * explicit dailyco room name * docs * remove stub data/method * frontend daily/whereby code self-review (no-mistake) * frontend daily/whereby code self-review (no-mistakes) * frontend daily/whereby code self-review (no-mistakes) * consent cleanup for multitrack (no-mistakes) * llm fun * remove extra comments * fix tests * merge migrations * Store participant names * Get participants by meeting session id * pop back main branch migration * s3 paddington (no-mistakes) * comment * pr comments * pr comments * pr comments * platform / meeting cleanup * Use participant names in summary generation * platform assignment to meeting at controller level * pr comment * room playform properly default none * room playform properly default none * restore migration lost * streaming WIP * extract storage / use common storage / proper env vars for storage * fix mocks tests * remove fall back * streaming for multifile * cenrtal storage abstraction (no-mistakes) * remove dead code / vars * Set participant user id for authenticated users * whereby recording name parsing fix * whereby recording name parsing fix * more file stream * storage dry + tests * remove homemade boto3 streaming and use proper boto * update migration guide * webhook creation script - print uuid --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com> Co-authored-by: Mathieu Virbel <mat@meltingrocks.com> Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
7.8 KiB
Reflector Architecture: Whereby + Daily.co Recording Storage
System Overview
graph TB
subgraph "Actors"
APP[Our App<br/>Reflector]
WHEREBY[Whereby Service<br/>External]
DAILY[Daily.co Service<br/>External]
end
subgraph "AWS S3 Buckets"
TRANSCRIPT_BUCKET[Transcript Bucket<br/>reflector-transcripts<br/>Output: Processed MP3s]
WHEREBY_BUCKET[Whereby Bucket<br/>reflector-whereby-recordings<br/>Input: Raw MP4s]
DAILY_BUCKET[Daily.co Bucket<br/>reflector-dailyco-recordings<br/>Input: Raw WebM tracks]
end
subgraph "AWS Infrastructure"
SQS[SQS Queue<br/>Whereby notifications]
end
subgraph "Database"
DB[(PostgreSQL<br/>Recordings, Transcripts, Meetings)]
end
APP -->|Write processed| TRANSCRIPT_BUCKET
APP -->|Read/Delete| WHEREBY_BUCKET
APP -->|Read/Delete| DAILY_BUCKET
APP -->|Poll| SQS
APP -->|Store metadata| DB
WHEREBY -->|Write recordings| WHEREBY_BUCKET
WHEREBY_BUCKET -->|S3 Event| SQS
WHEREBY -->|Participant webhooks<br/>room.client.joined/left| APP
DAILY -->|Write recordings| DAILY_BUCKET
DAILY -->|Recording webhook<br/>recording.ready-to-download| APP
Note on Webhook vs S3 Event for Recording Processing:
- Whereby: Uses S3 Events → SQS for recording availability (S3 as source of truth, no race conditions)
- Daily.co: Uses webhooks for recording availability (more immediate, built-in reliability)
- Both: Use webhooks for participant tracking (real-time updates)
Credentials & Permissions
graph LR
subgraph "Master Credentials"
MASTER[TRANSCRIPT_STORAGE_AWS_*<br/>Access Key ID + Secret]
end
subgraph "Whereby Upload Credentials"
WHEREBY_CREDS[AWS_WHEREBY_ACCESS_KEY_*<br/>Access Key ID + Secret]
end
subgraph "Daily.co Upload Role"
DAILY_ROLE[DAILY_STORAGE_AWS_ROLE_ARN<br/>IAM Role ARN]
end
subgraph "Our App Uses"
MASTER -->|Read/Write/Delete| TRANSCRIPT_BUCKET[Transcript Bucket]
MASTER -->|Read/Delete| WHEREBY_BUCKET[Whereby Bucket]
MASTER -->|Read/Delete| DAILY_BUCKET[Daily.co Bucket]
MASTER -->|Poll/Delete| SQS[SQS Queue]
end
subgraph "We Give To Services"
WHEREBY_CREDS -->|Passed in API call| WHEREBY_SERVICE[Whereby Service]
WHEREBY_SERVICE -->|Write Only| WHEREBY_BUCKET
DAILY_ROLE -->|Passed in API call| DAILY_SERVICE[Daily.co Service]
DAILY_SERVICE -->|Assume Role| DAILY_ROLE
DAILY_SERVICE -->|Write Only| DAILY_BUCKET
end
Video Platform Recording Integration
This document explains how Reflector receives and identifies multitrack audio recordings from different video platforms.
Platform Comparison
| Platform | Delivery Method | Track Identification |
|---|---|---|
| Daily.co | Webhook | Explicit track list in payload |
| Whereby | SQS (S3 notifications) | Single file per notification |
Daily.co (Webhook-based)
Daily.co uses webhooks to notify Reflector when recordings are ready.
How It Works
-
Daily.co sends webhook when recording is ready
- Event type:
recording.ready-to-download - Endpoint:
/v1/daily/webhook(reflector/views/daily.py:46-102)
- Event type:
-
Webhook payload explicitly includes track list:
{
"recording_id": "7443ee0a-dab1-40eb-b316-33d6c0d5ff88",
"room_name": "daily-20251020193458",
"tracks": [
{
"type": "audio",
"s3Key": "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
"size": 831843
},
{
"type": "audio",
"s3Key": "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
"size": 408438
},
{
"type": "video",
"s3Key": "monadical/daily-20251020193458/...-video.webm",
"size": 30000000
}
]
}
- System extracts audio tracks (
daily.py:211):
track_keys = [t.s3Key for t in tracks if t.type == "audio"]
- Triggers multitrack processing (
daily.py:213-218):
process_multitrack_recording.delay(
bucket_name=bucket_name, # reflector-dailyco-local
room_name=room_name, # daily-20251020193458
recording_id=recording_id, # 7443ee0a-dab1-40eb-b316-33d6c0d5ff88
track_keys=track_keys # Only audio s3Keys
)
Key Advantage: No Ambiguity
Even though multiple meetings may share the same S3 bucket/folder (monadical/), there's no ambiguity because:
- Each webhook payload contains the exact
s3Keylist for that specificrecording_id - No need to scan folders or guess which files belong together
- Each track's s3Key includes the room timestamp subfolder (e.g.,
daily-20251020193458/)
The room name includes timestamp (daily-20251020193458) to keep recordings organized, but the webhook's explicit track list is what prevents mixing files from different meetings.
Track Timeline Extraction
Daily.co provides timing information in two places:
1. PyAV WebM Metadata (current approach):
# Read from WebM container stream metadata
stream.start_time = 8.130s # Meeting-relative timing
2. Filename Timestamps (alternative approach, commit 3bae9076):
Filename format: {recording_start_ts}-{uuid}-cam-audio-{track_start_ts}.webm
Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm
Parse timestamps:
- recording_start_ts: 1760988935484 (Unix ms)
- track_start_ts: 1760988935922 (Unix ms)
- offset: (1760988935922 - 1760988935484) / 1000 = 0.438s
Time Difference (PyAV vs Filename):
Track 0:
Filename offset: 438ms
PyAV metadata: 229ms
Difference: 209ms
Track 1:
Filename offset: 8339ms
PyAV metadata: 8130ms
Difference: 209ms
Consistent 209ms delta suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata).
Current implementation uses PyAV metadata because:
- More accurate (represents when audio actually started)
- Padding BEFORE transcription produces correct Whisper timestamps automatically
- No manual offset adjustment needed during transcript merge
Why Re-encoding During Padding
Padding coincidentally involves re-encoding, which is important for Daily.co + Whisper:
Problem: Daily.co skips frames in recordings when microphone is muted or paused
- WebM containers have gaps where audio frames should be
- Whisper doesn't understand these gaps and produces incorrect timestamps
- Example: 5s of audio with 2s muted → file has frames only for 3s, Whisper thinks duration is 3s
Solution: Re-encoding via PyAV filter graph (adelay + aresample)
- Restores missing frames as silence
- Produces continuous audio stream without gaps
- Whisper now sees correct duration and produces accurate timestamps
Why combined with padding:
- Already re-encoding for padding (adding initial silence)
- More performant to do both operations in single PyAV pipeline
- Padded values needed for mixdown anyway (creating final MP3)
Implementation: main_multitrack_pipeline.py:_apply_audio_padding_streaming()
Whereby (SQS-based)
Whereby uses AWS SQS (via S3 notifications) to notify Reflector when files are uploaded.
How It Works
- Whereby uploads recording to S3
- S3 sends notification to SQS queue (one notification per file)
- Reflector polls SQS queue (
worker/process.py:process_messages()) - System processes single file (
worker/process.py:process_recording())
Key Difference from Daily.co
Whereby (SQS): System receives S3 notification "file X was created" - only knows about one file at a time, would need to scan folder to find related files
Daily.co (Webhook): Daily explicitly tells system which files belong together in the webhook payload