# Reflector Architecture: Whereby + Daily.co Recording Storage
## System Overview
```mermaid
graph TB
subgraph "Actors"
APP[Our App
Reflector]
WHEREBY[Whereby Service
External]
DAILY[Daily.co Service
External]
end
subgraph "AWS S3 Buckets"
TRANSCRIPT_BUCKET[Transcript Bucket
reflector-transcripts
Output: Processed MP3s]
WHEREBY_BUCKET[Whereby Bucket
reflector-whereby-recordings
Input: Raw MP4s]
DAILY_BUCKET[Daily.co Bucket
reflector-dailyco-recordings
Input: Raw WebM tracks]
end
subgraph "AWS Infrastructure"
SQS[SQS Queue
Whereby notifications]
end
subgraph "Database"
DB[(PostgreSQL
Recordings, Transcripts, Meetings)]
end
APP -->|Write processed| TRANSCRIPT_BUCKET
APP -->|Read/Delete| WHEREBY_BUCKET
APP -->|Read/Delete| DAILY_BUCKET
APP -->|Poll| SQS
APP -->|Store metadata| DB
WHEREBY -->|Write recordings| WHEREBY_BUCKET
WHEREBY_BUCKET -->|S3 Event| SQS
WHEREBY -->|Participant webhooks
room.client.joined/left| APP
DAILY -->|Write recordings| DAILY_BUCKET
DAILY -->|Recording webhook
recording.ready-to-download| APP
```
**Note on Webhook vs S3 Event for Recording Processing:**
- **Whereby**: Uses S3 Events → SQS for recording availability (S3 as source of truth, no race conditions)
- **Daily.co**: Uses webhooks for recording availability (more immediate, built-in reliability)
- **Both**: Use webhooks for participant tracking (real-time updates)
## Credentials & Permissions
```mermaid
graph LR
subgraph "Master Credentials"
MASTER[TRANSCRIPT_STORAGE_AWS_*
Access Key ID + Secret]
end
subgraph "Whereby Upload Credentials"
WHEREBY_CREDS[AWS_WHEREBY_ACCESS_KEY_*
Access Key ID + Secret]
end
subgraph "Daily.co Upload Role"
DAILY_ROLE[DAILY_STORAGE_AWS_ROLE_ARN
IAM Role ARN]
end
subgraph "Our App Uses"
MASTER -->|Read/Write/Delete| TRANSCRIPT_BUCKET[Transcript Bucket]
MASTER -->|Read/Delete| WHEREBY_BUCKET[Whereby Bucket]
MASTER -->|Read/Delete| DAILY_BUCKET[Daily.co Bucket]
MASTER -->|Poll/Delete| SQS[SQS Queue]
end
subgraph "We Give To Services"
WHEREBY_CREDS -->|Passed in API call| WHEREBY_SERVICE[Whereby Service]
WHEREBY_SERVICE -->|Write Only| WHEREBY_BUCKET
DAILY_ROLE -->|Passed in API call| DAILY_SERVICE[Daily.co Service]
DAILY_SERVICE -->|Assume Role| DAILY_ROLE
DAILY_SERVICE -->|Write Only| DAILY_BUCKET
end
```
# Video Platform Recording Integration
This document explains how Reflector receives and identifies multitrack audio recordings from different video platforms.
## Platform Comparison
| Platform | Delivery Method | Track Identification |
|----------|----------------|---------------------|
| **Daily.co** | Webhook | Explicit track list in payload |
| **Whereby** | SQS (S3 notifications) | Single file per notification |
---
## Daily.co
**Note:** Primary discovery via polling (`poll_daily_recordings`), webhooks as backup.
Daily.co uses **webhooks** to notify Reflector when recordings are ready.
### How It Works
1. **Daily.co sends webhook** when recording is ready
- Event type: `recording.ready-to-download`
- Endpoint: `/v1/daily/webhook` (`reflector/views/daily.py:46-102`)
2. **Webhook payload explicitly includes track list**:
```json
{
"recording_id": "7443ee0a-dab1-40eb-b316-33d6c0d5ff88",
"room_name": "daily-20251020193458",
"tracks": [
{
"type": "audio",
"s3Key": "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
"size": 831843
},
{
"type": "audio",
"s3Key": "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
"size": 408438
},
{
"type": "video",
"s3Key": "monadical/daily-20251020193458/...-video.webm",
"size": 30000000
}
]
}
```
3. **System extracts audio tracks** (`daily.py:211`):
```python
track_keys = [t.s3Key for t in tracks if t.type == "audio"]
```
4. **Triggers multitrack processing** (`daily.py:213-218`):
```python
process_multitrack_recording.delay(
bucket_name=bucket_name, # reflector-dailyco-local
room_name=room_name, # daily-20251020193458
recording_id=recording_id, # 7443ee0a-dab1-40eb-b316-33d6c0d5ff88
track_keys=track_keys # Only audio s3Keys
)
```
### Key Advantage: No Ambiguity
Even though multiple meetings may share the same S3 bucket/folder (`monadical/`), **there's no ambiguity** because:
- Each webhook payload contains the exact `s3Key` list for that specific `recording_id`
- No need to scan folders or guess which files belong together
- Each track's s3Key includes the room timestamp subfolder (e.g., `daily-20251020193458/`)
The room name includes timestamp (`daily-20251020193458`) to keep recordings organized, but **the webhook's explicit track list is what prevents mixing files from different meetings**.
### Track Timeline Extraction
Daily.co provides timing information in two places:
**1. PyAV WebM Metadata (current approach)**:
```python
# Read from WebM container stream metadata
stream.start_time = 8.130s # Meeting-relative timing
```
**2. Filename Timestamps (alternative approach, commit 3bae9076)**:
```
Filename format: {recording_start_ts}-{uuid}-cam-audio-{track_start_ts}.webm
Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm
Parse timestamps:
- recording_start_ts: 1760988935484 (Unix ms)
- track_start_ts: 1760988935922 (Unix ms)
- offset: (1760988935922 - 1760988935484) / 1000 = 0.438s
```
**Time Difference (PyAV vs Filename)**:
```
Track 0:
Filename offset: 438ms
PyAV metadata: 229ms
Difference: 209ms
Track 1:
Filename offset: 8339ms
PyAV metadata: 8130ms
Difference: 209ms
```
**Consistent 209ms delta** suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata).
**Current implementation uses PyAV metadata** because:
- More accurate (represents when audio actually started)
- Padding BEFORE transcription produces correct Whisper timestamps automatically
- No manual offset adjustment needed during transcript merge
### Why Re-encoding During Padding
Padding coincidentally involves re-encoding, which is important for Daily.co + Whisper:
**Problem:** Daily.co skips frames in recordings when microphone is muted or paused
- WebM containers have gaps where audio frames should be
- Whisper doesn't understand these gaps and produces incorrect timestamps
- Example: 5s of audio with 2s muted → file has frames only for 3s, Whisper thinks duration is 3s
**Solution:** Re-encoding via PyAV filter graph (`adelay` + `aresample`)
- Restores missing frames as silence
- Produces continuous audio stream without gaps
- Whisper now sees correct duration and produces accurate timestamps
**Why combined with padding:**
- Already re-encoding for padding (adding initial silence)
- More performant to do both operations in single PyAV pipeline
- Padded values needed for mixdown anyway (creating final MP3)
Implementation: `main_multitrack_pipeline.py:_apply_audio_padding_streaming()`
---
## Whereby (SQS-based)
Whereby uses **AWS SQS** (via S3 notifications) to notify Reflector when files are uploaded.
### How It Works
1. **Whereby uploads recording** to S3
2. **S3 sends notification** to SQS queue (one notification per file)
3. **Reflector polls SQS queue** (`worker/process.py:process_messages()`)
4. **System processes single file** (`worker/process.py:process_recording()`)
### Key Difference from Daily.co
**Whereby (SQS):** System receives S3 notification "file X was created" - only knows about one file at a time, would need to scan folder to find related files
**Daily.co (Webhook):** Daily explicitly tells system which files belong together in the webhook payload
---