mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-20 12:19:06 +00:00

Files

Igor Monadical 1473fd82dc feat: daily.co support as alternative to whereby (#691 )

* llm instructions

* vibe dailyco

* vibe dailyco

* doc update (vibe)

* dont show recording ui on call

* stub processor (vibe)

* stub processor (vibe) self-review

* stub processor (vibe) self-review

* chore(main): release 0.14.0 (#670)

* Add multitrack pipeline

* Mixdown audio tracks

* Mixdown with pyav filter graph

* Trigger multitrack processing for daily recordings

* apply platform from envs in priority: non-dry

* Use explicit track keys for processing

* Align tracks of a multitrack recording

* Generate waveforms for the mixed audio

* Emit multriack pipeline events

* Fix multitrack pipeline track alignment

* dailico docs

* Enable multitrack reprocessing

* modal temp files uniform names, cleanup. remove llm temporary docs

* docs cleanup

* dont proceed with raw recordings if any of the downloads fail

* dry transcription pipelines

* remove is_miltitrack

* comments

* explicit dailyco room name

* docs

* remove stub data/method

* frontend daily/whereby code self-review (no-mistake)

* frontend daily/whereby code self-review (no-mistakes)

* frontend daily/whereby code self-review (no-mistakes)

* consent cleanup for multitrack (no-mistakes)

* llm fun

* remove extra comments

* fix tests

* merge migrations

* Store participant names

* Get participants by meeting session id

* pop back main branch migration

* s3 paddington (no-mistakes)

* comment

* pr comments

* pr comments

* pr comments

* platform / meeting cleanup

* Use participant names in summary generation

* platform assignment to meeting at controller level

* pr comment

* room playform properly default none

* room playform properly default none

* restore migration lost

* streaming WIP

* extract storage / use common storage / proper env vars for storage

* fix mocks tests

* remove fall back

* streaming for multifile

* cenrtal storage abstraction (no-mistakes)

* remove dead code / vars

* Set participant user id for authenticated users

* whereby recording name parsing fix

* whereby recording name parsing fix

* more file stream

* storage dry + tests

* remove homemade boto3 streaming and use proper boto

* update migration guide

* webhook creation script - print uuid

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
Co-authored-by: Mathieu Virbel <mat@meltingrocks.com>
Co-authored-by: Sergey Mankovsky <sergey@monadical.com>

2025-11-12 21:21:16 -05:00

7.8 KiB

Raw Blame History

Reflector Architecture: Whereby + Daily.co Recording Storage

System Overview

graph TB
    subgraph "Actors"
        APP[Our App<br/>Reflector]
        WHEREBY[Whereby Service<br/>External]
        DAILY[Daily.co Service<br/>External]
    end

    subgraph "AWS S3 Buckets"
        TRANSCRIPT_BUCKET[Transcript Bucket<br/>reflector-transcripts<br/>Output: Processed MP3s]
        WHEREBY_BUCKET[Whereby Bucket<br/>reflector-whereby-recordings<br/>Input: Raw MP4s]
        DAILY_BUCKET[Daily.co Bucket<br/>reflector-dailyco-recordings<br/>Input: Raw WebM tracks]
    end

    subgraph "AWS Infrastructure"
        SQS[SQS Queue<br/>Whereby notifications]
    end

    subgraph "Database"
        DB[(PostgreSQL<br/>Recordings, Transcripts, Meetings)]
    end

    APP -->|Write processed| TRANSCRIPT_BUCKET
    APP -->|Read/Delete| WHEREBY_BUCKET
    APP -->|Read/Delete| DAILY_BUCKET
    APP -->|Poll| SQS
    APP -->|Store metadata| DB

    WHEREBY -->|Write recordings| WHEREBY_BUCKET
    WHEREBY_BUCKET -->|S3 Event| SQS
    WHEREBY -->|Participant webhooks<br/>room.client.joined/left| APP

    DAILY -->|Write recordings| DAILY_BUCKET
    DAILY -->|Recording webhook<br/>recording.ready-to-download| APP

Note on Webhook vs S3 Event for Recording Processing:

Whereby: Uses S3 Events → SQS for recording availability (S3 as source of truth, no race conditions)
Daily.co: Uses webhooks for recording availability (more immediate, built-in reliability)
Both: Use webhooks for participant tracking (real-time updates)

Credentials & Permissions

graph LR
    subgraph "Master Credentials"
        MASTER[TRANSCRIPT_STORAGE_AWS_*<br/>Access Key ID + Secret]
    end

    subgraph "Whereby Upload Credentials"
        WHEREBY_CREDS[AWS_WHEREBY_ACCESS_KEY_*<br/>Access Key ID + Secret]
    end

    subgraph "Daily.co Upload Role"
        DAILY_ROLE[DAILY_STORAGE_AWS_ROLE_ARN<br/>IAM Role ARN]
    end

    subgraph "Our App Uses"
        MASTER -->|Read/Write/Delete| TRANSCRIPT_BUCKET[Transcript Bucket]
        MASTER -->|Read/Delete| WHEREBY_BUCKET[Whereby Bucket]
        MASTER -->|Read/Delete| DAILY_BUCKET[Daily.co Bucket]
        MASTER -->|Poll/Delete| SQS[SQS Queue]
    end

    subgraph "We Give To Services"
        WHEREBY_CREDS -->|Passed in API call| WHEREBY_SERVICE[Whereby Service]
        WHEREBY_SERVICE -->|Write Only| WHEREBY_BUCKET

        DAILY_ROLE -->|Passed in API call| DAILY_SERVICE[Daily.co Service]
        DAILY_SERVICE -->|Assume Role| DAILY_ROLE
        DAILY_SERVICE -->|Write Only| DAILY_BUCKET
    end

Video Platform Recording Integration

This document explains how Reflector receives and identifies multitrack audio recordings from different video platforms.

Platform Comparison

Platform	Delivery Method	Track Identification
Daily.co	Webhook	Explicit track list in payload
Whereby	SQS (S3 notifications)	Single file per notification

Daily.co (Webhook-based)

Daily.co uses webhooks to notify Reflector when recordings are ready.

How It Works

Daily.co sends webhook when recording is ready
- Event type: recording.ready-to-download
- Endpoint: /v1/daily/webhook (reflector/views/daily.py:46-102)
Webhook payload explicitly includes track list:

{
  "recording_id": "7443ee0a-dab1-40eb-b316-33d6c0d5ff88",
  "room_name": "daily-20251020193458",
  "tracks": [
    {
      "type": "audio",
      "s3Key": "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
      "size": 831843
    },
    {
      "type": "audio",
      "s3Key": "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
      "size": 408438
    },
    {
      "type": "video",
      "s3Key": "monadical/daily-20251020193458/...-video.webm",
      "size": 30000000
    }
  ]
}

System extracts audio tracks (daily.py:211):

track_keys = [t.s3Key for t in tracks if t.type == "audio"]

Triggers multitrack processing (daily.py:213-218):

process_multitrack_recording.delay(
    bucket_name=bucket_name,  # reflector-dailyco-local
    room_name=room_name,      # daily-20251020193458
    recording_id=recording_id, # 7443ee0a-dab1-40eb-b316-33d6c0d5ff88
    track_keys=track_keys      # Only audio s3Keys
)

Key Advantage: No Ambiguity

Even though multiple meetings may share the same S3 bucket/folder (monadical/), there's no ambiguity because:

Each webhook payload contains the exact s3Key list for that specific recording_id
No need to scan folders or guess which files belong together
Each track's s3Key includes the room timestamp subfolder (e.g., daily-20251020193458/)

The room name includes timestamp (daily-20251020193458) to keep recordings organized, but the webhook's explicit track list is what prevents mixing files from different meetings.

Track Timeline Extraction

Daily.co provides timing information in two places:

1. PyAV WebM Metadata (current approach):

# Read from WebM container stream metadata
stream.start_time = 8.130s  # Meeting-relative timing

2. Filename Timestamps (alternative approach, commit 3bae9076):

Filename format: {recording_start_ts}-{uuid}-cam-audio-{track_start_ts}.webm
Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm

Parse timestamps:
- recording_start_ts: 1760988935484 (Unix ms)
- track_start_ts: 1760988935922 (Unix ms)
- offset: (1760988935922 - 1760988935484) / 1000 = 0.438s

Time Difference (PyAV vs Filename):

Track 0:
  Filename offset: 438ms
  PyAV metadata:   229ms
  Difference:      209ms

Track 1:
  Filename offset: 8339ms
  PyAV metadata:   8130ms
  Difference:      209ms

Consistent 209ms delta suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata).

Current implementation uses PyAV metadata because:

More accurate (represents when audio actually started)
Padding BEFORE transcription produces correct Whisper timestamps automatically
No manual offset adjustment needed during transcript merge

Why Re-encoding During Padding

Padding coincidentally involves re-encoding, which is important for Daily.co + Whisper:

Problem: Daily.co skips frames in recordings when microphone is muted or paused

WebM containers have gaps where audio frames should be
Whisper doesn't understand these gaps and produces incorrect timestamps
Example: 5s of audio with 2s muted → file has frames only for 3s, Whisper thinks duration is 3s

Solution: Re-encoding via PyAV filter graph (adelay + aresample)

Restores missing frames as silence
Produces continuous audio stream without gaps
Whisper now sees correct duration and produces accurate timestamps

Why combined with padding:

Already re-encoding for padding (adding initial silence)
More performant to do both operations in single PyAV pipeline
Padded values needed for mixdown anyway (creating final MP3)

Implementation: main_multitrack_pipeline.py:_apply_audio_padding_streaming()

Whereby (SQS-based)

Whereby uses AWS SQS (via S3 notifications) to notify Reflector when files are uploaded.

How It Works

Whereby uploads recording to S3
S3 sends notification to SQS queue (one notification per file)
Reflector polls SQS queue (worker/process.py:process_messages())
System processes single file (worker/process.py:process_recording())

Key Difference from Daily.co

Whereby (SQS): System receives S3 notification "file X was created" - only knows about one file at a time, would need to scan folder to find related files

Daily.co (Webhook): Daily explicitly tells system which files belong together in the webhook payload

7.8 KiB Raw Blame History

Reflector Architecture: Whereby + Daily.co Recording Storage

System Overview

Credentials & Permissions

Video Platform Recording Integration

Platform Comparison

Daily.co (Webhook-based)

How It Works

Key Advantage: No Ambiguity

Track Timeline Extraction

Why Re-encoding During Padding

Whereby (SQS-based)

How It Works

Key Difference from Daily.co

7.8 KiB

Raw Blame History