diff --git a/server/docs/video-platforms/README.md b/server/docs/video-platforms/README.md
new file mode 100644
index 00000000..45a615c3
--- /dev/null
+++ b/server/docs/video-platforms/README.md
@@ -0,0 +1,234 @@
+# Reflector Architecture: Whereby + Daily.co Recording Storage
+
+## System Overview
+
+```mermaid
+graph TB
+ subgraph "Actors"
+ APP[Our App
Reflector]
+ WHEREBY[Whereby Service
External]
+ DAILY[Daily.co Service
External]
+ end
+
+ subgraph "AWS S3 Buckets"
+ TRANSCRIPT_BUCKET[Transcript Bucket
reflector-transcripts
Output: Processed MP3s]
+ WHEREBY_BUCKET[Whereby Bucket
reflector-whereby-recordings
Input: Raw MP4s]
+ DAILY_BUCKET[Daily.co Bucket
reflector-dailyco-recordings
Input: Raw WebM tracks]
+ end
+
+ subgraph "AWS Infrastructure"
+ SQS[SQS Queue
Whereby notifications]
+ end
+
+ subgraph "Database"
+ DB[(PostgreSQL
Recordings, Transcripts, Meetings)]
+ end
+
+ APP -->|Write processed| TRANSCRIPT_BUCKET
+ APP -->|Read/Delete| WHEREBY_BUCKET
+ APP -->|Read/Delete| DAILY_BUCKET
+ APP -->|Poll| SQS
+ APP -->|Store metadata| DB
+
+ WHEREBY -->|Write recordings| WHEREBY_BUCKET
+ WHEREBY_BUCKET -->|S3 Event| SQS
+ WHEREBY -->|Participant webhooks
room.client.joined/left| APP
+
+ DAILY -->|Write recordings| DAILY_BUCKET
+ DAILY -->|Recording webhook
recording.ready-to-download| APP
+```
+
+**Note on Webhook vs S3 Event for Recording Processing:**
+- **Whereby**: Uses S3 Events → SQS for recording availability (S3 as source of truth, no race conditions)
+- **Daily.co**: Uses webhooks for recording availability (more immediate, built-in reliability)
+- **Both**: Use webhooks for participant tracking (real-time updates)
+
+## Credentials & Permissions
+
+```mermaid
+graph LR
+ subgraph "Master Credentials"
+ MASTER[TRANSCRIPT_STORAGE_AWS_*
Access Key ID + Secret]
+ end
+
+ subgraph "Whereby Upload Credentials"
+ WHEREBY_CREDS[AWS_WHEREBY_ACCESS_KEY_*
Access Key ID + Secret]
+ end
+
+ subgraph "Daily.co Upload Role"
+ DAILY_ROLE[DAILY_STORAGE_AWS_ROLE_ARN
IAM Role ARN]
+ end
+
+ subgraph "Our App Uses"
+ MASTER -->|Read/Write/Delete| TRANSCRIPT_BUCKET[Transcript Bucket]
+ MASTER -->|Read/Delete| WHEREBY_BUCKET[Whereby Bucket]
+ MASTER -->|Read/Delete| DAILY_BUCKET[Daily.co Bucket]
+ MASTER -->|Poll/Delete| SQS[SQS Queue]
+ end
+
+ subgraph "We Give To Services"
+ WHEREBY_CREDS -->|Passed in API call| WHEREBY_SERVICE[Whereby Service]
+ WHEREBY_SERVICE -->|Write Only| WHEREBY_BUCKET
+
+ DAILY_ROLE -->|Passed in API call| DAILY_SERVICE[Daily.co Service]
+ DAILY_SERVICE -->|Assume Role| DAILY_ROLE
+ DAILY_SERVICE -->|Write Only| DAILY_BUCKET
+ end
+```
+
+# Video Platform Recording Integration
+
+This document explains how Reflector receives and identifies multitrack audio recordings from different video platforms.
+
+## Platform Comparison
+
+| Platform | Delivery Method | Track Identification |
+|----------|----------------|---------------------|
+| **Daily.co** | Webhook | Explicit track list in payload |
+| **Whereby** | SQS (S3 notifications) | Single file per notification |
+
+---
+
+## Daily.co (Webhook-based)
+
+Daily.co uses **webhooks** to notify Reflector when recordings are ready.
+
+### How It Works
+
+1. **Daily.co sends webhook** when recording is ready
+ - Event type: `recording.ready-to-download`
+ - Endpoint: `/v1/daily/webhook` (`reflector/views/daily.py:46-102`)
+
+2. **Webhook payload explicitly includes track list**:
+```json
+{
+ "recording_id": "7443ee0a-dab1-40eb-b316-33d6c0d5ff88",
+ "room_name": "daily-20251020193458",
+ "tracks": [
+ {
+ "type": "audio",
+ "s3Key": "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
+ "size": 831843
+ },
+ {
+ "type": "audio",
+ "s3Key": "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
+ "size": 408438
+ },
+ {
+ "type": "video",
+ "s3Key": "monadical/daily-20251020193458/...-video.webm",
+ "size": 30000000
+ }
+ ]
+}
+```
+
+3. **System extracts audio tracks** (`daily.py:211`):
+```python
+track_keys = [t.s3Key for t in tracks if t.type == "audio"]
+```
+
+4. **Triggers multitrack processing** (`daily.py:213-218`):
+```python
+process_multitrack_recording.delay(
+ bucket_name=bucket_name, # reflector-dailyco-local
+ room_name=room_name, # daily-20251020193458
+ recording_id=recording_id, # 7443ee0a-dab1-40eb-b316-33d6c0d5ff88
+ track_keys=track_keys # Only audio s3Keys
+)
+```
+
+### Key Advantage: No Ambiguity
+
+Even though multiple meetings may share the same S3 bucket/folder (`monadical/`), **there's no ambiguity** because:
+- Each webhook payload contains the exact `s3Key` list for that specific `recording_id`
+- No need to scan folders or guess which files belong together
+- Each track's s3Key includes the room timestamp subfolder (e.g., `daily-20251020193458/`)
+
+The room name includes timestamp (`daily-20251020193458`) to keep recordings organized, but **the webhook's explicit track list is what prevents mixing files from different meetings**.
+
+### Track Timeline Extraction
+
+Daily.co provides timing information in two places:
+
+**1. PyAV WebM Metadata (current approach)**:
+```python
+# Read from WebM container stream metadata
+stream.start_time = 8.130s # Meeting-relative timing
+```
+
+**2. Filename Timestamps (alternative approach, commit 3bae9076)**:
+```
+Filename format: {recording_start_ts}-{uuid}-cam-audio-{track_start_ts}.webm
+Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm
+
+Parse timestamps:
+- recording_start_ts: 1760988935484 (Unix ms)
+- track_start_ts: 1760988935922 (Unix ms)
+- offset: (1760988935922 - 1760988935484) / 1000 = 0.438s
+```
+
+**Time Difference (PyAV vs Filename)**:
+```
+Track 0:
+ Filename offset: 438ms
+ PyAV metadata: 229ms
+ Difference: 209ms
+
+Track 1:
+ Filename offset: 8339ms
+ PyAV metadata: 8130ms
+ Difference: 209ms
+```
+
+**Consistent 209ms delta** suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata).
+
+**Current implementation uses PyAV metadata** because:
+- More accurate (represents when audio actually started)
+- Padding BEFORE transcription produces correct Whisper timestamps automatically
+- No manual offset adjustment needed during transcript merge
+
+### Why Re-encoding During Padding
+
+Padding coincidentally involves re-encoding, which is important for Daily.co + Whisper:
+
+**Problem:** Daily.co skips frames in recordings when microphone is muted or paused
+- WebM containers have gaps where audio frames should be
+- Whisper doesn't understand these gaps and produces incorrect timestamps
+- Example: 5s of audio with 2s muted → file has frames only for 3s, Whisper thinks duration is 3s
+
+**Solution:** Re-encoding via PyAV filter graph (`adelay` + `aresample`)
+- Restores missing frames as silence
+- Produces continuous audio stream without gaps
+- Whisper now sees correct duration and produces accurate timestamps
+
+**Why combined with padding:**
+- Already re-encoding for padding (adding initial silence)
+- More performant to do both operations in single PyAV pipeline
+- Padded values needed for mixdown anyway (creating final MP3)
+
+Implementation: `main_multitrack_pipeline.py:_apply_audio_padding_streaming()`
+
+---
+
+## Whereby (SQS-based)
+
+Whereby uses **AWS SQS** (via S3 notifications) to notify Reflector when files are uploaded.
+
+### How It Works
+
+1. **Whereby uploads recording** to S3
+2. **S3 sends notification** to SQS queue (one notification per file)
+3. **Reflector polls SQS queue** (`worker/process.py:process_messages()`)
+4. **System processes single file** (`worker/process.py:process_recording()`)
+
+### Key Difference from Daily.co
+
+**Whereby (SQS):** System receives S3 notification "file X was created" - only knows about one file at a time, would need to scan folder to find related files
+
+**Daily.co (Webhook):** Daily explicitly tells system which files belong together in the webhook payload
+
+---
+
+
diff --git a/server/env.example b/server/env.example
index ff0f4211..7375bf0a 100644
--- a/server/env.example
+++ b/server/env.example
@@ -71,3 +71,30 @@ DIARIZATION_URL=https://monadical-sas--reflector-diarizer-web.modal.run
## Sentry DSN configuration
#SENTRY_DSN=
+
+## =======================================================
+## Video Platform Configuration
+## =======================================================
+
+## Whereby
+#WHEREBY_API_KEY=your-whereby-api-key
+#WHEREBY_WEBHOOK_SECRET=your-whereby-webhook-secret
+#WHEREBY_STORAGE_AWS_ACCESS_KEY_ID=your-aws-key
+#WHEREBY_STORAGE_AWS_SECRET_ACCESS_KEY=your-aws-secret
+#AWS_PROCESS_RECORDING_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/...
+
+## Daily.co
+#DAILY_API_KEY=your-daily-api-key
+#DAILY_WEBHOOK_SECRET=your-daily-webhook-secret
+#DAILY_SUBDOMAIN=your-subdomain
+#DAILY_WEBHOOK_UUID= # Auto-populated by recreate_daily_webhook.py script
+#DAILYCO_STORAGE_AWS_ROLE_ARN=... # IAM role ARN for Daily.co S3 access
+#DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco
+#DAILYCO_STORAGE_AWS_REGION=us-west-2
+
+## Whereby (optional separate bucket)
+#WHEREBY_STORAGE_AWS_BUCKET_NAME=reflector-whereby
+#WHEREBY_STORAGE_AWS_REGION=us-east-1
+
+## Platform Configuration
+#DEFAULT_VIDEO_PLATFORM=whereby # Default platform for new rooms
diff --git a/server/migrations/versions/1e49625677e4_add_platform_support.py b/server/migrations/versions/1e49625677e4_add_platform_support.py
new file mode 100644
index 00000000..fa403f92
--- /dev/null
+++ b/server/migrations/versions/1e49625677e4_add_platform_support.py
@@ -0,0 +1,50 @@
+"""add_platform_support
+
+Revision ID: 1e49625677e4
+Revises: 9e3f7b2a4c8e
+Create Date: 2025-10-08 13:17:29.943612
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "1e49625677e4"
+down_revision: Union[str, None] = "9e3f7b2a4c8e"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+ """Add platform field with default 'whereby' for backward compatibility."""
+ with op.batch_alter_table("room", schema=None) as batch_op:
+ batch_op.add_column(
+ sa.Column(
+ "platform",
+ sa.String(),
+ nullable=True,
+ server_default=None,
+ )
+ )
+
+ with op.batch_alter_table("meeting", schema=None) as batch_op:
+ batch_op.add_column(
+ sa.Column(
+ "platform",
+ sa.String(),
+ nullable=False,
+ server_default="whereby",
+ )
+ )
+
+
+def downgrade() -> None:
+ """Remove platform field."""
+ with op.batch_alter_table("meeting", schema=None) as batch_op:
+ batch_op.drop_column("platform")
+
+ with op.batch_alter_table("room", schema=None) as batch_op:
+ batch_op.drop_column("platform")
diff --git a/server/migrations/versions/f8294b31f022_add_track_keys.py b/server/migrations/versions/f8294b31f022_add_track_keys.py
new file mode 100644
index 00000000..7eda6ccc
--- /dev/null
+++ b/server/migrations/versions/f8294b31f022_add_track_keys.py
@@ -0,0 +1,28 @@
+"""add_track_keys
+
+Revision ID: f8294b31f022
+Revises: 1e49625677e4
+Create Date: 2025-10-27 18:52:17.589167
+
+"""
+
+from typing import Sequence, Union
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision: str = "f8294b31f022"
+down_revision: Union[str, None] = "1e49625677e4"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+ with op.batch_alter_table("recording", schema=None) as batch_op:
+ batch_op.add_column(sa.Column("track_keys", sa.JSON(), nullable=True))
+
+
+def downgrade() -> None:
+ with op.batch_alter_table("recording", schema=None) as batch_op:
+ batch_op.drop_column("track_keys")
diff --git a/server/reflector/app.py b/server/reflector/app.py
index a15934f5..2ca76acb 100644
--- a/server/reflector/app.py
+++ b/server/reflector/app.py
@@ -12,6 +12,7 @@ from reflector.events import subscribers_shutdown, subscribers_startup
from reflector.logger import logger
from reflector.metrics import metrics_init
from reflector.settings import settings
+from reflector.views.daily import router as daily_router
from reflector.views.meetings import router as meetings_router
from reflector.views.rooms import router as rooms_router
from reflector.views.rtc_offer import router as rtc_offer_router
@@ -96,6 +97,7 @@ app.include_router(user_api_keys_router, prefix="/v1")
app.include_router(user_ws_router, prefix="/v1")
app.include_router(zulip_router, prefix="/v1")
app.include_router(whereby_router, prefix="/v1")
+app.include_router(daily_router, prefix="/v1/daily")
add_pagination(app)
# prepare celery
diff --git a/server/reflector/db/meetings.py b/server/reflector/db/meetings.py
index 12a0c187..6912b285 100644
--- a/server/reflector/db/meetings.py
+++ b/server/reflector/db/meetings.py
@@ -7,7 +7,10 @@ from sqlalchemy.dialects.postgresql import JSONB
from reflector.db import get_database, metadata
from reflector.db.rooms import Room
+from reflector.schemas.platform import WHEREBY_PLATFORM, Platform
from reflector.utils import generate_uuid4
+from reflector.utils.string import assert_equal
+from reflector.video_platforms.factory import get_platform
meetings = sa.Table(
"meeting",
@@ -55,6 +58,12 @@ meetings = sa.Table(
),
),
sa.Column("calendar_metadata", JSONB),
+ sa.Column(
+ "platform",
+ sa.String,
+ nullable=False,
+ server_default=assert_equal(WHEREBY_PLATFORM, "whereby"),
+ ),
sa.Index("idx_meeting_room_id", "room_id"),
sa.Index("idx_meeting_calendar_event", "calendar_event_id"),
)
@@ -94,13 +103,14 @@ class Meeting(BaseModel):
is_locked: bool = False
room_mode: Literal["normal", "group"] = "normal"
recording_type: Literal["none", "local", "cloud"] = "cloud"
- recording_trigger: Literal[
+ recording_trigger: Literal[ # whereby-specific
"none", "prompt", "automatic", "automatic-2nd-participant"
] = "automatic-2nd-participant"
num_clients: int = 0
is_active: bool = True
calendar_event_id: str | None = None
calendar_metadata: dict[str, Any] | None = None
+ platform: Platform = WHEREBY_PLATFORM
class MeetingController:
@@ -130,6 +140,7 @@ class MeetingController:
recording_trigger=room.recording_trigger,
calendar_event_id=calendar_event_id,
calendar_metadata=calendar_metadata,
+ platform=get_platform(room.platform),
)
query = meetings.insert().values(**meeting.model_dump())
await get_database().execute(query)
@@ -137,7 +148,8 @@ class MeetingController:
async def get_all_active(self) -> list[Meeting]:
query = meetings.select().where(meetings.c.is_active)
- return await get_database().fetch_all(query)
+ results = await get_database().fetch_all(query)
+ return [Meeting(**result) for result in results]
async def get_by_room_name(
self,
@@ -147,16 +159,14 @@ class MeetingController:
Get a meeting by room name.
For backward compatibility, returns the most recent meeting.
"""
- end_date = getattr(meetings.c, "end_date")
query = (
meetings.select()
.where(meetings.c.room_name == room_name)
- .order_by(end_date.desc())
+ .order_by(meetings.c.end_date.desc())
)
result = await get_database().fetch_one(query)
if not result:
return None
-
return Meeting(**result)
async def get_active(self, room: Room, current_time: datetime) -> Meeting | None:
@@ -179,7 +189,6 @@ class MeetingController:
result = await get_database().fetch_one(query)
if not result:
return None
-
return Meeting(**result)
async def get_all_active_for_room(
@@ -219,17 +228,27 @@ class MeetingController:
return None
return Meeting(**result)
- async def get_by_id(self, meeting_id: str, **kwargs) -> Meeting | None:
+ async def get_by_id(
+ self, meeting_id: str, room: Room | None = None
+ ) -> Meeting | None:
query = meetings.select().where(meetings.c.id == meeting_id)
+
+ if room:
+ query = query.where(meetings.c.room_id == room.id)
+
result = await get_database().fetch_one(query)
if not result:
return None
return Meeting(**result)
- async def get_by_calendar_event(self, calendar_event_id: str) -> Meeting | None:
+ async def get_by_calendar_event(
+ self, calendar_event_id: str, room: Room
+ ) -> Meeting | None:
query = meetings.select().where(
meetings.c.calendar_event_id == calendar_event_id
)
+ if room:
+ query = query.where(meetings.c.room_id == room.id)
result = await get_database().fetch_one(query)
if not result:
return None
@@ -239,6 +258,28 @@ class MeetingController:
query = meetings.update().where(meetings.c.id == meeting_id).values(**kwargs)
await get_database().execute(query)
+ async def increment_num_clients(self, meeting_id: str) -> None:
+ """Atomically increment participant count."""
+ query = (
+ meetings.update()
+ .where(meetings.c.id == meeting_id)
+ .values(num_clients=meetings.c.num_clients + 1)
+ )
+ await get_database().execute(query)
+
+ async def decrement_num_clients(self, meeting_id: str) -> None:
+ """Atomically decrement participant count (min 0)."""
+ query = (
+ meetings.update()
+ .where(meetings.c.id == meeting_id)
+ .values(
+ num_clients=sa.case(
+ (meetings.c.num_clients > 0, meetings.c.num_clients - 1), else_=0
+ )
+ )
+ )
+ await get_database().execute(query)
+
class MeetingConsentController:
async def get_by_meeting_id(self, meeting_id: str) -> list[MeetingConsent]:
diff --git a/server/reflector/db/recordings.py b/server/reflector/db/recordings.py
index 0d05790d..bde4afa5 100644
--- a/server/reflector/db/recordings.py
+++ b/server/reflector/db/recordings.py
@@ -21,6 +21,7 @@ recordings = sa.Table(
server_default="pending",
),
sa.Column("meeting_id", sa.String),
+ sa.Column("track_keys", sa.JSON, nullable=True),
sa.Index("idx_recording_meeting_id", "meeting_id"),
)
@@ -28,10 +29,13 @@ recordings = sa.Table(
class Recording(BaseModel):
id: str = Field(default_factory=generate_uuid4)
bucket_name: str
+ # for single-track
object_key: str
recorded_at: datetime
status: Literal["pending", "processing", "completed", "failed"] = "pending"
meeting_id: str | None = None
+ # for multitrack reprocessing
+ track_keys: list[str] | None = None
class RecordingController:
diff --git a/server/reflector/db/rooms.py b/server/reflector/db/rooms.py
index 396c818a..1081ac38 100644
--- a/server/reflector/db/rooms.py
+++ b/server/reflector/db/rooms.py
@@ -9,6 +9,7 @@ from pydantic import BaseModel, Field
from sqlalchemy.sql import false, or_
from reflector.db import get_database, metadata
+from reflector.schemas.platform import Platform
from reflector.utils import generate_uuid4
rooms = sqlalchemy.Table(
@@ -50,6 +51,12 @@ rooms = sqlalchemy.Table(
),
sqlalchemy.Column("ics_last_sync", sqlalchemy.DateTime(timezone=True)),
sqlalchemy.Column("ics_last_etag", sqlalchemy.Text),
+ sqlalchemy.Column(
+ "platform",
+ sqlalchemy.String,
+ nullable=True,
+ server_default=None,
+ ),
sqlalchemy.Index("idx_room_is_shared", "is_shared"),
sqlalchemy.Index("idx_room_ics_enabled", "ics_enabled"),
)
@@ -66,7 +73,7 @@ class Room(BaseModel):
is_locked: bool = False
room_mode: Literal["normal", "group"] = "normal"
recording_type: Literal["none", "local", "cloud"] = "cloud"
- recording_trigger: Literal[
+ recording_trigger: Literal[ # whereby-specific
"none", "prompt", "automatic", "automatic-2nd-participant"
] = "automatic-2nd-participant"
is_shared: bool = False
@@ -77,6 +84,7 @@ class Room(BaseModel):
ics_enabled: bool = False
ics_last_sync: datetime | None = None
ics_last_etag: str | None = None
+ platform: Platform | None = None
class RoomController:
@@ -130,6 +138,7 @@ class RoomController:
ics_url: str | None = None,
ics_fetch_interval: int = 300,
ics_enabled: bool = False,
+ platform: Platform | None = None,
):
"""
Add a new room
@@ -153,6 +162,7 @@ class RoomController:
ics_url=ics_url,
ics_fetch_interval=ics_fetch_interval,
ics_enabled=ics_enabled,
+ platform=platform,
)
query = rooms.insert().values(**room.model_dump())
try:
diff --git a/server/reflector/db/transcripts.py b/server/reflector/db/transcripts.py
index b82e4fe1..f9c3c057 100644
--- a/server/reflector/db/transcripts.py
+++ b/server/reflector/db/transcripts.py
@@ -21,7 +21,7 @@ from reflector.db.utils import is_postgresql
from reflector.logger import logger
from reflector.processors.types import Word as ProcessorWord
from reflector.settings import settings
-from reflector.storage import get_recordings_storage, get_transcripts_storage
+from reflector.storage import get_transcripts_storage
from reflector.utils import generate_uuid4
from reflector.utils.webvtt import topics_to_webvtt
@@ -186,6 +186,7 @@ class TranscriptParticipant(BaseModel):
id: str = Field(default_factory=generate_uuid4)
speaker: int | None
name: str
+ user_id: str | None = None
class Transcript(BaseModel):
@@ -623,7 +624,9 @@ class TranscriptController:
)
if recording:
try:
- await get_recordings_storage().delete_file(recording.object_key)
+ await get_transcripts_storage().delete_file(
+ recording.object_key, bucket=recording.bucket_name
+ )
except Exception as e:
logger.warning(
"Failed to delete recording object from S3",
@@ -725,11 +728,13 @@ class TranscriptController:
"""
Download audio from storage
"""
- transcript.audio_mp3_filename.write_bytes(
- await get_transcripts_storage().get_file(
- transcript.storage_audio_path,
- )
- )
+ storage = get_transcripts_storage()
+ try:
+ with open(transcript.audio_mp3_filename, "wb") as f:
+ await storage.stream_to_fileobj(transcript.storage_audio_path, f)
+ except Exception:
+ transcript.audio_mp3_filename.unlink(missing_ok=True)
+ raise
async def upsert_participant(
self,
diff --git a/server/reflector/pipelines/__init__.py b/server/reflector/pipelines/__init__.py
new file mode 100644
index 00000000..89d3e9de
--- /dev/null
+++ b/server/reflector/pipelines/__init__.py
@@ -0,0 +1 @@
+"""Pipeline modules for audio processing."""
diff --git a/server/reflector/pipelines/main_file_pipeline.py b/server/reflector/pipelines/main_file_pipeline.py
index 0a05d593..6f8e8011 100644
--- a/server/reflector/pipelines/main_file_pipeline.py
+++ b/server/reflector/pipelines/main_file_pipeline.py
@@ -23,23 +23,18 @@ from reflector.db.transcripts import (
transcripts_controller,
)
from reflector.logger import logger
+from reflector.pipelines import topic_processing
from reflector.pipelines.main_live_pipeline import (
PipelineMainBase,
broadcast_to_sockets,
task_cleanup_consent,
task_pipeline_post_to_zulip,
)
-from reflector.processors import (
- AudioFileWriterProcessor,
- TranscriptFinalSummaryProcessor,
- TranscriptFinalTitleProcessor,
- TranscriptTopicDetectorProcessor,
-)
+from reflector.pipelines.transcription_helpers import transcribe_file_with_processor
+from reflector.processors import AudioFileWriterProcessor
from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
from reflector.processors.file_diarization import FileDiarizationInput
from reflector.processors.file_diarization_auto import FileDiarizationAutoProcessor
-from reflector.processors.file_transcript import FileTranscriptInput
-from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
from reflector.processors.transcript_diarization_assembler import (
TranscriptDiarizationAssemblerInput,
TranscriptDiarizationAssemblerProcessor,
@@ -56,19 +51,6 @@ from reflector.storage import get_transcripts_storage
from reflector.worker.webhook import send_transcript_webhook
-class EmptyPipeline:
- """Empty pipeline for processors that need a pipeline reference"""
-
- def __init__(self, logger: structlog.BoundLogger):
- self.logger = logger
-
- def get_pref(self, k, d=None):
- return d
-
- async def emit(self, event):
- pass
-
-
class PipelineMainFile(PipelineMainBase):
"""
Optimized file processing pipeline.
@@ -81,7 +63,7 @@ class PipelineMainFile(PipelineMainBase):
def __init__(self, transcript_id: str):
super().__init__(transcript_id=transcript_id)
self.logger = logger.bind(transcript_id=self.transcript_id)
- self.empty_pipeline = EmptyPipeline(logger=self.logger)
+ self.empty_pipeline = topic_processing.EmptyPipeline(logger=self.logger)
def _handle_gather_exceptions(self, results: list, operation: str) -> None:
"""Handle exceptions from asyncio.gather with return_exceptions=True"""
@@ -262,24 +244,7 @@ class PipelineMainFile(PipelineMainBase):
async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
"""Transcribe complete file"""
- processor = FileTranscriptAutoProcessor()
- input_data = FileTranscriptInput(audio_url=audio_url, language=language)
-
- # Store result for retrieval
- result: TranscriptType | None = None
-
- async def capture_result(transcript):
- nonlocal result
- result = transcript
-
- processor.on(capture_result)
- await processor.push(input_data)
- await processor.flush()
-
- if not result:
- raise ValueError("No transcript captured")
-
- return result
+ return await transcribe_file_with_processor(audio_url, language)
async def diarize_file(self, audio_url: str) -> list[DiarizationSegment] | None:
"""Get diarization for file"""
@@ -322,63 +287,31 @@ class PipelineMainFile(PipelineMainBase):
async def detect_topics(
self, transcript: TranscriptType, target_language: str
) -> list[TitleSummary]:
- """Detect topics from complete transcript"""
- chunk_size = 300
- topics: list[TitleSummary] = []
-
- async def on_topic(topic: TitleSummary):
- topics.append(topic)
- return await self.on_topic(topic)
-
- topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
- topic_detector.set_pipeline(self.empty_pipeline)
-
- for i in range(0, len(transcript.words), chunk_size):
- chunk_words = transcript.words[i : i + chunk_size]
- if not chunk_words:
- continue
-
- chunk_transcript = TranscriptType(
- words=chunk_words, translation=transcript.translation
- )
-
- await topic_detector.push(chunk_transcript)
-
- await topic_detector.flush()
- return topics
+ return await topic_processing.detect_topics(
+ transcript,
+ target_language,
+ on_topic_callback=self.on_topic,
+ empty_pipeline=self.empty_pipeline,
+ )
async def generate_title(self, topics: list[TitleSummary]):
- """Generate title from topics"""
- if not topics:
- self.logger.warning("No topics for title generation")
- return
-
- processor = TranscriptFinalTitleProcessor(callback=self.on_title)
- processor.set_pipeline(self.empty_pipeline)
-
- for topic in topics:
- await processor.push(topic)
-
- await processor.flush()
+ return await topic_processing.generate_title(
+ topics,
+ on_title_callback=self.on_title,
+ empty_pipeline=self.empty_pipeline,
+ logger=self.logger,
+ )
async def generate_summaries(self, topics: list[TitleSummary]):
- """Generate long and short summaries from topics"""
- if not topics:
- self.logger.warning("No topics for summary generation")
- return
-
transcript = await self.get_transcript()
- processor = TranscriptFinalSummaryProcessor(
- transcript=transcript,
- callback=self.on_long_summary,
- on_short_summary=self.on_short_summary,
+ return await topic_processing.generate_summaries(
+ topics,
+ transcript,
+ on_long_summary_callback=self.on_long_summary,
+ on_short_summary_callback=self.on_short_summary,
+ empty_pipeline=self.empty_pipeline,
+ logger=self.logger,
)
- processor.set_pipeline(self.empty_pipeline)
-
- for topic in topics:
- await processor.push(topic)
-
- await processor.flush()
@shared_task
diff --git a/server/reflector/pipelines/main_live_pipeline.py b/server/reflector/pipelines/main_live_pipeline.py
index f6fe6a83..83e560d6 100644
--- a/server/reflector/pipelines/main_live_pipeline.py
+++ b/server/reflector/pipelines/main_live_pipeline.py
@@ -17,7 +17,6 @@ from contextlib import asynccontextmanager
from typing import Generic
import av
-import boto3
from celery import chord, current_task, group, shared_task
from pydantic import BaseModel
from structlog import BoundLogger as Logger
@@ -584,6 +583,7 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
consent_denied = False
recording = None
+ meeting = None
try:
if transcript.recording_id:
recording = await recordings_controller.get_by_id(transcript.recording_id)
@@ -594,8 +594,8 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
meeting.id
)
except Exception as e:
- logger.error(f"Failed to get fetch consent: {e}", exc_info=e)
- consent_denied = True
+ logger.error(f"Failed to fetch consent: {e}", exc_info=e)
+ raise
if not consent_denied:
logger.info("Consent approved, keeping all files")
@@ -603,25 +603,24 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
logger.info("Consent denied, cleaning up all related audio files")
- if recording and recording.bucket_name and recording.object_key:
- s3_whereby = boto3.client(
- "s3",
- aws_access_key_id=settings.AWS_WHEREBY_ACCESS_KEY_ID,
- aws_secret_access_key=settings.AWS_WHEREBY_ACCESS_KEY_SECRET,
- )
- try:
- s3_whereby.delete_object(
- Bucket=recording.bucket_name, Key=recording.object_key
- )
- logger.info(
- f"Deleted original Whereby recording: {recording.bucket_name}/{recording.object_key}"
- )
- except Exception as e:
- logger.error(f"Failed to delete Whereby recording: {e}", exc_info=e)
+ deletion_errors = []
+ if recording and recording.bucket_name:
+ keys_to_delete = []
+ if recording.track_keys:
+ keys_to_delete = recording.track_keys
+ elif recording.object_key:
+ keys_to_delete = [recording.object_key]
+
+ master_storage = get_transcripts_storage()
+ for key in keys_to_delete:
+ try:
+ await master_storage.delete_file(key, bucket=recording.bucket_name)
+ logger.info(f"Deleted recording file: {recording.bucket_name}/{key}")
+ except Exception as e:
+ error_msg = f"Failed to delete {key}: {e}"
+ logger.error(error_msg, exc_info=e)
+ deletion_errors.append(error_msg)
- # non-transactional, files marked for deletion not actually deleted is possible
- await transcripts_controller.update(transcript, {"audio_deleted": True})
- # 2. Delete processed audio from transcript storage S3 bucket
if transcript.audio_location == "storage":
storage = get_transcripts_storage()
try:
@@ -630,18 +629,28 @@ async def cleanup_consent(transcript: Transcript, logger: Logger):
f"Deleted processed audio from storage: {transcript.storage_audio_path}"
)
except Exception as e:
- logger.error(f"Failed to delete processed audio: {e}", exc_info=e)
+ error_msg = f"Failed to delete processed audio: {e}"
+ logger.error(error_msg, exc_info=e)
+ deletion_errors.append(error_msg)
- # 3. Delete local audio files
try:
if hasattr(transcript, "audio_mp3_filename") and transcript.audio_mp3_filename:
transcript.audio_mp3_filename.unlink(missing_ok=True)
if hasattr(transcript, "audio_wav_filename") and transcript.audio_wav_filename:
transcript.audio_wav_filename.unlink(missing_ok=True)
except Exception as e:
- logger.error(f"Failed to delete local audio files: {e}", exc_info=e)
+ error_msg = f"Failed to delete local audio files: {e}"
+ logger.error(error_msg, exc_info=e)
+ deletion_errors.append(error_msg)
- logger.info("Consent cleanup done")
+ if deletion_errors:
+ logger.warning(
+ f"Consent cleanup completed with {len(deletion_errors)} errors",
+ errors=deletion_errors,
+ )
+ else:
+ await transcripts_controller.update(transcript, {"audio_deleted": True})
+ logger.info("Consent cleanup done - all audio deleted")
@get_transcript
diff --git a/server/reflector/pipelines/main_multitrack_pipeline.py b/server/reflector/pipelines/main_multitrack_pipeline.py
new file mode 100644
index 00000000..addcd9b4
--- /dev/null
+++ b/server/reflector/pipelines/main_multitrack_pipeline.py
@@ -0,0 +1,694 @@
+import asyncio
+import math
+import tempfile
+from fractions import Fraction
+from pathlib import Path
+
+import av
+from av.audio.resampler import AudioResampler
+from celery import chain, shared_task
+
+from reflector.asynctask import asynctask
+from reflector.db.transcripts import (
+ TranscriptStatus,
+ TranscriptWaveform,
+ transcripts_controller,
+)
+from reflector.logger import logger
+from reflector.pipelines import topic_processing
+from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed
+from reflector.pipelines.main_live_pipeline import (
+ PipelineMainBase,
+ broadcast_to_sockets,
+ task_cleanup_consent,
+ task_pipeline_post_to_zulip,
+)
+from reflector.pipelines.transcription_helpers import transcribe_file_with_processor
+from reflector.processors import AudioFileWriterProcessor
+from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
+from reflector.processors.types import TitleSummary
+from reflector.processors.types import Transcript as TranscriptType
+from reflector.storage import Storage, get_transcripts_storage
+from reflector.utils.string import NonEmptyString
+
+# Audio encoding constants
+OPUS_STANDARD_SAMPLE_RATE = 48000
+OPUS_DEFAULT_BIT_RATE = 128000
+
+# Storage operation constants
+PRESIGNED_URL_EXPIRATION_SECONDS = 7200 # 2 hours
+
+
+class PipelineMainMultitrack(PipelineMainBase):
+ def __init__(self, transcript_id: str):
+ super().__init__(transcript_id=transcript_id)
+ self.logger = logger.bind(transcript_id=self.transcript_id)
+ self.empty_pipeline = topic_processing.EmptyPipeline(logger=self.logger)
+
+ async def pad_track_for_transcription(
+ self,
+ track_url: NonEmptyString,
+ track_idx: int,
+ storage: Storage,
+ ) -> NonEmptyString:
+ """
+ Pad a single track with silence based on stream metadata start_time.
+ Downloads from S3 presigned URL, processes via PyAV using tempfile, uploads to S3.
+ Returns presigned URL of padded track (or original URL if no padding needed).
+
+ Memory usage:
+ - Pattern: fixed_overhead(2-5MB) for PyAV codec/filters
+ - PyAV streams input efficiently (no full download, verified)
+ - Output written to tempfile (disk-based, not memory)
+ - Upload streams from file handle (boto3 chunks, typically 5-10MB)
+
+ Daily.co raw-tracks timing - Two approaches:
+
+ CURRENT APPROACH (PyAV metadata):
+ The WebM stream.start_time field encodes MEETING-RELATIVE timing:
+ - t=0: When Daily.co recording started (first participant joined)
+ - start_time=8.13s: This participant's track began 8.13s after recording started
+ - Purpose: Enables track alignment without external manifest files
+
+ This is NOT:
+ - Stream-internal offset (first packet timestamp relative to stream start)
+ - Absolute/wall-clock time
+ - Recording duration
+
+ ALTERNATIVE APPROACH (filename parsing):
+ Daily.co filenames contain Unix timestamps (milliseconds):
+ Format: {recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}.webm
+ Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm
+
+ Can calculate offset: (track_start_ts - recording_start_ts) / 1000
+ - Track 0: (1760988935922 - 1760988935484) / 1000 = 0.438s
+ - Track 1: (1760988943823 - 1760988935484) / 1000 = 8.339s
+
+ TIME DIFFERENCE: PyAV metadata vs filename timestamps differ by ~209ms:
+ - Track 0: filename=438ms, metadata=229ms (diff: 209ms)
+ - Track 1: filename=8339ms, metadata=8130ms (diff: 209ms)
+
+ Consistent delta suggests network/encoding delay. PyAV metadata is ground truth
+ (represents when audio stream actually started vs when file upload initiated).
+
+ Example with 2 participants:
+ Track A: start_time=0.2s → Joined 200ms after recording began
+ Track B: start_time=8.1s → Joined 8.1 seconds later
+
+ After padding:
+ Track A: [0.2s silence] + [speech...]
+ Track B: [8.1s silence] + [speech...]
+
+ Whisper transcription timestamps are now synchronized:
+ Track A word at 5.0s → happened at meeting t=5.0s
+ Track B word at 10.0s → happened at meeting t=10.0s
+
+ Merging just sorts by timestamp - no offset calculation needed.
+
+ Padding coincidentally involves re-encoding. It's important when we work with Daily.co + Whisper.
+ This is because Daily.co returns recordings with skipped frames e.g. when microphone muted.
+ Daily.co doesn't understand those frames and ignores them, causing timestamp issues in transcription.
+ Re-encoding restores those frames. We do padding and re-encoding together just because it's convenient and more performant:
+ we need padded values for mix mp3 anyways
+ """
+
+ transcript = await self.get_transcript()
+
+ try:
+ # PyAV streams input from S3 URL efficiently (2-5MB fixed overhead for codec/filters)
+ with av.open(track_url) as in_container:
+ start_time_seconds = self._extract_stream_start_time_from_container(
+ in_container, track_idx
+ )
+
+ if start_time_seconds <= 0:
+ self.logger.info(
+ f"Track {track_idx} requires no padding (start_time={start_time_seconds}s)",
+ track_idx=track_idx,
+ )
+ return track_url
+
+ # Use tempfile instead of BytesIO for better memory efficiency
+ # Reduces peak memory usage during encoding/upload
+ with tempfile.NamedTemporaryFile(
+ suffix=".webm", delete=False
+ ) as temp_file:
+ temp_path = temp_file.name
+
+ try:
+ self._apply_audio_padding_to_file(
+ in_container, temp_path, start_time_seconds, track_idx
+ )
+
+ storage_path = (
+ f"file_pipeline/{transcript.id}/tracks/padded_{track_idx}.webm"
+ )
+
+ # Upload using file handle for streaming
+ with open(temp_path, "rb") as padded_file:
+ await storage.put_file(storage_path, padded_file)
+ finally:
+ # Clean up temp file
+ Path(temp_path).unlink(missing_ok=True)
+
+ padded_url = await storage.get_file_url(
+ storage_path,
+ operation="get_object",
+ expires_in=PRESIGNED_URL_EXPIRATION_SECONDS,
+ )
+
+ self.logger.info(
+ f"Successfully padded track {track_idx}",
+ track_idx=track_idx,
+ start_time_seconds=start_time_seconds,
+ padded_url=padded_url,
+ )
+
+ return padded_url
+
+ except Exception as e:
+ self.logger.error(
+ f"Failed to process track {track_idx}",
+ track_idx=track_idx,
+ url=track_url,
+ error=str(e),
+ exc_info=True,
+ )
+ raise Exception(
+ f"Track {track_idx} padding failed - transcript would have incorrect timestamps"
+ ) from e
+
+ def _extract_stream_start_time_from_container(
+ self, container, track_idx: int
+ ) -> float:
+ """
+ Extract meeting-relative start time from WebM stream metadata.
+ Uses PyAV to read stream.start_time from WebM container.
+ More accurate than filename timestamps by ~209ms due to network/encoding delays.
+ """
+ start_time_seconds = 0.0
+ try:
+ audio_streams = [s for s in container.streams if s.type == "audio"]
+ stream = audio_streams[0] if audio_streams else container.streams[0]
+
+ # 1) Try stream-level start_time (most reliable for Daily.co tracks)
+ if stream.start_time is not None and stream.time_base is not None:
+ start_time_seconds = float(stream.start_time * stream.time_base)
+
+ # 2) Fallback to container-level start_time (in av.time_base units)
+ if (start_time_seconds <= 0) and (container.start_time is not None):
+ start_time_seconds = float(container.start_time * av.time_base)
+
+ # 3) Fallback to first packet DTS in stream.time_base
+ if start_time_seconds <= 0:
+ for packet in container.demux(stream):
+ if packet.dts is not None:
+ start_time_seconds = float(packet.dts * stream.time_base)
+ break
+ except Exception as e:
+ self.logger.warning(
+ "PyAV metadata read failed; assuming 0 start_time",
+ track_idx=track_idx,
+ error=str(e),
+ )
+ start_time_seconds = 0.0
+
+ self.logger.info(
+ f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s",
+ track_idx=track_idx,
+ )
+ return start_time_seconds
+
+ def _apply_audio_padding_to_file(
+ self,
+ in_container,
+ output_path: str,
+ start_time_seconds: float,
+ track_idx: int,
+ ) -> None:
+ """Apply silence padding to audio track using PyAV filter graph, writing to file"""
+ delay_ms = math.floor(start_time_seconds * 1000)
+
+ self.logger.info(
+ f"Padding track {track_idx} with {delay_ms}ms delay using PyAV",
+ track_idx=track_idx,
+ delay_ms=delay_ms,
+ )
+
+ try:
+ with av.open(output_path, "w", format="webm") as out_container:
+ in_stream = next(
+ (s for s in in_container.streams if s.type == "audio"), None
+ )
+ if in_stream is None:
+ raise Exception("No audio stream in input")
+
+ out_stream = out_container.add_stream(
+ "libopus", rate=OPUS_STANDARD_SAMPLE_RATE
+ )
+ out_stream.bit_rate = OPUS_DEFAULT_BIT_RATE
+ graph = av.filter.Graph()
+
+ abuf_args = (
+ f"time_base=1/{OPUS_STANDARD_SAMPLE_RATE}:"
+ f"sample_rate={OPUS_STANDARD_SAMPLE_RATE}:"
+ f"sample_fmt=s16:"
+ f"channel_layout=stereo"
+ )
+ src = graph.add("abuffer", args=abuf_args, name="src")
+ aresample_f = graph.add("aresample", args="async=1", name="ares")
+ # adelay requires one delay value per channel separated by '|'
+ delays_arg = f"{delay_ms}|{delay_ms}"
+ adelay_f = graph.add(
+ "adelay", args=f"delays={delays_arg}:all=1", name="delay"
+ )
+ sink = graph.add("abuffersink", name="sink")
+
+ src.link_to(aresample_f)
+ aresample_f.link_to(adelay_f)
+ adelay_f.link_to(sink)
+ graph.configure()
+
+ resampler = AudioResampler(
+ format="s16", layout="stereo", rate=OPUS_STANDARD_SAMPLE_RATE
+ )
+ # Decode -> resample -> push through graph -> encode Opus
+ for frame in in_container.decode(in_stream):
+ out_frames = resampler.resample(frame) or []
+ for rframe in out_frames:
+ rframe.sample_rate = OPUS_STANDARD_SAMPLE_RATE
+ rframe.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE)
+ src.push(rframe)
+
+ while True:
+ try:
+ f_out = sink.pull()
+ except Exception:
+ break
+ f_out.sample_rate = OPUS_STANDARD_SAMPLE_RATE
+ f_out.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE)
+ for packet in out_stream.encode(f_out):
+ out_container.mux(packet)
+
+ src.push(None)
+ while True:
+ try:
+ f_out = sink.pull()
+ except Exception:
+ break
+ f_out.sample_rate = OPUS_STANDARD_SAMPLE_RATE
+ f_out.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE)
+ for packet in out_stream.encode(f_out):
+ out_container.mux(packet)
+
+ for packet in out_stream.encode(None):
+ out_container.mux(packet)
+ except Exception as e:
+ self.logger.error(
+ "PyAV padding failed for track",
+ track_idx=track_idx,
+ delay_ms=delay_ms,
+ error=str(e),
+ exc_info=True,
+ )
+ raise
+
+ async def mixdown_tracks(
+ self,
+ track_urls: list[str],
+ writer: AudioFileWriterProcessor,
+ offsets_seconds: list[float] | None = None,
+ ) -> None:
+ """Multi-track mixdown using PyAV filter graph (amix), reading from S3 presigned URLs"""
+
+ target_sample_rate: int | None = None
+ for url in track_urls:
+ if not url:
+ continue
+ container = None
+ try:
+ container = av.open(url)
+ for frame in container.decode(audio=0):
+ target_sample_rate = frame.sample_rate
+ break
+ except Exception:
+ continue
+ finally:
+ if container is not None:
+ container.close()
+ if target_sample_rate:
+ break
+
+ if not target_sample_rate:
+ self.logger.error("Mixdown failed - no decodable audio frames found")
+ raise Exception("Mixdown failed: No decodable audio frames in any track")
+ # Build PyAV filter graph:
+ # N abuffer (s32/stereo)
+ # -> optional adelay per input (for alignment)
+ # -> amix (s32)
+ # -> aformat(s16)
+ # -> sink
+ graph = av.filter.Graph()
+ inputs = []
+ valid_track_urls = [url for url in track_urls if url]
+ input_offsets_seconds = None
+ if offsets_seconds is not None:
+ input_offsets_seconds = [
+ offsets_seconds[i] for i, url in enumerate(track_urls) if url
+ ]
+ for idx, url in enumerate(valid_track_urls):
+ args = (
+ f"time_base=1/{target_sample_rate}:"
+ f"sample_rate={target_sample_rate}:"
+ f"sample_fmt=s32:"
+ f"channel_layout=stereo"
+ )
+ in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
+ inputs.append(in_ctx)
+
+ if not inputs:
+ self.logger.error("Mixdown failed - no valid inputs for graph")
+ raise Exception("Mixdown failed: No valid inputs for filter graph")
+
+ mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
+
+ fmt = graph.add(
+ "aformat",
+ args=(
+ f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}"
+ ),
+ name="fmt",
+ )
+
+ sink = graph.add("abuffersink", name="out")
+
+ # Optional per-input delay before mixing
+ delays_ms: list[int] = []
+ if input_offsets_seconds is not None:
+ base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
+ delays_ms = [
+ max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds
+ ]
+ else:
+ delays_ms = [0 for _ in inputs]
+
+ for idx, in_ctx in enumerate(inputs):
+ delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
+ if delay_ms > 0:
+ # adelay requires one value per channel; use same for stereo
+ adelay = graph.add(
+ "adelay",
+ args=f"delays={delay_ms}|{delay_ms}:all=1",
+ name=f"delay{idx}",
+ )
+ in_ctx.link_to(adelay)
+ adelay.link_to(mixer, 0, idx)
+ else:
+ in_ctx.link_to(mixer, 0, idx)
+ mixer.link_to(fmt)
+ fmt.link_to(sink)
+ graph.configure()
+
+ containers = []
+ try:
+ # Open all containers with cleanup guaranteed
+ for i, url in enumerate(valid_track_urls):
+ try:
+ c = av.open(url)
+ containers.append(c)
+ except Exception as e:
+ self.logger.warning(
+ "Mixdown: failed to open container from URL",
+ input=i,
+ url=url,
+ error=str(e),
+ )
+
+ if not containers:
+ self.logger.error("Mixdown failed - no valid containers opened")
+ raise Exception("Mixdown failed: Could not open any track containers")
+
+ decoders = [c.decode(audio=0) for c in containers]
+ active = [True] * len(decoders)
+ resamplers = [
+ AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
+ for _ in decoders
+ ]
+
+ while any(active):
+ for i, (dec, is_active) in enumerate(zip(decoders, active)):
+ if not is_active:
+ continue
+ try:
+ frame = next(dec)
+ except StopIteration:
+ active[i] = False
+ continue
+
+ if frame.sample_rate != target_sample_rate:
+ continue
+ out_frames = resamplers[i].resample(frame) or []
+ for rf in out_frames:
+ rf.sample_rate = target_sample_rate
+ rf.time_base = Fraction(1, target_sample_rate)
+ inputs[i].push(rf)
+
+ while True:
+ try:
+ mixed = sink.pull()
+ except Exception:
+ break
+ mixed.sample_rate = target_sample_rate
+ mixed.time_base = Fraction(1, target_sample_rate)
+ await writer.push(mixed)
+
+ for in_ctx in inputs:
+ in_ctx.push(None)
+ while True:
+ try:
+ mixed = sink.pull()
+ except Exception:
+ break
+ mixed.sample_rate = target_sample_rate
+ mixed.time_base = Fraction(1, target_sample_rate)
+ await writer.push(mixed)
+ finally:
+ # Cleanup all containers, even if processing failed
+ for c in containers:
+ if c is not None:
+ try:
+ c.close()
+ except Exception:
+ pass # Best effort cleanup
+
+ @broadcast_to_sockets
+ async def set_status(self, transcript_id: str, status: TranscriptStatus):
+ async with self.lock_transaction():
+ return await transcripts_controller.set_status(transcript_id, status)
+
+ async def on_waveform(self, data):
+ async with self.transaction():
+ waveform = TranscriptWaveform(waveform=data)
+ transcript = await self.get_transcript()
+ return await transcripts_controller.append_event(
+ transcript=transcript, event="WAVEFORM", data=waveform
+ )
+
+ async def process(self, bucket_name: str, track_keys: list[str]):
+ transcript = await self.get_transcript()
+ async with self.transaction():
+ await transcripts_controller.update(
+ transcript,
+ {
+ "events": [],
+ "topics": [],
+ },
+ )
+
+ source_storage = get_transcripts_storage()
+ transcript_storage = source_storage
+
+ track_urls: list[str] = []
+ for key in track_keys:
+ url = await source_storage.get_file_url(
+ key,
+ operation="get_object",
+ expires_in=PRESIGNED_URL_EXPIRATION_SECONDS,
+ bucket=bucket_name,
+ )
+ track_urls.append(url)
+ self.logger.info(
+ f"Generated presigned URL for track from {bucket_name}",
+ key=key,
+ )
+
+ created_padded_files = set()
+ padded_track_urls: list[str] = []
+ for idx, url in enumerate(track_urls):
+ padded_url = await self.pad_track_for_transcription(
+ url, idx, transcript_storage
+ )
+ padded_track_urls.append(padded_url)
+ if padded_url != url:
+ storage_path = f"file_pipeline/{transcript.id}/tracks/padded_{idx}.webm"
+ created_padded_files.add(storage_path)
+ self.logger.info(f"Track {idx} processed, padded URL: {padded_url}")
+
+ transcript.data_path.mkdir(parents=True, exist_ok=True)
+
+ mp3_writer = AudioFileWriterProcessor(
+ path=str(transcript.audio_mp3_filename),
+ on_duration=self.on_duration,
+ )
+ await self.mixdown_tracks(padded_track_urls, mp3_writer, offsets_seconds=None)
+ await mp3_writer.flush()
+
+ if not transcript.audio_mp3_filename.exists():
+ raise Exception(
+ "Mixdown failed - no MP3 file generated. Cannot proceed without playable audio."
+ )
+
+ storage_path = f"{transcript.id}/audio.mp3"
+ # Use file handle streaming to avoid loading entire MP3 into memory
+ mp3_size = transcript.audio_mp3_filename.stat().st_size
+ with open(transcript.audio_mp3_filename, "rb") as mp3_file:
+ await transcript_storage.put_file(storage_path, mp3_file)
+ mp3_url = await transcript_storage.get_file_url(storage_path)
+
+ await transcripts_controller.update(transcript, {"audio_location": "storage"})
+
+ self.logger.info(
+ f"Uploaded mixed audio to storage",
+ storage_path=storage_path,
+ size=mp3_size,
+ url=mp3_url,
+ )
+
+ self.logger.info("Generating waveform from mixed audio")
+ waveform_processor = AudioWaveformProcessor(
+ audio_path=transcript.audio_mp3_filename,
+ waveform_path=transcript.audio_waveform_filename,
+ on_waveform=self.on_waveform,
+ )
+ waveform_processor.set_pipeline(self.empty_pipeline)
+ await waveform_processor.flush()
+ self.logger.info("Waveform generated successfully")
+
+ speaker_transcripts: list[TranscriptType] = []
+ for idx, padded_url in enumerate(padded_track_urls):
+ if not padded_url:
+ continue
+
+ t = await self.transcribe_file(padded_url, transcript.source_language)
+
+ if not t.words:
+ continue
+
+ for w in t.words:
+ w.speaker = idx
+
+ speaker_transcripts.append(t)
+ self.logger.info(
+ f"Track {idx} transcribed successfully with {len(t.words)} words",
+ track_idx=idx,
+ )
+
+ valid_track_count = len([url for url in padded_track_urls if url])
+ if valid_track_count > 0 and len(speaker_transcripts) != valid_track_count:
+ raise Exception(
+ f"Only {len(speaker_transcripts)}/{valid_track_count} tracks transcribed successfully. "
+ f"All tracks must succeed to avoid incomplete transcripts."
+ )
+
+ if not speaker_transcripts:
+ raise Exception("No valid track transcriptions")
+
+ self.logger.info(f"Cleaning up {len(created_padded_files)} temporary S3 files")
+ cleanup_tasks = []
+ for storage_path in created_padded_files:
+ cleanup_tasks.append(transcript_storage.delete_file(storage_path))
+
+ if cleanup_tasks:
+ cleanup_results = await asyncio.gather(
+ *cleanup_tasks, return_exceptions=True
+ )
+ for storage_path, result in zip(created_padded_files, cleanup_results):
+ if isinstance(result, Exception):
+ self.logger.warning(
+ "Failed to cleanup temporary padded track",
+ storage_path=storage_path,
+ error=str(result),
+ )
+
+ merged_words = []
+ for t in speaker_transcripts:
+ merged_words.extend(t.words)
+ merged_words.sort(
+ key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0
+ )
+
+ merged_transcript = TranscriptType(words=merged_words, translation=None)
+
+ await self.on_transcript(merged_transcript)
+
+ topics = await self.detect_topics(merged_transcript, transcript.target_language)
+ await asyncio.gather(
+ self.generate_title(topics),
+ self.generate_summaries(topics),
+ return_exceptions=False,
+ )
+
+ await self.set_status(transcript.id, "ended")
+
+ async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
+ return await transcribe_file_with_processor(audio_url, language)
+
+ async def detect_topics(
+ self, transcript: TranscriptType, target_language: str
+ ) -> list[TitleSummary]:
+ return await topic_processing.detect_topics(
+ transcript,
+ target_language,
+ on_topic_callback=self.on_topic,
+ empty_pipeline=self.empty_pipeline,
+ )
+
+ async def generate_title(self, topics: list[TitleSummary]):
+ return await topic_processing.generate_title(
+ topics,
+ on_title_callback=self.on_title,
+ empty_pipeline=self.empty_pipeline,
+ logger=self.logger,
+ )
+
+ async def generate_summaries(self, topics: list[TitleSummary]):
+ transcript = await self.get_transcript()
+ return await topic_processing.generate_summaries(
+ topics,
+ transcript,
+ on_long_summary_callback=self.on_long_summary,
+ on_short_summary_callback=self.on_short_summary,
+ empty_pipeline=self.empty_pipeline,
+ logger=self.logger,
+ )
+
+
+@shared_task
+@asynctask
+async def task_pipeline_multitrack_process(
+ *, transcript_id: str, bucket_name: str, track_keys: list[str]
+):
+ pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
+ try:
+ await pipeline.set_status(transcript_id, "processing")
+ await pipeline.process(bucket_name, track_keys)
+ except Exception:
+ await pipeline.set_status(transcript_id, "error")
+ raise
+
+ post_chain = chain(
+ task_cleanup_consent.si(transcript_id=transcript_id),
+ task_pipeline_post_to_zulip.si(transcript_id=transcript_id),
+ task_send_webhook_if_needed.si(transcript_id=transcript_id),
+ )
+ post_chain.delay()
diff --git a/server/reflector/pipelines/topic_processing.py b/server/reflector/pipelines/topic_processing.py
new file mode 100644
index 00000000..7f055025
--- /dev/null
+++ b/server/reflector/pipelines/topic_processing.py
@@ -0,0 +1,109 @@
+"""
+Topic processing utilities
+==========================
+
+Shared topic detection, title generation, and summarization logic
+used across file and multitrack pipelines.
+"""
+
+from typing import Callable
+
+import structlog
+
+from reflector.db.transcripts import Transcript
+from reflector.processors import (
+ TranscriptFinalSummaryProcessor,
+ TranscriptFinalTitleProcessor,
+ TranscriptTopicDetectorProcessor,
+)
+from reflector.processors.types import TitleSummary
+from reflector.processors.types import Transcript as TranscriptType
+
+
+class EmptyPipeline:
+ def __init__(self, logger: structlog.BoundLogger):
+ self.logger = logger
+
+ def get_pref(self, k, d=None):
+ return d
+
+ async def emit(self, event):
+ pass
+
+
+async def detect_topics(
+ transcript: TranscriptType,
+ target_language: str,
+ *,
+ on_topic_callback: Callable,
+ empty_pipeline: EmptyPipeline,
+) -> list[TitleSummary]:
+ chunk_size = 300
+ topics: list[TitleSummary] = []
+
+ async def on_topic(topic: TitleSummary):
+ topics.append(topic)
+ return await on_topic_callback(topic)
+
+ topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
+ topic_detector.set_pipeline(empty_pipeline)
+
+ for i in range(0, len(transcript.words), chunk_size):
+ chunk_words = transcript.words[i : i + chunk_size]
+ if not chunk_words:
+ continue
+
+ chunk_transcript = TranscriptType(
+ words=chunk_words, translation=transcript.translation
+ )
+
+ await topic_detector.push(chunk_transcript)
+
+ await topic_detector.flush()
+ return topics
+
+
+async def generate_title(
+ topics: list[TitleSummary],
+ *,
+ on_title_callback: Callable,
+ empty_pipeline: EmptyPipeline,
+ logger: structlog.BoundLogger,
+):
+ if not topics:
+ logger.warning("No topics for title generation")
+ return
+
+ processor = TranscriptFinalTitleProcessor(callback=on_title_callback)
+ processor.set_pipeline(empty_pipeline)
+
+ for topic in topics:
+ await processor.push(topic)
+
+ await processor.flush()
+
+
+async def generate_summaries(
+ topics: list[TitleSummary],
+ transcript: Transcript,
+ *,
+ on_long_summary_callback: Callable,
+ on_short_summary_callback: Callable,
+ empty_pipeline: EmptyPipeline,
+ logger: structlog.BoundLogger,
+):
+ if not topics:
+ logger.warning("No topics for summary generation")
+ return
+
+ processor = TranscriptFinalSummaryProcessor(
+ transcript=transcript,
+ callback=on_long_summary_callback,
+ on_short_summary=on_short_summary_callback,
+ )
+ processor.set_pipeline(empty_pipeline)
+
+ for topic in topics:
+ await processor.push(topic)
+
+ await processor.flush()
diff --git a/server/reflector/pipelines/transcription_helpers.py b/server/reflector/pipelines/transcription_helpers.py
new file mode 100644
index 00000000..b0cc5858
--- /dev/null
+++ b/server/reflector/pipelines/transcription_helpers.py
@@ -0,0 +1,34 @@
+from reflector.processors.file_transcript import FileTranscriptInput
+from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
+from reflector.processors.types import Transcript as TranscriptType
+
+
+async def transcribe_file_with_processor(
+ audio_url: str,
+ language: str,
+ processor_name: str | None = None,
+) -> TranscriptType:
+ processor = (
+ FileTranscriptAutoProcessor(name=processor_name)
+ if processor_name
+ else FileTranscriptAutoProcessor()
+ )
+ input_data = FileTranscriptInput(audio_url=audio_url, language=language)
+
+ result: TranscriptType | None = None
+
+ async def capture_result(transcript):
+ nonlocal result
+ result = transcript
+
+ processor.on(capture_result)
+ await processor.push(input_data)
+ await processor.flush()
+
+ if not result:
+ processor_label = processor_name or "default"
+ raise ValueError(
+ f"No transcript captured from {processor_label} processor for audio: {audio_url}"
+ )
+
+ return result
diff --git a/server/reflector/processors/summary/summary_builder.py b/server/reflector/processors/summary/summary_builder.py
index efcf9227..df348093 100644
--- a/server/reflector/processors/summary/summary_builder.py
+++ b/server/reflector/processors/summary/summary_builder.py
@@ -165,6 +165,7 @@ class SummaryBuilder:
self.llm: LLM = llm
self.model_name: str = llm.model_name
self.logger = logger or structlog.get_logger()
+ self.participant_instructions: str | None = None
if filename:
self.read_transcript_from_file(filename)
@@ -191,14 +192,61 @@ class SummaryBuilder:
self, prompt: str, output_cls: Type[T], tone_name: str | None = None
) -> T:
"""Generic function to get structured output from LLM for non-function-calling models."""
+ # Add participant instructions to the prompt if available
+ enhanced_prompt = self._enhance_prompt_with_participants(prompt)
return await self.llm.get_structured_response(
- prompt, [self.transcript], output_cls, tone_name=tone_name
+ enhanced_prompt, [self.transcript], output_cls, tone_name=tone_name
)
+ async def _get_response(
+ self, prompt: str, texts: list[str], tone_name: str | None = None
+ ) -> str:
+ """Get text response with automatic participant instructions injection."""
+ enhanced_prompt = self._enhance_prompt_with_participants(prompt)
+ return await self.llm.get_response(enhanced_prompt, texts, tone_name=tone_name)
+
+ def _enhance_prompt_with_participants(self, prompt: str) -> str:
+ """Add participant instructions to any prompt if participants are known."""
+ if self.participant_instructions:
+ self.logger.debug("Adding participant instructions to prompt")
+ return f"{prompt}\n\n{self.participant_instructions}"
+ return prompt
+
# ----------------------------------------------------------------------------
# Participants
# ----------------------------------------------------------------------------
+ def set_known_participants(self, participants: list[str]) -> None:
+ """
+ Set known participants directly without LLM identification.
+ This is used when participants are already identified and stored.
+ They are appended at the end of the transcript, providing more context for the assistant.
+ """
+ if not participants:
+ self.logger.warning("No participants provided")
+ return
+
+ self.logger.info(
+ "Using known participants",
+ participants=participants,
+ )
+
+ participants_md = self.format_list_md(participants)
+ self.transcript += f"\n\n# Participants\n\n{participants_md}"
+
+ # Set instructions that will be automatically added to all prompts
+ participants_list = ", ".join(participants)
+ self.participant_instructions = dedent(
+ f"""
+ # IMPORTANT: Participant Names
+ The following participants are identified in this conversation: {participants_list}
+
+ You MUST use these specific participant names when referring to people in your response.
+ Do NOT use generic terms like "a participant", "someone", "attendee", "Speaker 1", "Speaker 2", etc.
+ Always refer to people by their actual names (e.g., "John suggested..." not "A participant suggested...").
+ """
+ ).strip()
+
async def identify_participants(self) -> None:
"""
From a transcript, try to identify the participants using TreeSummarize with structured output.
@@ -232,6 +280,19 @@ class SummaryBuilder:
if unique_participants:
participants_md = self.format_list_md(unique_participants)
self.transcript += f"\n\n# Participants\n\n{participants_md}"
+
+ # Set instructions that will be automatically added to all prompts
+ participants_list = ", ".join(unique_participants)
+ self.participant_instructions = dedent(
+ f"""
+ # IMPORTANT: Participant Names
+ The following participants are identified in this conversation: {participants_list}
+
+ You MUST use these specific participant names when referring to people in your response.
+ Do NOT use generic terms like "a participant", "someone", "attendee", "Speaker 1", "Speaker 2", etc.
+ Always refer to people by their actual names (e.g., "John suggested..." not "A participant suggested...").
+ """
+ ).strip()
else:
self.logger.warning("No participants identified in the transcript")
@@ -318,13 +379,13 @@ class SummaryBuilder:
for subject in self.subjects:
detailed_prompt = DETAILED_SUBJECT_PROMPT_TEMPLATE.format(subject=subject)
- detailed_response = await self.llm.get_response(
+ detailed_response = await self._get_response(
detailed_prompt, [self.transcript], tone_name="Topic assistant"
)
paragraph_prompt = PARAGRAPH_SUMMARY_PROMPT
- paragraph_response = await self.llm.get_response(
+ paragraph_response = await self._get_response(
paragraph_prompt, [str(detailed_response)], tone_name="Topic summarizer"
)
@@ -345,7 +406,7 @@ class SummaryBuilder:
recap_prompt = RECAP_PROMPT
- recap_response = await self.llm.get_response(
+ recap_response = await self._get_response(
recap_prompt, [summaries_text], tone_name="Recap summarizer"
)
diff --git a/server/reflector/processors/transcript_final_summary.py b/server/reflector/processors/transcript_final_summary.py
index 0b4a594c..dfe07aad 100644
--- a/server/reflector/processors/transcript_final_summary.py
+++ b/server/reflector/processors/transcript_final_summary.py
@@ -26,7 +26,25 @@ class TranscriptFinalSummaryProcessor(Processor):
async def get_summary_builder(self, text) -> SummaryBuilder:
builder = SummaryBuilder(self.llm, logger=self.logger)
builder.set_transcript(text)
- await builder.identify_participants()
+
+ # Use known participants if available, otherwise identify them
+ if self.transcript and self.transcript.participants:
+ # Extract participant names from the stored participants
+ participant_names = [p.name for p in self.transcript.participants if p.name]
+ if participant_names:
+ self.logger.info(
+ f"Using {len(participant_names)} known participants from transcript"
+ )
+ builder.set_known_participants(participant_names)
+ else:
+ self.logger.info(
+ "Participants field exists but is empty, identifying participants"
+ )
+ await builder.identify_participants()
+ else:
+ self.logger.info("No participants stored, identifying participants")
+ await builder.identify_participants()
+
await builder.generate_summary()
return builder
@@ -49,18 +67,30 @@ class TranscriptFinalSummaryProcessor(Processor):
speakermap = {}
if self.transcript:
speakermap = {
- participant["speaker"]: participant["name"]
- for participant in self.transcript.participants
+ p.speaker: p.name
+ for p in (self.transcript.participants or [])
+ if p.speaker is not None and p.name
}
+ self.logger.info(
+ f"Built speaker map with {len(speakermap)} participants",
+ speakermap=speakermap,
+ )
# build the transcript as a single string
- # XXX: unsure if the participants name as replaced directly in speaker ?
+ # Replace speaker IDs with actual participant names if available
text_transcript = []
+ unique_speakers = set()
for topic in self.chunks:
for segment in topic.transcript.as_segments():
name = speakermap.get(segment.speaker, f"Speaker {segment.speaker}")
+ unique_speakers.add((segment.speaker, name))
text_transcript.append(f"{name}: {segment.text}")
+ self.logger.info(
+ f"Built transcript with {len(unique_speakers)} unique speakers",
+ speakers=list(unique_speakers),
+ )
+
text_transcript = "\n".join(text_transcript)
last_chunk = self.chunks[-1]
diff --git a/server/reflector/processors/transcript_topic_detector.py b/server/reflector/processors/transcript_topic_detector.py
index 317e2d9c..695d3af3 100644
--- a/server/reflector/processors/transcript_topic_detector.py
+++ b/server/reflector/processors/transcript_topic_detector.py
@@ -1,6 +1,6 @@
from textwrap import dedent
-from pydantic import BaseModel, Field
+from pydantic import AliasChoices, BaseModel, Field
from reflector.llm import LLM
from reflector.processors.base import Processor
@@ -36,15 +36,13 @@ class TopicResponse(BaseModel):
title: str = Field(
description="A descriptive title for the topic being discussed",
- validation_alias="Title",
+ validation_alias=AliasChoices("title", "Title"),
)
summary: str = Field(
description="A concise 1-2 sentence summary of the discussion",
- validation_alias="Summary",
+ validation_alias=AliasChoices("summary", "Summary"),
)
- model_config = {"populate_by_name": True}
-
class TranscriptTopicDetectorProcessor(Processor):
"""
diff --git a/server/reflector/schemas/platform.py b/server/reflector/schemas/platform.py
new file mode 100644
index 00000000..7b945841
--- /dev/null
+++ b/server/reflector/schemas/platform.py
@@ -0,0 +1,5 @@
+from typing import Literal
+
+Platform = Literal["whereby", "daily"]
+WHEREBY_PLATFORM: Platform = "whereby"
+DAILY_PLATFORM: Platform = "daily"
diff --git a/server/reflector/settings.py b/server/reflector/settings.py
index 9659f648..0e3fb3f7 100644
--- a/server/reflector/settings.py
+++ b/server/reflector/settings.py
@@ -1,6 +1,7 @@
from pydantic.types import PositiveInt
from pydantic_settings import BaseSettings, SettingsConfigDict
+from reflector.schemas.platform import WHEREBY_PLATFORM, Platform
from reflector.utils.string import NonEmptyString
@@ -47,14 +48,17 @@ class Settings(BaseSettings):
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
- # Recording storage
- RECORDING_STORAGE_BACKEND: str | None = None
+ # Platform-specific recording storage (follows {PREFIX}_STORAGE_AWS_{CREDENTIAL} pattern)
+ # Whereby storage configuration
+ WHEREBY_STORAGE_AWS_BUCKET_NAME: str | None = None
+ WHEREBY_STORAGE_AWS_REGION: str | None = None
+ WHEREBY_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
+ WHEREBY_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
- # Recording storage configuration for AWS
- RECORDING_STORAGE_AWS_BUCKET_NAME: str = "recording-bucket"
- RECORDING_STORAGE_AWS_REGION: str = "us-east-1"
- RECORDING_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
- RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
+ # Daily.co storage configuration
+ DAILYCO_STORAGE_AWS_BUCKET_NAME: str | None = None
+ DAILYCO_STORAGE_AWS_REGION: str | None = None
+ DAILYCO_STORAGE_AWS_ROLE_ARN: str | None = None
# Translate into the target language
TRANSLATION_BACKEND: str = "passthrough"
@@ -124,11 +128,20 @@ class Settings(BaseSettings):
WHEREBY_API_URL: str = "https://api.whereby.dev/v1"
WHEREBY_API_KEY: NonEmptyString | None = None
WHEREBY_WEBHOOK_SECRET: str | None = None
- AWS_WHEREBY_ACCESS_KEY_ID: str | None = None
- AWS_WHEREBY_ACCESS_KEY_SECRET: str | None = None
AWS_PROCESS_RECORDING_QUEUE_URL: str | None = None
SQS_POLLING_TIMEOUT_SECONDS: int = 60
+ # Daily.co integration
+ DAILY_API_KEY: str | None = None
+ DAILY_WEBHOOK_SECRET: str | None = None
+ DAILY_SUBDOMAIN: str | None = None
+ DAILY_WEBHOOK_UUID: str | None = (
+ None # Webhook UUID for this environment. Not used by production code
+ )
+
+ # Platform Configuration
+ DEFAULT_VIDEO_PLATFORM: Platform = WHEREBY_PLATFORM
+
# Zulip integration
ZULIP_REALM: str | None = None
ZULIP_API_KEY: str | None = None
diff --git a/server/reflector/storage/__init__.py b/server/reflector/storage/__init__.py
index 3db8a77b..aff6c767 100644
--- a/server/reflector/storage/__init__.py
+++ b/server/reflector/storage/__init__.py
@@ -3,6 +3,13 @@ from reflector.settings import settings
def get_transcripts_storage() -> Storage:
+ """
+ Get storage for processed transcript files (master credentials).
+
+ Also use this for ALL our file operations with bucket override:
+ master = get_transcripts_storage()
+ master.delete_file(key, bucket=recording.bucket_name)
+ """
assert settings.TRANSCRIPT_STORAGE_BACKEND
return Storage.get_instance(
name=settings.TRANSCRIPT_STORAGE_BACKEND,
@@ -10,8 +17,53 @@ def get_transcripts_storage() -> Storage:
)
-def get_recordings_storage() -> Storage:
+def get_whereby_storage() -> Storage:
+ """
+ Get storage config for Whereby (for passing to Whereby API).
+
+ Usage:
+ whereby_storage = get_whereby_storage()
+ key_id, secret = whereby_storage.key_credentials
+ whereby_api.create_meeting(
+ bucket=whereby_storage.bucket_name,
+ access_key_id=key_id,
+ secret=secret,
+ )
+
+ Do NOT use for our file operations - use get_transcripts_storage() instead.
+ """
+ if not settings.WHEREBY_STORAGE_AWS_BUCKET_NAME:
+ raise ValueError(
+ "WHEREBY_STORAGE_AWS_BUCKET_NAME required for Whereby with AWS storage"
+ )
+
return Storage.get_instance(
- name=settings.RECORDING_STORAGE_BACKEND,
- settings_prefix="RECORDING_STORAGE_",
+ name="aws",
+ settings_prefix="WHEREBY_STORAGE_",
+ )
+
+
+def get_dailyco_storage() -> Storage:
+ """
+ Get storage config for Daily.co (for passing to Daily API).
+
+ Usage:
+ daily_storage = get_dailyco_storage()
+ daily_api.create_meeting(
+ bucket=daily_storage.bucket_name,
+ region=daily_storage.region,
+ role_arn=daily_storage.role_credential,
+ )
+
+ Do NOT use for our file operations - use get_transcripts_storage() instead.
+ """
+ # Fail fast if platform-specific config missing
+ if not settings.DAILYCO_STORAGE_AWS_BUCKET_NAME:
+ raise ValueError(
+ "DAILYCO_STORAGE_AWS_BUCKET_NAME required for Daily.co with AWS storage"
+ )
+
+ return Storage.get_instance(
+ name="aws",
+ settings_prefix="DAILYCO_STORAGE_",
)
diff --git a/server/reflector/storage/base.py b/server/reflector/storage/base.py
index 360930d8..ba4316d8 100644
--- a/server/reflector/storage/base.py
+++ b/server/reflector/storage/base.py
@@ -1,10 +1,23 @@
import importlib
+from typing import BinaryIO, Union
from pydantic import BaseModel
from reflector.settings import settings
+class StorageError(Exception):
+ """Base exception for storage operations."""
+
+ pass
+
+
+class StoragePermissionError(StorageError):
+ """Exception raised when storage operation fails due to permission issues."""
+
+ pass
+
+
class FileResult(BaseModel):
filename: str
url: str
@@ -36,26 +49,113 @@ class Storage:
return cls._registry[name](**config)
- async def put_file(self, filename: str, data: bytes) -> FileResult:
- return await self._put_file(filename, data)
-
- async def _put_file(self, filename: str, data: bytes) -> FileResult:
+ # Credential properties for API passthrough
+ @property
+ def bucket_name(self) -> str:
+ """Default bucket name for this storage instance."""
raise NotImplementedError
- async def delete_file(self, filename: str):
- return await self._delete_file(filename)
-
- async def _delete_file(self, filename: str):
+ @property
+ def region(self) -> str:
+ """AWS region for this storage instance."""
raise NotImplementedError
- async def get_file_url(self, filename: str) -> str:
- return await self._get_file_url(filename)
+ @property
+ def access_key_id(self) -> str | None:
+ """AWS access key ID (None for role-based auth). Prefer key_credentials property."""
+ return None
- async def _get_file_url(self, filename: str) -> str:
+ @property
+ def secret_access_key(self) -> str | None:
+ """AWS secret access key (None for role-based auth). Prefer key_credentials property."""
+ return None
+
+ @property
+ def role_arn(self) -> str | None:
+ """AWS IAM role ARN for role-based auth (None for key-based auth). Prefer role_credential property."""
+ return None
+
+ @property
+ def key_credentials(self) -> tuple[str, str]:
+ """
+ Get (access_key_id, secret_access_key) for key-based auth.
+ Raises ValueError if storage uses IAM role instead.
+ """
raise NotImplementedError
- async def get_file(self, filename: str):
- return await self._get_file(filename)
-
- async def _get_file(self, filename: str):
+ @property
+ def role_credential(self) -> str:
+ """
+ Get IAM role ARN for role-based auth.
+ Raises ValueError if storage uses access keys instead.
+ """
+ raise NotImplementedError
+
+ async def put_file(
+ self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None
+ ) -> FileResult:
+ """Upload data. bucket: override instance default if provided."""
+ return await self._put_file(filename, data, bucket=bucket)
+
+ async def _put_file(
+ self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None
+ ) -> FileResult:
+ raise NotImplementedError
+
+ async def delete_file(self, filename: str, *, bucket: str | None = None):
+ """Delete file. bucket: override instance default if provided."""
+ return await self._delete_file(filename, bucket=bucket)
+
+ async def _delete_file(self, filename: str, *, bucket: str | None = None):
+ raise NotImplementedError
+
+ async def get_file_url(
+ self,
+ filename: str,
+ operation: str = "get_object",
+ expires_in: int = 3600,
+ *,
+ bucket: str | None = None,
+ ) -> str:
+ """Generate presigned URL. bucket: override instance default if provided."""
+ return await self._get_file_url(filename, operation, expires_in, bucket=bucket)
+
+ async def _get_file_url(
+ self,
+ filename: str,
+ operation: str = "get_object",
+ expires_in: int = 3600,
+ *,
+ bucket: str | None = None,
+ ) -> str:
+ raise NotImplementedError
+
+ async def get_file(self, filename: str, *, bucket: str | None = None):
+ """Download file. bucket: override instance default if provided."""
+ return await self._get_file(filename, bucket=bucket)
+
+ async def _get_file(self, filename: str, *, bucket: str | None = None):
+ raise NotImplementedError
+
+ async def list_objects(
+ self, prefix: str = "", *, bucket: str | None = None
+ ) -> list[str]:
+ """List object keys. bucket: override instance default if provided."""
+ return await self._list_objects(prefix, bucket=bucket)
+
+ async def _list_objects(
+ self, prefix: str = "", *, bucket: str | None = None
+ ) -> list[str]:
+ raise NotImplementedError
+
+ async def stream_to_fileobj(
+ self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None
+ ):
+ """Stream file directly to file object without loading into memory.
+ bucket: override instance default if provided."""
+ return await self._stream_to_fileobj(filename, fileobj, bucket=bucket)
+
+ async def _stream_to_fileobj(
+ self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None
+ ):
raise NotImplementedError
diff --git a/server/reflector/storage/storage_aws.py b/server/reflector/storage/storage_aws.py
index de9ccf35..372af4aa 100644
--- a/server/reflector/storage/storage_aws.py
+++ b/server/reflector/storage/storage_aws.py
@@ -1,79 +1,236 @@
+from functools import wraps
+from typing import BinaryIO, Union
+
import aioboto3
+from botocore.config import Config
+from botocore.exceptions import ClientError
from reflector.logger import logger
-from reflector.storage.base import FileResult, Storage
+from reflector.storage.base import FileResult, Storage, StoragePermissionError
+
+
+def handle_s3_client_errors(operation_name: str):
+ """Decorator to handle S3 ClientError with bucket-aware messaging.
+
+ Args:
+ operation_name: Human-readable operation name for error messages (e.g., "upload", "delete")
+ """
+
+ def decorator(func):
+ @wraps(func)
+ async def wrapper(self, *args, **kwargs):
+ bucket = kwargs.get("bucket")
+ try:
+ return await func(self, *args, **kwargs)
+ except ClientError as e:
+ error_code = e.response.get("Error", {}).get("Code")
+ if error_code in ("AccessDenied", "NoSuchBucket"):
+ actual_bucket = bucket or self._bucket_name
+ bucket_context = (
+ f"overridden bucket '{actual_bucket}'"
+ if bucket
+ else f"default bucket '{actual_bucket}'"
+ )
+ raise StoragePermissionError(
+ f"S3 {operation_name} failed for {bucket_context}: {error_code}. "
+ f"Check TRANSCRIPT_STORAGE_AWS_* credentials have permission."
+ ) from e
+ raise
+
+ return wrapper
+
+ return decorator
class AwsStorage(Storage):
+ """AWS S3 storage with bucket override for multi-platform recording architecture.
+ Master credentials access all buckets via optional bucket parameter in operations."""
+
def __init__(
self,
- aws_access_key_id: str,
- aws_secret_access_key: str,
aws_bucket_name: str,
aws_region: str,
+ aws_access_key_id: str | None = None,
+ aws_secret_access_key: str | None = None,
+ aws_role_arn: str | None = None,
):
- if not aws_access_key_id:
- raise ValueError("Storage `aws_storage` require `aws_access_key_id`")
- if not aws_secret_access_key:
- raise ValueError("Storage `aws_storage` require `aws_secret_access_key`")
if not aws_bucket_name:
raise ValueError("Storage `aws_storage` require `aws_bucket_name`")
if not aws_region:
raise ValueError("Storage `aws_storage` require `aws_region`")
+ if not aws_access_key_id and not aws_role_arn:
+ raise ValueError(
+ "Storage `aws_storage` require either `aws_access_key_id` or `aws_role_arn`"
+ )
+ if aws_role_arn and (aws_access_key_id or aws_secret_access_key):
+ raise ValueError(
+ "Storage `aws_storage` cannot use both `aws_role_arn` and access keys"
+ )
super().__init__()
- self.aws_bucket_name = aws_bucket_name
+ self._bucket_name = aws_bucket_name
+ self._region = aws_region
+ self._access_key_id = aws_access_key_id
+ self._secret_access_key = aws_secret_access_key
+ self._role_arn = aws_role_arn
+
self.aws_folder = ""
if "/" in aws_bucket_name:
- self.aws_bucket_name, self.aws_folder = aws_bucket_name.split("/", 1)
+ self._bucket_name, self.aws_folder = aws_bucket_name.split("/", 1)
+ self.boto_config = Config(retries={"max_attempts": 3, "mode": "adaptive"})
self.session = aioboto3.Session(
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=aws_region,
)
- self.base_url = f"https://{aws_bucket_name}.s3.amazonaws.com/"
+ self.base_url = f"https://{self._bucket_name}.s3.amazonaws.com/"
- async def _put_file(self, filename: str, data: bytes) -> FileResult:
- bucket = self.aws_bucket_name
- folder = self.aws_folder
- logger.info(f"Uploading {filename} to S3 {bucket}/{folder}")
- s3filename = f"{folder}/{filename}" if folder else filename
- async with self.session.client("s3") as client:
- await client.put_object(
- Bucket=bucket,
- Key=s3filename,
- Body=data,
+ # Implement credential properties
+ @property
+ def bucket_name(self) -> str:
+ return self._bucket_name
+
+ @property
+ def region(self) -> str:
+ return self._region
+
+ @property
+ def access_key_id(self) -> str | None:
+ return self._access_key_id
+
+ @property
+ def secret_access_key(self) -> str | None:
+ return self._secret_access_key
+
+ @property
+ def role_arn(self) -> str | None:
+ return self._role_arn
+
+ @property
+ def key_credentials(self) -> tuple[str, str]:
+ """Get (access_key_id, secret_access_key) for key-based auth."""
+ if self._role_arn:
+ raise ValueError(
+ "Storage uses IAM role authentication. "
+ "Use role_credential property instead of key_credentials."
)
+ if not self._access_key_id or not self._secret_access_key:
+ raise ValueError("Storage access key credentials not configured")
+ return (self._access_key_id, self._secret_access_key)
- async def _get_file_url(self, filename: str) -> FileResult:
- bucket = self.aws_bucket_name
+ @property
+ def role_credential(self) -> str:
+ """Get IAM role ARN for role-based auth."""
+ if self._access_key_id or self._secret_access_key:
+ raise ValueError(
+ "Storage uses access key authentication. "
+ "Use key_credentials property instead of role_credential."
+ )
+ if not self._role_arn:
+ raise ValueError("Storage IAM role ARN not configured")
+ return self._role_arn
+
+ @handle_s3_client_errors("upload")
+ async def _put_file(
+ self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None
+ ) -> FileResult:
+ actual_bucket = bucket or self._bucket_name
folder = self.aws_folder
s3filename = f"{folder}/{filename}" if folder else filename
- async with self.session.client("s3") as client:
+ logger.info(f"Uploading {filename} to S3 {actual_bucket}/{folder}")
+
+ async with self.session.client("s3", config=self.boto_config) as client:
+ if isinstance(data, bytes):
+ await client.put_object(Bucket=actual_bucket, Key=s3filename, Body=data)
+ else:
+ # boto3 reads file-like object in chunks
+ # avoids creating extra memory copy vs bytes.getvalue() approach
+ await client.upload_fileobj(data, Bucket=actual_bucket, Key=s3filename)
+
+ url = await self._get_file_url(filename, bucket=bucket)
+ return FileResult(filename=filename, url=url)
+
+ @handle_s3_client_errors("presign")
+ async def _get_file_url(
+ self,
+ filename: str,
+ operation: str = "get_object",
+ expires_in: int = 3600,
+ *,
+ bucket: str | None = None,
+ ) -> str:
+ actual_bucket = bucket or self._bucket_name
+ folder = self.aws_folder
+ s3filename = f"{folder}/{filename}" if folder else filename
+ async with self.session.client("s3", config=self.boto_config) as client:
presigned_url = await client.generate_presigned_url(
- "get_object",
- Params={"Bucket": bucket, "Key": s3filename},
- ExpiresIn=3600,
+ operation,
+ Params={"Bucket": actual_bucket, "Key": s3filename},
+ ExpiresIn=expires_in,
)
return presigned_url
- async def _delete_file(self, filename: str):
- bucket = self.aws_bucket_name
+ @handle_s3_client_errors("delete")
+ async def _delete_file(self, filename: str, *, bucket: str | None = None):
+ actual_bucket = bucket or self._bucket_name
folder = self.aws_folder
- logger.info(f"Deleting {filename} from S3 {bucket}/{folder}")
+ logger.info(f"Deleting {filename} from S3 {actual_bucket}/{folder}")
s3filename = f"{folder}/{filename}" if folder else filename
- async with self.session.client("s3") as client:
- await client.delete_object(Bucket=bucket, Key=s3filename)
+ async with self.session.client("s3", config=self.boto_config) as client:
+ await client.delete_object(Bucket=actual_bucket, Key=s3filename)
- async def _get_file(self, filename: str):
- bucket = self.aws_bucket_name
+ @handle_s3_client_errors("download")
+ async def _get_file(self, filename: str, *, bucket: str | None = None):
+ actual_bucket = bucket or self._bucket_name
folder = self.aws_folder
- logger.info(f"Downloading {filename} from S3 {bucket}/{folder}")
+ logger.info(f"Downloading {filename} from S3 {actual_bucket}/{folder}")
s3filename = f"{folder}/{filename}" if folder else filename
- async with self.session.client("s3") as client:
- response = await client.get_object(Bucket=bucket, Key=s3filename)
+ async with self.session.client("s3", config=self.boto_config) as client:
+ response = await client.get_object(Bucket=actual_bucket, Key=s3filename)
return await response["Body"].read()
+ @handle_s3_client_errors("list_objects")
+ async def _list_objects(
+ self, prefix: str = "", *, bucket: str | None = None
+ ) -> list[str]:
+ actual_bucket = bucket or self._bucket_name
+ folder = self.aws_folder
+ # Combine folder and prefix
+ s3prefix = f"{folder}/{prefix}" if folder else prefix
+ logger.info(f"Listing objects from S3 {actual_bucket} with prefix '{s3prefix}'")
+
+ keys = []
+ async with self.session.client("s3", config=self.boto_config) as client:
+ paginator = client.get_paginator("list_objects_v2")
+ async for page in paginator.paginate(Bucket=actual_bucket, Prefix=s3prefix):
+ if "Contents" in page:
+ for obj in page["Contents"]:
+ # Strip folder prefix from keys if present
+ key = obj["Key"]
+ if folder:
+ if key.startswith(f"{folder}/"):
+ key = key[len(folder) + 1 :]
+ elif key == folder:
+ # Skip folder marker itself
+ continue
+ keys.append(key)
+
+ return keys
+
+ @handle_s3_client_errors("stream")
+ async def _stream_to_fileobj(
+ self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None
+ ):
+ """Stream file from S3 directly to file object without loading into memory."""
+ actual_bucket = bucket or self._bucket_name
+ folder = self.aws_folder
+ logger.info(f"Streaming {filename} from S3 {actual_bucket}/{folder}")
+ s3filename = f"{folder}/{filename}" if folder else filename
+ async with self.session.client("s3", config=self.boto_config) as client:
+ await client.download_fileobj(
+ Bucket=actual_bucket, Key=s3filename, Fileobj=fileobj
+ )
+
Storage.register("aws", AwsStorage)
diff --git a/server/reflector/utils/daily.py b/server/reflector/utils/daily.py
new file mode 100644
index 00000000..1c3b367c
--- /dev/null
+++ b/server/reflector/utils/daily.py
@@ -0,0 +1,26 @@
+from reflector.utils.string import NonEmptyString
+
+DailyRoomName = str
+
+
+def extract_base_room_name(daily_room_name: DailyRoomName) -> NonEmptyString:
+ """
+ Extract base room name from Daily.co timestamped room name.
+
+ Daily.co creates rooms with timestamp suffix: {base_name}-YYYYMMDDHHMMSS
+ This function removes the timestamp to get the original room name.
+
+ Examples:
+ "daily-20251020193458" → "daily"
+ "daily-2-20251020193458" → "daily-2"
+ "my-room-name-20251020193458" → "my-room-name"
+
+ Args:
+ daily_room_name: Full Daily.co room name with optional timestamp
+
+ Returns:
+ Base room name without timestamp suffix
+ """
+ base_name = daily_room_name.rsplit("-", 1)[0]
+ assert base_name, f"Extracted base name is empty from: {daily_room_name}"
+ return base_name
diff --git a/server/reflector/utils/datetime.py b/server/reflector/utils/datetime.py
new file mode 100644
index 00000000..d416412f
--- /dev/null
+++ b/server/reflector/utils/datetime.py
@@ -0,0 +1,9 @@
+from datetime import datetime, timezone
+
+
+def parse_datetime_with_timezone(iso_string: str) -> datetime:
+ """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive)."""
+ dt = datetime.fromisoformat(iso_string)
+ if dt.tzinfo is None:
+ dt = dt.replace(tzinfo=timezone.utc)
+ return dt
diff --git a/server/reflector/utils/string.py b/server/reflector/utils/string.py
index 05f40e30..ae4277c5 100644
--- a/server/reflector/utils/string.py
+++ b/server/reflector/utils/string.py
@@ -1,4 +1,4 @@
-from typing import Annotated
+from typing import Annotated, TypeVar
from pydantic import Field, TypeAdapter, constr
@@ -21,3 +21,12 @@ def try_parse_non_empty_string(s: str) -> NonEmptyString | None:
if not s:
return None
return parse_non_empty_string(s)
+
+
+T = TypeVar("T", bound=str)
+
+
+def assert_equal[T](s1: T, s2: T) -> T:
+ if s1 != s2:
+ raise ValueError(f"assert_equal: {s1} != {s2}")
+ return s1
diff --git a/server/reflector/utils/url.py b/server/reflector/utils/url.py
new file mode 100644
index 00000000..e49a4cb0
--- /dev/null
+++ b/server/reflector/utils/url.py
@@ -0,0 +1,37 @@
+"""URL manipulation utilities."""
+
+from urllib.parse import parse_qs, urlencode, urlparse, urlunparse
+
+
+def add_query_param(url: str, key: str, value: str) -> str:
+ """
+ Add or update a query parameter in a URL.
+
+ Properly handles URLs with or without existing query parameters,
+ preserving fragments and encoding special characters.
+
+ Args:
+ url: The URL to modify
+ key: The query parameter name
+ value: The query parameter value
+
+ Returns:
+ The URL with the query parameter added or updated
+
+ Examples:
+ >>> add_query_param("https://example.com/room", "t", "token123")
+ 'https://example.com/room?t=token123'
+
+ >>> add_query_param("https://example.com/room?existing=param", "t", "token123")
+ 'https://example.com/room?existing=param&t=token123'
+ """
+ parsed = urlparse(url)
+
+ query_params = parse_qs(parsed.query, keep_blank_values=True)
+
+ query_params[key] = [value]
+
+ new_query = urlencode(query_params, doseq=True)
+
+ new_parsed = parsed._replace(query=new_query)
+ return urlunparse(new_parsed)
diff --git a/server/reflector/video_platforms/__init__.py b/server/reflector/video_platforms/__init__.py
new file mode 100644
index 00000000..dcbdc45b
--- /dev/null
+++ b/server/reflector/video_platforms/__init__.py
@@ -0,0 +1,11 @@
+from .base import VideoPlatformClient
+from .models import MeetingData, VideoPlatformConfig
+from .registry import get_platform_client, register_platform
+
+__all__ = [
+ "VideoPlatformClient",
+ "VideoPlatformConfig",
+ "MeetingData",
+ "get_platform_client",
+ "register_platform",
+]
diff --git a/server/reflector/video_platforms/base.py b/server/reflector/video_platforms/base.py
new file mode 100644
index 00000000..d208a75a
--- /dev/null
+++ b/server/reflector/video_platforms/base.py
@@ -0,0 +1,54 @@
+from abc import ABC, abstractmethod
+from datetime import datetime
+from typing import TYPE_CHECKING, Any, Dict, List, Optional
+
+from ..schemas.platform import Platform
+from ..utils.string import NonEmptyString
+from .models import MeetingData, VideoPlatformConfig
+
+if TYPE_CHECKING:
+ from reflector.db.rooms import Room
+
+# separator doesn't guarantee there's no more "ROOM_PREFIX_SEPARATOR" strings in room name
+ROOM_PREFIX_SEPARATOR = "-"
+
+
+class VideoPlatformClient(ABC):
+ PLATFORM_NAME: Platform
+
+ def __init__(self, config: VideoPlatformConfig):
+ self.config = config
+
+ @abstractmethod
+ async def create_meeting(
+ self, room_name_prefix: NonEmptyString, end_date: datetime, room: "Room"
+ ) -> MeetingData:
+ pass
+
+ @abstractmethod
+ async def get_room_sessions(self, room_name: str) -> List[Any] | None:
+ pass
+
+ @abstractmethod
+ async def delete_room(self, room_name: str) -> bool:
+ pass
+
+ @abstractmethod
+ async def upload_logo(self, room_name: str, logo_path: str) -> bool:
+ pass
+
+ @abstractmethod
+ def verify_webhook_signature(
+ self, body: bytes, signature: str, timestamp: Optional[str] = None
+ ) -> bool:
+ pass
+
+ def format_recording_config(self, room: "Room") -> Dict[str, Any]:
+ if room.recording_type == "cloud" and self.config.s3_bucket:
+ return {
+ "type": room.recording_type,
+ "bucket": self.config.s3_bucket,
+ "region": self.config.s3_region,
+ "trigger": room.recording_trigger,
+ }
+ return {"type": room.recording_type}
diff --git a/server/reflector/video_platforms/daily.py b/server/reflector/video_platforms/daily.py
new file mode 100644
index 00000000..ec45d965
--- /dev/null
+++ b/server/reflector/video_platforms/daily.py
@@ -0,0 +1,198 @@
+import base64
+import hmac
+from datetime import datetime
+from hashlib import sha256
+from http import HTTPStatus
+from typing import Any, Dict, List, Optional
+
+import httpx
+
+from reflector.db.rooms import Room
+from reflector.logger import logger
+from reflector.storage import get_dailyco_storage
+
+from ..schemas.platform import Platform
+from ..utils.daily import DailyRoomName
+from ..utils.string import NonEmptyString
+from .base import ROOM_PREFIX_SEPARATOR, VideoPlatformClient
+from .models import MeetingData, RecordingType, VideoPlatformConfig
+
+
+class DailyClient(VideoPlatformClient):
+ PLATFORM_NAME: Platform = "daily"
+ TIMEOUT = 10
+ BASE_URL = "https://api.daily.co/v1"
+ TIMESTAMP_FORMAT = "%Y%m%d%H%M%S"
+ RECORDING_NONE: RecordingType = "none"
+ RECORDING_CLOUD: RecordingType = "cloud"
+
+ def __init__(self, config: VideoPlatformConfig):
+ super().__init__(config)
+ self.headers = {
+ "Authorization": f"Bearer {config.api_key}",
+ "Content-Type": "application/json",
+ }
+
+ async def create_meeting(
+ self, room_name_prefix: NonEmptyString, end_date: datetime, room: Room
+ ) -> MeetingData:
+ """
+ Daily.co rooms vs meetings:
+ - We create a NEW Daily.co room for each Reflector meeting
+ - Daily.co meeting/session starts automatically when first participant joins
+ - Room auto-deletes after exp time
+ - Meeting.room_name stores the timestamped Daily.co room name
+ """
+ timestamp = datetime.now().strftime(self.TIMESTAMP_FORMAT)
+ room_name = f"{room_name_prefix}{ROOM_PREFIX_SEPARATOR}{timestamp}"
+
+ data = {
+ "name": room_name,
+ "privacy": "private" if room.is_locked else "public",
+ "properties": {
+ "enable_recording": "raw-tracks"
+ if room.recording_type != self.RECORDING_NONE
+ else False,
+ "enable_chat": True,
+ "enable_screenshare": True,
+ "start_video_off": False,
+ "start_audio_off": False,
+ "exp": int(end_date.timestamp()),
+ },
+ }
+
+ # Get storage config for passing to Daily API
+ daily_storage = get_dailyco_storage()
+ assert daily_storage.bucket_name, "S3 bucket must be configured"
+ data["properties"]["recordings_bucket"] = {
+ "bucket_name": daily_storage.bucket_name,
+ "bucket_region": daily_storage.region,
+ "assume_role_arn": daily_storage.role_credential,
+ "allow_api_access": True,
+ }
+
+ async with httpx.AsyncClient() as client:
+ response = await client.post(
+ f"{self.BASE_URL}/rooms",
+ headers=self.headers,
+ json=data,
+ timeout=self.TIMEOUT,
+ )
+ if response.status_code >= 400:
+ logger.error(
+ "Daily.co API error",
+ status_code=response.status_code,
+ response_body=response.text,
+ request_data=data,
+ )
+ response.raise_for_status()
+ result = response.json()
+
+ room_url = result["url"]
+
+ return MeetingData(
+ meeting_id=result["id"],
+ room_name=result["name"],
+ room_url=room_url,
+ host_room_url=room_url,
+ platform=self.PLATFORM_NAME,
+ extra_data=result,
+ )
+
+ async def get_room_sessions(self, room_name: str) -> List[Any] | None:
+ # no such api
+ return None
+
+ async def get_room_presence(self, room_name: str) -> Dict[str, Any]:
+ async with httpx.AsyncClient() as client:
+ response = await client.get(
+ f"{self.BASE_URL}/rooms/{room_name}/presence",
+ headers=self.headers,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ return response.json()
+
+ async def get_meeting_participants(self, meeting_id: str) -> Dict[str, Any]:
+ async with httpx.AsyncClient() as client:
+ response = await client.get(
+ f"{self.BASE_URL}/meetings/{meeting_id}/participants",
+ headers=self.headers,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ return response.json()
+
+ async def get_recording(self, recording_id: str) -> Dict[str, Any]:
+ async with httpx.AsyncClient() as client:
+ response = await client.get(
+ f"{self.BASE_URL}/recordings/{recording_id}",
+ headers=self.headers,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ return response.json()
+
+ async def delete_room(self, room_name: str) -> bool:
+ async with httpx.AsyncClient() as client:
+ response = await client.delete(
+ f"{self.BASE_URL}/rooms/{room_name}",
+ headers=self.headers,
+ timeout=self.TIMEOUT,
+ )
+ return response.status_code in (HTTPStatus.OK, HTTPStatus.NOT_FOUND)
+
+ async def upload_logo(self, room_name: str, logo_path: str) -> bool:
+ return True
+
+ def verify_webhook_signature(
+ self, body: bytes, signature: str, timestamp: Optional[str] = None
+ ) -> bool:
+ """Verify Daily.co webhook signature.
+
+ Daily.co uses:
+ - X-Webhook-Signature header
+ - X-Webhook-Timestamp header
+ - Signature format: HMAC-SHA256(base64_decode(secret), timestamp + '.' + body)
+ - Result is base64 encoded
+ """
+ if not signature or not timestamp:
+ return False
+
+ try:
+ secret_bytes = base64.b64decode(self.config.webhook_secret)
+
+ signed_content = timestamp.encode() + b"." + body
+
+ expected = hmac.new(secret_bytes, signed_content, sha256).digest()
+ expected_b64 = base64.b64encode(expected).decode()
+
+ return hmac.compare_digest(expected_b64, signature)
+ except Exception as e:
+ logger.error("Daily.co webhook signature verification failed", exc_info=e)
+ return False
+
+ async def create_meeting_token(
+ self,
+ room_name: DailyRoomName,
+ enable_recording: bool,
+ user_id: Optional[str] = None,
+ ) -> str:
+ data = {"properties": {"room_name": room_name}}
+
+ if enable_recording:
+ data["properties"]["start_cloud_recording"] = True
+ data["properties"]["enable_recording_ui"] = False
+
+ if user_id:
+ data["properties"]["user_id"] = user_id
+
+ async with httpx.AsyncClient() as client:
+ response = await client.post(
+ f"{self.BASE_URL}/meeting-tokens",
+ headers=self.headers,
+ json=data,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ return response.json()["token"]
diff --git a/server/reflector/video_platforms/factory.py b/server/reflector/video_platforms/factory.py
new file mode 100644
index 00000000..172d45e7
--- /dev/null
+++ b/server/reflector/video_platforms/factory.py
@@ -0,0 +1,62 @@
+from typing import Optional
+
+from reflector.settings import settings
+from reflector.storage import get_dailyco_storage, get_whereby_storage
+
+from ..schemas.platform import WHEREBY_PLATFORM, Platform
+from .base import VideoPlatformClient, VideoPlatformConfig
+from .registry import get_platform_client
+
+
+def get_platform_config(platform: Platform) -> VideoPlatformConfig:
+ if platform == WHEREBY_PLATFORM:
+ if not settings.WHEREBY_API_KEY:
+ raise ValueError(
+ "WHEREBY_API_KEY is required when platform='whereby'. "
+ "Set WHEREBY_API_KEY environment variable."
+ )
+ whereby_storage = get_whereby_storage()
+ key_id, secret = whereby_storage.key_credentials
+ return VideoPlatformConfig(
+ api_key=settings.WHEREBY_API_KEY,
+ webhook_secret=settings.WHEREBY_WEBHOOK_SECRET or "",
+ api_url=settings.WHEREBY_API_URL,
+ s3_bucket=whereby_storage.bucket_name,
+ s3_region=whereby_storage.region,
+ aws_access_key_id=key_id,
+ aws_access_key_secret=secret,
+ )
+ elif platform == "daily":
+ if not settings.DAILY_API_KEY:
+ raise ValueError(
+ "DAILY_API_KEY is required when platform='daily'. "
+ "Set DAILY_API_KEY environment variable."
+ )
+ if not settings.DAILY_SUBDOMAIN:
+ raise ValueError(
+ "DAILY_SUBDOMAIN is required when platform='daily'. "
+ "Set DAILY_SUBDOMAIN environment variable."
+ )
+ daily_storage = get_dailyco_storage()
+ return VideoPlatformConfig(
+ api_key=settings.DAILY_API_KEY,
+ webhook_secret=settings.DAILY_WEBHOOK_SECRET or "",
+ subdomain=settings.DAILY_SUBDOMAIN,
+ s3_bucket=daily_storage.bucket_name,
+ s3_region=daily_storage.region,
+ aws_role_arn=daily_storage.role_credential,
+ )
+ else:
+ raise ValueError(f"Unknown platform: {platform}")
+
+
+def create_platform_client(platform: Platform) -> VideoPlatformClient:
+ config = get_platform_config(platform)
+ return get_platform_client(platform, config)
+
+
+def get_platform(room_platform: Optional[Platform] = None) -> Platform:
+ if room_platform:
+ return room_platform
+
+ return settings.DEFAULT_VIDEO_PLATFORM
diff --git a/server/reflector/video_platforms/models.py b/server/reflector/video_platforms/models.py
new file mode 100644
index 00000000..82876888
--- /dev/null
+++ b/server/reflector/video_platforms/models.py
@@ -0,0 +1,40 @@
+from typing import Any, Dict, Literal, Optional
+
+from pydantic import BaseModel, Field
+
+from reflector.schemas.platform import WHEREBY_PLATFORM, Platform
+
+RecordingType = Literal["none", "local", "cloud"]
+
+
+class MeetingData(BaseModel):
+ platform: Platform
+ meeting_id: str = Field(description="Platform-specific meeting identifier")
+ room_url: str = Field(description="URL for participants to join")
+ host_room_url: str = Field(description="URL for hosts (may be same as room_url)")
+ room_name: str = Field(description="Human-readable room name")
+ extra_data: Dict[str, Any] = Field(default_factory=dict)
+
+ class Config:
+ json_schema_extra = {
+ "example": {
+ "platform": WHEREBY_PLATFORM,
+ "meeting_id": "12345678",
+ "room_url": "https://subdomain.whereby.com/room-20251008120000",
+ "host_room_url": "https://subdomain.whereby.com/room-20251008120000?roomKey=abc123",
+ "room_name": "room-20251008120000",
+ }
+ }
+
+
+class VideoPlatformConfig(BaseModel):
+ api_key: str
+ webhook_secret: str
+ api_url: Optional[str] = None
+ subdomain: Optional[str] = None # Whereby/Daily subdomain
+ s3_bucket: Optional[str] = None
+ s3_region: Optional[str] = None
+ # Whereby uses access keys, Daily uses IAM role
+ aws_access_key_id: Optional[str] = None
+ aws_access_key_secret: Optional[str] = None
+ aws_role_arn: Optional[str] = None
diff --git a/server/reflector/video_platforms/registry.py b/server/reflector/video_platforms/registry.py
new file mode 100644
index 00000000..b4c10697
--- /dev/null
+++ b/server/reflector/video_platforms/registry.py
@@ -0,0 +1,35 @@
+from typing import Dict, Type
+
+from ..schemas.platform import DAILY_PLATFORM, WHEREBY_PLATFORM, Platform
+from .base import VideoPlatformClient, VideoPlatformConfig
+
+_PLATFORMS: Dict[Platform, Type[VideoPlatformClient]] = {}
+
+
+def register_platform(name: Platform, client_class: Type[VideoPlatformClient]):
+ _PLATFORMS[name] = client_class
+
+
+def get_platform_client(
+ platform: Platform, config: VideoPlatformConfig
+) -> VideoPlatformClient:
+ if platform not in _PLATFORMS:
+ raise ValueError(f"Unknown video platform: {platform}")
+
+ client_class = _PLATFORMS[platform]
+ return client_class(config)
+
+
+def get_available_platforms() -> list[Platform]:
+ return list(_PLATFORMS.keys())
+
+
+def _register_builtin_platforms():
+ from .daily import DailyClient # noqa: PLC0415
+ from .whereby import WherebyClient # noqa: PLC0415
+
+ register_platform(WHEREBY_PLATFORM, WherebyClient)
+ register_platform(DAILY_PLATFORM, DailyClient)
+
+
+_register_builtin_platforms()
diff --git a/server/reflector/video_platforms/whereby.py b/server/reflector/video_platforms/whereby.py
new file mode 100644
index 00000000..f856454a
--- /dev/null
+++ b/server/reflector/video_platforms/whereby.py
@@ -0,0 +1,141 @@
+import hmac
+import json
+import re
+import time
+from datetime import datetime
+from hashlib import sha256
+from typing import Any, Dict, Optional
+
+import httpx
+
+from reflector.db.rooms import Room
+from reflector.storage import get_whereby_storage
+
+from ..schemas.platform import WHEREBY_PLATFORM, Platform
+from ..utils.string import NonEmptyString
+from .base import (
+ MeetingData,
+ VideoPlatformClient,
+ VideoPlatformConfig,
+)
+from .whereby_utils import whereby_room_name_prefix
+
+
+class WherebyClient(VideoPlatformClient):
+ PLATFORM_NAME: Platform = WHEREBY_PLATFORM
+ TIMEOUT = 10 # seconds
+ MAX_ELAPSED_TIME = 60 * 1000 # 1 minute in milliseconds
+
+ def __init__(self, config: VideoPlatformConfig):
+ super().__init__(config)
+ self.headers = {
+ "Content-Type": "application/json; charset=utf-8",
+ "Authorization": f"Bearer {config.api_key}",
+ }
+
+ async def create_meeting(
+ self, room_name_prefix: NonEmptyString, end_date: datetime, room: Room
+ ) -> MeetingData:
+ data = {
+ "isLocked": room.is_locked,
+ "roomNamePrefix": whereby_room_name_prefix(room_name_prefix),
+ "roomNamePattern": "uuid",
+ "roomMode": room.room_mode,
+ "endDate": end_date.isoformat(),
+ "fields": ["hostRoomUrl"],
+ }
+
+ if room.recording_type == "cloud":
+ # Get storage config for passing credentials to Whereby API
+ whereby_storage = get_whereby_storage()
+ key_id, secret = whereby_storage.key_credentials
+ data["recording"] = {
+ "type": room.recording_type,
+ "destination": {
+ "provider": "s3",
+ "bucket": whereby_storage.bucket_name,
+ "accessKeyId": key_id,
+ "accessKeySecret": secret,
+ "fileFormat": "mp4",
+ },
+ "startTrigger": room.recording_trigger,
+ }
+
+ async with httpx.AsyncClient() as client:
+ response = await client.post(
+ f"{self.config.api_url}/meetings",
+ headers=self.headers,
+ json=data,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ result = response.json()
+
+ return MeetingData(
+ meeting_id=result["meetingId"],
+ room_name=result["roomName"],
+ room_url=result["roomUrl"],
+ host_room_url=result["hostRoomUrl"],
+ platform=self.PLATFORM_NAME,
+ extra_data=result,
+ )
+
+ async def get_room_sessions(self, room_name: str) -> Dict[str, Any]:
+ async with httpx.AsyncClient() as client:
+ response = await client.get(
+ f"{self.config.api_url}/insights/room-sessions?roomName={room_name}",
+ headers=self.headers,
+ timeout=self.TIMEOUT,
+ )
+ response.raise_for_status()
+ return response.json().get("results", [])
+
+ async def delete_room(self, room_name: str) -> bool:
+ return True
+
+ async def upload_logo(self, room_name: str, logo_path: str) -> bool:
+ async with httpx.AsyncClient() as client:
+ with open(logo_path, "rb") as f:
+ response = await client.put(
+ f"{self.config.api_url}/rooms/{room_name}/theme/logo",
+ headers={
+ "Authorization": f"Bearer {self.config.api_key}",
+ },
+ timeout=self.TIMEOUT,
+ files={"image": f},
+ )
+ response.raise_for_status()
+ return True
+
+ def verify_webhook_signature(
+ self, body: bytes, signature: str, timestamp: Optional[str] = None
+ ) -> bool:
+ if not signature:
+ return False
+
+ matches = re.match(r"t=(.*),v1=(.*)", signature)
+ if not matches:
+ return False
+
+ ts, sig = matches.groups()
+
+ current_time = int(time.time() * 1000)
+ diff_time = current_time - int(ts) * 1000
+ if diff_time >= self.MAX_ELAPSED_TIME:
+ return False
+
+ body_dict = json.loads(body)
+ signed_payload = f"{ts}.{json.dumps(body_dict, separators=(',', ':'))}"
+ hmac_obj = hmac.new(
+ self.config.webhook_secret.encode("utf-8"),
+ signed_payload.encode("utf-8"),
+ sha256,
+ )
+ expected_signature = hmac_obj.hexdigest()
+
+ try:
+ return hmac.compare_digest(
+ expected_signature.encode("utf-8"), sig.encode("utf-8")
+ )
+ except Exception:
+ return False
diff --git a/server/reflector/video_platforms/whereby_utils.py b/server/reflector/video_platforms/whereby_utils.py
new file mode 100644
index 00000000..2724a7b5
--- /dev/null
+++ b/server/reflector/video_platforms/whereby_utils.py
@@ -0,0 +1,38 @@
+import re
+from datetime import datetime
+
+from reflector.utils.datetime import parse_datetime_with_timezone
+from reflector.utils.string import NonEmptyString, parse_non_empty_string
+from reflector.video_platforms.base import ROOM_PREFIX_SEPARATOR
+
+
+def parse_whereby_recording_filename(
+ object_key: NonEmptyString,
+) -> (NonEmptyString, datetime):
+ filename = parse_non_empty_string(object_key.rsplit(".", 1)[0])
+ timestamp_pattern = r"(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)"
+ match = re.search(timestamp_pattern, filename)
+ if not match:
+ raise ValueError(f"No ISO timestamp found in filename: {object_key}")
+ timestamp_str = match.group(1)
+ timestamp_start = match.start(1)
+ room_name_part = filename[:timestamp_start]
+ if room_name_part.endswith(ROOM_PREFIX_SEPARATOR):
+ room_name_part = room_name_part[: -len(ROOM_PREFIX_SEPARATOR)]
+ else:
+ raise ValueError(
+ f"room name {room_name_part} doesnt have {ROOM_PREFIX_SEPARATOR} at the end of filename: {object_key}"
+ )
+
+ return parse_non_empty_string(room_name_part), parse_datetime_with_timezone(
+ timestamp_str
+ )
+
+
+def whereby_room_name_prefix(room_name_prefix: NonEmptyString) -> NonEmptyString:
+ return room_name_prefix + ROOM_PREFIX_SEPARATOR
+
+
+# room name comes with "/" from whereby api but lacks "/" e.g. in recording filenames
+def room_name_to_whereby_api_room_name(room_name: NonEmptyString) -> NonEmptyString:
+ return f"/{room_name}"
diff --git a/server/reflector/views/daily.py b/server/reflector/views/daily.py
new file mode 100644
index 00000000..6f51cd1e
--- /dev/null
+++ b/server/reflector/views/daily.py
@@ -0,0 +1,233 @@
+import json
+from typing import Any, Dict, Literal
+
+from fastapi import APIRouter, HTTPException, Request
+from pydantic import BaseModel
+
+from reflector.db.meetings import meetings_controller
+from reflector.logger import logger as _logger
+from reflector.settings import settings
+from reflector.utils.daily import DailyRoomName
+from reflector.video_platforms.factory import create_platform_client
+from reflector.worker.process import process_multitrack_recording
+
+router = APIRouter()
+
+logger = _logger.bind(platform="daily")
+
+
+class DailyTrack(BaseModel):
+ type: Literal["audio", "video"]
+ s3Key: str
+ size: int
+
+
+class DailyWebhookEvent(BaseModel):
+ version: str
+ type: str
+ id: str
+ payload: Dict[str, Any]
+ event_ts: float
+
+
+def _extract_room_name(event: DailyWebhookEvent) -> DailyRoomName | None:
+ """Extract room name from Daily event payload.
+
+ Daily.co API inconsistency:
+ - participant.* events use "room" field
+ - recording.* events use "room_name" field
+ """
+ return event.payload.get("room_name") or event.payload.get("room")
+
+
+@router.post("/webhook")
+async def webhook(request: Request):
+ """Handle Daily webhook events.
+
+ Daily.co circuit-breaker: After 3+ failed responses (4xx/5xx), webhook
+ state→FAILED, stops sending events. Reset: scripts/recreate_daily_webhook.py
+ """
+ body = await request.body()
+ signature = request.headers.get("X-Webhook-Signature", "")
+ timestamp = request.headers.get("X-Webhook-Timestamp", "")
+
+ client = create_platform_client("daily")
+
+ # TEMPORARY: Bypass signature check for testing
+ # TODO: Remove this after testing is complete
+ BYPASS_FOR_TESTING = True
+ if not BYPASS_FOR_TESTING:
+ if not client.verify_webhook_signature(body, signature, timestamp):
+ logger.warning(
+ "Invalid webhook signature",
+ signature=signature,
+ timestamp=timestamp,
+ has_body=bool(body),
+ )
+ raise HTTPException(status_code=401, detail="Invalid webhook signature")
+
+ try:
+ body_json = json.loads(body)
+ except json.JSONDecodeError:
+ raise HTTPException(status_code=422, detail="Invalid JSON")
+
+ if body_json.get("test") == "test":
+ logger.info("Received Daily webhook test event")
+ return {"status": "ok"}
+
+ # Parse as actual event
+ try:
+ event = DailyWebhookEvent(**body_json)
+ except Exception as e:
+ logger.error("Failed to parse webhook event", error=str(e), body=body.decode())
+ raise HTTPException(status_code=422, detail="Invalid event format")
+
+ # Handle participant events
+ if event.type == "participant.joined":
+ await _handle_participant_joined(event)
+ elif event.type == "participant.left":
+ await _handle_participant_left(event)
+ elif event.type == "recording.started":
+ await _handle_recording_started(event)
+ elif event.type == "recording.ready-to-download":
+ await _handle_recording_ready(event)
+ elif event.type == "recording.error":
+ await _handle_recording_error(event)
+ else:
+ logger.warning(
+ "Unhandled Daily webhook event type",
+ event_type=event.type,
+ payload=event.payload,
+ )
+
+ return {"status": "ok"}
+
+
+async def _handle_participant_joined(event: DailyWebhookEvent):
+ daily_room_name = _extract_room_name(event)
+ if not daily_room_name:
+ logger.warning("participant.joined: no room in payload", payload=event.payload)
+ return
+
+ meeting = await meetings_controller.get_by_room_name(daily_room_name)
+ if meeting:
+ await meetings_controller.increment_num_clients(meeting.id)
+ logger.info(
+ "Participant joined",
+ meeting_id=meeting.id,
+ room_name=daily_room_name,
+ recording_type=meeting.recording_type,
+ recording_trigger=meeting.recording_trigger,
+ )
+ else:
+ logger.warning(
+ "participant.joined: meeting not found", room_name=daily_room_name
+ )
+
+
+async def _handle_participant_left(event: DailyWebhookEvent):
+ room_name = _extract_room_name(event)
+ if not room_name:
+ return
+
+ meeting = await meetings_controller.get_by_room_name(room_name)
+ if meeting:
+ await meetings_controller.decrement_num_clients(meeting.id)
+
+
+async def _handle_recording_started(event: DailyWebhookEvent):
+ room_name = _extract_room_name(event)
+ if not room_name:
+ logger.warning(
+ "recording.started: no room_name in payload", payload=event.payload
+ )
+ return
+
+ meeting = await meetings_controller.get_by_room_name(room_name)
+ if meeting:
+ logger.info(
+ "Recording started",
+ meeting_id=meeting.id,
+ room_name=room_name,
+ recording_id=event.payload.get("recording_id"),
+ platform="daily",
+ )
+ else:
+ logger.warning("recording.started: meeting not found", room_name=room_name)
+
+
+async def _handle_recording_ready(event: DailyWebhookEvent):
+ """Handle recording ready for download event.
+
+ Daily.co webhook payload for raw-tracks recordings:
+ {
+ "recording_id": "...",
+ "room_name": "test2-20251009192341",
+ "tracks": [
+ {"type": "audio", "s3Key": "monadical/test2-.../uuid-cam-audio-123.webm", "size": 400000},
+ {"type": "video", "s3Key": "monadical/test2-.../uuid-cam-video-456.webm", "size": 30000000}
+ ]
+ }
+ """
+ room_name = _extract_room_name(event)
+ recording_id = event.payload.get("recording_id")
+ tracks_raw = event.payload.get("tracks", [])
+
+ if not room_name or not tracks_raw:
+ logger.warning(
+ "recording.ready-to-download: missing room_name or tracks",
+ room_name=room_name,
+ has_tracks=bool(tracks_raw),
+ payload=event.payload,
+ )
+ return
+
+ try:
+ tracks = [DailyTrack(**t) for t in tracks_raw]
+ except Exception as e:
+ logger.error(
+ "recording.ready-to-download: invalid tracks structure",
+ error=str(e),
+ tracks=tracks_raw,
+ )
+ return
+
+ logger.info(
+ "Recording ready for download",
+ room_name=room_name,
+ recording_id=recording_id,
+ num_tracks=len(tracks),
+ platform="daily",
+ )
+
+ bucket_name = settings.DAILYCO_STORAGE_AWS_BUCKET_NAME
+ if not bucket_name:
+ logger.error(
+ "DAILYCO_STORAGE_AWS_BUCKET_NAME not configured; cannot process Daily recording"
+ )
+ return
+
+ track_keys = [t.s3Key for t in tracks if t.type == "audio"]
+
+ process_multitrack_recording.delay(
+ bucket_name=bucket_name,
+ daily_room_name=room_name,
+ recording_id=recording_id,
+ track_keys=track_keys,
+ )
+
+
+async def _handle_recording_error(event: DailyWebhookEvent):
+ room_name = _extract_room_name(event)
+ error = event.payload.get("error", "Unknown error")
+
+ if room_name:
+ meeting = await meetings_controller.get_by_room_name(room_name)
+ if meeting:
+ logger.error(
+ "Recording error",
+ meeting_id=meeting.id,
+ room_name=room_name,
+ error=error,
+ platform="daily",
+ )
diff --git a/server/reflector/views/rooms.py b/server/reflector/views/rooms.py
index 70e3f9e4..e786b0d9 100644
--- a/server/reflector/views/rooms.py
+++ b/server/reflector/views/rooms.py
@@ -15,9 +15,14 @@ from reflector.db.calendar_events import calendar_events_controller
from reflector.db.meetings import meetings_controller
from reflector.db.rooms import rooms_controller
from reflector.redis_cache import RedisAsyncLock
+from reflector.schemas.platform import Platform
from reflector.services.ics_sync import ics_sync_service
from reflector.settings import settings
-from reflector.whereby import create_meeting, upload_logo
+from reflector.utils.url import add_query_param
+from reflector.video_platforms.factory import (
+ create_platform_client,
+ get_platform,
+)
from reflector.worker.webhook import test_webhook
logger = logging.getLogger(__name__)
@@ -41,6 +46,7 @@ class Room(BaseModel):
ics_enabled: bool = False
ics_last_sync: Optional[datetime] = None
ics_last_etag: Optional[str] = None
+ platform: Platform
class RoomDetails(Room):
@@ -68,6 +74,7 @@ class Meeting(BaseModel):
is_active: bool = True
calendar_event_id: str | None = None
calendar_metadata: dict[str, Any] | None = None
+ platform: Platform
class CreateRoom(BaseModel):
@@ -85,6 +92,7 @@ class CreateRoom(BaseModel):
ics_url: Optional[str] = None
ics_fetch_interval: int = 300
ics_enabled: bool = False
+ platform: Optional[Platform] = None
class UpdateRoom(BaseModel):
@@ -102,6 +110,7 @@ class UpdateRoom(BaseModel):
ics_url: Optional[str] = None
ics_fetch_interval: Optional[int] = None
ics_enabled: Optional[bool] = None
+ platform: Optional[Platform] = None
class CreateRoomMeeting(BaseModel):
@@ -165,14 +174,6 @@ class CalendarEventResponse(BaseModel):
router = APIRouter()
-def parse_datetime_with_timezone(iso_string: str) -> datetime:
- """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive)."""
- dt = datetime.fromisoformat(iso_string)
- if dt.tzinfo is None:
- dt = dt.replace(tzinfo=timezone.utc)
- return dt
-
-
@router.get("/rooms", response_model=Page[RoomDetails])
async def rooms_list(
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
@@ -182,13 +183,18 @@ async def rooms_list(
user_id = user["sub"] if user else None
- return await apaginate(
+ paginated = await apaginate(
get_database(),
await rooms_controller.get_all(
user_id=user_id, order_by="-created_at", return_query=True
),
)
+ for room in paginated.items:
+ room.platform = get_platform(room.platform)
+
+ return paginated
+
@router.get("/rooms/{room_id}", response_model=RoomDetails)
async def rooms_get(
@@ -201,6 +207,7 @@ async def rooms_get(
raise HTTPException(status_code=404, detail="Room not found")
if not room.is_shared and (user_id is None or room.user_id != user_id):
raise HTTPException(status_code=403, detail="Room access denied")
+ room.platform = get_platform(room.platform)
return room
@@ -214,17 +221,16 @@ async def rooms_get_by_name(
if not room:
raise HTTPException(status_code=404, detail="Room not found")
- # Convert to RoomDetails format (add webhook fields if user is owner)
room_dict = room.__dict__.copy()
if user_id == room.user_id:
- # User is owner, include webhook details if available
room_dict["webhook_url"] = getattr(room, "webhook_url", None)
room_dict["webhook_secret"] = getattr(room, "webhook_secret", None)
else:
- # Non-owner, hide webhook details
room_dict["webhook_url"] = None
room_dict["webhook_secret"] = None
+ room_dict["platform"] = get_platform(room.platform)
+
return RoomDetails(**room_dict)
@@ -251,6 +257,7 @@ async def rooms_create(
ics_url=room.ics_url,
ics_fetch_interval=room.ics_fetch_interval,
ics_enabled=room.ics_enabled,
+ platform=room.platform,
)
@@ -268,6 +275,7 @@ async def rooms_update(
raise HTTPException(status_code=403, detail="Not authorized")
values = info.dict(exclude_unset=True)
await rooms_controller.update(room, values)
+ room.platform = get_platform(room.platform)
return room
@@ -315,19 +323,22 @@ async def rooms_create_meeting(
if meeting is None:
end_date = current_time + timedelta(hours=8)
- whereby_meeting = await create_meeting("", end_date=end_date, room=room)
+ platform = get_platform(room.platform)
+ client = create_platform_client(platform)
- await upload_logo(whereby_meeting["roomName"], "./images/logo.png")
+ meeting_data = await client.create_meeting(
+ room.name, end_date=end_date, room=room
+ )
+
+ await client.upload_logo(meeting_data.room_name, "./images/logo.png")
meeting = await meetings_controller.create(
- id=whereby_meeting["meetingId"],
- room_name=whereby_meeting["roomName"],
- room_url=whereby_meeting["roomUrl"],
- host_room_url=whereby_meeting["hostRoomUrl"],
- start_date=parse_datetime_with_timezone(
- whereby_meeting["startDate"]
- ),
- end_date=parse_datetime_with_timezone(whereby_meeting["endDate"]),
+ id=meeting_data.meeting_id,
+ room_name=meeting_data.room_name,
+ room_url=meeting_data.room_url,
+ host_room_url=meeting_data.host_room_url,
+ start_date=current_time,
+ end_date=end_date,
room=room,
)
except LockError:
@@ -336,6 +347,18 @@ async def rooms_create_meeting(
status_code=503, detail="Meeting creation in progress, please try again"
)
+ if meeting.platform == "daily" and room.recording_trigger != "none":
+ client = create_platform_client(meeting.platform)
+ token = await client.create_meeting_token(
+ meeting.room_name,
+ enable_recording=True,
+ user_id=user_id,
+ )
+ meeting = meeting.model_copy()
+ meeting.room_url = add_query_param(meeting.room_url, "t", token)
+ if meeting.host_room_url:
+ meeting.host_room_url = add_query_param(meeting.host_room_url, "t", token)
+
if user_id != room.user_id:
meeting.host_room_url = ""
@@ -490,7 +513,10 @@ async def rooms_list_active_meetings(
room=room, current_time=current_time
)
- # Hide host URLs from non-owners
+ effective_platform = get_platform(room.platform)
+ for meeting in meetings:
+ meeting.platform = effective_platform
+
if user_id != room.user_id:
for meeting in meetings:
meeting.host_room_url = ""
@@ -511,15 +537,10 @@ async def rooms_get_meeting(
if not room:
raise HTTPException(status_code=404, detail="Room not found")
- meeting = await meetings_controller.get_by_id(meeting_id)
+ meeting = await meetings_controller.get_by_id(meeting_id, room=room)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
- if meeting.room_id != room.id:
- raise HTTPException(
- status_code=403, detail="Meeting does not belong to this room"
- )
-
if user_id != room.user_id and not room.is_shared:
meeting.host_room_url = ""
@@ -538,16 +559,11 @@ async def rooms_join_meeting(
if not room:
raise HTTPException(status_code=404, detail="Room not found")
- meeting = await meetings_controller.get_by_id(meeting_id)
+ meeting = await meetings_controller.get_by_id(meeting_id, room=room)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
- if meeting.room_id != room.id:
- raise HTTPException(
- status_code=403, detail="Meeting does not belong to this room"
- )
-
if not meeting.is_active:
raise HTTPException(status_code=400, detail="Meeting is not active")
@@ -555,7 +571,6 @@ async def rooms_join_meeting(
if meeting.end_date <= current_time:
raise HTTPException(status_code=400, detail="Meeting has ended")
- # Hide host URL from non-owners
if user_id != room.user_id:
meeting.host_room_url = ""
diff --git a/server/reflector/views/transcripts_process.py b/server/reflector/views/transcripts_process.py
index f9295765..46e070fd 100644
--- a/server/reflector/views/transcripts_process.py
+++ b/server/reflector/views/transcripts_process.py
@@ -5,8 +5,12 @@ from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
import reflector.auth as auth
+from reflector.db.recordings import recordings_controller
from reflector.db.transcripts import transcripts_controller
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
+from reflector.pipelines.main_multitrack_pipeline import (
+ task_pipeline_multitrack_process,
+)
router = APIRouter()
@@ -33,14 +37,35 @@ async def transcript_process(
status_code=400, detail="Recording is not ready for processing"
)
+ # avoid duplicate scheduling for either pipeline
if task_is_scheduled_or_active(
"reflector.pipelines.main_file_pipeline.task_pipeline_file_process",
transcript_id=transcript_id,
+ ) or task_is_scheduled_or_active(
+ "reflector.pipelines.main_multitrack_pipeline.task_pipeline_multitrack_process",
+ transcript_id=transcript_id,
):
return ProcessStatus(status="already running")
- # schedule a background task process the file
- task_pipeline_file_process.delay(transcript_id=transcript_id)
+ # Determine processing mode strictly from DB to avoid S3 scans
+ bucket_name = None
+ track_keys: list[str] = []
+
+ if transcript.recording_id:
+ recording = await recordings_controller.get_by_id(transcript.recording_id)
+ if recording:
+ bucket_name = recording.bucket_name
+ track_keys = list(getattr(recording, "track_keys", []) or [])
+
+ if bucket_name:
+ task_pipeline_multitrack_process.delay(
+ transcript_id=transcript_id,
+ bucket_name=bucket_name,
+ track_keys=track_keys,
+ )
+ else:
+ # Default single-file pipeline
+ task_pipeline_file_process.delay(transcript_id=transcript_id)
return ProcessStatus(status="ok")
diff --git a/server/reflector/whereby.py b/server/reflector/whereby.py
deleted file mode 100644
index 8b5c18fd..00000000
--- a/server/reflector/whereby.py
+++ /dev/null
@@ -1,114 +0,0 @@
-import logging
-from datetime import datetime
-
-import httpx
-
-from reflector.db.rooms import Room
-from reflector.settings import settings
-from reflector.utils.string import parse_non_empty_string
-
-logger = logging.getLogger(__name__)
-
-
-def _get_headers():
- api_key = parse_non_empty_string(
- settings.WHEREBY_API_KEY, "WHEREBY_API_KEY value is required."
- )
- return {
- "Content-Type": "application/json; charset=utf-8",
- "Authorization": f"Bearer {api_key}",
- }
-
-
-TIMEOUT = 10 # seconds
-
-
-def _get_whereby_s3_auth():
- errors = []
- try:
- bucket_name = parse_non_empty_string(
- settings.RECORDING_STORAGE_AWS_BUCKET_NAME,
- "RECORDING_STORAGE_AWS_BUCKET_NAME value is required.",
- )
- except Exception as e:
- errors.append(e)
- try:
- key_id = parse_non_empty_string(
- settings.AWS_WHEREBY_ACCESS_KEY_ID,
- "AWS_WHEREBY_ACCESS_KEY_ID value is required.",
- )
- except Exception as e:
- errors.append(e)
- try:
- key_secret = parse_non_empty_string(
- settings.AWS_WHEREBY_ACCESS_KEY_SECRET,
- "AWS_WHEREBY_ACCESS_KEY_SECRET value is required.",
- )
- except Exception as e:
- errors.append(e)
- if len(errors) > 0:
- raise Exception(
- f"Failed to get Whereby auth settings: {', '.join(str(e) for e in errors)}"
- )
- return bucket_name, key_id, key_secret
-
-
-async def create_meeting(room_name_prefix: str, end_date: datetime, room: Room):
- s3_bucket_name, s3_key_id, s3_key_secret = _get_whereby_s3_auth()
- data = {
- "isLocked": room.is_locked,
- "roomNamePrefix": room_name_prefix,
- "roomNamePattern": "uuid",
- "roomMode": room.room_mode,
- "endDate": end_date.isoformat(),
- "recording": {
- "type": room.recording_type,
- "destination": {
- "provider": "s3",
- "bucket": s3_bucket_name,
- "accessKeyId": s3_key_id,
- "accessKeySecret": s3_key_secret,
- "fileFormat": "mp4",
- },
- "startTrigger": room.recording_trigger,
- },
- "fields": ["hostRoomUrl"],
- }
- async with httpx.AsyncClient() as client:
- response = await client.post(
- f"{settings.WHEREBY_API_URL}/meetings",
- headers=_get_headers(),
- json=data,
- timeout=TIMEOUT,
- )
- if response.status_code == 403:
- logger.warning(
- f"Failed to create meeting: access denied on Whereby: {response.text}"
- )
- response.raise_for_status()
- return response.json()
-
-
-async def get_room_sessions(room_name: str):
- async with httpx.AsyncClient() as client:
- response = await client.get(
- f"{settings.WHEREBY_API_URL}/insights/room-sessions?roomName={room_name}",
- headers=_get_headers(),
- timeout=TIMEOUT,
- )
- response.raise_for_status()
- return response.json()
-
-
-async def upload_logo(room_name: str, logo_path: str):
- async with httpx.AsyncClient() as client:
- with open(logo_path, "rb") as f:
- response = await client.put(
- f"{settings.WHEREBY_API_URL}/rooms{room_name}/theme/logo",
- headers={
- "Authorization": f"Bearer {settings.WHEREBY_API_KEY}",
- },
- timeout=TIMEOUT,
- files={"image": f},
- )
- response.raise_for_status()
diff --git a/server/reflector/worker/cleanup.py b/server/reflector/worker/cleanup.py
index 66d45e94..43559e64 100644
--- a/server/reflector/worker/cleanup.py
+++ b/server/reflector/worker/cleanup.py
@@ -19,7 +19,7 @@ from reflector.db.meetings import meetings
from reflector.db.recordings import recordings
from reflector.db.transcripts import transcripts, transcripts_controller
from reflector.settings import settings
-from reflector.storage import get_recordings_storage
+from reflector.storage import get_transcripts_storage
logger = structlog.get_logger(__name__)
@@ -53,8 +53,8 @@ async def delete_single_transcript(
)
if recording:
try:
- await get_recordings_storage().delete_file(
- recording["object_key"]
+ await get_transcripts_storage().delete_file(
+ recording["object_key"], bucket=recording["bucket_name"]
)
except Exception as storage_error:
logger.warning(
diff --git a/server/reflector/worker/ics_sync.py b/server/reflector/worker/ics_sync.py
index faf62f4a..4d72d4ae 100644
--- a/server/reflector/worker/ics_sync.py
+++ b/server/reflector/worker/ics_sync.py
@@ -7,10 +7,10 @@ from celery.utils.log import get_task_logger
from reflector.asynctask import asynctask
from reflector.db.calendar_events import calendar_events_controller
from reflector.db.meetings import meetings_controller
-from reflector.db.rooms import rooms_controller
+from reflector.db.rooms import Room, rooms_controller
from reflector.redis_cache import RedisAsyncLock
from reflector.services.ics_sync import SyncStatus, ics_sync_service
-from reflector.whereby import create_meeting, upload_logo
+from reflector.video_platforms.factory import create_platform_client, get_platform
logger = structlog.wrap_logger(get_task_logger(__name__))
@@ -86,17 +86,17 @@ def _should_sync(room) -> bool:
MEETING_DEFAULT_DURATION = timedelta(hours=1)
-async def create_upcoming_meetings_for_event(event, create_window, room_id, room):
+async def create_upcoming_meetings_for_event(event, create_window, room: Room):
if event.start_time <= create_window:
return
- existing_meeting = await meetings_controller.get_by_calendar_event(event.id)
+ existing_meeting = await meetings_controller.get_by_calendar_event(event.id, room)
if existing_meeting:
return
logger.info(
"Pre-creating meeting for calendar event",
- room_id=room_id,
+ room_id=room.id,
event_id=event.id,
event_title=event.title,
)
@@ -104,20 +104,22 @@ async def create_upcoming_meetings_for_event(event, create_window, room_id, room
try:
end_date = event.end_time or (event.start_time + MEETING_DEFAULT_DURATION)
- whereby_meeting = await create_meeting(
+ client = create_platform_client(get_platform(room.platform))
+
+ meeting_data = await client.create_meeting(
"",
end_date=end_date,
room=room,
)
- await upload_logo(whereby_meeting["roomName"], "./images/logo.png")
+ await client.upload_logo(meeting_data.room_name, "./images/logo.png")
meeting = await meetings_controller.create(
- id=whereby_meeting["meetingId"],
- room_name=whereby_meeting["roomName"],
- room_url=whereby_meeting["roomUrl"],
- host_room_url=whereby_meeting["hostRoomUrl"],
- start_date=datetime.fromisoformat(whereby_meeting["startDate"]),
- end_date=datetime.fromisoformat(whereby_meeting["endDate"]),
+ id=meeting_data.meeting_id,
+ room_name=meeting_data.room_name,
+ room_url=meeting_data.room_url,
+ host_room_url=meeting_data.host_room_url,
+ start_date=event.start_time,
+ end_date=end_date,
room=room,
calendar_event_id=event.id,
calendar_metadata={
@@ -136,7 +138,7 @@ async def create_upcoming_meetings_for_event(event, create_window, room_id, room
except Exception as e:
logger.error(
"Failed to pre-create meeting",
- room_id=room_id,
+ room_id=room.id,
event_id=event.id,
error=str(e),
)
@@ -166,9 +168,7 @@ async def create_upcoming_meetings():
)
for event in events:
- await create_upcoming_meetings_for_event(
- event, create_window, room.id, room
- )
+ await create_upcoming_meetings_for_event(event, create_window, room)
logger.info("Completed pre-creation check for upcoming meetings")
except Exception as e:
diff --git a/server/reflector/worker/process.py b/server/reflector/worker/process.py
index e660e840..47cbb1cb 100644
--- a/server/reflector/worker/process.py
+++ b/server/reflector/worker/process.py
@@ -1,5 +1,6 @@
import json
import os
+import re
from datetime import datetime, timezone
from urllib.parse import unquote
@@ -14,24 +15,32 @@ from redis.exceptions import LockError
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import Recording, recordings_controller
from reflector.db.rooms import rooms_controller
-from reflector.db.transcripts import SourceKind, transcripts_controller
+from reflector.db.transcripts import (
+ SourceKind,
+ TranscriptParticipant,
+ transcripts_controller,
+)
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
from reflector.pipelines.main_live_pipeline import asynctask
+from reflector.pipelines.main_multitrack_pipeline import (
+ task_pipeline_multitrack_process,
+)
+from reflector.pipelines.topic_processing import EmptyPipeline
+from reflector.processors import AudioFileWriterProcessor
+from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
from reflector.redis_cache import get_redis_client
from reflector.settings import settings
-from reflector.whereby import get_room_sessions
+from reflector.storage import get_transcripts_storage
+from reflector.utils.daily import DailyRoomName, extract_base_room_name
+from reflector.video_platforms.factory import create_platform_client
+from reflector.video_platforms.whereby_utils import (
+ parse_whereby_recording_filename,
+ room_name_to_whereby_api_room_name,
+)
logger = structlog.wrap_logger(get_task_logger(__name__))
-def parse_datetime_with_timezone(iso_string: str) -> datetime:
- """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive)."""
- dt = datetime.fromisoformat(iso_string)
- if dt.tzinfo is None:
- dt = dt.replace(tzinfo=timezone.utc)
- return dt
-
-
@shared_task
def process_messages():
queue_url = settings.AWS_PROCESS_RECORDING_QUEUE_URL
@@ -73,14 +82,16 @@ def process_messages():
logger.error("process_messages", error=str(e))
+# only whereby supported.
@shared_task
@asynctask
async def process_recording(bucket_name: str, object_key: str):
logger.info("Processing recording: %s/%s", bucket_name, object_key)
- # extract a guid and a datetime from the object key
- room_name = f"/{object_key[:36]}"
- recorded_at = parse_datetime_with_timezone(object_key[37:57])
+ room_name_part, recorded_at = parse_whereby_recording_filename(object_key)
+
+ # we store whereby api room names, NOT whereby room names
+ room_name = room_name_to_whereby_api_room_name(room_name_part)
meeting = await meetings_controller.get_by_room_name(room_name)
room = await rooms_controller.get_by_id(meeting.room_id)
@@ -102,6 +113,7 @@ async def process_recording(bucket_name: str, object_key: str):
transcript,
{
"topics": [],
+ "participants": [],
},
)
else:
@@ -121,15 +133,15 @@ async def process_recording(bucket_name: str, object_key: str):
upload_filename = transcript.data_path / f"upload{extension}"
upload_filename.parent.mkdir(parents=True, exist_ok=True)
- s3 = boto3.client(
- "s3",
- region_name=settings.TRANSCRIPT_STORAGE_AWS_REGION,
- aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
- aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
- )
+ storage = get_transcripts_storage()
- with open(upload_filename, "wb") as f:
- s3.download_fileobj(bucket_name, object_key, f)
+ try:
+ with open(upload_filename, "wb") as f:
+ await storage.stream_to_fileobj(object_key, f, bucket=bucket_name)
+ except Exception:
+ # Clean up partial file on stream failure
+ upload_filename.unlink(missing_ok=True)
+ raise
container = av.open(upload_filename.as_posix())
try:
@@ -146,6 +158,165 @@ async def process_recording(bucket_name: str, object_key: str):
task_pipeline_file_process.delay(transcript_id=transcript.id)
+@shared_task
+@asynctask
+async def process_multitrack_recording(
+ bucket_name: str,
+ daily_room_name: DailyRoomName,
+ recording_id: str,
+ track_keys: list[str],
+):
+ logger.info(
+ "Processing multitrack recording",
+ bucket=bucket_name,
+ room_name=daily_room_name,
+ recording_id=recording_id,
+ provided_keys=len(track_keys),
+ )
+
+ if not track_keys:
+ logger.warning("No audio track keys provided")
+ return
+
+ tz = timezone.utc
+ recorded_at = datetime.now(tz)
+ try:
+ if track_keys:
+ folder = os.path.basename(os.path.dirname(track_keys[0]))
+ ts_match = re.search(r"(\d{14})$", folder)
+ if ts_match:
+ ts = ts_match.group(1)
+ recorded_at = datetime.strptime(ts, "%Y%m%d%H%M%S").replace(tzinfo=tz)
+ except Exception as e:
+ logger.warning(
+ f"Could not parse recorded_at from keys, using now() {recorded_at}",
+ e,
+ exc_info=True,
+ )
+
+ meeting = await meetings_controller.get_by_room_name(daily_room_name)
+
+ room_name_base = extract_base_room_name(daily_room_name)
+
+ room = await rooms_controller.get_by_name(room_name_base)
+ if not room:
+ raise Exception(f"Room not found: {room_name_base}")
+
+ if not meeting:
+ raise Exception(f"Meeting not found: {room_name_base}")
+
+ logger.info(
+ "Found existing Meeting for recording",
+ meeting_id=meeting.id,
+ room_name=daily_room_name,
+ recording_id=recording_id,
+ )
+
+ recording = await recordings_controller.get_by_id(recording_id)
+ if not recording:
+ object_key_dir = os.path.dirname(track_keys[0]) if track_keys else ""
+ recording = await recordings_controller.create(
+ Recording(
+ id=recording_id,
+ bucket_name=bucket_name,
+ object_key=object_key_dir,
+ recorded_at=recorded_at,
+ meeting_id=meeting.id,
+ track_keys=track_keys,
+ )
+ )
+ else:
+ # Recording already exists; assume metadata was set at creation time
+ pass
+
+ transcript = await transcripts_controller.get_by_recording_id(recording.id)
+ if transcript:
+ await transcripts_controller.update(
+ transcript,
+ {
+ "topics": [],
+ "participants": [],
+ },
+ )
+ else:
+ transcript = await transcripts_controller.add(
+ "",
+ source_kind=SourceKind.ROOM,
+ source_language="en",
+ target_language="en",
+ user_id=room.user_id,
+ recording_id=recording.id,
+ share_mode="public",
+ meeting_id=meeting.id,
+ room_id=room.id,
+ )
+
+ try:
+ daily_client = create_platform_client("daily")
+
+ id_to_name = {}
+ id_to_user_id = {}
+
+ mtg_session_id = None
+ try:
+ rec_details = await daily_client.get_recording(recording_id)
+ mtg_session_id = rec_details.get("mtgSessionId")
+ except Exception as e:
+ logger.warning(
+ "Failed to fetch Daily recording details",
+ error=str(e),
+ recording_id=recording_id,
+ exc_info=True,
+ )
+
+ if mtg_session_id:
+ try:
+ payload = await daily_client.get_meeting_participants(mtg_session_id)
+ for p in payload.get("data", []):
+ pid = p.get("participant_id")
+ name = p.get("user_name")
+ user_id = p.get("user_id")
+ if pid and name:
+ id_to_name[pid] = name
+ if pid and user_id:
+ id_to_user_id[pid] = user_id
+ except Exception as e:
+ logger.warning(
+ "Failed to fetch Daily meeting participants",
+ error=str(e),
+ mtg_session_id=mtg_session_id,
+ exc_info=True,
+ )
+ else:
+ logger.warning(
+ "No mtgSessionId found for recording; participant names may be generic",
+ recording_id=recording_id,
+ )
+
+ for idx, key in enumerate(track_keys):
+ base = os.path.basename(key)
+ m = re.search(r"\d{13,}-([0-9a-fA-F-]{36})-cam-audio-", base)
+ participant_id = m.group(1) if m else None
+
+ default_name = f"Speaker {idx}"
+ name = id_to_name.get(participant_id, default_name)
+ user_id = id_to_user_id.get(participant_id)
+
+ participant = TranscriptParticipant(
+ id=participant_id, speaker=idx, name=name, user_id=user_id
+ )
+ await transcripts_controller.upsert_participant(transcript, participant)
+
+ except Exception as e:
+ logger.warning("Failed to map participant names", error=str(e), exc_info=True)
+
+ task_pipeline_multitrack_process.delay(
+ transcript_id=transcript.id,
+ bucket_name=bucket_name,
+ track_keys=track_keys,
+ )
+
+
@shared_task
@asynctask
async def process_meetings():
@@ -164,7 +335,7 @@ async def process_meetings():
Uses distributed locking to prevent race conditions when multiple workers
process the same meeting simultaneously.
"""
- logger.info("Processing meetings")
+ logger.debug("Processing meetings")
meetings = await meetings_controller.get_all_active()
current_time = datetime.now(timezone.utc)
redis_client = get_redis_client()
@@ -189,7 +360,8 @@ async def process_meetings():
end_date = end_date.replace(tzinfo=timezone.utc)
# This API call could be slow, extend lock if needed
- response = await get_room_sessions(meeting.room_name)
+ client = create_platform_client(meeting.platform)
+ room_sessions = await client.get_room_sessions(meeting.room_name)
try:
# Extend lock after slow operation to ensure we still hold it
@@ -198,7 +370,6 @@ async def process_meetings():
logger_.warning("Lost lock for meeting, skipping")
continue
- room_sessions = response.get("results", [])
has_active_sessions = room_sessions and any(
rs["endedAt"] is None for rs in room_sessions
)
@@ -231,69 +402,120 @@ async def process_meetings():
except LockError:
pass # Lock already released or expired
- logger.info(
+ logger.debug(
"Processed meetings finished",
processed_count=processed_count,
skipped_count=skipped_count,
)
+async def convert_audio_and_waveform(transcript) -> None:
+ """Convert WebM to MP3 and generate waveform for Daily.co recordings.
+
+ This bypasses the full file pipeline which would overwrite stub data.
+ """
+ try:
+ logger.info(
+ "Converting audio to MP3 and generating waveform",
+ transcript_id=transcript.id,
+ )
+
+ upload_path = transcript.data_path / "upload.webm"
+ mp3_path = transcript.audio_mp3_filename
+
+ # Convert WebM to MP3
+ mp3_writer = AudioFileWriterProcessor(path=mp3_path)
+
+ container = av.open(str(upload_path))
+ for frame in container.decode(audio=0):
+ await mp3_writer.push(frame)
+ await mp3_writer.flush()
+ container.close()
+
+ logger.info(
+ "Converted WebM to MP3",
+ transcript_id=transcript.id,
+ mp3_size=mp3_path.stat().st_size,
+ )
+
+ waveform_processor = AudioWaveformProcessor(
+ audio_path=mp3_path,
+ waveform_path=transcript.audio_waveform_filename,
+ )
+ waveform_processor.set_pipeline(EmptyPipeline(logger))
+ await waveform_processor.flush()
+
+ logger.info(
+ "Generated waveform",
+ transcript_id=transcript.id,
+ waveform_path=transcript.audio_waveform_filename,
+ )
+
+ # Update transcript status to ended (successful)
+ await transcripts_controller.update(transcript, {"status": "ended"})
+
+ except Exception as e:
+ logger.error(
+ "Failed to convert audio or generate waveform",
+ transcript_id=transcript.id,
+ error=str(e),
+ )
+ # Keep status as uploaded even if conversion fails
+ pass
+
+
@shared_task
@asynctask
async def reprocess_failed_recordings():
"""
- Find recordings in the S3 bucket and check if they have proper transcriptions.
+ Find recordings in Whereby S3 bucket and check if they have proper transcriptions.
If not, requeue them for processing.
- """
- logger.info("Checking for recordings that need processing or reprocessing")
- s3 = boto3.client(
- "s3",
- region_name=settings.TRANSCRIPT_STORAGE_AWS_REGION,
- aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
- aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
- )
+ Note: Daily.co recordings are processed via webhooks, not this cron job.
+ """
+ logger.info("Checking Whereby recordings that need processing or reprocessing")
+
+ if not settings.WHEREBY_STORAGE_AWS_BUCKET_NAME:
+ raise ValueError(
+ "WHEREBY_STORAGE_AWS_BUCKET_NAME required for Whereby recording reprocessing. "
+ "Set WHEREBY_STORAGE_AWS_BUCKET_NAME environment variable."
+ )
+
+ storage = get_transcripts_storage()
+ bucket_name = settings.WHEREBY_STORAGE_AWS_BUCKET_NAME
reprocessed_count = 0
try:
- paginator = s3.get_paginator("list_objects_v2")
- bucket_name = settings.RECORDING_STORAGE_AWS_BUCKET_NAME
- pages = paginator.paginate(Bucket=bucket_name)
+ object_keys = await storage.list_objects(prefix="", bucket=bucket_name)
- for page in pages:
- if "Contents" not in page:
+ for object_key in object_keys:
+ if not object_key.endswith(".mp4"):
continue
- for obj in page["Contents"]:
- object_key = obj["Key"]
+ recording = await recordings_controller.get_by_object_key(
+ bucket_name, object_key
+ )
+ if not recording:
+ logger.info(f"Queueing recording for processing: {object_key}")
+ process_recording.delay(bucket_name, object_key)
+ reprocessed_count += 1
+ continue
- if not (object_key.endswith(".mp4")):
- continue
-
- recording = await recordings_controller.get_by_object_key(
- bucket_name, object_key
+ transcript = None
+ try:
+ transcript = await transcripts_controller.get_by_recording_id(
+ recording.id
+ )
+ except ValidationError:
+ await transcripts_controller.remove_by_recording_id(recording.id)
+ logger.warning(
+ f"Removed invalid transcript for recording: {recording.id}"
)
- if not recording:
- logger.info(f"Queueing recording for processing: {object_key}")
- process_recording.delay(bucket_name, object_key)
- reprocessed_count += 1
- continue
- transcript = None
- try:
- transcript = await transcripts_controller.get_by_recording_id(
- recording.id
- )
- except ValidationError:
- await transcripts_controller.remove_by_recording_id(recording.id)
- logger.warning(
- f"Removed invalid transcript for recording: {recording.id}"
- )
-
- if transcript is None or transcript.status == "error":
- logger.info(f"Queueing recording for processing: {object_key}")
- process_recording.delay(bucket_name, object_key)
- reprocessed_count += 1
+ if transcript is None or transcript.status == "error":
+ logger.info(f"Queueing recording for processing: {object_key}")
+ process_recording.delay(bucket_name, object_key)
+ reprocessed_count += 1
except Exception as e:
logger.error(f"Error checking S3 bucket: {str(e)}")
diff --git a/server/scripts/recreate_daily_webhook.py b/server/scripts/recreate_daily_webhook.py
new file mode 100644
index 00000000..a378baf2
--- /dev/null
+++ b/server/scripts/recreate_daily_webhook.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+import asyncio
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+import httpx
+
+from reflector.settings import settings
+
+
+async def setup_webhook(webhook_url: str):
+ """
+ Create or update Daily.co webhook for this environment.
+ Uses DAILY_WEBHOOK_UUID to identify existing webhook.
+ """
+ if not settings.DAILY_API_KEY:
+ print("Error: DAILY_API_KEY not set")
+ return 1
+
+ headers = {
+ "Authorization": f"Bearer {settings.DAILY_API_KEY}",
+ "Content-Type": "application/json",
+ }
+
+ webhook_data = {
+ "url": webhook_url,
+ "eventTypes": [
+ "participant.joined",
+ "participant.left",
+ "recording.started",
+ "recording.ready-to-download",
+ "recording.error",
+ ],
+ "hmac": settings.DAILY_WEBHOOK_SECRET,
+ }
+
+ async with httpx.AsyncClient() as client:
+ webhook_uuid = settings.DAILY_WEBHOOK_UUID
+
+ if webhook_uuid:
+ # Update existing webhook
+ print(f"Updating existing webhook {webhook_uuid}...")
+ try:
+ resp = await client.patch(
+ f"https://api.daily.co/v1/webhooks/{webhook_uuid}",
+ headers=headers,
+ json=webhook_data,
+ )
+ resp.raise_for_status()
+ result = resp.json()
+ print(f"✓ Updated webhook {result['uuid']} (state: {result['state']})")
+ print(f" URL: {result['url']}")
+ return 0
+ except httpx.HTTPStatusError as e:
+ if e.response.status_code == 404:
+ print(f"Webhook {webhook_uuid} not found, creating new one...")
+ webhook_uuid = None # Fall through to creation
+ else:
+ print(f"Error updating webhook: {e}")
+ return 1
+
+ if not webhook_uuid:
+ # Create new webhook
+ print("Creating new webhook...")
+ resp = await client.post(
+ "https://api.daily.co/v1/webhooks", headers=headers, json=webhook_data
+ )
+ resp.raise_for_status()
+ result = resp.json()
+ webhook_uuid = result["uuid"]
+
+ print(f"✓ Created webhook {webhook_uuid} (state: {result['state']})")
+ print(f" URL: {result['url']}")
+ print()
+ print("=" * 60)
+ print("IMPORTANT: Add this to your environment variables:")
+ print("=" * 60)
+ print(f"DAILY_WEBHOOK_UUID: {webhook_uuid}")
+ print("=" * 60)
+ print()
+
+ # Try to write UUID to .env file
+ env_file = Path(__file__).parent.parent / ".env"
+ if env_file.exists():
+ lines = env_file.read_text().splitlines()
+ updated = False
+
+ # Update existing DAILY_WEBHOOK_UUID line or add it
+ for i, line in enumerate(lines):
+ if line.startswith("DAILY_WEBHOOK_UUID="):
+ lines[i] = f"DAILY_WEBHOOK_UUID={webhook_uuid}"
+ updated = True
+ break
+
+ if not updated:
+ lines.append(f"DAILY_WEBHOOK_UUID={webhook_uuid}")
+
+ env_file.write_text("\n".join(lines) + "\n")
+ print(f"✓ Also saved to local .env file")
+ else:
+ print(f"⚠ Local .env file not found - please add manually")
+
+ return 0
+
+
+if __name__ == "__main__":
+ if len(sys.argv) != 2:
+ print("Usage: python recreate_daily_webhook.py ")
+ print(
+ "Example: python recreate_daily_webhook.py https://example.com/v1/daily/webhook"
+ )
+ print()
+ print("Behavior:")
+ print(" - If DAILY_WEBHOOK_UUID set: Updates existing webhook")
+ print(
+ " - If DAILY_WEBHOOK_UUID empty: Creates new webhook, saves UUID to .env"
+ )
+ sys.exit(1)
+
+ sys.exit(asyncio.run(setup_webhook(sys.argv[1])))
diff --git a/server/tests/conftest.py b/server/tests/conftest.py
index a70604ae..7d6c4302 100644
--- a/server/tests/conftest.py
+++ b/server/tests/conftest.py
@@ -5,6 +5,18 @@ from unittest.mock import patch
import pytest
+from reflector.schemas.platform import WHEREBY_PLATFORM
+
+
+@pytest.fixture(scope="session", autouse=True)
+def register_mock_platform():
+ from mocks.mock_platform import MockPlatformClient
+
+ from reflector.video_platforms.registry import register_platform
+
+ register_platform(WHEREBY_PLATFORM, MockPlatformClient)
+ yield
+
@pytest.fixture(scope="session", autouse=True)
def settings_configuration():
diff --git a/server/tests/mocks/__init__.py b/server/tests/mocks/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/server/tests/mocks/mock_platform.py b/server/tests/mocks/mock_platform.py
new file mode 100644
index 00000000..0f84a271
--- /dev/null
+++ b/server/tests/mocks/mock_platform.py
@@ -0,0 +1,112 @@
+import uuid
+from datetime import datetime
+from typing import Any, Dict, Literal, Optional
+
+from reflector.db.rooms import Room
+from reflector.video_platforms.base import (
+ ROOM_PREFIX_SEPARATOR,
+ MeetingData,
+ VideoPlatformClient,
+ VideoPlatformConfig,
+)
+
+MockPlatform = Literal["mock"]
+
+
+class MockPlatformClient(VideoPlatformClient):
+ PLATFORM_NAME: MockPlatform = "mock"
+
+ def __init__(self, config: VideoPlatformConfig):
+ super().__init__(config)
+ self._rooms: Dict[str, Dict[str, Any]] = {}
+ self._webhook_calls: list[Dict[str, Any]] = []
+
+ async def create_meeting(
+ self, room_name_prefix: str, end_date: datetime, room: Room
+ ) -> MeetingData:
+ meeting_id = str(uuid.uuid4())
+ room_name = f"{room_name_prefix}{ROOM_PREFIX_SEPARATOR}{meeting_id[:8]}"
+ room_url = f"https://mock.video/{room_name}"
+ host_room_url = f"{room_url}?host=true"
+
+ self._rooms[room_name] = {
+ "id": meeting_id,
+ "name": room_name,
+ "url": room_url,
+ "host_url": host_room_url,
+ "end_date": end_date,
+ "room": room,
+ "participants": [],
+ "is_active": True,
+ }
+
+ return MeetingData.model_construct(
+ meeting_id=meeting_id,
+ room_name=room_name,
+ room_url=room_url,
+ host_room_url=host_room_url,
+ platform="whereby",
+ extra_data={"mock": True},
+ )
+
+ async def get_room_sessions(self, room_name: str) -> Dict[str, Any]:
+ if room_name not in self._rooms:
+ return {"error": "Room not found"}
+
+ room_data = self._rooms[room_name]
+ return {
+ "roomName": room_name,
+ "sessions": [
+ {
+ "sessionId": room_data["id"],
+ "startTime": datetime.utcnow().isoformat(),
+ "participants": room_data["participants"],
+ "isActive": room_data["is_active"],
+ }
+ ],
+ }
+
+ async def delete_room(self, room_name: str) -> bool:
+ if room_name in self._rooms:
+ self._rooms[room_name]["is_active"] = False
+ return True
+ return False
+
+ async def upload_logo(self, room_name: str, logo_path: str) -> bool:
+ if room_name in self._rooms:
+ self._rooms[room_name]["logo_path"] = logo_path
+ return True
+ return False
+
+ def verify_webhook_signature(
+ self, body: bytes, signature: str, timestamp: Optional[str] = None
+ ) -> bool:
+ return signature == "valid"
+
+ def add_participant(
+ self, room_name: str, participant_id: str, participant_name: str
+ ):
+ if room_name in self._rooms:
+ self._rooms[room_name]["participants"].append(
+ {
+ "id": participant_id,
+ "name": participant_name,
+ "joined_at": datetime.utcnow().isoformat(),
+ }
+ )
+
+ def trigger_webhook(self, event_type: str, data: Dict[str, Any]):
+ self._webhook_calls.append(
+ {
+ "type": event_type,
+ "data": data,
+ "timestamp": datetime.utcnow().isoformat(),
+ }
+ )
+
+ def get_webhook_calls(self) -> list[Dict[str, Any]]:
+ return self._webhook_calls.copy()
+
+ def clear_data(self):
+ self._rooms.clear()
+ self._webhook_calls.clear()
diff --git a/server/tests/test_cleanup.py b/server/tests/test_cleanup.py
index 2cb8614c..0c968941 100644
--- a/server/tests/test_cleanup.py
+++ b/server/tests/test_cleanup.py
@@ -139,14 +139,10 @@ async def test_cleanup_deletes_associated_meeting_and_recording():
mock_settings.PUBLIC_DATA_RETENTION_DAYS = 7
# Mock storage deletion
- with patch("reflector.db.transcripts.get_transcripts_storage") as mock_storage:
+ with patch("reflector.worker.cleanup.get_transcripts_storage") as mock_storage:
mock_storage.return_value.delete_file = AsyncMock()
- with patch(
- "reflector.worker.cleanup.get_recordings_storage"
- ) as mock_rec_storage:
- mock_rec_storage.return_value.delete_file = AsyncMock()
- result = await cleanup_old_public_data()
+ result = await cleanup_old_public_data()
# Check results
assert result["transcripts_deleted"] == 1
diff --git a/server/tests/test_consent_multitrack.py b/server/tests/test_consent_multitrack.py
new file mode 100644
index 00000000..15948708
--- /dev/null
+++ b/server/tests/test_consent_multitrack.py
@@ -0,0 +1,330 @@
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from reflector.db.meetings import (
+ MeetingConsent,
+ meeting_consent_controller,
+ meetings_controller,
+)
+from reflector.db.recordings import Recording, recordings_controller
+from reflector.db.rooms import rooms_controller
+from reflector.db.transcripts import SourceKind, transcripts_controller
+from reflector.pipelines.main_live_pipeline import cleanup_consent
+
+
+@pytest.mark.asyncio
+async def test_consent_cleanup_deletes_multitrack_files():
+ room = await rooms_controller.add(
+ name="Test Room",
+ user_id="test-user",
+ zulip_auto_post=False,
+ zulip_stream="",
+ zulip_topic="",
+ is_locked=False,
+ room_mode="normal",
+ recording_type="cloud",
+ recording_trigger="automatic",
+ is_shared=False,
+ platform="daily",
+ )
+
+ # Create meeting
+ meeting = await meetings_controller.create(
+ id="test-multitrack-meeting",
+ room_name="test-room-20250101120000",
+ room_url="https://test.daily.co/test-room",
+ host_room_url="https://test.daily.co/test-room",
+ start_date=datetime.now(timezone.utc),
+ end_date=datetime.now(timezone.utc),
+ room=room,
+ )
+
+ track_keys = [
+ "recordings/test-room-20250101120000/track-0.webm",
+ "recordings/test-room-20250101120000/track-1.webm",
+ "recordings/test-room-20250101120000/track-2.webm",
+ ]
+ recording = await recordings_controller.create(
+ Recording(
+ bucket_name="test-bucket",
+ object_key="recordings/test-room-20250101120000", # Folder path
+ recorded_at=datetime.now(timezone.utc),
+ meeting_id=meeting.id,
+ track_keys=track_keys,
+ )
+ )
+
+ # Create transcript
+ transcript = await transcripts_controller.add(
+ name="Test Multitrack Transcript",
+ source_kind=SourceKind.ROOM,
+ recording_id=recording.id,
+ meeting_id=meeting.id,
+ )
+
+ # Add consent denial
+ await meeting_consent_controller.upsert(
+ MeetingConsent(
+ meeting_id=meeting.id,
+ user_id="test-user",
+ consent_given=False,
+ consent_timestamp=datetime.now(timezone.utc),
+ )
+ )
+
+ # Mock get_transcripts_storage (master credentials with bucket override)
+ with patch(
+ "reflector.pipelines.main_live_pipeline.get_transcripts_storage"
+ ) as mock_get_transcripts_storage:
+ mock_master_storage = MagicMock()
+ mock_master_storage.delete_file = AsyncMock()
+ mock_get_transcripts_storage.return_value = mock_master_storage
+
+ await cleanup_consent(transcript_id=transcript.id)
+
+ # Verify master storage was used with bucket override for all track keys
+ assert mock_master_storage.delete_file.call_count == 3
+ deleted_keys = []
+ for call_args in mock_master_storage.delete_file.call_args_list:
+ key = call_args[0][0]
+ bucket_kwarg = call_args[1].get("bucket")
+ deleted_keys.append(key)
+ assert bucket_kwarg == "test-bucket" # Verify bucket override!
+ assert set(deleted_keys) == set(track_keys)
+
+ updated_transcript = await transcripts_controller.get_by_id(transcript.id)
+ assert updated_transcript.audio_deleted is True
+
+
+@pytest.mark.asyncio
+async def test_consent_cleanup_handles_missing_track_keys():
+ room = await rooms_controller.add(
+ name="Test Room 2",
+ user_id="test-user",
+ zulip_auto_post=False,
+ zulip_stream="",
+ zulip_topic="",
+ is_locked=False,
+ room_mode="normal",
+ recording_type="cloud",
+ recording_trigger="automatic",
+ is_shared=False,
+ platform="daily",
+ )
+
+ # Create meeting
+ meeting = await meetings_controller.create(
+ id="test-multitrack-meeting-2",
+ room_name="test-room-20250101120001",
+ room_url="https://test.daily.co/test-room-2",
+ host_room_url="https://test.daily.co/test-room-2",
+ start_date=datetime.now(timezone.utc),
+ end_date=datetime.now(timezone.utc),
+ room=room,
+ )
+
+ recording = await recordings_controller.create(
+ Recording(
+ bucket_name="test-bucket",
+ object_key="recordings/old-style-recording.mp4",
+ recorded_at=datetime.now(timezone.utc),
+ meeting_id=meeting.id,
+ track_keys=None,
+ )
+ )
+
+ transcript = await transcripts_controller.add(
+ name="Test Old-Style Transcript",
+ source_kind=SourceKind.ROOM,
+ recording_id=recording.id,
+ meeting_id=meeting.id,
+ )
+
+ # Add consent denial
+ await meeting_consent_controller.upsert(
+ MeetingConsent(
+ meeting_id=meeting.id,
+ user_id="test-user-2",
+ consent_given=False,
+ consent_timestamp=datetime.now(timezone.utc),
+ )
+ )
+
+ # Mock get_transcripts_storage (master credentials with bucket override)
+ with patch(
+ "reflector.pipelines.main_live_pipeline.get_transcripts_storage"
+ ) as mock_get_transcripts_storage:
+ mock_master_storage = MagicMock()
+ mock_master_storage.delete_file = AsyncMock()
+ mock_get_transcripts_storage.return_value = mock_master_storage
+
+ await cleanup_consent(transcript_id=transcript.id)
+
+ # Verify master storage was used with bucket override
+ assert mock_master_storage.delete_file.call_count == 1
+ call_args = mock_master_storage.delete_file.call_args
+ assert call_args[0][0] == recording.object_key
+ assert call_args[1].get("bucket") == "test-bucket" # Verify bucket override!
+
+
+@pytest.mark.asyncio
+async def test_consent_cleanup_empty_track_keys_falls_back():
+ room = await rooms_controller.add(
+ name="Test Room 3",
+ user_id="test-user",
+ zulip_auto_post=False,
+ zulip_stream="",
+ zulip_topic="",
+ is_locked=False,
+ room_mode="normal",
+ recording_type="cloud",
+ recording_trigger="automatic",
+ is_shared=False,
+ platform="daily",
+ )
+
+ # Create meeting
+ meeting = await meetings_controller.create(
+ id="test-multitrack-meeting-3",
+ room_name="test-room-20250101120002",
+ room_url="https://test.daily.co/test-room-3",
+ host_room_url="https://test.daily.co/test-room-3",
+ start_date=datetime.now(timezone.utc),
+ end_date=datetime.now(timezone.utc),
+ room=room,
+ )
+
+ recording = await recordings_controller.create(
+ Recording(
+ bucket_name="test-bucket",
+ object_key="recordings/fallback-recording.mp4",
+ recorded_at=datetime.now(timezone.utc),
+ meeting_id=meeting.id,
+ track_keys=[],
+ )
+ )
+
+ transcript = await transcripts_controller.add(
+ name="Test Empty Track Keys Transcript",
+ source_kind=SourceKind.ROOM,
+ recording_id=recording.id,
+ meeting_id=meeting.id,
+ )
+
+ # Add consent denial
+ await meeting_consent_controller.upsert(
+ MeetingConsent(
+ meeting_id=meeting.id,
+ user_id="test-user-3",
+ consent_given=False,
+ consent_timestamp=datetime.now(timezone.utc),
+ )
+ )
+
+ # Mock get_transcripts_storage (master credentials with bucket override)
+ with patch(
+ "reflector.pipelines.main_live_pipeline.get_transcripts_storage"
+ ) as mock_get_transcripts_storage:
+ mock_master_storage = MagicMock()
+ mock_master_storage.delete_file = AsyncMock()
+ mock_get_transcripts_storage.return_value = mock_master_storage
+
+ # Run cleanup
+ await cleanup_consent(transcript_id=transcript.id)
+
+ # Verify master storage was used with bucket override
+ assert mock_master_storage.delete_file.call_count == 1
+ call_args = mock_master_storage.delete_file.call_args
+ assert call_args[0][0] == recording.object_key
+ assert call_args[1].get("bucket") == "test-bucket" # Verify bucket override!
+
+
+@pytest.mark.asyncio
+async def test_consent_cleanup_partial_failure_doesnt_mark_deleted():
+ room = await rooms_controller.add(
+ name="Test Room 4",
+ user_id="test-user",
+ zulip_auto_post=False,
+ zulip_stream="",
+ zulip_topic="",
+ is_locked=False,
+ room_mode="normal",
+ recording_type="cloud",
+ recording_trigger="automatic",
+ is_shared=False,
+ platform="daily",
+ )
+
+ # Create meeting
+ meeting = await meetings_controller.create(
+ id="test-multitrack-meeting-4",
+ room_name="test-room-20250101120003",
+ room_url="https://test.daily.co/test-room-4",
+ host_room_url="https://test.daily.co/test-room-4",
+ start_date=datetime.now(timezone.utc),
+ end_date=datetime.now(timezone.utc),
+ room=room,
+ )
+
+ track_keys = [
+ "recordings/test-room-20250101120003/track-0.webm",
+ "recordings/test-room-20250101120003/track-1.webm",
+ "recordings/test-room-20250101120003/track-2.webm",
+ ]
+ recording = await recordings_controller.create(
+ Recording(
+ bucket_name="test-bucket",
+ object_key="recordings/test-room-20250101120003",
+ recorded_at=datetime.now(timezone.utc),
+ meeting_id=meeting.id,
+ track_keys=track_keys,
+ )
+ )
+
+ # Create transcript
+ transcript = await transcripts_controller.add(
+ name="Test Partial Failure Transcript",
+ source_kind=SourceKind.ROOM,
+ recording_id=recording.id,
+ meeting_id=meeting.id,
+ )
+
+ # Add consent denial
+ await meeting_consent_controller.upsert(
+ MeetingConsent(
+ meeting_id=meeting.id,
+ user_id="test-user-4",
+ consent_given=False,
+ consent_timestamp=datetime.now(timezone.utc),
+ )
+ )
+
+ # Mock get_transcripts_storage (master credentials with bucket override) with partial failure
+ with patch(
+ "reflector.pipelines.main_live_pipeline.get_transcripts_storage"
+ ) as mock_get_transcripts_storage:
+ mock_master_storage = MagicMock()
+
+ call_count = 0
+
+ async def delete_side_effect(key, bucket=None):
+ nonlocal call_count
+ call_count += 1
+ if call_count == 2:
+ raise Exception("S3 deletion failed")
+
+ mock_master_storage.delete_file = AsyncMock(side_effect=delete_side_effect)
+ mock_get_transcripts_storage.return_value = mock_master_storage
+
+ await cleanup_consent(transcript_id=transcript.id)
+
+ # Verify master storage was called with bucket override
+ assert mock_master_storage.delete_file.call_count == 3
+
+ updated_transcript = await transcripts_controller.get_by_id(transcript.id)
+ assert (
+ updated_transcript.audio_deleted is None
+ or updated_transcript.audio_deleted is False
+ )
diff --git a/server/tests/test_pipeline_main_file.py b/server/tests/test_pipeline_main_file.py
index f86dc85d..825c8389 100644
--- a/server/tests/test_pipeline_main_file.py
+++ b/server/tests/test_pipeline_main_file.py
@@ -127,18 +127,27 @@ async def mock_storage():
from reflector.storage.base import Storage
class TestStorage(Storage):
- async def _put_file(self, path, data):
+ async def _put_file(self, path, data, bucket=None):
return None
- async def _get_file_url(self, path):
+ async def _get_file_url(
+ self,
+ path,
+ operation: str = "get_object",
+ expires_in: int = 3600,
+ bucket=None,
+ ):
return f"http://test-storage/{path}"
- async def _get_file(self, path):
+ async def _get_file(self, path, bucket=None):
return b"test_audio_data"
- async def _delete_file(self, path):
+ async def _delete_file(self, path, bucket=None):
return None
+ async def _stream_to_fileobj(self, path, fileobj, bucket=None):
+ fileobj.write(b"test_audio_data")
+
storage = TestStorage()
# Add mock tracking for verification
storage._put_file = AsyncMock(side_effect=storage._put_file)
@@ -181,7 +190,7 @@ async def mock_waveform_processor():
async def mock_topic_detector():
"""Mock TranscriptTopicDetectorProcessor"""
with patch(
- "reflector.pipelines.main_file_pipeline.TranscriptTopicDetectorProcessor"
+ "reflector.pipelines.topic_processing.TranscriptTopicDetectorProcessor"
) as mock_topic_class:
mock_topic = AsyncMock()
mock_topic.set_pipeline = MagicMock()
@@ -218,7 +227,7 @@ async def mock_topic_detector():
async def mock_title_processor():
"""Mock TranscriptFinalTitleProcessor"""
with patch(
- "reflector.pipelines.main_file_pipeline.TranscriptFinalTitleProcessor"
+ "reflector.pipelines.topic_processing.TranscriptFinalTitleProcessor"
) as mock_title_class:
mock_title = AsyncMock()
mock_title.set_pipeline = MagicMock()
@@ -247,7 +256,7 @@ async def mock_title_processor():
async def mock_summary_processor():
"""Mock TranscriptFinalSummaryProcessor"""
with patch(
- "reflector.pipelines.main_file_pipeline.TranscriptFinalSummaryProcessor"
+ "reflector.pipelines.topic_processing.TranscriptFinalSummaryProcessor"
) as mock_summary_class:
mock_summary = AsyncMock()
mock_summary.set_pipeline = MagicMock()
diff --git a/server/tests/test_room_ics_api.py b/server/tests/test_room_ics_api.py
index 8e7cf76f..79512995 100644
--- a/server/tests/test_room_ics_api.py
+++ b/server/tests/test_room_ics_api.py
@@ -48,6 +48,7 @@ async def test_create_room_with_ics_fields(authenticated_client):
"ics_url": "https://calendar.example.com/test.ics",
"ics_fetch_interval": 600,
"ics_enabled": True,
+ "platform": "daily",
},
)
assert response.status_code == 200
@@ -75,6 +76,7 @@ async def test_update_room_ics_configuration(authenticated_client):
"is_shared": False,
"webhook_url": "",
"webhook_secret": "",
+ "platform": "daily",
},
)
assert response.status_code == 200
@@ -111,6 +113,7 @@ async def test_trigger_ics_sync(authenticated_client):
is_shared=False,
ics_url="https://calendar.example.com/api.ics",
ics_enabled=True,
+ platform="daily",
)
cal = Calendar()
@@ -154,6 +157,7 @@ async def test_trigger_ics_sync_unauthorized(client):
is_shared=False,
ics_url="https://calendar.example.com/api.ics",
ics_enabled=True,
+ platform="daily",
)
response = await client.post(f"/rooms/{room.name}/ics/sync")
@@ -176,6 +180,7 @@ async def test_trigger_ics_sync_not_configured(authenticated_client):
recording_trigger="automatic-2nd-participant",
is_shared=False,
ics_enabled=False,
+ platform="daily",
)
response = await client.post(f"/rooms/{room.name}/ics/sync")
@@ -200,6 +205,7 @@ async def test_get_ics_status(authenticated_client):
ics_url="https://calendar.example.com/status.ics",
ics_enabled=True,
ics_fetch_interval=300,
+ platform="daily",
)
now = datetime.now(timezone.utc)
@@ -231,6 +237,7 @@ async def test_get_ics_status_unauthorized(client):
is_shared=False,
ics_url="https://calendar.example.com/status.ics",
ics_enabled=True,
+ platform="daily",
)
response = await client.get(f"/rooms/{room.name}/ics/status")
@@ -252,6 +259,7 @@ async def test_list_room_meetings(authenticated_client):
recording_type="cloud",
recording_trigger="automatic-2nd-participant",
is_shared=False,
+ platform="daily",
)
now = datetime.now(timezone.utc)
@@ -298,6 +306,7 @@ async def test_list_room_meetings_non_owner(client):
recording_type="cloud",
recording_trigger="automatic-2nd-participant",
is_shared=False,
+ platform="daily",
)
event = CalendarEvent(
@@ -334,6 +343,7 @@ async def test_list_upcoming_meetings(authenticated_client):
recording_type="cloud",
recording_trigger="automatic-2nd-participant",
is_shared=False,
+ platform="daily",
)
now = datetime.now(timezone.utc)
diff --git a/server/tests/test_storage.py b/server/tests/test_storage.py
new file mode 100644
index 00000000..ccfc3dbd
--- /dev/null
+++ b/server/tests/test_storage.py
@@ -0,0 +1,321 @@
+"""Tests for storage abstraction layer."""
+
+import io
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from botocore.exceptions import ClientError
+
+from reflector.storage.base import StoragePermissionError
+from reflector.storage.storage_aws import AwsStorage
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_stream_to_fileobj():
+ """Test that AWS storage can stream directly to a file object without loading into memory."""
+ # Setup
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock download_fileobj to write data
+ async def mock_download(Bucket, Key, Fileobj, **kwargs):
+ Fileobj.write(b"chunk1chunk2")
+
+ mock_client = AsyncMock()
+ mock_client.download_fileobj = AsyncMock(side_effect=mock_download)
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ # Patch the session client
+ with patch.object(storage.session, "client", return_value=mock_client):
+ # Create a file-like object to stream to
+ output = io.BytesIO()
+
+ # Act - stream to file object
+ await storage.stream_to_fileobj("test-file.mp4", output, bucket="test-bucket")
+
+ # Assert
+ mock_client.download_fileobj.assert_called_once_with(
+ Bucket="test-bucket", Key="test-file.mp4", Fileobj=output
+ )
+
+ # Check that data was written to output
+ output.seek(0)
+ assert output.read() == b"chunk1chunk2"
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_stream_to_fileobj_with_folder():
+ """Test streaming with folder prefix in bucket name."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket/recordings",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ async def mock_download(Bucket, Key, Fileobj, **kwargs):
+ Fileobj.write(b"data")
+
+ mock_client = AsyncMock()
+ mock_client.download_fileobj = AsyncMock(side_effect=mock_download)
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ output = io.BytesIO()
+ await storage.stream_to_fileobj("file.mp4", output, bucket="other-bucket")
+
+ # Should use folder prefix from instance config
+ mock_client.download_fileobj.assert_called_once_with(
+ Bucket="other-bucket", Key="recordings/file.mp4", Fileobj=output
+ )
+
+
+@pytest.mark.asyncio
+async def test_storage_base_class_stream_to_fileobj():
+ """Test that base Storage class has stream_to_fileobj method."""
+ from reflector.storage.base import Storage
+
+ # Verify method exists in base class
+ assert hasattr(Storage, "stream_to_fileobj")
+
+ # Create a mock storage instance
+ storage = MagicMock(spec=Storage)
+ storage.stream_to_fileobj = AsyncMock()
+
+ # Should be callable
+ await storage.stream_to_fileobj("file.mp4", io.BytesIO())
+ storage.stream_to_fileobj.assert_called_once()
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_stream_uses_download_fileobj():
+ """Test that download_fileobj is called correctly."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ async def mock_download(Bucket, Key, Fileobj, **kwargs):
+ Fileobj.write(b"data")
+
+ mock_client = AsyncMock()
+ mock_client.download_fileobj = AsyncMock(side_effect=mock_download)
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ output = io.BytesIO()
+ await storage.stream_to_fileobj("test.mp4", output)
+
+ # Verify download_fileobj was called with correct parameters
+ mock_client.download_fileobj.assert_called_once_with(
+ Bucket="test-bucket", Key="test.mp4", Fileobj=output
+ )
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_handles_access_denied_error():
+ """Test that AccessDenied errors are caught and wrapped in StoragePermissionError."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError with AccessDenied
+ error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}
+ mock_client = AsyncMock()
+ mock_client.put_object = AsyncMock(
+ side_effect=ClientError(error_response, "PutObject")
+ )
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ with pytest.raises(StoragePermissionError) as exc_info:
+ await storage.put_file("test.txt", b"data")
+
+ # Verify error message contains expected information
+ error_msg = str(exc_info.value)
+ assert "AccessDenied" in error_msg
+ assert "default bucket 'test-bucket'" in error_msg
+ assert "S3 upload failed" in error_msg
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_handles_no_such_bucket_error():
+ """Test that NoSuchBucket errors are caught and wrapped in StoragePermissionError."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError with NoSuchBucket
+ error_response = {
+ "Error": {
+ "Code": "NoSuchBucket",
+ "Message": "The specified bucket does not exist",
+ }
+ }
+ mock_client = AsyncMock()
+ mock_client.delete_object = AsyncMock(
+ side_effect=ClientError(error_response, "DeleteObject")
+ )
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ with pytest.raises(StoragePermissionError) as exc_info:
+ await storage.delete_file("test.txt")
+
+ # Verify error message contains expected information
+ error_msg = str(exc_info.value)
+ assert "NoSuchBucket" in error_msg
+ assert "default bucket 'test-bucket'" in error_msg
+ assert "S3 delete failed" in error_msg
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_error_message_with_bucket_override():
+ """Test that error messages correctly show overridden bucket."""
+ storage = AwsStorage(
+ aws_bucket_name="default-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError with AccessDenied
+ error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}
+ mock_client = AsyncMock()
+ mock_client.get_object = AsyncMock(
+ side_effect=ClientError(error_response, "GetObject")
+ )
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ with pytest.raises(StoragePermissionError) as exc_info:
+ await storage.get_file("test.txt", bucket="override-bucket")
+
+ # Verify error message shows overridden bucket, not default
+ error_msg = str(exc_info.value)
+ assert "overridden bucket 'override-bucket'" in error_msg
+ assert "default-bucket" not in error_msg
+ assert "S3 download failed" in error_msg
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_reraises_non_handled_errors():
+ """Test that non-AccessDenied/NoSuchBucket errors are re-raised as-is."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError with different error code
+ error_response = {
+ "Error": {"Code": "InternalError", "Message": "Internal Server Error"}
+ }
+ mock_client = AsyncMock()
+ mock_client.put_object = AsyncMock(
+ side_effect=ClientError(error_response, "PutObject")
+ )
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ # Should raise ClientError, not StoragePermissionError
+ with pytest.raises(ClientError) as exc_info:
+ await storage.put_file("test.txt", b"data")
+
+ # Verify it's the original ClientError
+ assert exc_info.value.response["Error"]["Code"] == "InternalError"
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_presign_url_handles_errors():
+ """Test that presigned URL generation handles permission errors."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError with AccessDenied during presign operation
+ error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}
+ mock_client = AsyncMock()
+ mock_client.generate_presigned_url = AsyncMock(
+ side_effect=ClientError(error_response, "GeneratePresignedUrl")
+ )
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ with pytest.raises(StoragePermissionError) as exc_info:
+ await storage.get_file_url("test.txt")
+
+ # Verify error message
+ error_msg = str(exc_info.value)
+ assert "S3 presign failed" in error_msg
+ assert "AccessDenied" in error_msg
+
+
+@pytest.mark.asyncio
+async def test_aws_storage_list_objects_handles_errors():
+ """Test that list_objects handles permission errors."""
+ storage = AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ )
+
+ # Mock ClientError during list operation
+ error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}
+ mock_paginator = MagicMock()
+
+ async def mock_paginate(*args, **kwargs):
+ raise ClientError(error_response, "ListObjectsV2")
+ yield # Make it an async generator
+
+ mock_paginator.paginate = mock_paginate
+
+ mock_client = AsyncMock()
+ mock_client.get_paginator = MagicMock(return_value=mock_paginator)
+ mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+ mock_client.__aexit__ = AsyncMock(return_value=None)
+
+ with patch.object(storage.session, "client", return_value=mock_client):
+ with pytest.raises(StoragePermissionError) as exc_info:
+ await storage.list_objects(prefix="test/")
+
+ error_msg = str(exc_info.value)
+ assert "S3 list_objects failed" in error_msg
+ assert "AccessDenied" in error_msg
+
+
+def test_aws_storage_constructor_rejects_mixed_auth():
+ """Test that constructor rejects both role_arn and access keys."""
+ with pytest.raises(ValueError, match="cannot use both.*role_arn.*access keys"):
+ AwsStorage(
+ aws_bucket_name="test-bucket",
+ aws_region="us-east-1",
+ aws_access_key_id="test-key",
+ aws_secret_access_key="test-secret",
+ aws_role_arn="arn:aws:iam::123456789012:role/test-role",
+ )
diff --git a/server/tests/test_transcripts_recording_deletion.py b/server/tests/test_transcripts_recording_deletion.py
index 810fe567..3a632612 100644
--- a/server/tests/test_transcripts_recording_deletion.py
+++ b/server/tests/test_transcripts_recording_deletion.py
@@ -22,13 +22,16 @@ async def test_recording_deleted_with_transcript():
recording_id=recording.id,
)
- with patch("reflector.db.transcripts.get_recordings_storage") as mock_get_storage:
+ with patch("reflector.db.transcripts.get_transcripts_storage") as mock_get_storage:
storage_instance = mock_get_storage.return_value
storage_instance.delete_file = AsyncMock()
await transcripts_controller.remove_by_id(transcript.id)
- storage_instance.delete_file.assert_awaited_once_with(recording.object_key)
+ # Should be called with bucket override
+ storage_instance.delete_file.assert_awaited_once_with(
+ recording.object_key, bucket=recording.bucket_name
+ )
assert await recordings_controller.get_by_id(recording.id) is None
assert await transcripts_controller.get_by_id(transcript.id) is None
diff --git a/server/tests/test_utils_daily.py b/server/tests/test_utils_daily.py
new file mode 100644
index 00000000..356ffc94
--- /dev/null
+++ b/server/tests/test_utils_daily.py
@@ -0,0 +1,17 @@
+import pytest
+
+from reflector.utils.daily import extract_base_room_name
+
+
+@pytest.mark.parametrize(
+ "daily_room_name,expected",
+ [
+ ("daily-20251020193458", "daily"),
+ ("daily-2-20251020193458", "daily-2"),
+ ("my-room-name-20251020193458", "my-room-name"),
+ ("room-with-numbers-123-20251020193458", "room-with-numbers-123"),
+ ("x-20251020193458", "x"),
+ ],
+)
+def test_extract_base_room_name(daily_room_name, expected):
+ assert extract_base_room_name(daily_room_name) == expected
diff --git a/server/tests/test_utils_url.py b/server/tests/test_utils_url.py
new file mode 100644
index 00000000..c833983c
--- /dev/null
+++ b/server/tests/test_utils_url.py
@@ -0,0 +1,63 @@
+"""Tests for URL utility functions."""
+
+from reflector.utils.url import add_query_param
+
+
+class TestAddQueryParam:
+ """Test the add_query_param function."""
+
+ def test_add_param_to_url_without_query(self):
+ """Should add query param with ? to URL without existing params."""
+ url = "https://example.com/room"
+ result = add_query_param(url, "t", "token123")
+ assert result == "https://example.com/room?t=token123"
+
+ def test_add_param_to_url_with_existing_query(self):
+ """Should add query param with & to URL with existing params."""
+ url = "https://example.com/room?existing=param"
+ result = add_query_param(url, "t", "token123")
+ assert result == "https://example.com/room?existing=param&t=token123"
+
+ def test_add_param_to_url_with_multiple_existing_params(self):
+ """Should add query param to URL with multiple existing params."""
+ url = "https://example.com/room?param1=value1¶m2=value2"
+ result = add_query_param(url, "t", "token123")
+ assert (
+ result == "https://example.com/room?param1=value1¶m2=value2&t=token123"
+ )
+
+ def test_add_param_with_special_characters(self):
+ """Should properly encode special characters in param value."""
+ url = "https://example.com/room"
+ result = add_query_param(url, "name", "hello world")
+ assert result == "https://example.com/room?name=hello+world"
+
+ def test_add_param_to_url_with_fragment(self):
+ """Should preserve URL fragment when adding query param."""
+ url = "https://example.com/room#section"
+ result = add_query_param(url, "t", "token123")
+ assert result == "https://example.com/room?t=token123#section"
+
+ def test_add_param_to_url_with_query_and_fragment(self):
+ """Should preserve fragment when adding param to URL with existing query."""
+ url = "https://example.com/room?existing=param#section"
+ result = add_query_param(url, "t", "token123")
+ assert result == "https://example.com/room?existing=param&t=token123#section"
+
+ def test_add_param_overwrites_existing_param(self):
+ """Should overwrite existing param with same name."""
+ url = "https://example.com/room?t=oldtoken"
+ result = add_query_param(url, "t", "newtoken")
+ assert result == "https://example.com/room?t=newtoken"
+
+ def test_url_without_scheme(self):
+ """Should handle URLs without scheme (relative URLs)."""
+ url = "/room/path"
+ result = add_query_param(url, "t", "token123")
+ assert result == "/room/path?t=token123"
+
+ def test_empty_url(self):
+ """Should handle empty URL."""
+ url = ""
+ result = add_query_param(url, "t", "token123")
+ assert result == "?t=token123"
diff --git a/server/tests/test_video_platforms_factory.py b/server/tests/test_video_platforms_factory.py
new file mode 100644
index 00000000..6c8c02c5
--- /dev/null
+++ b/server/tests/test_video_platforms_factory.py
@@ -0,0 +1,58 @@
+"""Tests for video_platforms.factory module."""
+
+from unittest.mock import patch
+
+from reflector.video_platforms.factory import get_platform
+
+
+class TestGetPlatformF:
+ """Test suite for get_platform function."""
+
+ @patch("reflector.video_platforms.factory.settings")
+ def test_with_room_platform(self, mock_settings):
+ """When room_platform provided, should return room_platform."""
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby"
+
+ # Should return the room's platform when provided
+ assert get_platform(room_platform="daily") == "daily"
+ assert get_platform(room_platform="whereby") == "whereby"
+
+ @patch("reflector.video_platforms.factory.settings")
+ def test_without_room_platform_uses_default(self, mock_settings):
+ """When no room_platform, should return DEFAULT_VIDEO_PLATFORM."""
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby"
+
+ # Should return default when room_platform is None
+ assert get_platform(room_platform=None) == "whereby"
+
+ @patch("reflector.video_platforms.factory.settings")
+ def test_with_daily_default(self, mock_settings):
+ """When DEFAULT_VIDEO_PLATFORM is 'daily', should return 'daily' when no room_platform."""
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "daily"
+
+ # Should return default 'daily' when room_platform is None
+ assert get_platform(room_platform=None) == "daily"
+
+ @patch("reflector.video_platforms.factory.settings")
+ def test_no_room_id_provided(self, mock_settings):
+ """Should work correctly even when room_id is not provided."""
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby"
+
+ # Should use room_platform when provided
+ assert get_platform(room_platform="daily") == "daily"
+
+ # Should use default when room_platform not provided
+ assert get_platform(room_platform=None) == "whereby"
+
+ @patch("reflector.video_platforms.factory.settings")
+ def test_room_platform_always_takes_precedence(self, mock_settings):
+ """room_platform should always be used when provided."""
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby"
+
+ # room_platform should take precedence over default
+ assert get_platform(room_platform="daily") == "daily"
+ assert get_platform(room_platform="whereby") == "whereby"
+
+ # Different default shouldn't matter when room_platform provided
+ mock_settings.DEFAULT_VIDEO_PLATFORM = "daily"
+ assert get_platform(room_platform="whereby") == "whereby"
diff --git a/www/app/[roomName]/[meetingId]/page.tsx b/www/app/[roomName]/[meetingId]/page.tsx
index 8ce405ba..725aa571 100644
--- a/www/app/[roomName]/[meetingId]/page.tsx
+++ b/www/app/[roomName]/[meetingId]/page.tsx
@@ -1,3 +1,3 @@
-import Room from "../room";
+import RoomContainer from "../components/RoomContainer";
-export default Room;
+export default RoomContainer;
diff --git a/www/app/[roomName]/components/DailyRoom.tsx b/www/app/[roomName]/components/DailyRoom.tsx
new file mode 100644
index 00000000..920f8624
--- /dev/null
+++ b/www/app/[roomName]/components/DailyRoom.tsx
@@ -0,0 +1,93 @@
+"use client";
+
+import { useCallback, useEffect, useRef } from "react";
+import { Box } from "@chakra-ui/react";
+import { useRouter } from "next/navigation";
+import DailyIframe, { DailyCall } from "@daily-co/daily-js";
+import type { components } from "../../reflector-api";
+import { useAuth } from "../../lib/AuthProvider";
+import {
+ ConsentDialogButton,
+ recordingTypeRequiresConsent,
+} from "../../lib/consent";
+
+type Meeting = components["schemas"]["Meeting"];
+
+interface DailyRoomProps {
+ meeting: Meeting;
+}
+
+export default function DailyRoom({ meeting }: DailyRoomProps) {
+ const router = useRouter();
+ const auth = useAuth();
+ const status = auth.status;
+ const containerRef = useRef(null);
+
+ const roomUrl = meeting?.host_room_url || meeting?.room_url;
+
+ const isLoading = status === "loading";
+
+ const handleLeave = useCallback(() => {
+ router.push("/browse");
+ }, [router]);
+
+ useEffect(() => {
+ if (isLoading || !roomUrl || !containerRef.current) return;
+
+ let frame: DailyCall | null = null;
+ let destroyed = false;
+
+ const createAndJoin = async () => {
+ try {
+ const existingFrame = DailyIframe.getCallInstance();
+ if (existingFrame) {
+ await existingFrame.destroy();
+ }
+
+ frame = DailyIframe.createFrame(containerRef.current!, {
+ iframeStyle: {
+ width: "100vw",
+ height: "100vh",
+ border: "none",
+ },
+ showLeaveButton: true,
+ showFullscreenButton: true,
+ });
+
+ if (destroyed) {
+ await frame.destroy();
+ return;
+ }
+
+ frame.on("left-meeting", handleLeave);
+ await frame.join({ url: roomUrl });
+ } catch (error) {
+ console.error("Error creating Daily frame:", error);
+ }
+ };
+
+ createAndJoin();
+
+ return () => {
+ destroyed = true;
+ if (frame) {
+ frame.destroy().catch((e) => {
+ console.error("Error destroying frame:", e);
+ });
+ }
+ };
+ }, [roomUrl, isLoading, handleLeave]);
+
+ if (!roomUrl) {
+ return null;
+ }
+
+ return (
+
+
+ {meeting.recording_type &&
+ recordingTypeRequiresConsent(meeting.recording_type) &&
+ meeting.id && }
+
+ );
+}
diff --git a/www/app/[roomName]/components/RoomContainer.tsx b/www/app/[roomName]/components/RoomContainer.tsx
new file mode 100644
index 00000000..bfcd82f7
--- /dev/null
+++ b/www/app/[roomName]/components/RoomContainer.tsx
@@ -0,0 +1,214 @@
+"use client";
+
+import { roomMeetingUrl } from "../../lib/routes";
+import { useCallback, useEffect, useState, use } from "react";
+import { Box, Text, Spinner } from "@chakra-ui/react";
+import { useRouter } from "next/navigation";
+import {
+ useRoomGetByName,
+ useRoomsCreateMeeting,
+ useRoomGetMeeting,
+} from "../../lib/apiHooks";
+import type { components } from "../../reflector-api";
+import MeetingSelection from "../MeetingSelection";
+import useRoomDefaultMeeting from "../useRoomDefaultMeeting";
+import WherebyRoom from "./WherebyRoom";
+import DailyRoom from "./DailyRoom";
+import { useAuth } from "../../lib/AuthProvider";
+import { useError } from "../../(errors)/errorContext";
+import { parseNonEmptyString } from "../../lib/utils";
+import { printApiError } from "../../api/_error";
+
+type Meeting = components["schemas"]["Meeting"];
+
+export type RoomDetails = {
+ params: Promise<{
+ roomName: string;
+ meetingId?: string;
+ }>;
+};
+
+function LoadingSpinner() {
+ return (
+
+
+
+ );
+}
+
+export default function RoomContainer(details: RoomDetails) {
+ const params = use(details.params);
+ const roomName = parseNonEmptyString(
+ params.roomName,
+ true,
+ "panic! params.roomName is required",
+ );
+ const router = useRouter();
+ const auth = useAuth();
+ const status = auth.status;
+ const isAuthenticated = status === "authenticated";
+ const { setError } = useError();
+
+ const roomQuery = useRoomGetByName(roomName);
+ const createMeetingMutation = useRoomsCreateMeeting();
+
+ const room = roomQuery.data;
+
+ const pageMeetingId = params.meetingId;
+
+ const defaultMeeting = useRoomDefaultMeeting(
+ room && !room.ics_enabled && !pageMeetingId ? roomName : null,
+ );
+
+ const explicitMeeting = useRoomGetMeeting(roomName, pageMeetingId || null);
+
+ const meeting = explicitMeeting.data || defaultMeeting.response;
+
+ const isLoading =
+ status === "loading" ||
+ roomQuery.isLoading ||
+ defaultMeeting?.loading ||
+ explicitMeeting.isLoading ||
+ createMeetingMutation.isPending;
+
+ const errors = [
+ explicitMeeting.error,
+ defaultMeeting.error,
+ roomQuery.error,
+ createMeetingMutation.error,
+ ].filter(Boolean);
+
+ const isOwner =
+ isAuthenticated && room ? auth.user?.id === room.user_id : false;
+
+ const handleMeetingSelect = (selectedMeeting: Meeting) => {
+ router.push(
+ roomMeetingUrl(
+ roomName,
+ parseNonEmptyString(
+ selectedMeeting.id,
+ true,
+ "panic! selectedMeeting.id is required",
+ ),
+ ),
+ );
+ };
+
+ const handleCreateUnscheduled = async () => {
+ try {
+ const newMeeting = await createMeetingMutation.mutateAsync({
+ params: {
+ path: { room_name: roomName },
+ },
+ body: {
+ allow_duplicated: room ? room.ics_enabled : false,
+ },
+ });
+ handleMeetingSelect(newMeeting);
+ } catch (err) {
+ console.error("Failed to create meeting:", err);
+ }
+ };
+
+ if (isLoading) {
+ return ;
+ }
+
+ if (!room) {
+ return (
+
+ Room not found
+
+ );
+ }
+
+ if (room.ics_enabled && !params.meetingId) {
+ return (
+
+ );
+ }
+
+ if (errors.length > 0) {
+ return (
+
+ {errors.map((error, i) => (
+
+ {printApiError(error)}
+
+ ))}
+
+ );
+ }
+
+ if (!meeting) {
+ return ;
+ }
+
+ const platform = meeting.platform;
+
+ if (!platform) {
+ return (
+
+ Meeting platform not configured
+
+ );
+ }
+
+ switch (platform) {
+ case "daily":
+ return ;
+ case "whereby":
+ return ;
+ default: {
+ const _exhaustive: never = platform;
+ return (
+
+ Unknown platform: {platform}
+
+ );
+ }
+ }
+}
diff --git a/www/app/[roomName]/components/WherebyRoom.tsx b/www/app/[roomName]/components/WherebyRoom.tsx
new file mode 100644
index 00000000..d670b4e2
--- /dev/null
+++ b/www/app/[roomName]/components/WherebyRoom.tsx
@@ -0,0 +1,101 @@
+"use client";
+
+import { useCallback, useEffect, useRef, RefObject } from "react";
+import { useRouter } from "next/navigation";
+import type { components } from "../../reflector-api";
+import { useAuth } from "../../lib/AuthProvider";
+import { getWherebyUrl, useWhereby } from "../../lib/wherebyClient";
+import { assertExistsAndNonEmptyString, NonEmptyString } from "../../lib/utils";
+import {
+ ConsentDialogButton as BaseConsentDialogButton,
+ useConsentDialog,
+ recordingTypeRequiresConsent,
+} from "../../lib/consent";
+
+type Meeting = components["schemas"]["Meeting"];
+
+interface WherebyRoomProps {
+ meeting: Meeting;
+}
+
+function WherebyConsentDialogButton({
+ meetingId,
+ wherebyRef,
+}: {
+ meetingId: NonEmptyString;
+ wherebyRef: React.RefObject;
+}) {
+ const previousFocusRef = useRef(null);
+
+ useEffect(() => {
+ const element = wherebyRef.current;
+ if (!element) return;
+
+ const handleWherebyReady = () => {
+ previousFocusRef.current = document.activeElement as HTMLElement;
+ };
+
+ element.addEventListener("ready", handleWherebyReady);
+
+ return () => {
+ element.removeEventListener("ready", handleWherebyReady);
+ if (previousFocusRef.current && document.activeElement === element) {
+ previousFocusRef.current.focus();
+ }
+ };
+ }, [wherebyRef]);
+
+ return ;
+}
+
+export default function WherebyRoom({ meeting }: WherebyRoomProps) {
+ const wherebyLoaded = useWhereby();
+ const wherebyRef = useRef(null);
+ const router = useRouter();
+ const auth = useAuth();
+ const status = auth.status;
+ const isAuthenticated = status === "authenticated";
+
+ const wherebyRoomUrl = getWherebyUrl(meeting);
+ const recordingType = meeting.recording_type;
+ const meetingId = meeting.id;
+
+ const isLoading = status === "loading";
+
+ const handleLeave = useCallback(() => {
+ router.push("/browse");
+ }, [router]);
+
+ useEffect(() => {
+ if (isLoading || !isAuthenticated || !wherebyRoomUrl || !wherebyLoaded)
+ return;
+
+ wherebyRef.current?.addEventListener("leave", handleLeave);
+
+ return () => {
+ wherebyRef.current?.removeEventListener("leave", handleLeave);
+ };
+ }, [handleLeave, wherebyRoomUrl, isLoading, isAuthenticated, wherebyLoaded]);
+
+ if (!wherebyRoomUrl || !wherebyLoaded) {
+ return null;
+ }
+
+ return (
+ <>
+
+ {recordingType &&
+ recordingTypeRequiresConsent(recordingType) &&
+ meetingId && (
+
+ )}
+ >
+ );
+}
diff --git a/www/app/[roomName]/page.tsx b/www/app/[roomName]/page.tsx
index 1aaca4c7..87651a50 100644
--- a/www/app/[roomName]/page.tsx
+++ b/www/app/[roomName]/page.tsx
@@ -1,3 +1,3 @@
-import Room from "./room";
+import RoomContainer from "./components/RoomContainer";
-export default Room;
+export default RoomContainer;
diff --git a/www/app/lib/consent/ConsentDialog.tsx b/www/app/lib/consent/ConsentDialog.tsx
new file mode 100644
index 00000000..488599d0
--- /dev/null
+++ b/www/app/lib/consent/ConsentDialog.tsx
@@ -0,0 +1,36 @@
+"use client";
+
+import { Box, Button, Text, VStack, HStack } from "@chakra-ui/react";
+import { CONSENT_DIALOG_TEXT } from "./constants";
+
+interface ConsentDialogProps {
+ onAccept: () => void;
+ onReject: () => void;
+}
+
+export function ConsentDialog({ onAccept, onReject }: ConsentDialogProps) {
+ return (
+
+
+
+ {CONSENT_DIALOG_TEXT.question}
+
+
+
+
+
+
+
+ );
+}
diff --git a/www/app/lib/consent/ConsentDialogButton.tsx b/www/app/lib/consent/ConsentDialogButton.tsx
new file mode 100644
index 00000000..2c1d084b
--- /dev/null
+++ b/www/app/lib/consent/ConsentDialogButton.tsx
@@ -0,0 +1,39 @@
+"use client";
+
+import { Button, Icon } from "@chakra-ui/react";
+import { FaBars } from "react-icons/fa6";
+import { useConsentDialog } from "./useConsentDialog";
+import {
+ CONSENT_BUTTON_TOP_OFFSET,
+ CONSENT_BUTTON_LEFT_OFFSET,
+ CONSENT_BUTTON_Z_INDEX,
+ CONSENT_DIALOG_TEXT,
+} from "./constants";
+
+interface ConsentDialogButtonProps {
+ meetingId: string;
+}
+
+export function ConsentDialogButton({ meetingId }: ConsentDialogButtonProps) {
+ const { showConsentModal, consentState, hasConsent, consentLoading } =
+ useConsentDialog(meetingId);
+
+ if (!consentState.ready || hasConsent(meetingId) || consentLoading) {
+ return null;
+ }
+
+ return (
+
+ );
+}
diff --git a/www/app/lib/consent/constants.ts b/www/app/lib/consent/constants.ts
new file mode 100644
index 00000000..41e7c7e1
--- /dev/null
+++ b/www/app/lib/consent/constants.ts
@@ -0,0 +1,12 @@
+export const CONSENT_BUTTON_TOP_OFFSET = "56px";
+export const CONSENT_BUTTON_LEFT_OFFSET = "8px";
+export const CONSENT_BUTTON_Z_INDEX = 1000;
+export const TOAST_CHECK_INTERVAL_MS = 100;
+
+export const CONSENT_DIALOG_TEXT = {
+ question:
+ "Can we have your permission to store this meeting's audio recording on our servers?",
+ acceptButton: "Yes, store the audio",
+ rejectButton: "No, delete after transcription",
+ triggerButton: "Meeting is being recorded",
+} as const;
diff --git a/www/app/lib/consent/index.ts b/www/app/lib/consent/index.ts
new file mode 100644
index 00000000..eabca8ac
--- /dev/null
+++ b/www/app/lib/consent/index.ts
@@ -0,0 +1,8 @@
+"use client";
+
+export { ConsentDialogButton } from "./ConsentDialogButton";
+export { ConsentDialog } from "./ConsentDialog";
+export { useConsentDialog } from "./useConsentDialog";
+export { recordingTypeRequiresConsent } from "./utils";
+export * from "./constants";
+export * from "./types";
diff --git a/www/app/lib/consent/types.ts b/www/app/lib/consent/types.ts
new file mode 100644
index 00000000..0bd15202
--- /dev/null
+++ b/www/app/lib/consent/types.ts
@@ -0,0 +1,9 @@
+export interface ConsentDialogResult {
+ showConsentModal: () => void;
+ consentState: {
+ ready: boolean;
+ consentAnsweredForMeetings?: Set;
+ };
+ hasConsent: (meetingId: string) => boolean;
+ consentLoading: boolean;
+}
diff --git a/www/app/lib/consent/useConsentDialog.tsx b/www/app/lib/consent/useConsentDialog.tsx
new file mode 100644
index 00000000..2a5c0ab3
--- /dev/null
+++ b/www/app/lib/consent/useConsentDialog.tsx
@@ -0,0 +1,109 @@
+"use client";
+
+import { useCallback, useState, useEffect, useRef } from "react";
+import { toaster } from "../../components/ui/toaster";
+import { useRecordingConsent } from "../../recordingConsentContext";
+import { useMeetingAudioConsent } from "../apiHooks";
+import { ConsentDialog } from "./ConsentDialog";
+import { TOAST_CHECK_INTERVAL_MS } from "./constants";
+import type { ConsentDialogResult } from "./types";
+
+export function useConsentDialog(meetingId: string): ConsentDialogResult {
+ const { state: consentState, touch, hasConsent } = useRecordingConsent();
+ const [modalOpen, setModalOpen] = useState(false);
+ const audioConsentMutation = useMeetingAudioConsent();
+ const intervalRef = useRef(null);
+ const keydownHandlerRef = useRef<((event: KeyboardEvent) => void) | null>(
+ null,
+ );
+
+ useEffect(() => {
+ return () => {
+ if (intervalRef.current) {
+ clearInterval(intervalRef.current);
+ intervalRef.current = null;
+ }
+ if (keydownHandlerRef.current) {
+ document.removeEventListener("keydown", keydownHandlerRef.current);
+ keydownHandlerRef.current = null;
+ }
+ };
+ }, []);
+
+ const handleConsent = useCallback(
+ async (given: boolean) => {
+ try {
+ await audioConsentMutation.mutateAsync({
+ params: {
+ path: { meeting_id: meetingId },
+ },
+ body: {
+ consent_given: given,
+ },
+ });
+
+ touch(meetingId);
+ } catch (error) {
+ console.error("Error submitting consent:", error);
+ }
+ },
+ [audioConsentMutation, touch, meetingId],
+ );
+
+ const showConsentModal = useCallback(() => {
+ if (modalOpen) return;
+
+ setModalOpen(true);
+
+ const toastId = toaster.create({
+ placement: "top",
+ duration: null,
+ render: ({ dismiss }) => (
+ {
+ handleConsent(true);
+ dismiss();
+ }}
+ onReject={() => {
+ handleConsent(false);
+ dismiss();
+ }}
+ />
+ ),
+ });
+
+ const handleKeyDown = (event: KeyboardEvent) => {
+ if (event.key === "Escape") {
+ toastId.then((id) => toaster.dismiss(id));
+ }
+ };
+
+ keydownHandlerRef.current = handleKeyDown;
+ document.addEventListener("keydown", handleKeyDown);
+
+ toastId.then((id) => {
+ intervalRef.current = setInterval(() => {
+ if (!toaster.isActive(id)) {
+ setModalOpen(false);
+
+ if (intervalRef.current) {
+ clearInterval(intervalRef.current);
+ intervalRef.current = null;
+ }
+
+ if (keydownHandlerRef.current) {
+ document.removeEventListener("keydown", keydownHandlerRef.current);
+ keydownHandlerRef.current = null;
+ }
+ }
+ }, TOAST_CHECK_INTERVAL_MS);
+ });
+ }, [handleConsent, modalOpen]);
+
+ return {
+ showConsentModal,
+ consentState,
+ hasConsent,
+ consentLoading: audioConsentMutation.isPending,
+ };
+}
diff --git a/www/app/lib/consent/utils.ts b/www/app/lib/consent/utils.ts
new file mode 100644
index 00000000..146bdd68
--- /dev/null
+++ b/www/app/lib/consent/utils.ts
@@ -0,0 +1,13 @@
+import type { components } from "../../reflector-api";
+
+type Meeting = components["schemas"]["Meeting"];
+
+/**
+ * Determines if a meeting's recording type requires user consent.
+ * Currently only "cloud" recordings require consent.
+ */
+export function recordingTypeRequiresConsent(
+ recordingType: Meeting["recording_type"],
+): boolean {
+ return recordingType === "cloud";
+}
diff --git a/www/app/lib/useLoginRequiredPages.ts b/www/app/lib/useLoginRequiredPages.ts
index 37ee96b1..d0dee1b6 100644
--- a/www/app/lib/useLoginRequiredPages.ts
+++ b/www/app/lib/useLoginRequiredPages.ts
@@ -3,6 +3,7 @@ import { PROTECTED_PAGES } from "./auth";
import { usePathname } from "next/navigation";
import { useAuth } from "./AuthProvider";
import { useEffect } from "react";
+import { featureEnabled } from "./features";
const HOME = "/" as const;
@@ -13,7 +14,9 @@ export const useLoginRequiredPages = () => {
const isNotLoggedIn = auth.status === "unauthenticated";
// safety
const isLastDestination = pathname === HOME;
- const shouldRedirect = isNotLoggedIn && isProtected && !isLastDestination;
+ const requireLogin = featureEnabled("requireLogin");
+ const shouldRedirect =
+ requireLogin && isNotLoggedIn && isProtected && !isLastDestination;
useEffect(() => {
if (!shouldRedirect) return;
// on the backend, the redirect goes straight to the auth provider, but we don't have it because it's hidden inside next-auth middleware
diff --git a/www/app/reflector-api.d.ts b/www/app/reflector-api.d.ts
index 1dc92f2b..9b9582ba 100644
--- a/www/app/reflector-api.d.ts
+++ b/www/app/reflector-api.d.ts
@@ -696,6 +696,26 @@ export interface paths {
patch?: never;
trace?: never;
};
+ "/v1/webhook": {
+ parameters: {
+ query?: never;
+ header?: never;
+ path?: never;
+ cookie?: never;
+ };
+ get?: never;
+ put?: never;
+ /**
+ * Webhook
+ * @description Handle Daily webhook events.
+ */
+ post: operations["v1_webhook"];
+ delete?: never;
+ options?: never;
+ head?: never;
+ patch?: never;
+ trace?: never;
+ };
}
export type webhooks = Record;
export interface components {
@@ -852,6 +872,8 @@ export interface components {
* @default false
*/
ics_enabled: boolean;
+ /** Platform */
+ platform?: ("whereby" | "daily") | null;
};
/** CreateRoomMeeting */
CreateRoomMeeting: {
@@ -877,6 +899,22 @@ export interface components {
target_language: string;
source_kind?: components["schemas"]["SourceKind"] | null;
};
+ /**
+ * DailyWebhookEvent
+ * @description Daily webhook event structure.
+ */
+ DailyWebhookEvent: {
+ /** Type */
+ type: string;
+ /** Id */
+ id: string;
+ /** Ts */
+ ts: number;
+ /** Data */
+ data: {
+ [key: string]: unknown;
+ };
+ };
/** DeletionStatus */
DeletionStatus: {
/** Status */
@@ -1193,6 +1231,12 @@ export interface components {
calendar_metadata?: {
[key: string]: unknown;
} | null;
+ /**
+ * Platform
+ * @default whereby
+ * @enum {string}
+ */
+ platform: "whereby" | "daily";
};
/** MeetingConsentRequest */
MeetingConsentRequest: {
@@ -1279,6 +1323,12 @@ export interface components {
ics_last_sync?: string | null;
/** Ics Last Etag */
ics_last_etag?: string | null;
+ /**
+ * Platform
+ * @default whereby
+ * @enum {string}
+ */
+ platform: "whereby" | "daily";
};
/** RoomDetails */
RoomDetails: {
@@ -1325,6 +1375,12 @@ export interface components {
ics_last_sync?: string | null;
/** Ics Last Etag */
ics_last_etag?: string | null;
+ /**
+ * Platform
+ * @default whereby
+ * @enum {string}
+ */
+ platform: "whereby" | "daily";
/** Webhook Url */
webhook_url: string | null;
/** Webhook Secret */
@@ -1505,6 +1561,8 @@ export interface components {
ics_fetch_interval?: number | null;
/** Ics Enabled */
ics_enabled?: boolean | null;
+ /** Platform */
+ platform?: ("whereby" | "daily") | null;
};
/** UpdateTranscript */
UpdateTranscript: {
@@ -3191,4 +3249,37 @@ export interface operations {
};
};
};
+ v1_webhook: {
+ parameters: {
+ query?: never;
+ header?: never;
+ path?: never;
+ cookie?: never;
+ };
+ requestBody: {
+ content: {
+ "application/json": components["schemas"]["DailyWebhookEvent"];
+ };
+ };
+ responses: {
+ /** @description Successful Response */
+ 200: {
+ headers: {
+ [name: string]: unknown;
+ };
+ content: {
+ "application/json": unknown;
+ };
+ };
+ /** @description Validation Error */
+ 422: {
+ headers: {
+ [name: string]: unknown;
+ };
+ content: {
+ "application/json": components["schemas"]["HTTPValidationError"];
+ };
+ };
+ };
+ };
}
diff --git a/www/package.json b/www/package.json
index 5169dbe2..f4412db0 100644
--- a/www/package.json
+++ b/www/package.json
@@ -14,6 +14,7 @@
},
"dependencies": {
"@chakra-ui/react": "^3.24.2",
+ "@daily-co/daily-js": "^0.84.0",
"@emotion/react": "^11.14.0",
"@fortawesome/fontawesome-svg-core": "^6.4.0",
"@fortawesome/free-solid-svg-icons": "^6.4.0",
diff --git a/www/pnpm-lock.yaml b/www/pnpm-lock.yaml
index 6c0a3d83..92667b7e 100644
--- a/www/pnpm-lock.yaml
+++ b/www/pnpm-lock.yaml
@@ -10,6 +10,9 @@ importers:
"@chakra-ui/react":
specifier: ^3.24.2
version: 3.24.2(@emotion/react@11.14.0(@types/react@18.2.20)(react@18.3.1))(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
+ "@daily-co/daily-js":
+ specifier: ^0.84.0
+ version: 0.84.0
"@emotion/react":
specifier: ^11.14.0
version: 11.14.0(@types/react@18.2.20)(react@18.3.1)
@@ -487,6 +490,13 @@ packages:
}
engines: { node: ">=12" }
+ "@daily-co/daily-js@0.84.0":
+ resolution:
+ {
+ integrity: sha512-/ynXrMDDkRXhLlHxiFNf9QU5yw4ZGPr56wNARgja/Tiid71UIniundTavCNF5cMb2I1vNoMh7oEJ/q8stg/V7g==,
+ }
+ engines: { node: ">=10.0.0" }
+
"@emnapi/core@1.4.5":
resolution:
{
@@ -2293,6 +2303,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry-internal/browser-utils@8.55.0":
+ resolution:
+ {
+ integrity: sha512-ROgqtQfpH/82AQIpESPqPQe0UyWywKJsmVIqi3c5Fh+zkds5LUxnssTj3yNd1x+kxaPDVB023jAP+3ibNgeNDw==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry-internal/feedback@10.11.0":
resolution:
{
@@ -2300,6 +2317,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry-internal/feedback@8.55.0":
+ resolution:
+ {
+ integrity: sha512-cP3BD/Q6pquVQ+YL+rwCnorKuTXiS9KXW8HNKu4nmmBAyf7urjs+F6Hr1k9MXP5yQ8W3yK7jRWd09Yu6DHWOiw==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry-internal/replay-canvas@10.11.0":
resolution:
{
@@ -2307,6 +2331,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry-internal/replay-canvas@8.55.0":
+ resolution:
+ {
+ integrity: sha512-nIkfgRWk1091zHdu4NbocQsxZF1rv1f7bbp3tTIlZYbrH62XVZosx5iHAuZG0Zc48AETLE7K4AX9VGjvQj8i9w==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry-internal/replay@10.11.0":
resolution:
{
@@ -2314,6 +2345,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry-internal/replay@8.55.0":
+ resolution:
+ {
+ integrity: sha512-roCDEGkORwolxBn8xAKedybY+Jlefq3xYmgN2fr3BTnsXjSYOPC7D1/mYqINBat99nDtvgFvNfRcZPiwwZ1hSw==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry/babel-plugin-component-annotate@4.3.0":
resolution:
{
@@ -2328,6 +2366,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry/browser@8.55.0":
+ resolution:
+ {
+ integrity: sha512-1A31mCEWCjaMxJt6qGUK+aDnLDcK6AwLAZnqpSchNysGni1pSn1RWSmk9TBF8qyTds5FH8B31H480uxMPUJ7Cw==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry/bundler-plugin-core@4.3.0":
resolution:
{
@@ -2421,6 +2466,13 @@ packages:
}
engines: { node: ">=18" }
+ "@sentry/core@8.55.0":
+ resolution:
+ {
+ integrity: sha512-6g7jpbefjHYs821Z+EBJ8r4Z7LT5h80YSWRJaylGS4nW5W5Z2KXzpdnyFarv37O7QjauzVC2E+PABmpkw5/JGA==,
+ }
+ engines: { node: ">=14.18" }
+
"@sentry/nextjs@10.11.0":
resolution:
{
@@ -4029,6 +4081,12 @@ packages:
}
engines: { node: ">=8" }
+ bowser@2.12.1:
+ resolution:
+ {
+ integrity: sha512-z4rE2Gxh7tvshQ4hluIT7XcFrgLIQaw9X3A+kTTRdovCz5PMukm/0QC/BKSYPj3omF5Qfypn9O/c5kgpmvYUCw==,
+ }
+
brace-expansion@1.1.12:
resolution:
{
@@ -9288,6 +9346,14 @@ snapshots:
"@jridgewell/trace-mapping": 0.3.9
optional: true
+ "@daily-co/daily-js@0.84.0":
+ dependencies:
+ "@babel/runtime": 7.28.2
+ "@sentry/browser": 8.55.0
+ bowser: 2.12.1
+ dequal: 2.0.3
+ events: 3.3.0
+
"@emnapi/core@1.4.5":
dependencies:
"@emnapi/wasi-threads": 1.0.4
@@ -10506,20 +10572,38 @@ snapshots:
dependencies:
"@sentry/core": 10.11.0
+ "@sentry-internal/browser-utils@8.55.0":
+ dependencies:
+ "@sentry/core": 8.55.0
+
"@sentry-internal/feedback@10.11.0":
dependencies:
"@sentry/core": 10.11.0
+ "@sentry-internal/feedback@8.55.0":
+ dependencies:
+ "@sentry/core": 8.55.0
+
"@sentry-internal/replay-canvas@10.11.0":
dependencies:
"@sentry-internal/replay": 10.11.0
"@sentry/core": 10.11.0
+ "@sentry-internal/replay-canvas@8.55.0":
+ dependencies:
+ "@sentry-internal/replay": 8.55.0
+ "@sentry/core": 8.55.0
+
"@sentry-internal/replay@10.11.0":
dependencies:
"@sentry-internal/browser-utils": 10.11.0
"@sentry/core": 10.11.0
+ "@sentry-internal/replay@8.55.0":
+ dependencies:
+ "@sentry-internal/browser-utils": 8.55.0
+ "@sentry/core": 8.55.0
+
"@sentry/babel-plugin-component-annotate@4.3.0": {}
"@sentry/browser@10.11.0":
@@ -10530,6 +10614,14 @@ snapshots:
"@sentry-internal/replay-canvas": 10.11.0
"@sentry/core": 10.11.0
+ "@sentry/browser@8.55.0":
+ dependencies:
+ "@sentry-internal/browser-utils": 8.55.0
+ "@sentry-internal/feedback": 8.55.0
+ "@sentry-internal/replay": 8.55.0
+ "@sentry-internal/replay-canvas": 8.55.0
+ "@sentry/core": 8.55.0
+
"@sentry/bundler-plugin-core@4.3.0":
dependencies:
"@babel/core": 7.28.3
@@ -10590,6 +10682,8 @@ snapshots:
"@sentry/core@10.11.0": {}
+ "@sentry/core@8.55.0": {}
+
"@sentry/nextjs@10.11.0(@opentelemetry/context-async-hooks@2.1.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.1.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.1.0(@opentelemetry/api@1.9.0))(next@15.5.3(@babel/core@7.28.3)(@opentelemetry/api@1.9.0)(babel-plugin-macros@3.1.0)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)(sass@1.90.0))(react@18.3.1)(webpack@5.101.3)":
dependencies:
"@opentelemetry/api": 1.9.0
@@ -11967,6 +12061,8 @@ snapshots:
binary-extensions@2.3.0: {}
+ bowser@2.12.1: {}
+
brace-expansion@1.1.12:
dependencies:
balanced-match: 1.0.2