diff --git a/server/docs/video-platforms/README.md b/server/docs/video-platforms/README.md new file mode 100644 index 00000000..45a615c3 --- /dev/null +++ b/server/docs/video-platforms/README.md @@ -0,0 +1,234 @@ +# Reflector Architecture: Whereby + Daily.co Recording Storage + +## System Overview + +```mermaid +graph TB + subgraph "Actors" + APP[Our App
Reflector] + WHEREBY[Whereby Service
External] + DAILY[Daily.co Service
External] + end + + subgraph "AWS S3 Buckets" + TRANSCRIPT_BUCKET[Transcript Bucket
reflector-transcripts
Output: Processed MP3s] + WHEREBY_BUCKET[Whereby Bucket
reflector-whereby-recordings
Input: Raw MP4s] + DAILY_BUCKET[Daily.co Bucket
reflector-dailyco-recordings
Input: Raw WebM tracks] + end + + subgraph "AWS Infrastructure" + SQS[SQS Queue
Whereby notifications] + end + + subgraph "Database" + DB[(PostgreSQL
Recordings, Transcripts, Meetings)] + end + + APP -->|Write processed| TRANSCRIPT_BUCKET + APP -->|Read/Delete| WHEREBY_BUCKET + APP -->|Read/Delete| DAILY_BUCKET + APP -->|Poll| SQS + APP -->|Store metadata| DB + + WHEREBY -->|Write recordings| WHEREBY_BUCKET + WHEREBY_BUCKET -->|S3 Event| SQS + WHEREBY -->|Participant webhooks
room.client.joined/left| APP + + DAILY -->|Write recordings| DAILY_BUCKET + DAILY -->|Recording webhook
recording.ready-to-download| APP +``` + +**Note on Webhook vs S3 Event for Recording Processing:** +- **Whereby**: Uses S3 Events → SQS for recording availability (S3 as source of truth, no race conditions) +- **Daily.co**: Uses webhooks for recording availability (more immediate, built-in reliability) +- **Both**: Use webhooks for participant tracking (real-time updates) + +## Credentials & Permissions + +```mermaid +graph LR + subgraph "Master Credentials" + MASTER[TRANSCRIPT_STORAGE_AWS_*
Access Key ID + Secret] + end + + subgraph "Whereby Upload Credentials" + WHEREBY_CREDS[AWS_WHEREBY_ACCESS_KEY_*
Access Key ID + Secret] + end + + subgraph "Daily.co Upload Role" + DAILY_ROLE[DAILY_STORAGE_AWS_ROLE_ARN
IAM Role ARN] + end + + subgraph "Our App Uses" + MASTER -->|Read/Write/Delete| TRANSCRIPT_BUCKET[Transcript Bucket] + MASTER -->|Read/Delete| WHEREBY_BUCKET[Whereby Bucket] + MASTER -->|Read/Delete| DAILY_BUCKET[Daily.co Bucket] + MASTER -->|Poll/Delete| SQS[SQS Queue] + end + + subgraph "We Give To Services" + WHEREBY_CREDS -->|Passed in API call| WHEREBY_SERVICE[Whereby Service] + WHEREBY_SERVICE -->|Write Only| WHEREBY_BUCKET + + DAILY_ROLE -->|Passed in API call| DAILY_SERVICE[Daily.co Service] + DAILY_SERVICE -->|Assume Role| DAILY_ROLE + DAILY_SERVICE -->|Write Only| DAILY_BUCKET + end +``` + +# Video Platform Recording Integration + +This document explains how Reflector receives and identifies multitrack audio recordings from different video platforms. + +## Platform Comparison + +| Platform | Delivery Method | Track Identification | +|----------|----------------|---------------------| +| **Daily.co** | Webhook | Explicit track list in payload | +| **Whereby** | SQS (S3 notifications) | Single file per notification | + +--- + +## Daily.co (Webhook-based) + +Daily.co uses **webhooks** to notify Reflector when recordings are ready. + +### How It Works + +1. **Daily.co sends webhook** when recording is ready + - Event type: `recording.ready-to-download` + - Endpoint: `/v1/daily/webhook` (`reflector/views/daily.py:46-102`) + +2. **Webhook payload explicitly includes track list**: +```json +{ + "recording_id": "7443ee0a-dab1-40eb-b316-33d6c0d5ff88", + "room_name": "daily-20251020193458", + "tracks": [ + { + "type": "audio", + "s3Key": "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922", + "size": 831843 + }, + { + "type": "audio", + "s3Key": "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823", + "size": 408438 + }, + { + "type": "video", + "s3Key": "monadical/daily-20251020193458/...-video.webm", + "size": 30000000 + } + ] +} +``` + +3. **System extracts audio tracks** (`daily.py:211`): +```python +track_keys = [t.s3Key for t in tracks if t.type == "audio"] +``` + +4. **Triggers multitrack processing** (`daily.py:213-218`): +```python +process_multitrack_recording.delay( + bucket_name=bucket_name, # reflector-dailyco-local + room_name=room_name, # daily-20251020193458 + recording_id=recording_id, # 7443ee0a-dab1-40eb-b316-33d6c0d5ff88 + track_keys=track_keys # Only audio s3Keys +) +``` + +### Key Advantage: No Ambiguity + +Even though multiple meetings may share the same S3 bucket/folder (`monadical/`), **there's no ambiguity** because: +- Each webhook payload contains the exact `s3Key` list for that specific `recording_id` +- No need to scan folders or guess which files belong together +- Each track's s3Key includes the room timestamp subfolder (e.g., `daily-20251020193458/`) + +The room name includes timestamp (`daily-20251020193458`) to keep recordings organized, but **the webhook's explicit track list is what prevents mixing files from different meetings**. + +### Track Timeline Extraction + +Daily.co provides timing information in two places: + +**1. PyAV WebM Metadata (current approach)**: +```python +# Read from WebM container stream metadata +stream.start_time = 8.130s # Meeting-relative timing +``` + +**2. Filename Timestamps (alternative approach, commit 3bae9076)**: +``` +Filename format: {recording_start_ts}-{uuid}-cam-audio-{track_start_ts}.webm +Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm + +Parse timestamps: +- recording_start_ts: 1760988935484 (Unix ms) +- track_start_ts: 1760988935922 (Unix ms) +- offset: (1760988935922 - 1760988935484) / 1000 = 0.438s +``` + +**Time Difference (PyAV vs Filename)**: +``` +Track 0: + Filename offset: 438ms + PyAV metadata: 229ms + Difference: 209ms + +Track 1: + Filename offset: 8339ms + PyAV metadata: 8130ms + Difference: 209ms +``` + +**Consistent 209ms delta** suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata). + +**Current implementation uses PyAV metadata** because: +- More accurate (represents when audio actually started) +- Padding BEFORE transcription produces correct Whisper timestamps automatically +- No manual offset adjustment needed during transcript merge + +### Why Re-encoding During Padding + +Padding coincidentally involves re-encoding, which is important for Daily.co + Whisper: + +**Problem:** Daily.co skips frames in recordings when microphone is muted or paused +- WebM containers have gaps where audio frames should be +- Whisper doesn't understand these gaps and produces incorrect timestamps +- Example: 5s of audio with 2s muted → file has frames only for 3s, Whisper thinks duration is 3s + +**Solution:** Re-encoding via PyAV filter graph (`adelay` + `aresample`) +- Restores missing frames as silence +- Produces continuous audio stream without gaps +- Whisper now sees correct duration and produces accurate timestamps + +**Why combined with padding:** +- Already re-encoding for padding (adding initial silence) +- More performant to do both operations in single PyAV pipeline +- Padded values needed for mixdown anyway (creating final MP3) + +Implementation: `main_multitrack_pipeline.py:_apply_audio_padding_streaming()` + +--- + +## Whereby (SQS-based) + +Whereby uses **AWS SQS** (via S3 notifications) to notify Reflector when files are uploaded. + +### How It Works + +1. **Whereby uploads recording** to S3 +2. **S3 sends notification** to SQS queue (one notification per file) +3. **Reflector polls SQS queue** (`worker/process.py:process_messages()`) +4. **System processes single file** (`worker/process.py:process_recording()`) + +### Key Difference from Daily.co + +**Whereby (SQS):** System receives S3 notification "file X was created" - only knows about one file at a time, would need to scan folder to find related files + +**Daily.co (Webhook):** Daily explicitly tells system which files belong together in the webhook payload + +--- + + diff --git a/server/env.example b/server/env.example index ff0f4211..7375bf0a 100644 --- a/server/env.example +++ b/server/env.example @@ -71,3 +71,30 @@ DIARIZATION_URL=https://monadical-sas--reflector-diarizer-web.modal.run ## Sentry DSN configuration #SENTRY_DSN= + +## ======================================================= +## Video Platform Configuration +## ======================================================= + +## Whereby +#WHEREBY_API_KEY=your-whereby-api-key +#WHEREBY_WEBHOOK_SECRET=your-whereby-webhook-secret +#WHEREBY_STORAGE_AWS_ACCESS_KEY_ID=your-aws-key +#WHEREBY_STORAGE_AWS_SECRET_ACCESS_KEY=your-aws-secret +#AWS_PROCESS_RECORDING_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/... + +## Daily.co +#DAILY_API_KEY=your-daily-api-key +#DAILY_WEBHOOK_SECRET=your-daily-webhook-secret +#DAILY_SUBDOMAIN=your-subdomain +#DAILY_WEBHOOK_UUID= # Auto-populated by recreate_daily_webhook.py script +#DAILYCO_STORAGE_AWS_ROLE_ARN=... # IAM role ARN for Daily.co S3 access +#DAILYCO_STORAGE_AWS_BUCKET_NAME=reflector-dailyco +#DAILYCO_STORAGE_AWS_REGION=us-west-2 + +## Whereby (optional separate bucket) +#WHEREBY_STORAGE_AWS_BUCKET_NAME=reflector-whereby +#WHEREBY_STORAGE_AWS_REGION=us-east-1 + +## Platform Configuration +#DEFAULT_VIDEO_PLATFORM=whereby # Default platform for new rooms diff --git a/server/migrations/versions/1e49625677e4_add_platform_support.py b/server/migrations/versions/1e49625677e4_add_platform_support.py new file mode 100644 index 00000000..fa403f92 --- /dev/null +++ b/server/migrations/versions/1e49625677e4_add_platform_support.py @@ -0,0 +1,50 @@ +"""add_platform_support + +Revision ID: 1e49625677e4 +Revises: 9e3f7b2a4c8e +Create Date: 2025-10-08 13:17:29.943612 + +""" + +from typing import Sequence, Union + +import sqlalchemy as sa +from alembic import op + +# revision identifiers, used by Alembic. +revision: str = "1e49625677e4" +down_revision: Union[str, None] = "9e3f7b2a4c8e" +branch_labels: Union[str, Sequence[str], None] = None +depends_on: Union[str, Sequence[str], None] = None + + +def upgrade() -> None: + """Add platform field with default 'whereby' for backward compatibility.""" + with op.batch_alter_table("room", schema=None) as batch_op: + batch_op.add_column( + sa.Column( + "platform", + sa.String(), + nullable=True, + server_default=None, + ) + ) + + with op.batch_alter_table("meeting", schema=None) as batch_op: + batch_op.add_column( + sa.Column( + "platform", + sa.String(), + nullable=False, + server_default="whereby", + ) + ) + + +def downgrade() -> None: + """Remove platform field.""" + with op.batch_alter_table("meeting", schema=None) as batch_op: + batch_op.drop_column("platform") + + with op.batch_alter_table("room", schema=None) as batch_op: + batch_op.drop_column("platform") diff --git a/server/migrations/versions/f8294b31f022_add_track_keys.py b/server/migrations/versions/f8294b31f022_add_track_keys.py new file mode 100644 index 00000000..7eda6ccc --- /dev/null +++ b/server/migrations/versions/f8294b31f022_add_track_keys.py @@ -0,0 +1,28 @@ +"""add_track_keys + +Revision ID: f8294b31f022 +Revises: 1e49625677e4 +Create Date: 2025-10-27 18:52:17.589167 + +""" + +from typing import Sequence, Union + +import sqlalchemy as sa +from alembic import op + +# revision identifiers, used by Alembic. +revision: str = "f8294b31f022" +down_revision: Union[str, None] = "1e49625677e4" +branch_labels: Union[str, Sequence[str], None] = None +depends_on: Union[str, Sequence[str], None] = None + + +def upgrade() -> None: + with op.batch_alter_table("recording", schema=None) as batch_op: + batch_op.add_column(sa.Column("track_keys", sa.JSON(), nullable=True)) + + +def downgrade() -> None: + with op.batch_alter_table("recording", schema=None) as batch_op: + batch_op.drop_column("track_keys") diff --git a/server/reflector/app.py b/server/reflector/app.py index a15934f5..2ca76acb 100644 --- a/server/reflector/app.py +++ b/server/reflector/app.py @@ -12,6 +12,7 @@ from reflector.events import subscribers_shutdown, subscribers_startup from reflector.logger import logger from reflector.metrics import metrics_init from reflector.settings import settings +from reflector.views.daily import router as daily_router from reflector.views.meetings import router as meetings_router from reflector.views.rooms import router as rooms_router from reflector.views.rtc_offer import router as rtc_offer_router @@ -96,6 +97,7 @@ app.include_router(user_api_keys_router, prefix="/v1") app.include_router(user_ws_router, prefix="/v1") app.include_router(zulip_router, prefix="/v1") app.include_router(whereby_router, prefix="/v1") +app.include_router(daily_router, prefix="/v1/daily") add_pagination(app) # prepare celery diff --git a/server/reflector/db/meetings.py b/server/reflector/db/meetings.py index 12a0c187..6912b285 100644 --- a/server/reflector/db/meetings.py +++ b/server/reflector/db/meetings.py @@ -7,7 +7,10 @@ from sqlalchemy.dialects.postgresql import JSONB from reflector.db import get_database, metadata from reflector.db.rooms import Room +from reflector.schemas.platform import WHEREBY_PLATFORM, Platform from reflector.utils import generate_uuid4 +from reflector.utils.string import assert_equal +from reflector.video_platforms.factory import get_platform meetings = sa.Table( "meeting", @@ -55,6 +58,12 @@ meetings = sa.Table( ), ), sa.Column("calendar_metadata", JSONB), + sa.Column( + "platform", + sa.String, + nullable=False, + server_default=assert_equal(WHEREBY_PLATFORM, "whereby"), + ), sa.Index("idx_meeting_room_id", "room_id"), sa.Index("idx_meeting_calendar_event", "calendar_event_id"), ) @@ -94,13 +103,14 @@ class Meeting(BaseModel): is_locked: bool = False room_mode: Literal["normal", "group"] = "normal" recording_type: Literal["none", "local", "cloud"] = "cloud" - recording_trigger: Literal[ + recording_trigger: Literal[ # whereby-specific "none", "prompt", "automatic", "automatic-2nd-participant" ] = "automatic-2nd-participant" num_clients: int = 0 is_active: bool = True calendar_event_id: str | None = None calendar_metadata: dict[str, Any] | None = None + platform: Platform = WHEREBY_PLATFORM class MeetingController: @@ -130,6 +140,7 @@ class MeetingController: recording_trigger=room.recording_trigger, calendar_event_id=calendar_event_id, calendar_metadata=calendar_metadata, + platform=get_platform(room.platform), ) query = meetings.insert().values(**meeting.model_dump()) await get_database().execute(query) @@ -137,7 +148,8 @@ class MeetingController: async def get_all_active(self) -> list[Meeting]: query = meetings.select().where(meetings.c.is_active) - return await get_database().fetch_all(query) + results = await get_database().fetch_all(query) + return [Meeting(**result) for result in results] async def get_by_room_name( self, @@ -147,16 +159,14 @@ class MeetingController: Get a meeting by room name. For backward compatibility, returns the most recent meeting. """ - end_date = getattr(meetings.c, "end_date") query = ( meetings.select() .where(meetings.c.room_name == room_name) - .order_by(end_date.desc()) + .order_by(meetings.c.end_date.desc()) ) result = await get_database().fetch_one(query) if not result: return None - return Meeting(**result) async def get_active(self, room: Room, current_time: datetime) -> Meeting | None: @@ -179,7 +189,6 @@ class MeetingController: result = await get_database().fetch_one(query) if not result: return None - return Meeting(**result) async def get_all_active_for_room( @@ -219,17 +228,27 @@ class MeetingController: return None return Meeting(**result) - async def get_by_id(self, meeting_id: str, **kwargs) -> Meeting | None: + async def get_by_id( + self, meeting_id: str, room: Room | None = None + ) -> Meeting | None: query = meetings.select().where(meetings.c.id == meeting_id) + + if room: + query = query.where(meetings.c.room_id == room.id) + result = await get_database().fetch_one(query) if not result: return None return Meeting(**result) - async def get_by_calendar_event(self, calendar_event_id: str) -> Meeting | None: + async def get_by_calendar_event( + self, calendar_event_id: str, room: Room + ) -> Meeting | None: query = meetings.select().where( meetings.c.calendar_event_id == calendar_event_id ) + if room: + query = query.where(meetings.c.room_id == room.id) result = await get_database().fetch_one(query) if not result: return None @@ -239,6 +258,28 @@ class MeetingController: query = meetings.update().where(meetings.c.id == meeting_id).values(**kwargs) await get_database().execute(query) + async def increment_num_clients(self, meeting_id: str) -> None: + """Atomically increment participant count.""" + query = ( + meetings.update() + .where(meetings.c.id == meeting_id) + .values(num_clients=meetings.c.num_clients + 1) + ) + await get_database().execute(query) + + async def decrement_num_clients(self, meeting_id: str) -> None: + """Atomically decrement participant count (min 0).""" + query = ( + meetings.update() + .where(meetings.c.id == meeting_id) + .values( + num_clients=sa.case( + (meetings.c.num_clients > 0, meetings.c.num_clients - 1), else_=0 + ) + ) + ) + await get_database().execute(query) + class MeetingConsentController: async def get_by_meeting_id(self, meeting_id: str) -> list[MeetingConsent]: diff --git a/server/reflector/db/recordings.py b/server/reflector/db/recordings.py index 0d05790d..bde4afa5 100644 --- a/server/reflector/db/recordings.py +++ b/server/reflector/db/recordings.py @@ -21,6 +21,7 @@ recordings = sa.Table( server_default="pending", ), sa.Column("meeting_id", sa.String), + sa.Column("track_keys", sa.JSON, nullable=True), sa.Index("idx_recording_meeting_id", "meeting_id"), ) @@ -28,10 +29,13 @@ recordings = sa.Table( class Recording(BaseModel): id: str = Field(default_factory=generate_uuid4) bucket_name: str + # for single-track object_key: str recorded_at: datetime status: Literal["pending", "processing", "completed", "failed"] = "pending" meeting_id: str | None = None + # for multitrack reprocessing + track_keys: list[str] | None = None class RecordingController: diff --git a/server/reflector/db/rooms.py b/server/reflector/db/rooms.py index 396c818a..1081ac38 100644 --- a/server/reflector/db/rooms.py +++ b/server/reflector/db/rooms.py @@ -9,6 +9,7 @@ from pydantic import BaseModel, Field from sqlalchemy.sql import false, or_ from reflector.db import get_database, metadata +from reflector.schemas.platform import Platform from reflector.utils import generate_uuid4 rooms = sqlalchemy.Table( @@ -50,6 +51,12 @@ rooms = sqlalchemy.Table( ), sqlalchemy.Column("ics_last_sync", sqlalchemy.DateTime(timezone=True)), sqlalchemy.Column("ics_last_etag", sqlalchemy.Text), + sqlalchemy.Column( + "platform", + sqlalchemy.String, + nullable=True, + server_default=None, + ), sqlalchemy.Index("idx_room_is_shared", "is_shared"), sqlalchemy.Index("idx_room_ics_enabled", "ics_enabled"), ) @@ -66,7 +73,7 @@ class Room(BaseModel): is_locked: bool = False room_mode: Literal["normal", "group"] = "normal" recording_type: Literal["none", "local", "cloud"] = "cloud" - recording_trigger: Literal[ + recording_trigger: Literal[ # whereby-specific "none", "prompt", "automatic", "automatic-2nd-participant" ] = "automatic-2nd-participant" is_shared: bool = False @@ -77,6 +84,7 @@ class Room(BaseModel): ics_enabled: bool = False ics_last_sync: datetime | None = None ics_last_etag: str | None = None + platform: Platform | None = None class RoomController: @@ -130,6 +138,7 @@ class RoomController: ics_url: str | None = None, ics_fetch_interval: int = 300, ics_enabled: bool = False, + platform: Platform | None = None, ): """ Add a new room @@ -153,6 +162,7 @@ class RoomController: ics_url=ics_url, ics_fetch_interval=ics_fetch_interval, ics_enabled=ics_enabled, + platform=platform, ) query = rooms.insert().values(**room.model_dump()) try: diff --git a/server/reflector/db/transcripts.py b/server/reflector/db/transcripts.py index b82e4fe1..f9c3c057 100644 --- a/server/reflector/db/transcripts.py +++ b/server/reflector/db/transcripts.py @@ -21,7 +21,7 @@ from reflector.db.utils import is_postgresql from reflector.logger import logger from reflector.processors.types import Word as ProcessorWord from reflector.settings import settings -from reflector.storage import get_recordings_storage, get_transcripts_storage +from reflector.storage import get_transcripts_storage from reflector.utils import generate_uuid4 from reflector.utils.webvtt import topics_to_webvtt @@ -186,6 +186,7 @@ class TranscriptParticipant(BaseModel): id: str = Field(default_factory=generate_uuid4) speaker: int | None name: str + user_id: str | None = None class Transcript(BaseModel): @@ -623,7 +624,9 @@ class TranscriptController: ) if recording: try: - await get_recordings_storage().delete_file(recording.object_key) + await get_transcripts_storage().delete_file( + recording.object_key, bucket=recording.bucket_name + ) except Exception as e: logger.warning( "Failed to delete recording object from S3", @@ -725,11 +728,13 @@ class TranscriptController: """ Download audio from storage """ - transcript.audio_mp3_filename.write_bytes( - await get_transcripts_storage().get_file( - transcript.storage_audio_path, - ) - ) + storage = get_transcripts_storage() + try: + with open(transcript.audio_mp3_filename, "wb") as f: + await storage.stream_to_fileobj(transcript.storage_audio_path, f) + except Exception: + transcript.audio_mp3_filename.unlink(missing_ok=True) + raise async def upsert_participant( self, diff --git a/server/reflector/pipelines/__init__.py b/server/reflector/pipelines/__init__.py new file mode 100644 index 00000000..89d3e9de --- /dev/null +++ b/server/reflector/pipelines/__init__.py @@ -0,0 +1 @@ +"""Pipeline modules for audio processing.""" diff --git a/server/reflector/pipelines/main_file_pipeline.py b/server/reflector/pipelines/main_file_pipeline.py index 0a05d593..6f8e8011 100644 --- a/server/reflector/pipelines/main_file_pipeline.py +++ b/server/reflector/pipelines/main_file_pipeline.py @@ -23,23 +23,18 @@ from reflector.db.transcripts import ( transcripts_controller, ) from reflector.logger import logger +from reflector.pipelines import topic_processing from reflector.pipelines.main_live_pipeline import ( PipelineMainBase, broadcast_to_sockets, task_cleanup_consent, task_pipeline_post_to_zulip, ) -from reflector.processors import ( - AudioFileWriterProcessor, - TranscriptFinalSummaryProcessor, - TranscriptFinalTitleProcessor, - TranscriptTopicDetectorProcessor, -) +from reflector.pipelines.transcription_helpers import transcribe_file_with_processor +from reflector.processors import AudioFileWriterProcessor from reflector.processors.audio_waveform_processor import AudioWaveformProcessor from reflector.processors.file_diarization import FileDiarizationInput from reflector.processors.file_diarization_auto import FileDiarizationAutoProcessor -from reflector.processors.file_transcript import FileTranscriptInput -from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor from reflector.processors.transcript_diarization_assembler import ( TranscriptDiarizationAssemblerInput, TranscriptDiarizationAssemblerProcessor, @@ -56,19 +51,6 @@ from reflector.storage import get_transcripts_storage from reflector.worker.webhook import send_transcript_webhook -class EmptyPipeline: - """Empty pipeline for processors that need a pipeline reference""" - - def __init__(self, logger: structlog.BoundLogger): - self.logger = logger - - def get_pref(self, k, d=None): - return d - - async def emit(self, event): - pass - - class PipelineMainFile(PipelineMainBase): """ Optimized file processing pipeline. @@ -81,7 +63,7 @@ class PipelineMainFile(PipelineMainBase): def __init__(self, transcript_id: str): super().__init__(transcript_id=transcript_id) self.logger = logger.bind(transcript_id=self.transcript_id) - self.empty_pipeline = EmptyPipeline(logger=self.logger) + self.empty_pipeline = topic_processing.EmptyPipeline(logger=self.logger) def _handle_gather_exceptions(self, results: list, operation: str) -> None: """Handle exceptions from asyncio.gather with return_exceptions=True""" @@ -262,24 +244,7 @@ class PipelineMainFile(PipelineMainBase): async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType: """Transcribe complete file""" - processor = FileTranscriptAutoProcessor() - input_data = FileTranscriptInput(audio_url=audio_url, language=language) - - # Store result for retrieval - result: TranscriptType | None = None - - async def capture_result(transcript): - nonlocal result - result = transcript - - processor.on(capture_result) - await processor.push(input_data) - await processor.flush() - - if not result: - raise ValueError("No transcript captured") - - return result + return await transcribe_file_with_processor(audio_url, language) async def diarize_file(self, audio_url: str) -> list[DiarizationSegment] | None: """Get diarization for file""" @@ -322,63 +287,31 @@ class PipelineMainFile(PipelineMainBase): async def detect_topics( self, transcript: TranscriptType, target_language: str ) -> list[TitleSummary]: - """Detect topics from complete transcript""" - chunk_size = 300 - topics: list[TitleSummary] = [] - - async def on_topic(topic: TitleSummary): - topics.append(topic) - return await self.on_topic(topic) - - topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic) - topic_detector.set_pipeline(self.empty_pipeline) - - for i in range(0, len(transcript.words), chunk_size): - chunk_words = transcript.words[i : i + chunk_size] - if not chunk_words: - continue - - chunk_transcript = TranscriptType( - words=chunk_words, translation=transcript.translation - ) - - await topic_detector.push(chunk_transcript) - - await topic_detector.flush() - return topics + return await topic_processing.detect_topics( + transcript, + target_language, + on_topic_callback=self.on_topic, + empty_pipeline=self.empty_pipeline, + ) async def generate_title(self, topics: list[TitleSummary]): - """Generate title from topics""" - if not topics: - self.logger.warning("No topics for title generation") - return - - processor = TranscriptFinalTitleProcessor(callback=self.on_title) - processor.set_pipeline(self.empty_pipeline) - - for topic in topics: - await processor.push(topic) - - await processor.flush() + return await topic_processing.generate_title( + topics, + on_title_callback=self.on_title, + empty_pipeline=self.empty_pipeline, + logger=self.logger, + ) async def generate_summaries(self, topics: list[TitleSummary]): - """Generate long and short summaries from topics""" - if not topics: - self.logger.warning("No topics for summary generation") - return - transcript = await self.get_transcript() - processor = TranscriptFinalSummaryProcessor( - transcript=transcript, - callback=self.on_long_summary, - on_short_summary=self.on_short_summary, + return await topic_processing.generate_summaries( + topics, + transcript, + on_long_summary_callback=self.on_long_summary, + on_short_summary_callback=self.on_short_summary, + empty_pipeline=self.empty_pipeline, + logger=self.logger, ) - processor.set_pipeline(self.empty_pipeline) - - for topic in topics: - await processor.push(topic) - - await processor.flush() @shared_task diff --git a/server/reflector/pipelines/main_live_pipeline.py b/server/reflector/pipelines/main_live_pipeline.py index f6fe6a83..83e560d6 100644 --- a/server/reflector/pipelines/main_live_pipeline.py +++ b/server/reflector/pipelines/main_live_pipeline.py @@ -17,7 +17,6 @@ from contextlib import asynccontextmanager from typing import Generic import av -import boto3 from celery import chord, current_task, group, shared_task from pydantic import BaseModel from structlog import BoundLogger as Logger @@ -584,6 +583,7 @@ async def cleanup_consent(transcript: Transcript, logger: Logger): consent_denied = False recording = None + meeting = None try: if transcript.recording_id: recording = await recordings_controller.get_by_id(transcript.recording_id) @@ -594,8 +594,8 @@ async def cleanup_consent(transcript: Transcript, logger: Logger): meeting.id ) except Exception as e: - logger.error(f"Failed to get fetch consent: {e}", exc_info=e) - consent_denied = True + logger.error(f"Failed to fetch consent: {e}", exc_info=e) + raise if not consent_denied: logger.info("Consent approved, keeping all files") @@ -603,25 +603,24 @@ async def cleanup_consent(transcript: Transcript, logger: Logger): logger.info("Consent denied, cleaning up all related audio files") - if recording and recording.bucket_name and recording.object_key: - s3_whereby = boto3.client( - "s3", - aws_access_key_id=settings.AWS_WHEREBY_ACCESS_KEY_ID, - aws_secret_access_key=settings.AWS_WHEREBY_ACCESS_KEY_SECRET, - ) - try: - s3_whereby.delete_object( - Bucket=recording.bucket_name, Key=recording.object_key - ) - logger.info( - f"Deleted original Whereby recording: {recording.bucket_name}/{recording.object_key}" - ) - except Exception as e: - logger.error(f"Failed to delete Whereby recording: {e}", exc_info=e) + deletion_errors = [] + if recording and recording.bucket_name: + keys_to_delete = [] + if recording.track_keys: + keys_to_delete = recording.track_keys + elif recording.object_key: + keys_to_delete = [recording.object_key] + + master_storage = get_transcripts_storage() + for key in keys_to_delete: + try: + await master_storage.delete_file(key, bucket=recording.bucket_name) + logger.info(f"Deleted recording file: {recording.bucket_name}/{key}") + except Exception as e: + error_msg = f"Failed to delete {key}: {e}" + logger.error(error_msg, exc_info=e) + deletion_errors.append(error_msg) - # non-transactional, files marked for deletion not actually deleted is possible - await transcripts_controller.update(transcript, {"audio_deleted": True}) - # 2. Delete processed audio from transcript storage S3 bucket if transcript.audio_location == "storage": storage = get_transcripts_storage() try: @@ -630,18 +629,28 @@ async def cleanup_consent(transcript: Transcript, logger: Logger): f"Deleted processed audio from storage: {transcript.storage_audio_path}" ) except Exception as e: - logger.error(f"Failed to delete processed audio: {e}", exc_info=e) + error_msg = f"Failed to delete processed audio: {e}" + logger.error(error_msg, exc_info=e) + deletion_errors.append(error_msg) - # 3. Delete local audio files try: if hasattr(transcript, "audio_mp3_filename") and transcript.audio_mp3_filename: transcript.audio_mp3_filename.unlink(missing_ok=True) if hasattr(transcript, "audio_wav_filename") and transcript.audio_wav_filename: transcript.audio_wav_filename.unlink(missing_ok=True) except Exception as e: - logger.error(f"Failed to delete local audio files: {e}", exc_info=e) + error_msg = f"Failed to delete local audio files: {e}" + logger.error(error_msg, exc_info=e) + deletion_errors.append(error_msg) - logger.info("Consent cleanup done") + if deletion_errors: + logger.warning( + f"Consent cleanup completed with {len(deletion_errors)} errors", + errors=deletion_errors, + ) + else: + await transcripts_controller.update(transcript, {"audio_deleted": True}) + logger.info("Consent cleanup done - all audio deleted") @get_transcript diff --git a/server/reflector/pipelines/main_multitrack_pipeline.py b/server/reflector/pipelines/main_multitrack_pipeline.py new file mode 100644 index 00000000..addcd9b4 --- /dev/null +++ b/server/reflector/pipelines/main_multitrack_pipeline.py @@ -0,0 +1,694 @@ +import asyncio +import math +import tempfile +from fractions import Fraction +from pathlib import Path + +import av +from av.audio.resampler import AudioResampler +from celery import chain, shared_task + +from reflector.asynctask import asynctask +from reflector.db.transcripts import ( + TranscriptStatus, + TranscriptWaveform, + transcripts_controller, +) +from reflector.logger import logger +from reflector.pipelines import topic_processing +from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed +from reflector.pipelines.main_live_pipeline import ( + PipelineMainBase, + broadcast_to_sockets, + task_cleanup_consent, + task_pipeline_post_to_zulip, +) +from reflector.pipelines.transcription_helpers import transcribe_file_with_processor +from reflector.processors import AudioFileWriterProcessor +from reflector.processors.audio_waveform_processor import AudioWaveformProcessor +from reflector.processors.types import TitleSummary +from reflector.processors.types import Transcript as TranscriptType +from reflector.storage import Storage, get_transcripts_storage +from reflector.utils.string import NonEmptyString + +# Audio encoding constants +OPUS_STANDARD_SAMPLE_RATE = 48000 +OPUS_DEFAULT_BIT_RATE = 128000 + +# Storage operation constants +PRESIGNED_URL_EXPIRATION_SECONDS = 7200 # 2 hours + + +class PipelineMainMultitrack(PipelineMainBase): + def __init__(self, transcript_id: str): + super().__init__(transcript_id=transcript_id) + self.logger = logger.bind(transcript_id=self.transcript_id) + self.empty_pipeline = topic_processing.EmptyPipeline(logger=self.logger) + + async def pad_track_for_transcription( + self, + track_url: NonEmptyString, + track_idx: int, + storage: Storage, + ) -> NonEmptyString: + """ + Pad a single track with silence based on stream metadata start_time. + Downloads from S3 presigned URL, processes via PyAV using tempfile, uploads to S3. + Returns presigned URL of padded track (or original URL if no padding needed). + + Memory usage: + - Pattern: fixed_overhead(2-5MB) for PyAV codec/filters + - PyAV streams input efficiently (no full download, verified) + - Output written to tempfile (disk-based, not memory) + - Upload streams from file handle (boto3 chunks, typically 5-10MB) + + Daily.co raw-tracks timing - Two approaches: + + CURRENT APPROACH (PyAV metadata): + The WebM stream.start_time field encodes MEETING-RELATIVE timing: + - t=0: When Daily.co recording started (first participant joined) + - start_time=8.13s: This participant's track began 8.13s after recording started + - Purpose: Enables track alignment without external manifest files + + This is NOT: + - Stream-internal offset (first packet timestamp relative to stream start) + - Absolute/wall-clock time + - Recording duration + + ALTERNATIVE APPROACH (filename parsing): + Daily.co filenames contain Unix timestamps (milliseconds): + Format: {recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}.webm + Example: 1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm + + Can calculate offset: (track_start_ts - recording_start_ts) / 1000 + - Track 0: (1760988935922 - 1760988935484) / 1000 = 0.438s + - Track 1: (1760988943823 - 1760988935484) / 1000 = 8.339s + + TIME DIFFERENCE: PyAV metadata vs filename timestamps differ by ~209ms: + - Track 0: filename=438ms, metadata=229ms (diff: 209ms) + - Track 1: filename=8339ms, metadata=8130ms (diff: 209ms) + + Consistent delta suggests network/encoding delay. PyAV metadata is ground truth + (represents when audio stream actually started vs when file upload initiated). + + Example with 2 participants: + Track A: start_time=0.2s → Joined 200ms after recording began + Track B: start_time=8.1s → Joined 8.1 seconds later + + After padding: + Track A: [0.2s silence] + [speech...] + Track B: [8.1s silence] + [speech...] + + Whisper transcription timestamps are now synchronized: + Track A word at 5.0s → happened at meeting t=5.0s + Track B word at 10.0s → happened at meeting t=10.0s + + Merging just sorts by timestamp - no offset calculation needed. + + Padding coincidentally involves re-encoding. It's important when we work with Daily.co + Whisper. + This is because Daily.co returns recordings with skipped frames e.g. when microphone muted. + Daily.co doesn't understand those frames and ignores them, causing timestamp issues in transcription. + Re-encoding restores those frames. We do padding and re-encoding together just because it's convenient and more performant: + we need padded values for mix mp3 anyways + """ + + transcript = await self.get_transcript() + + try: + # PyAV streams input from S3 URL efficiently (2-5MB fixed overhead for codec/filters) + with av.open(track_url) as in_container: + start_time_seconds = self._extract_stream_start_time_from_container( + in_container, track_idx + ) + + if start_time_seconds <= 0: + self.logger.info( + f"Track {track_idx} requires no padding (start_time={start_time_seconds}s)", + track_idx=track_idx, + ) + return track_url + + # Use tempfile instead of BytesIO for better memory efficiency + # Reduces peak memory usage during encoding/upload + with tempfile.NamedTemporaryFile( + suffix=".webm", delete=False + ) as temp_file: + temp_path = temp_file.name + + try: + self._apply_audio_padding_to_file( + in_container, temp_path, start_time_seconds, track_idx + ) + + storage_path = ( + f"file_pipeline/{transcript.id}/tracks/padded_{track_idx}.webm" + ) + + # Upload using file handle for streaming + with open(temp_path, "rb") as padded_file: + await storage.put_file(storage_path, padded_file) + finally: + # Clean up temp file + Path(temp_path).unlink(missing_ok=True) + + padded_url = await storage.get_file_url( + storage_path, + operation="get_object", + expires_in=PRESIGNED_URL_EXPIRATION_SECONDS, + ) + + self.logger.info( + f"Successfully padded track {track_idx}", + track_idx=track_idx, + start_time_seconds=start_time_seconds, + padded_url=padded_url, + ) + + return padded_url + + except Exception as e: + self.logger.error( + f"Failed to process track {track_idx}", + track_idx=track_idx, + url=track_url, + error=str(e), + exc_info=True, + ) + raise Exception( + f"Track {track_idx} padding failed - transcript would have incorrect timestamps" + ) from e + + def _extract_stream_start_time_from_container( + self, container, track_idx: int + ) -> float: + """ + Extract meeting-relative start time from WebM stream metadata. + Uses PyAV to read stream.start_time from WebM container. + More accurate than filename timestamps by ~209ms due to network/encoding delays. + """ + start_time_seconds = 0.0 + try: + audio_streams = [s for s in container.streams if s.type == "audio"] + stream = audio_streams[0] if audio_streams else container.streams[0] + + # 1) Try stream-level start_time (most reliable for Daily.co tracks) + if stream.start_time is not None and stream.time_base is not None: + start_time_seconds = float(stream.start_time * stream.time_base) + + # 2) Fallback to container-level start_time (in av.time_base units) + if (start_time_seconds <= 0) and (container.start_time is not None): + start_time_seconds = float(container.start_time * av.time_base) + + # 3) Fallback to first packet DTS in stream.time_base + if start_time_seconds <= 0: + for packet in container.demux(stream): + if packet.dts is not None: + start_time_seconds = float(packet.dts * stream.time_base) + break + except Exception as e: + self.logger.warning( + "PyAV metadata read failed; assuming 0 start_time", + track_idx=track_idx, + error=str(e), + ) + start_time_seconds = 0.0 + + self.logger.info( + f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s", + track_idx=track_idx, + ) + return start_time_seconds + + def _apply_audio_padding_to_file( + self, + in_container, + output_path: str, + start_time_seconds: float, + track_idx: int, + ) -> None: + """Apply silence padding to audio track using PyAV filter graph, writing to file""" + delay_ms = math.floor(start_time_seconds * 1000) + + self.logger.info( + f"Padding track {track_idx} with {delay_ms}ms delay using PyAV", + track_idx=track_idx, + delay_ms=delay_ms, + ) + + try: + with av.open(output_path, "w", format="webm") as out_container: + in_stream = next( + (s for s in in_container.streams if s.type == "audio"), None + ) + if in_stream is None: + raise Exception("No audio stream in input") + + out_stream = out_container.add_stream( + "libopus", rate=OPUS_STANDARD_SAMPLE_RATE + ) + out_stream.bit_rate = OPUS_DEFAULT_BIT_RATE + graph = av.filter.Graph() + + abuf_args = ( + f"time_base=1/{OPUS_STANDARD_SAMPLE_RATE}:" + f"sample_rate={OPUS_STANDARD_SAMPLE_RATE}:" + f"sample_fmt=s16:" + f"channel_layout=stereo" + ) + src = graph.add("abuffer", args=abuf_args, name="src") + aresample_f = graph.add("aresample", args="async=1", name="ares") + # adelay requires one delay value per channel separated by '|' + delays_arg = f"{delay_ms}|{delay_ms}" + adelay_f = graph.add( + "adelay", args=f"delays={delays_arg}:all=1", name="delay" + ) + sink = graph.add("abuffersink", name="sink") + + src.link_to(aresample_f) + aresample_f.link_to(adelay_f) + adelay_f.link_to(sink) + graph.configure() + + resampler = AudioResampler( + format="s16", layout="stereo", rate=OPUS_STANDARD_SAMPLE_RATE + ) + # Decode -> resample -> push through graph -> encode Opus + for frame in in_container.decode(in_stream): + out_frames = resampler.resample(frame) or [] + for rframe in out_frames: + rframe.sample_rate = OPUS_STANDARD_SAMPLE_RATE + rframe.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE) + src.push(rframe) + + while True: + try: + f_out = sink.pull() + except Exception: + break + f_out.sample_rate = OPUS_STANDARD_SAMPLE_RATE + f_out.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE) + for packet in out_stream.encode(f_out): + out_container.mux(packet) + + src.push(None) + while True: + try: + f_out = sink.pull() + except Exception: + break + f_out.sample_rate = OPUS_STANDARD_SAMPLE_RATE + f_out.time_base = Fraction(1, OPUS_STANDARD_SAMPLE_RATE) + for packet in out_stream.encode(f_out): + out_container.mux(packet) + + for packet in out_stream.encode(None): + out_container.mux(packet) + except Exception as e: + self.logger.error( + "PyAV padding failed for track", + track_idx=track_idx, + delay_ms=delay_ms, + error=str(e), + exc_info=True, + ) + raise + + async def mixdown_tracks( + self, + track_urls: list[str], + writer: AudioFileWriterProcessor, + offsets_seconds: list[float] | None = None, + ) -> None: + """Multi-track mixdown using PyAV filter graph (amix), reading from S3 presigned URLs""" + + target_sample_rate: int | None = None + for url in track_urls: + if not url: + continue + container = None + try: + container = av.open(url) + for frame in container.decode(audio=0): + target_sample_rate = frame.sample_rate + break + except Exception: + continue + finally: + if container is not None: + container.close() + if target_sample_rate: + break + + if not target_sample_rate: + self.logger.error("Mixdown failed - no decodable audio frames found") + raise Exception("Mixdown failed: No decodable audio frames in any track") + # Build PyAV filter graph: + # N abuffer (s32/stereo) + # -> optional adelay per input (for alignment) + # -> amix (s32) + # -> aformat(s16) + # -> sink + graph = av.filter.Graph() + inputs = [] + valid_track_urls = [url for url in track_urls if url] + input_offsets_seconds = None + if offsets_seconds is not None: + input_offsets_seconds = [ + offsets_seconds[i] for i, url in enumerate(track_urls) if url + ] + for idx, url in enumerate(valid_track_urls): + args = ( + f"time_base=1/{target_sample_rate}:" + f"sample_rate={target_sample_rate}:" + f"sample_fmt=s32:" + f"channel_layout=stereo" + ) + in_ctx = graph.add("abuffer", args=args, name=f"in{idx}") + inputs.append(in_ctx) + + if not inputs: + self.logger.error("Mixdown failed - no valid inputs for graph") + raise Exception("Mixdown failed: No valid inputs for filter graph") + + mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix") + + fmt = graph.add( + "aformat", + args=( + f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}" + ), + name="fmt", + ) + + sink = graph.add("abuffersink", name="out") + + # Optional per-input delay before mixing + delays_ms: list[int] = [] + if input_offsets_seconds is not None: + base = min(input_offsets_seconds) if input_offsets_seconds else 0.0 + delays_ms = [ + max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds + ] + else: + delays_ms = [0 for _ in inputs] + + for idx, in_ctx in enumerate(inputs): + delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0 + if delay_ms > 0: + # adelay requires one value per channel; use same for stereo + adelay = graph.add( + "adelay", + args=f"delays={delay_ms}|{delay_ms}:all=1", + name=f"delay{idx}", + ) + in_ctx.link_to(adelay) + adelay.link_to(mixer, 0, idx) + else: + in_ctx.link_to(mixer, 0, idx) + mixer.link_to(fmt) + fmt.link_to(sink) + graph.configure() + + containers = [] + try: + # Open all containers with cleanup guaranteed + for i, url in enumerate(valid_track_urls): + try: + c = av.open(url) + containers.append(c) + except Exception as e: + self.logger.warning( + "Mixdown: failed to open container from URL", + input=i, + url=url, + error=str(e), + ) + + if not containers: + self.logger.error("Mixdown failed - no valid containers opened") + raise Exception("Mixdown failed: Could not open any track containers") + + decoders = [c.decode(audio=0) for c in containers] + active = [True] * len(decoders) + resamplers = [ + AudioResampler(format="s32", layout="stereo", rate=target_sample_rate) + for _ in decoders + ] + + while any(active): + for i, (dec, is_active) in enumerate(zip(decoders, active)): + if not is_active: + continue + try: + frame = next(dec) + except StopIteration: + active[i] = False + continue + + if frame.sample_rate != target_sample_rate: + continue + out_frames = resamplers[i].resample(frame) or [] + for rf in out_frames: + rf.sample_rate = target_sample_rate + rf.time_base = Fraction(1, target_sample_rate) + inputs[i].push(rf) + + while True: + try: + mixed = sink.pull() + except Exception: + break + mixed.sample_rate = target_sample_rate + mixed.time_base = Fraction(1, target_sample_rate) + await writer.push(mixed) + + for in_ctx in inputs: + in_ctx.push(None) + while True: + try: + mixed = sink.pull() + except Exception: + break + mixed.sample_rate = target_sample_rate + mixed.time_base = Fraction(1, target_sample_rate) + await writer.push(mixed) + finally: + # Cleanup all containers, even if processing failed + for c in containers: + if c is not None: + try: + c.close() + except Exception: + pass # Best effort cleanup + + @broadcast_to_sockets + async def set_status(self, transcript_id: str, status: TranscriptStatus): + async with self.lock_transaction(): + return await transcripts_controller.set_status(transcript_id, status) + + async def on_waveform(self, data): + async with self.transaction(): + waveform = TranscriptWaveform(waveform=data) + transcript = await self.get_transcript() + return await transcripts_controller.append_event( + transcript=transcript, event="WAVEFORM", data=waveform + ) + + async def process(self, bucket_name: str, track_keys: list[str]): + transcript = await self.get_transcript() + async with self.transaction(): + await transcripts_controller.update( + transcript, + { + "events": [], + "topics": [], + }, + ) + + source_storage = get_transcripts_storage() + transcript_storage = source_storage + + track_urls: list[str] = [] + for key in track_keys: + url = await source_storage.get_file_url( + key, + operation="get_object", + expires_in=PRESIGNED_URL_EXPIRATION_SECONDS, + bucket=bucket_name, + ) + track_urls.append(url) + self.logger.info( + f"Generated presigned URL for track from {bucket_name}", + key=key, + ) + + created_padded_files = set() + padded_track_urls: list[str] = [] + for idx, url in enumerate(track_urls): + padded_url = await self.pad_track_for_transcription( + url, idx, transcript_storage + ) + padded_track_urls.append(padded_url) + if padded_url != url: + storage_path = f"file_pipeline/{transcript.id}/tracks/padded_{idx}.webm" + created_padded_files.add(storage_path) + self.logger.info(f"Track {idx} processed, padded URL: {padded_url}") + + transcript.data_path.mkdir(parents=True, exist_ok=True) + + mp3_writer = AudioFileWriterProcessor( + path=str(transcript.audio_mp3_filename), + on_duration=self.on_duration, + ) + await self.mixdown_tracks(padded_track_urls, mp3_writer, offsets_seconds=None) + await mp3_writer.flush() + + if not transcript.audio_mp3_filename.exists(): + raise Exception( + "Mixdown failed - no MP3 file generated. Cannot proceed without playable audio." + ) + + storage_path = f"{transcript.id}/audio.mp3" + # Use file handle streaming to avoid loading entire MP3 into memory + mp3_size = transcript.audio_mp3_filename.stat().st_size + with open(transcript.audio_mp3_filename, "rb") as mp3_file: + await transcript_storage.put_file(storage_path, mp3_file) + mp3_url = await transcript_storage.get_file_url(storage_path) + + await transcripts_controller.update(transcript, {"audio_location": "storage"}) + + self.logger.info( + f"Uploaded mixed audio to storage", + storage_path=storage_path, + size=mp3_size, + url=mp3_url, + ) + + self.logger.info("Generating waveform from mixed audio") + waveform_processor = AudioWaveformProcessor( + audio_path=transcript.audio_mp3_filename, + waveform_path=transcript.audio_waveform_filename, + on_waveform=self.on_waveform, + ) + waveform_processor.set_pipeline(self.empty_pipeline) + await waveform_processor.flush() + self.logger.info("Waveform generated successfully") + + speaker_transcripts: list[TranscriptType] = [] + for idx, padded_url in enumerate(padded_track_urls): + if not padded_url: + continue + + t = await self.transcribe_file(padded_url, transcript.source_language) + + if not t.words: + continue + + for w in t.words: + w.speaker = idx + + speaker_transcripts.append(t) + self.logger.info( + f"Track {idx} transcribed successfully with {len(t.words)} words", + track_idx=idx, + ) + + valid_track_count = len([url for url in padded_track_urls if url]) + if valid_track_count > 0 and len(speaker_transcripts) != valid_track_count: + raise Exception( + f"Only {len(speaker_transcripts)}/{valid_track_count} tracks transcribed successfully. " + f"All tracks must succeed to avoid incomplete transcripts." + ) + + if not speaker_transcripts: + raise Exception("No valid track transcriptions") + + self.logger.info(f"Cleaning up {len(created_padded_files)} temporary S3 files") + cleanup_tasks = [] + for storage_path in created_padded_files: + cleanup_tasks.append(transcript_storage.delete_file(storage_path)) + + if cleanup_tasks: + cleanup_results = await asyncio.gather( + *cleanup_tasks, return_exceptions=True + ) + for storage_path, result in zip(created_padded_files, cleanup_results): + if isinstance(result, Exception): + self.logger.warning( + "Failed to cleanup temporary padded track", + storage_path=storage_path, + error=str(result), + ) + + merged_words = [] + for t in speaker_transcripts: + merged_words.extend(t.words) + merged_words.sort( + key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0 + ) + + merged_transcript = TranscriptType(words=merged_words, translation=None) + + await self.on_transcript(merged_transcript) + + topics = await self.detect_topics(merged_transcript, transcript.target_language) + await asyncio.gather( + self.generate_title(topics), + self.generate_summaries(topics), + return_exceptions=False, + ) + + await self.set_status(transcript.id, "ended") + + async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType: + return await transcribe_file_with_processor(audio_url, language) + + async def detect_topics( + self, transcript: TranscriptType, target_language: str + ) -> list[TitleSummary]: + return await topic_processing.detect_topics( + transcript, + target_language, + on_topic_callback=self.on_topic, + empty_pipeline=self.empty_pipeline, + ) + + async def generate_title(self, topics: list[TitleSummary]): + return await topic_processing.generate_title( + topics, + on_title_callback=self.on_title, + empty_pipeline=self.empty_pipeline, + logger=self.logger, + ) + + async def generate_summaries(self, topics: list[TitleSummary]): + transcript = await self.get_transcript() + return await topic_processing.generate_summaries( + topics, + transcript, + on_long_summary_callback=self.on_long_summary, + on_short_summary_callback=self.on_short_summary, + empty_pipeline=self.empty_pipeline, + logger=self.logger, + ) + + +@shared_task +@asynctask +async def task_pipeline_multitrack_process( + *, transcript_id: str, bucket_name: str, track_keys: list[str] +): + pipeline = PipelineMainMultitrack(transcript_id=transcript_id) + try: + await pipeline.set_status(transcript_id, "processing") + await pipeline.process(bucket_name, track_keys) + except Exception: + await pipeline.set_status(transcript_id, "error") + raise + + post_chain = chain( + task_cleanup_consent.si(transcript_id=transcript_id), + task_pipeline_post_to_zulip.si(transcript_id=transcript_id), + task_send_webhook_if_needed.si(transcript_id=transcript_id), + ) + post_chain.delay() diff --git a/server/reflector/pipelines/topic_processing.py b/server/reflector/pipelines/topic_processing.py new file mode 100644 index 00000000..7f055025 --- /dev/null +++ b/server/reflector/pipelines/topic_processing.py @@ -0,0 +1,109 @@ +""" +Topic processing utilities +========================== + +Shared topic detection, title generation, and summarization logic +used across file and multitrack pipelines. +""" + +from typing import Callable + +import structlog + +from reflector.db.transcripts import Transcript +from reflector.processors import ( + TranscriptFinalSummaryProcessor, + TranscriptFinalTitleProcessor, + TranscriptTopicDetectorProcessor, +) +from reflector.processors.types import TitleSummary +from reflector.processors.types import Transcript as TranscriptType + + +class EmptyPipeline: + def __init__(self, logger: structlog.BoundLogger): + self.logger = logger + + def get_pref(self, k, d=None): + return d + + async def emit(self, event): + pass + + +async def detect_topics( + transcript: TranscriptType, + target_language: str, + *, + on_topic_callback: Callable, + empty_pipeline: EmptyPipeline, +) -> list[TitleSummary]: + chunk_size = 300 + topics: list[TitleSummary] = [] + + async def on_topic(topic: TitleSummary): + topics.append(topic) + return await on_topic_callback(topic) + + topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic) + topic_detector.set_pipeline(empty_pipeline) + + for i in range(0, len(transcript.words), chunk_size): + chunk_words = transcript.words[i : i + chunk_size] + if not chunk_words: + continue + + chunk_transcript = TranscriptType( + words=chunk_words, translation=transcript.translation + ) + + await topic_detector.push(chunk_transcript) + + await topic_detector.flush() + return topics + + +async def generate_title( + topics: list[TitleSummary], + *, + on_title_callback: Callable, + empty_pipeline: EmptyPipeline, + logger: structlog.BoundLogger, +): + if not topics: + logger.warning("No topics for title generation") + return + + processor = TranscriptFinalTitleProcessor(callback=on_title_callback) + processor.set_pipeline(empty_pipeline) + + for topic in topics: + await processor.push(topic) + + await processor.flush() + + +async def generate_summaries( + topics: list[TitleSummary], + transcript: Transcript, + *, + on_long_summary_callback: Callable, + on_short_summary_callback: Callable, + empty_pipeline: EmptyPipeline, + logger: structlog.BoundLogger, +): + if not topics: + logger.warning("No topics for summary generation") + return + + processor = TranscriptFinalSummaryProcessor( + transcript=transcript, + callback=on_long_summary_callback, + on_short_summary=on_short_summary_callback, + ) + processor.set_pipeline(empty_pipeline) + + for topic in topics: + await processor.push(topic) + + await processor.flush() diff --git a/server/reflector/pipelines/transcription_helpers.py b/server/reflector/pipelines/transcription_helpers.py new file mode 100644 index 00000000..b0cc5858 --- /dev/null +++ b/server/reflector/pipelines/transcription_helpers.py @@ -0,0 +1,34 @@ +from reflector.processors.file_transcript import FileTranscriptInput +from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor +from reflector.processors.types import Transcript as TranscriptType + + +async def transcribe_file_with_processor( + audio_url: str, + language: str, + processor_name: str | None = None, +) -> TranscriptType: + processor = ( + FileTranscriptAutoProcessor(name=processor_name) + if processor_name + else FileTranscriptAutoProcessor() + ) + input_data = FileTranscriptInput(audio_url=audio_url, language=language) + + result: TranscriptType | None = None + + async def capture_result(transcript): + nonlocal result + result = transcript + + processor.on(capture_result) + await processor.push(input_data) + await processor.flush() + + if not result: + processor_label = processor_name or "default" + raise ValueError( + f"No transcript captured from {processor_label} processor for audio: {audio_url}" + ) + + return result diff --git a/server/reflector/processors/summary/summary_builder.py b/server/reflector/processors/summary/summary_builder.py index efcf9227..df348093 100644 --- a/server/reflector/processors/summary/summary_builder.py +++ b/server/reflector/processors/summary/summary_builder.py @@ -165,6 +165,7 @@ class SummaryBuilder: self.llm: LLM = llm self.model_name: str = llm.model_name self.logger = logger or structlog.get_logger() + self.participant_instructions: str | None = None if filename: self.read_transcript_from_file(filename) @@ -191,14 +192,61 @@ class SummaryBuilder: self, prompt: str, output_cls: Type[T], tone_name: str | None = None ) -> T: """Generic function to get structured output from LLM for non-function-calling models.""" + # Add participant instructions to the prompt if available + enhanced_prompt = self._enhance_prompt_with_participants(prompt) return await self.llm.get_structured_response( - prompt, [self.transcript], output_cls, tone_name=tone_name + enhanced_prompt, [self.transcript], output_cls, tone_name=tone_name ) + async def _get_response( + self, prompt: str, texts: list[str], tone_name: str | None = None + ) -> str: + """Get text response with automatic participant instructions injection.""" + enhanced_prompt = self._enhance_prompt_with_participants(prompt) + return await self.llm.get_response(enhanced_prompt, texts, tone_name=tone_name) + + def _enhance_prompt_with_participants(self, prompt: str) -> str: + """Add participant instructions to any prompt if participants are known.""" + if self.participant_instructions: + self.logger.debug("Adding participant instructions to prompt") + return f"{prompt}\n\n{self.participant_instructions}" + return prompt + # ---------------------------------------------------------------------------- # Participants # ---------------------------------------------------------------------------- + def set_known_participants(self, participants: list[str]) -> None: + """ + Set known participants directly without LLM identification. + This is used when participants are already identified and stored. + They are appended at the end of the transcript, providing more context for the assistant. + """ + if not participants: + self.logger.warning("No participants provided") + return + + self.logger.info( + "Using known participants", + participants=participants, + ) + + participants_md = self.format_list_md(participants) + self.transcript += f"\n\n# Participants\n\n{participants_md}" + + # Set instructions that will be automatically added to all prompts + participants_list = ", ".join(participants) + self.participant_instructions = dedent( + f""" + # IMPORTANT: Participant Names + The following participants are identified in this conversation: {participants_list} + + You MUST use these specific participant names when referring to people in your response. + Do NOT use generic terms like "a participant", "someone", "attendee", "Speaker 1", "Speaker 2", etc. + Always refer to people by their actual names (e.g., "John suggested..." not "A participant suggested..."). + """ + ).strip() + async def identify_participants(self) -> None: """ From a transcript, try to identify the participants using TreeSummarize with structured output. @@ -232,6 +280,19 @@ class SummaryBuilder: if unique_participants: participants_md = self.format_list_md(unique_participants) self.transcript += f"\n\n# Participants\n\n{participants_md}" + + # Set instructions that will be automatically added to all prompts + participants_list = ", ".join(unique_participants) + self.participant_instructions = dedent( + f""" + # IMPORTANT: Participant Names + The following participants are identified in this conversation: {participants_list} + + You MUST use these specific participant names when referring to people in your response. + Do NOT use generic terms like "a participant", "someone", "attendee", "Speaker 1", "Speaker 2", etc. + Always refer to people by their actual names (e.g., "John suggested..." not "A participant suggested..."). + """ + ).strip() else: self.logger.warning("No participants identified in the transcript") @@ -318,13 +379,13 @@ class SummaryBuilder: for subject in self.subjects: detailed_prompt = DETAILED_SUBJECT_PROMPT_TEMPLATE.format(subject=subject) - detailed_response = await self.llm.get_response( + detailed_response = await self._get_response( detailed_prompt, [self.transcript], tone_name="Topic assistant" ) paragraph_prompt = PARAGRAPH_SUMMARY_PROMPT - paragraph_response = await self.llm.get_response( + paragraph_response = await self._get_response( paragraph_prompt, [str(detailed_response)], tone_name="Topic summarizer" ) @@ -345,7 +406,7 @@ class SummaryBuilder: recap_prompt = RECAP_PROMPT - recap_response = await self.llm.get_response( + recap_response = await self._get_response( recap_prompt, [summaries_text], tone_name="Recap summarizer" ) diff --git a/server/reflector/processors/transcript_final_summary.py b/server/reflector/processors/transcript_final_summary.py index 0b4a594c..dfe07aad 100644 --- a/server/reflector/processors/transcript_final_summary.py +++ b/server/reflector/processors/transcript_final_summary.py @@ -26,7 +26,25 @@ class TranscriptFinalSummaryProcessor(Processor): async def get_summary_builder(self, text) -> SummaryBuilder: builder = SummaryBuilder(self.llm, logger=self.logger) builder.set_transcript(text) - await builder.identify_participants() + + # Use known participants if available, otherwise identify them + if self.transcript and self.transcript.participants: + # Extract participant names from the stored participants + participant_names = [p.name for p in self.transcript.participants if p.name] + if participant_names: + self.logger.info( + f"Using {len(participant_names)} known participants from transcript" + ) + builder.set_known_participants(participant_names) + else: + self.logger.info( + "Participants field exists but is empty, identifying participants" + ) + await builder.identify_participants() + else: + self.logger.info("No participants stored, identifying participants") + await builder.identify_participants() + await builder.generate_summary() return builder @@ -49,18 +67,30 @@ class TranscriptFinalSummaryProcessor(Processor): speakermap = {} if self.transcript: speakermap = { - participant["speaker"]: participant["name"] - for participant in self.transcript.participants + p.speaker: p.name + for p in (self.transcript.participants or []) + if p.speaker is not None and p.name } + self.logger.info( + f"Built speaker map with {len(speakermap)} participants", + speakermap=speakermap, + ) # build the transcript as a single string - # XXX: unsure if the participants name as replaced directly in speaker ? + # Replace speaker IDs with actual participant names if available text_transcript = [] + unique_speakers = set() for topic in self.chunks: for segment in topic.transcript.as_segments(): name = speakermap.get(segment.speaker, f"Speaker {segment.speaker}") + unique_speakers.add((segment.speaker, name)) text_transcript.append(f"{name}: {segment.text}") + self.logger.info( + f"Built transcript with {len(unique_speakers)} unique speakers", + speakers=list(unique_speakers), + ) + text_transcript = "\n".join(text_transcript) last_chunk = self.chunks[-1] diff --git a/server/reflector/processors/transcript_topic_detector.py b/server/reflector/processors/transcript_topic_detector.py index 317e2d9c..695d3af3 100644 --- a/server/reflector/processors/transcript_topic_detector.py +++ b/server/reflector/processors/transcript_topic_detector.py @@ -1,6 +1,6 @@ from textwrap import dedent -from pydantic import BaseModel, Field +from pydantic import AliasChoices, BaseModel, Field from reflector.llm import LLM from reflector.processors.base import Processor @@ -36,15 +36,13 @@ class TopicResponse(BaseModel): title: str = Field( description="A descriptive title for the topic being discussed", - validation_alias="Title", + validation_alias=AliasChoices("title", "Title"), ) summary: str = Field( description="A concise 1-2 sentence summary of the discussion", - validation_alias="Summary", + validation_alias=AliasChoices("summary", "Summary"), ) - model_config = {"populate_by_name": True} - class TranscriptTopicDetectorProcessor(Processor): """ diff --git a/server/reflector/schemas/platform.py b/server/reflector/schemas/platform.py new file mode 100644 index 00000000..7b945841 --- /dev/null +++ b/server/reflector/schemas/platform.py @@ -0,0 +1,5 @@ +from typing import Literal + +Platform = Literal["whereby", "daily"] +WHEREBY_PLATFORM: Platform = "whereby" +DAILY_PLATFORM: Platform = "daily" diff --git a/server/reflector/settings.py b/server/reflector/settings.py index 9659f648..0e3fb3f7 100644 --- a/server/reflector/settings.py +++ b/server/reflector/settings.py @@ -1,6 +1,7 @@ from pydantic.types import PositiveInt from pydantic_settings import BaseSettings, SettingsConfigDict +from reflector.schemas.platform import WHEREBY_PLATFORM, Platform from reflector.utils.string import NonEmptyString @@ -47,14 +48,17 @@ class Settings(BaseSettings): TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID: str | None = None TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None - # Recording storage - RECORDING_STORAGE_BACKEND: str | None = None + # Platform-specific recording storage (follows {PREFIX}_STORAGE_AWS_{CREDENTIAL} pattern) + # Whereby storage configuration + WHEREBY_STORAGE_AWS_BUCKET_NAME: str | None = None + WHEREBY_STORAGE_AWS_REGION: str | None = None + WHEREBY_STORAGE_AWS_ACCESS_KEY_ID: str | None = None + WHEREBY_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None - # Recording storage configuration for AWS - RECORDING_STORAGE_AWS_BUCKET_NAME: str = "recording-bucket" - RECORDING_STORAGE_AWS_REGION: str = "us-east-1" - RECORDING_STORAGE_AWS_ACCESS_KEY_ID: str | None = None - RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None + # Daily.co storage configuration + DAILYCO_STORAGE_AWS_BUCKET_NAME: str | None = None + DAILYCO_STORAGE_AWS_REGION: str | None = None + DAILYCO_STORAGE_AWS_ROLE_ARN: str | None = None # Translate into the target language TRANSLATION_BACKEND: str = "passthrough" @@ -124,11 +128,20 @@ class Settings(BaseSettings): WHEREBY_API_URL: str = "https://api.whereby.dev/v1" WHEREBY_API_KEY: NonEmptyString | None = None WHEREBY_WEBHOOK_SECRET: str | None = None - AWS_WHEREBY_ACCESS_KEY_ID: str | None = None - AWS_WHEREBY_ACCESS_KEY_SECRET: str | None = None AWS_PROCESS_RECORDING_QUEUE_URL: str | None = None SQS_POLLING_TIMEOUT_SECONDS: int = 60 + # Daily.co integration + DAILY_API_KEY: str | None = None + DAILY_WEBHOOK_SECRET: str | None = None + DAILY_SUBDOMAIN: str | None = None + DAILY_WEBHOOK_UUID: str | None = ( + None # Webhook UUID for this environment. Not used by production code + ) + + # Platform Configuration + DEFAULT_VIDEO_PLATFORM: Platform = WHEREBY_PLATFORM + # Zulip integration ZULIP_REALM: str | None = None ZULIP_API_KEY: str | None = None diff --git a/server/reflector/storage/__init__.py b/server/reflector/storage/__init__.py index 3db8a77b..aff6c767 100644 --- a/server/reflector/storage/__init__.py +++ b/server/reflector/storage/__init__.py @@ -3,6 +3,13 @@ from reflector.settings import settings def get_transcripts_storage() -> Storage: + """ + Get storage for processed transcript files (master credentials). + + Also use this for ALL our file operations with bucket override: + master = get_transcripts_storage() + master.delete_file(key, bucket=recording.bucket_name) + """ assert settings.TRANSCRIPT_STORAGE_BACKEND return Storage.get_instance( name=settings.TRANSCRIPT_STORAGE_BACKEND, @@ -10,8 +17,53 @@ def get_transcripts_storage() -> Storage: ) -def get_recordings_storage() -> Storage: +def get_whereby_storage() -> Storage: + """ + Get storage config for Whereby (for passing to Whereby API). + + Usage: + whereby_storage = get_whereby_storage() + key_id, secret = whereby_storage.key_credentials + whereby_api.create_meeting( + bucket=whereby_storage.bucket_name, + access_key_id=key_id, + secret=secret, + ) + + Do NOT use for our file operations - use get_transcripts_storage() instead. + """ + if not settings.WHEREBY_STORAGE_AWS_BUCKET_NAME: + raise ValueError( + "WHEREBY_STORAGE_AWS_BUCKET_NAME required for Whereby with AWS storage" + ) + return Storage.get_instance( - name=settings.RECORDING_STORAGE_BACKEND, - settings_prefix="RECORDING_STORAGE_", + name="aws", + settings_prefix="WHEREBY_STORAGE_", + ) + + +def get_dailyco_storage() -> Storage: + """ + Get storage config for Daily.co (for passing to Daily API). + + Usage: + daily_storage = get_dailyco_storage() + daily_api.create_meeting( + bucket=daily_storage.bucket_name, + region=daily_storage.region, + role_arn=daily_storage.role_credential, + ) + + Do NOT use for our file operations - use get_transcripts_storage() instead. + """ + # Fail fast if platform-specific config missing + if not settings.DAILYCO_STORAGE_AWS_BUCKET_NAME: + raise ValueError( + "DAILYCO_STORAGE_AWS_BUCKET_NAME required for Daily.co with AWS storage" + ) + + return Storage.get_instance( + name="aws", + settings_prefix="DAILYCO_STORAGE_", ) diff --git a/server/reflector/storage/base.py b/server/reflector/storage/base.py index 360930d8..ba4316d8 100644 --- a/server/reflector/storage/base.py +++ b/server/reflector/storage/base.py @@ -1,10 +1,23 @@ import importlib +from typing import BinaryIO, Union from pydantic import BaseModel from reflector.settings import settings +class StorageError(Exception): + """Base exception for storage operations.""" + + pass + + +class StoragePermissionError(StorageError): + """Exception raised when storage operation fails due to permission issues.""" + + pass + + class FileResult(BaseModel): filename: str url: str @@ -36,26 +49,113 @@ class Storage: return cls._registry[name](**config) - async def put_file(self, filename: str, data: bytes) -> FileResult: - return await self._put_file(filename, data) - - async def _put_file(self, filename: str, data: bytes) -> FileResult: + # Credential properties for API passthrough + @property + def bucket_name(self) -> str: + """Default bucket name for this storage instance.""" raise NotImplementedError - async def delete_file(self, filename: str): - return await self._delete_file(filename) - - async def _delete_file(self, filename: str): + @property + def region(self) -> str: + """AWS region for this storage instance.""" raise NotImplementedError - async def get_file_url(self, filename: str) -> str: - return await self._get_file_url(filename) + @property + def access_key_id(self) -> str | None: + """AWS access key ID (None for role-based auth). Prefer key_credentials property.""" + return None - async def _get_file_url(self, filename: str) -> str: + @property + def secret_access_key(self) -> str | None: + """AWS secret access key (None for role-based auth). Prefer key_credentials property.""" + return None + + @property + def role_arn(self) -> str | None: + """AWS IAM role ARN for role-based auth (None for key-based auth). Prefer role_credential property.""" + return None + + @property + def key_credentials(self) -> tuple[str, str]: + """ + Get (access_key_id, secret_access_key) for key-based auth. + Raises ValueError if storage uses IAM role instead. + """ raise NotImplementedError - async def get_file(self, filename: str): - return await self._get_file(filename) - - async def _get_file(self, filename: str): + @property + def role_credential(self) -> str: + """ + Get IAM role ARN for role-based auth. + Raises ValueError if storage uses access keys instead. + """ + raise NotImplementedError + + async def put_file( + self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None + ) -> FileResult: + """Upload data. bucket: override instance default if provided.""" + return await self._put_file(filename, data, bucket=bucket) + + async def _put_file( + self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None + ) -> FileResult: + raise NotImplementedError + + async def delete_file(self, filename: str, *, bucket: str | None = None): + """Delete file. bucket: override instance default if provided.""" + return await self._delete_file(filename, bucket=bucket) + + async def _delete_file(self, filename: str, *, bucket: str | None = None): + raise NotImplementedError + + async def get_file_url( + self, + filename: str, + operation: str = "get_object", + expires_in: int = 3600, + *, + bucket: str | None = None, + ) -> str: + """Generate presigned URL. bucket: override instance default if provided.""" + return await self._get_file_url(filename, operation, expires_in, bucket=bucket) + + async def _get_file_url( + self, + filename: str, + operation: str = "get_object", + expires_in: int = 3600, + *, + bucket: str | None = None, + ) -> str: + raise NotImplementedError + + async def get_file(self, filename: str, *, bucket: str | None = None): + """Download file. bucket: override instance default if provided.""" + return await self._get_file(filename, bucket=bucket) + + async def _get_file(self, filename: str, *, bucket: str | None = None): + raise NotImplementedError + + async def list_objects( + self, prefix: str = "", *, bucket: str | None = None + ) -> list[str]: + """List object keys. bucket: override instance default if provided.""" + return await self._list_objects(prefix, bucket=bucket) + + async def _list_objects( + self, prefix: str = "", *, bucket: str | None = None + ) -> list[str]: + raise NotImplementedError + + async def stream_to_fileobj( + self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None + ): + """Stream file directly to file object without loading into memory. + bucket: override instance default if provided.""" + return await self._stream_to_fileobj(filename, fileobj, bucket=bucket) + + async def _stream_to_fileobj( + self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None + ): raise NotImplementedError diff --git a/server/reflector/storage/storage_aws.py b/server/reflector/storage/storage_aws.py index de9ccf35..372af4aa 100644 --- a/server/reflector/storage/storage_aws.py +++ b/server/reflector/storage/storage_aws.py @@ -1,79 +1,236 @@ +from functools import wraps +from typing import BinaryIO, Union + import aioboto3 +from botocore.config import Config +from botocore.exceptions import ClientError from reflector.logger import logger -from reflector.storage.base import FileResult, Storage +from reflector.storage.base import FileResult, Storage, StoragePermissionError + + +def handle_s3_client_errors(operation_name: str): + """Decorator to handle S3 ClientError with bucket-aware messaging. + + Args: + operation_name: Human-readable operation name for error messages (e.g., "upload", "delete") + """ + + def decorator(func): + @wraps(func) + async def wrapper(self, *args, **kwargs): + bucket = kwargs.get("bucket") + try: + return await func(self, *args, **kwargs) + except ClientError as e: + error_code = e.response.get("Error", {}).get("Code") + if error_code in ("AccessDenied", "NoSuchBucket"): + actual_bucket = bucket or self._bucket_name + bucket_context = ( + f"overridden bucket '{actual_bucket}'" + if bucket + else f"default bucket '{actual_bucket}'" + ) + raise StoragePermissionError( + f"S3 {operation_name} failed for {bucket_context}: {error_code}. " + f"Check TRANSCRIPT_STORAGE_AWS_* credentials have permission." + ) from e + raise + + return wrapper + + return decorator class AwsStorage(Storage): + """AWS S3 storage with bucket override for multi-platform recording architecture. + Master credentials access all buckets via optional bucket parameter in operations.""" + def __init__( self, - aws_access_key_id: str, - aws_secret_access_key: str, aws_bucket_name: str, aws_region: str, + aws_access_key_id: str | None = None, + aws_secret_access_key: str | None = None, + aws_role_arn: str | None = None, ): - if not aws_access_key_id: - raise ValueError("Storage `aws_storage` require `aws_access_key_id`") - if not aws_secret_access_key: - raise ValueError("Storage `aws_storage` require `aws_secret_access_key`") if not aws_bucket_name: raise ValueError("Storage `aws_storage` require `aws_bucket_name`") if not aws_region: raise ValueError("Storage `aws_storage` require `aws_region`") + if not aws_access_key_id and not aws_role_arn: + raise ValueError( + "Storage `aws_storage` require either `aws_access_key_id` or `aws_role_arn`" + ) + if aws_role_arn and (aws_access_key_id or aws_secret_access_key): + raise ValueError( + "Storage `aws_storage` cannot use both `aws_role_arn` and access keys" + ) super().__init__() - self.aws_bucket_name = aws_bucket_name + self._bucket_name = aws_bucket_name + self._region = aws_region + self._access_key_id = aws_access_key_id + self._secret_access_key = aws_secret_access_key + self._role_arn = aws_role_arn + self.aws_folder = "" if "/" in aws_bucket_name: - self.aws_bucket_name, self.aws_folder = aws_bucket_name.split("/", 1) + self._bucket_name, self.aws_folder = aws_bucket_name.split("/", 1) + self.boto_config = Config(retries={"max_attempts": 3, "mode": "adaptive"}) self.session = aioboto3.Session( aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key, region_name=aws_region, ) - self.base_url = f"https://{aws_bucket_name}.s3.amazonaws.com/" + self.base_url = f"https://{self._bucket_name}.s3.amazonaws.com/" - async def _put_file(self, filename: str, data: bytes) -> FileResult: - bucket = self.aws_bucket_name - folder = self.aws_folder - logger.info(f"Uploading {filename} to S3 {bucket}/{folder}") - s3filename = f"{folder}/{filename}" if folder else filename - async with self.session.client("s3") as client: - await client.put_object( - Bucket=bucket, - Key=s3filename, - Body=data, + # Implement credential properties + @property + def bucket_name(self) -> str: + return self._bucket_name + + @property + def region(self) -> str: + return self._region + + @property + def access_key_id(self) -> str | None: + return self._access_key_id + + @property + def secret_access_key(self) -> str | None: + return self._secret_access_key + + @property + def role_arn(self) -> str | None: + return self._role_arn + + @property + def key_credentials(self) -> tuple[str, str]: + """Get (access_key_id, secret_access_key) for key-based auth.""" + if self._role_arn: + raise ValueError( + "Storage uses IAM role authentication. " + "Use role_credential property instead of key_credentials." ) + if not self._access_key_id or not self._secret_access_key: + raise ValueError("Storage access key credentials not configured") + return (self._access_key_id, self._secret_access_key) - async def _get_file_url(self, filename: str) -> FileResult: - bucket = self.aws_bucket_name + @property + def role_credential(self) -> str: + """Get IAM role ARN for role-based auth.""" + if self._access_key_id or self._secret_access_key: + raise ValueError( + "Storage uses access key authentication. " + "Use key_credentials property instead of role_credential." + ) + if not self._role_arn: + raise ValueError("Storage IAM role ARN not configured") + return self._role_arn + + @handle_s3_client_errors("upload") + async def _put_file( + self, filename: str, data: Union[bytes, BinaryIO], *, bucket: str | None = None + ) -> FileResult: + actual_bucket = bucket or self._bucket_name folder = self.aws_folder s3filename = f"{folder}/{filename}" if folder else filename - async with self.session.client("s3") as client: + logger.info(f"Uploading {filename} to S3 {actual_bucket}/{folder}") + + async with self.session.client("s3", config=self.boto_config) as client: + if isinstance(data, bytes): + await client.put_object(Bucket=actual_bucket, Key=s3filename, Body=data) + else: + # boto3 reads file-like object in chunks + # avoids creating extra memory copy vs bytes.getvalue() approach + await client.upload_fileobj(data, Bucket=actual_bucket, Key=s3filename) + + url = await self._get_file_url(filename, bucket=bucket) + return FileResult(filename=filename, url=url) + + @handle_s3_client_errors("presign") + async def _get_file_url( + self, + filename: str, + operation: str = "get_object", + expires_in: int = 3600, + *, + bucket: str | None = None, + ) -> str: + actual_bucket = bucket or self._bucket_name + folder = self.aws_folder + s3filename = f"{folder}/{filename}" if folder else filename + async with self.session.client("s3", config=self.boto_config) as client: presigned_url = await client.generate_presigned_url( - "get_object", - Params={"Bucket": bucket, "Key": s3filename}, - ExpiresIn=3600, + operation, + Params={"Bucket": actual_bucket, "Key": s3filename}, + ExpiresIn=expires_in, ) return presigned_url - async def _delete_file(self, filename: str): - bucket = self.aws_bucket_name + @handle_s3_client_errors("delete") + async def _delete_file(self, filename: str, *, bucket: str | None = None): + actual_bucket = bucket or self._bucket_name folder = self.aws_folder - logger.info(f"Deleting {filename} from S3 {bucket}/{folder}") + logger.info(f"Deleting {filename} from S3 {actual_bucket}/{folder}") s3filename = f"{folder}/{filename}" if folder else filename - async with self.session.client("s3") as client: - await client.delete_object(Bucket=bucket, Key=s3filename) + async with self.session.client("s3", config=self.boto_config) as client: + await client.delete_object(Bucket=actual_bucket, Key=s3filename) - async def _get_file(self, filename: str): - bucket = self.aws_bucket_name + @handle_s3_client_errors("download") + async def _get_file(self, filename: str, *, bucket: str | None = None): + actual_bucket = bucket or self._bucket_name folder = self.aws_folder - logger.info(f"Downloading {filename} from S3 {bucket}/{folder}") + logger.info(f"Downloading {filename} from S3 {actual_bucket}/{folder}") s3filename = f"{folder}/{filename}" if folder else filename - async with self.session.client("s3") as client: - response = await client.get_object(Bucket=bucket, Key=s3filename) + async with self.session.client("s3", config=self.boto_config) as client: + response = await client.get_object(Bucket=actual_bucket, Key=s3filename) return await response["Body"].read() + @handle_s3_client_errors("list_objects") + async def _list_objects( + self, prefix: str = "", *, bucket: str | None = None + ) -> list[str]: + actual_bucket = bucket or self._bucket_name + folder = self.aws_folder + # Combine folder and prefix + s3prefix = f"{folder}/{prefix}" if folder else prefix + logger.info(f"Listing objects from S3 {actual_bucket} with prefix '{s3prefix}'") + + keys = [] + async with self.session.client("s3", config=self.boto_config) as client: + paginator = client.get_paginator("list_objects_v2") + async for page in paginator.paginate(Bucket=actual_bucket, Prefix=s3prefix): + if "Contents" in page: + for obj in page["Contents"]: + # Strip folder prefix from keys if present + key = obj["Key"] + if folder: + if key.startswith(f"{folder}/"): + key = key[len(folder) + 1 :] + elif key == folder: + # Skip folder marker itself + continue + keys.append(key) + + return keys + + @handle_s3_client_errors("stream") + async def _stream_to_fileobj( + self, filename: str, fileobj: BinaryIO, *, bucket: str | None = None + ): + """Stream file from S3 directly to file object without loading into memory.""" + actual_bucket = bucket or self._bucket_name + folder = self.aws_folder + logger.info(f"Streaming {filename} from S3 {actual_bucket}/{folder}") + s3filename = f"{folder}/{filename}" if folder else filename + async with self.session.client("s3", config=self.boto_config) as client: + await client.download_fileobj( + Bucket=actual_bucket, Key=s3filename, Fileobj=fileobj + ) + Storage.register("aws", AwsStorage) diff --git a/server/reflector/utils/daily.py b/server/reflector/utils/daily.py new file mode 100644 index 00000000..1c3b367c --- /dev/null +++ b/server/reflector/utils/daily.py @@ -0,0 +1,26 @@ +from reflector.utils.string import NonEmptyString + +DailyRoomName = str + + +def extract_base_room_name(daily_room_name: DailyRoomName) -> NonEmptyString: + """ + Extract base room name from Daily.co timestamped room name. + + Daily.co creates rooms with timestamp suffix: {base_name}-YYYYMMDDHHMMSS + This function removes the timestamp to get the original room name. + + Examples: + "daily-20251020193458" → "daily" + "daily-2-20251020193458" → "daily-2" + "my-room-name-20251020193458" → "my-room-name" + + Args: + daily_room_name: Full Daily.co room name with optional timestamp + + Returns: + Base room name without timestamp suffix + """ + base_name = daily_room_name.rsplit("-", 1)[0] + assert base_name, f"Extracted base name is empty from: {daily_room_name}" + return base_name diff --git a/server/reflector/utils/datetime.py b/server/reflector/utils/datetime.py new file mode 100644 index 00000000..d416412f --- /dev/null +++ b/server/reflector/utils/datetime.py @@ -0,0 +1,9 @@ +from datetime import datetime, timezone + + +def parse_datetime_with_timezone(iso_string: str) -> datetime: + """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive).""" + dt = datetime.fromisoformat(iso_string) + if dt.tzinfo is None: + dt = dt.replace(tzinfo=timezone.utc) + return dt diff --git a/server/reflector/utils/string.py b/server/reflector/utils/string.py index 05f40e30..ae4277c5 100644 --- a/server/reflector/utils/string.py +++ b/server/reflector/utils/string.py @@ -1,4 +1,4 @@ -from typing import Annotated +from typing import Annotated, TypeVar from pydantic import Field, TypeAdapter, constr @@ -21,3 +21,12 @@ def try_parse_non_empty_string(s: str) -> NonEmptyString | None: if not s: return None return parse_non_empty_string(s) + + +T = TypeVar("T", bound=str) + + +def assert_equal[T](s1: T, s2: T) -> T: + if s1 != s2: + raise ValueError(f"assert_equal: {s1} != {s2}") + return s1 diff --git a/server/reflector/utils/url.py b/server/reflector/utils/url.py new file mode 100644 index 00000000..e49a4cb0 --- /dev/null +++ b/server/reflector/utils/url.py @@ -0,0 +1,37 @@ +"""URL manipulation utilities.""" + +from urllib.parse import parse_qs, urlencode, urlparse, urlunparse + + +def add_query_param(url: str, key: str, value: str) -> str: + """ + Add or update a query parameter in a URL. + + Properly handles URLs with or without existing query parameters, + preserving fragments and encoding special characters. + + Args: + url: The URL to modify + key: The query parameter name + value: The query parameter value + + Returns: + The URL with the query parameter added or updated + + Examples: + >>> add_query_param("https://example.com/room", "t", "token123") + 'https://example.com/room?t=token123' + + >>> add_query_param("https://example.com/room?existing=param", "t", "token123") + 'https://example.com/room?existing=param&t=token123' + """ + parsed = urlparse(url) + + query_params = parse_qs(parsed.query, keep_blank_values=True) + + query_params[key] = [value] + + new_query = urlencode(query_params, doseq=True) + + new_parsed = parsed._replace(query=new_query) + return urlunparse(new_parsed) diff --git a/server/reflector/video_platforms/__init__.py b/server/reflector/video_platforms/__init__.py new file mode 100644 index 00000000..dcbdc45b --- /dev/null +++ b/server/reflector/video_platforms/__init__.py @@ -0,0 +1,11 @@ +from .base import VideoPlatformClient +from .models import MeetingData, VideoPlatformConfig +from .registry import get_platform_client, register_platform + +__all__ = [ + "VideoPlatformClient", + "VideoPlatformConfig", + "MeetingData", + "get_platform_client", + "register_platform", +] diff --git a/server/reflector/video_platforms/base.py b/server/reflector/video_platforms/base.py new file mode 100644 index 00000000..d208a75a --- /dev/null +++ b/server/reflector/video_platforms/base.py @@ -0,0 +1,54 @@ +from abc import ABC, abstractmethod +from datetime import datetime +from typing import TYPE_CHECKING, Any, Dict, List, Optional + +from ..schemas.platform import Platform +from ..utils.string import NonEmptyString +from .models import MeetingData, VideoPlatformConfig + +if TYPE_CHECKING: + from reflector.db.rooms import Room + +# separator doesn't guarantee there's no more "ROOM_PREFIX_SEPARATOR" strings in room name +ROOM_PREFIX_SEPARATOR = "-" + + +class VideoPlatformClient(ABC): + PLATFORM_NAME: Platform + + def __init__(self, config: VideoPlatformConfig): + self.config = config + + @abstractmethod + async def create_meeting( + self, room_name_prefix: NonEmptyString, end_date: datetime, room: "Room" + ) -> MeetingData: + pass + + @abstractmethod + async def get_room_sessions(self, room_name: str) -> List[Any] | None: + pass + + @abstractmethod + async def delete_room(self, room_name: str) -> bool: + pass + + @abstractmethod + async def upload_logo(self, room_name: str, logo_path: str) -> bool: + pass + + @abstractmethod + def verify_webhook_signature( + self, body: bytes, signature: str, timestamp: Optional[str] = None + ) -> bool: + pass + + def format_recording_config(self, room: "Room") -> Dict[str, Any]: + if room.recording_type == "cloud" and self.config.s3_bucket: + return { + "type": room.recording_type, + "bucket": self.config.s3_bucket, + "region": self.config.s3_region, + "trigger": room.recording_trigger, + } + return {"type": room.recording_type} diff --git a/server/reflector/video_platforms/daily.py b/server/reflector/video_platforms/daily.py new file mode 100644 index 00000000..ec45d965 --- /dev/null +++ b/server/reflector/video_platforms/daily.py @@ -0,0 +1,198 @@ +import base64 +import hmac +from datetime import datetime +from hashlib import sha256 +from http import HTTPStatus +from typing import Any, Dict, List, Optional + +import httpx + +from reflector.db.rooms import Room +from reflector.logger import logger +from reflector.storage import get_dailyco_storage + +from ..schemas.platform import Platform +from ..utils.daily import DailyRoomName +from ..utils.string import NonEmptyString +from .base import ROOM_PREFIX_SEPARATOR, VideoPlatformClient +from .models import MeetingData, RecordingType, VideoPlatformConfig + + +class DailyClient(VideoPlatformClient): + PLATFORM_NAME: Platform = "daily" + TIMEOUT = 10 + BASE_URL = "https://api.daily.co/v1" + TIMESTAMP_FORMAT = "%Y%m%d%H%M%S" + RECORDING_NONE: RecordingType = "none" + RECORDING_CLOUD: RecordingType = "cloud" + + def __init__(self, config: VideoPlatformConfig): + super().__init__(config) + self.headers = { + "Authorization": f"Bearer {config.api_key}", + "Content-Type": "application/json", + } + + async def create_meeting( + self, room_name_prefix: NonEmptyString, end_date: datetime, room: Room + ) -> MeetingData: + """ + Daily.co rooms vs meetings: + - We create a NEW Daily.co room for each Reflector meeting + - Daily.co meeting/session starts automatically when first participant joins + - Room auto-deletes after exp time + - Meeting.room_name stores the timestamped Daily.co room name + """ + timestamp = datetime.now().strftime(self.TIMESTAMP_FORMAT) + room_name = f"{room_name_prefix}{ROOM_PREFIX_SEPARATOR}{timestamp}" + + data = { + "name": room_name, + "privacy": "private" if room.is_locked else "public", + "properties": { + "enable_recording": "raw-tracks" + if room.recording_type != self.RECORDING_NONE + else False, + "enable_chat": True, + "enable_screenshare": True, + "start_video_off": False, + "start_audio_off": False, + "exp": int(end_date.timestamp()), + }, + } + + # Get storage config for passing to Daily API + daily_storage = get_dailyco_storage() + assert daily_storage.bucket_name, "S3 bucket must be configured" + data["properties"]["recordings_bucket"] = { + "bucket_name": daily_storage.bucket_name, + "bucket_region": daily_storage.region, + "assume_role_arn": daily_storage.role_credential, + "allow_api_access": True, + } + + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.BASE_URL}/rooms", + headers=self.headers, + json=data, + timeout=self.TIMEOUT, + ) + if response.status_code >= 400: + logger.error( + "Daily.co API error", + status_code=response.status_code, + response_body=response.text, + request_data=data, + ) + response.raise_for_status() + result = response.json() + + room_url = result["url"] + + return MeetingData( + meeting_id=result["id"], + room_name=result["name"], + room_url=room_url, + host_room_url=room_url, + platform=self.PLATFORM_NAME, + extra_data=result, + ) + + async def get_room_sessions(self, room_name: str) -> List[Any] | None: + # no such api + return None + + async def get_room_presence(self, room_name: str) -> Dict[str, Any]: + async with httpx.AsyncClient() as client: + response = await client.get( + f"{self.BASE_URL}/rooms/{room_name}/presence", + headers=self.headers, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + return response.json() + + async def get_meeting_participants(self, meeting_id: str) -> Dict[str, Any]: + async with httpx.AsyncClient() as client: + response = await client.get( + f"{self.BASE_URL}/meetings/{meeting_id}/participants", + headers=self.headers, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + return response.json() + + async def get_recording(self, recording_id: str) -> Dict[str, Any]: + async with httpx.AsyncClient() as client: + response = await client.get( + f"{self.BASE_URL}/recordings/{recording_id}", + headers=self.headers, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + return response.json() + + async def delete_room(self, room_name: str) -> bool: + async with httpx.AsyncClient() as client: + response = await client.delete( + f"{self.BASE_URL}/rooms/{room_name}", + headers=self.headers, + timeout=self.TIMEOUT, + ) + return response.status_code in (HTTPStatus.OK, HTTPStatus.NOT_FOUND) + + async def upload_logo(self, room_name: str, logo_path: str) -> bool: + return True + + def verify_webhook_signature( + self, body: bytes, signature: str, timestamp: Optional[str] = None + ) -> bool: + """Verify Daily.co webhook signature. + + Daily.co uses: + - X-Webhook-Signature header + - X-Webhook-Timestamp header + - Signature format: HMAC-SHA256(base64_decode(secret), timestamp + '.' + body) + - Result is base64 encoded + """ + if not signature or not timestamp: + return False + + try: + secret_bytes = base64.b64decode(self.config.webhook_secret) + + signed_content = timestamp.encode() + b"." + body + + expected = hmac.new(secret_bytes, signed_content, sha256).digest() + expected_b64 = base64.b64encode(expected).decode() + + return hmac.compare_digest(expected_b64, signature) + except Exception as e: + logger.error("Daily.co webhook signature verification failed", exc_info=e) + return False + + async def create_meeting_token( + self, + room_name: DailyRoomName, + enable_recording: bool, + user_id: Optional[str] = None, + ) -> str: + data = {"properties": {"room_name": room_name}} + + if enable_recording: + data["properties"]["start_cloud_recording"] = True + data["properties"]["enable_recording_ui"] = False + + if user_id: + data["properties"]["user_id"] = user_id + + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.BASE_URL}/meeting-tokens", + headers=self.headers, + json=data, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + return response.json()["token"] diff --git a/server/reflector/video_platforms/factory.py b/server/reflector/video_platforms/factory.py new file mode 100644 index 00000000..172d45e7 --- /dev/null +++ b/server/reflector/video_platforms/factory.py @@ -0,0 +1,62 @@ +from typing import Optional + +from reflector.settings import settings +from reflector.storage import get_dailyco_storage, get_whereby_storage + +from ..schemas.platform import WHEREBY_PLATFORM, Platform +from .base import VideoPlatformClient, VideoPlatformConfig +from .registry import get_platform_client + + +def get_platform_config(platform: Platform) -> VideoPlatformConfig: + if platform == WHEREBY_PLATFORM: + if not settings.WHEREBY_API_KEY: + raise ValueError( + "WHEREBY_API_KEY is required when platform='whereby'. " + "Set WHEREBY_API_KEY environment variable." + ) + whereby_storage = get_whereby_storage() + key_id, secret = whereby_storage.key_credentials + return VideoPlatformConfig( + api_key=settings.WHEREBY_API_KEY, + webhook_secret=settings.WHEREBY_WEBHOOK_SECRET or "", + api_url=settings.WHEREBY_API_URL, + s3_bucket=whereby_storage.bucket_name, + s3_region=whereby_storage.region, + aws_access_key_id=key_id, + aws_access_key_secret=secret, + ) + elif platform == "daily": + if not settings.DAILY_API_KEY: + raise ValueError( + "DAILY_API_KEY is required when platform='daily'. " + "Set DAILY_API_KEY environment variable." + ) + if not settings.DAILY_SUBDOMAIN: + raise ValueError( + "DAILY_SUBDOMAIN is required when platform='daily'. " + "Set DAILY_SUBDOMAIN environment variable." + ) + daily_storage = get_dailyco_storage() + return VideoPlatformConfig( + api_key=settings.DAILY_API_KEY, + webhook_secret=settings.DAILY_WEBHOOK_SECRET or "", + subdomain=settings.DAILY_SUBDOMAIN, + s3_bucket=daily_storage.bucket_name, + s3_region=daily_storage.region, + aws_role_arn=daily_storage.role_credential, + ) + else: + raise ValueError(f"Unknown platform: {platform}") + + +def create_platform_client(platform: Platform) -> VideoPlatformClient: + config = get_platform_config(platform) + return get_platform_client(platform, config) + + +def get_platform(room_platform: Optional[Platform] = None) -> Platform: + if room_platform: + return room_platform + + return settings.DEFAULT_VIDEO_PLATFORM diff --git a/server/reflector/video_platforms/models.py b/server/reflector/video_platforms/models.py new file mode 100644 index 00000000..82876888 --- /dev/null +++ b/server/reflector/video_platforms/models.py @@ -0,0 +1,40 @@ +from typing import Any, Dict, Literal, Optional + +from pydantic import BaseModel, Field + +from reflector.schemas.platform import WHEREBY_PLATFORM, Platform + +RecordingType = Literal["none", "local", "cloud"] + + +class MeetingData(BaseModel): + platform: Platform + meeting_id: str = Field(description="Platform-specific meeting identifier") + room_url: str = Field(description="URL for participants to join") + host_room_url: str = Field(description="URL for hosts (may be same as room_url)") + room_name: str = Field(description="Human-readable room name") + extra_data: Dict[str, Any] = Field(default_factory=dict) + + class Config: + json_schema_extra = { + "example": { + "platform": WHEREBY_PLATFORM, + "meeting_id": "12345678", + "room_url": "https://subdomain.whereby.com/room-20251008120000", + "host_room_url": "https://subdomain.whereby.com/room-20251008120000?roomKey=abc123", + "room_name": "room-20251008120000", + } + } + + +class VideoPlatformConfig(BaseModel): + api_key: str + webhook_secret: str + api_url: Optional[str] = None + subdomain: Optional[str] = None # Whereby/Daily subdomain + s3_bucket: Optional[str] = None + s3_region: Optional[str] = None + # Whereby uses access keys, Daily uses IAM role + aws_access_key_id: Optional[str] = None + aws_access_key_secret: Optional[str] = None + aws_role_arn: Optional[str] = None diff --git a/server/reflector/video_platforms/registry.py b/server/reflector/video_platforms/registry.py new file mode 100644 index 00000000..b4c10697 --- /dev/null +++ b/server/reflector/video_platforms/registry.py @@ -0,0 +1,35 @@ +from typing import Dict, Type + +from ..schemas.platform import DAILY_PLATFORM, WHEREBY_PLATFORM, Platform +from .base import VideoPlatformClient, VideoPlatformConfig + +_PLATFORMS: Dict[Platform, Type[VideoPlatformClient]] = {} + + +def register_platform(name: Platform, client_class: Type[VideoPlatformClient]): + _PLATFORMS[name] = client_class + + +def get_platform_client( + platform: Platform, config: VideoPlatformConfig +) -> VideoPlatformClient: + if platform not in _PLATFORMS: + raise ValueError(f"Unknown video platform: {platform}") + + client_class = _PLATFORMS[platform] + return client_class(config) + + +def get_available_platforms() -> list[Platform]: + return list(_PLATFORMS.keys()) + + +def _register_builtin_platforms(): + from .daily import DailyClient # noqa: PLC0415 + from .whereby import WherebyClient # noqa: PLC0415 + + register_platform(WHEREBY_PLATFORM, WherebyClient) + register_platform(DAILY_PLATFORM, DailyClient) + + +_register_builtin_platforms() diff --git a/server/reflector/video_platforms/whereby.py b/server/reflector/video_platforms/whereby.py new file mode 100644 index 00000000..f856454a --- /dev/null +++ b/server/reflector/video_platforms/whereby.py @@ -0,0 +1,141 @@ +import hmac +import json +import re +import time +from datetime import datetime +from hashlib import sha256 +from typing import Any, Dict, Optional + +import httpx + +from reflector.db.rooms import Room +from reflector.storage import get_whereby_storage + +from ..schemas.platform import WHEREBY_PLATFORM, Platform +from ..utils.string import NonEmptyString +from .base import ( + MeetingData, + VideoPlatformClient, + VideoPlatformConfig, +) +from .whereby_utils import whereby_room_name_prefix + + +class WherebyClient(VideoPlatformClient): + PLATFORM_NAME: Platform = WHEREBY_PLATFORM + TIMEOUT = 10 # seconds + MAX_ELAPSED_TIME = 60 * 1000 # 1 minute in milliseconds + + def __init__(self, config: VideoPlatformConfig): + super().__init__(config) + self.headers = { + "Content-Type": "application/json; charset=utf-8", + "Authorization": f"Bearer {config.api_key}", + } + + async def create_meeting( + self, room_name_prefix: NonEmptyString, end_date: datetime, room: Room + ) -> MeetingData: + data = { + "isLocked": room.is_locked, + "roomNamePrefix": whereby_room_name_prefix(room_name_prefix), + "roomNamePattern": "uuid", + "roomMode": room.room_mode, + "endDate": end_date.isoformat(), + "fields": ["hostRoomUrl"], + } + + if room.recording_type == "cloud": + # Get storage config for passing credentials to Whereby API + whereby_storage = get_whereby_storage() + key_id, secret = whereby_storage.key_credentials + data["recording"] = { + "type": room.recording_type, + "destination": { + "provider": "s3", + "bucket": whereby_storage.bucket_name, + "accessKeyId": key_id, + "accessKeySecret": secret, + "fileFormat": "mp4", + }, + "startTrigger": room.recording_trigger, + } + + async with httpx.AsyncClient() as client: + response = await client.post( + f"{self.config.api_url}/meetings", + headers=self.headers, + json=data, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + result = response.json() + + return MeetingData( + meeting_id=result["meetingId"], + room_name=result["roomName"], + room_url=result["roomUrl"], + host_room_url=result["hostRoomUrl"], + platform=self.PLATFORM_NAME, + extra_data=result, + ) + + async def get_room_sessions(self, room_name: str) -> Dict[str, Any]: + async with httpx.AsyncClient() as client: + response = await client.get( + f"{self.config.api_url}/insights/room-sessions?roomName={room_name}", + headers=self.headers, + timeout=self.TIMEOUT, + ) + response.raise_for_status() + return response.json().get("results", []) + + async def delete_room(self, room_name: str) -> bool: + return True + + async def upload_logo(self, room_name: str, logo_path: str) -> bool: + async with httpx.AsyncClient() as client: + with open(logo_path, "rb") as f: + response = await client.put( + f"{self.config.api_url}/rooms/{room_name}/theme/logo", + headers={ + "Authorization": f"Bearer {self.config.api_key}", + }, + timeout=self.TIMEOUT, + files={"image": f}, + ) + response.raise_for_status() + return True + + def verify_webhook_signature( + self, body: bytes, signature: str, timestamp: Optional[str] = None + ) -> bool: + if not signature: + return False + + matches = re.match(r"t=(.*),v1=(.*)", signature) + if not matches: + return False + + ts, sig = matches.groups() + + current_time = int(time.time() * 1000) + diff_time = current_time - int(ts) * 1000 + if diff_time >= self.MAX_ELAPSED_TIME: + return False + + body_dict = json.loads(body) + signed_payload = f"{ts}.{json.dumps(body_dict, separators=(',', ':'))}" + hmac_obj = hmac.new( + self.config.webhook_secret.encode("utf-8"), + signed_payload.encode("utf-8"), + sha256, + ) + expected_signature = hmac_obj.hexdigest() + + try: + return hmac.compare_digest( + expected_signature.encode("utf-8"), sig.encode("utf-8") + ) + except Exception: + return False diff --git a/server/reflector/video_platforms/whereby_utils.py b/server/reflector/video_platforms/whereby_utils.py new file mode 100644 index 00000000..2724a7b5 --- /dev/null +++ b/server/reflector/video_platforms/whereby_utils.py @@ -0,0 +1,38 @@ +import re +from datetime import datetime + +from reflector.utils.datetime import parse_datetime_with_timezone +from reflector.utils.string import NonEmptyString, parse_non_empty_string +from reflector.video_platforms.base import ROOM_PREFIX_SEPARATOR + + +def parse_whereby_recording_filename( + object_key: NonEmptyString, +) -> (NonEmptyString, datetime): + filename = parse_non_empty_string(object_key.rsplit(".", 1)[0]) + timestamp_pattern = r"(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)" + match = re.search(timestamp_pattern, filename) + if not match: + raise ValueError(f"No ISO timestamp found in filename: {object_key}") + timestamp_str = match.group(1) + timestamp_start = match.start(1) + room_name_part = filename[:timestamp_start] + if room_name_part.endswith(ROOM_PREFIX_SEPARATOR): + room_name_part = room_name_part[: -len(ROOM_PREFIX_SEPARATOR)] + else: + raise ValueError( + f"room name {room_name_part} doesnt have {ROOM_PREFIX_SEPARATOR} at the end of filename: {object_key}" + ) + + return parse_non_empty_string(room_name_part), parse_datetime_with_timezone( + timestamp_str + ) + + +def whereby_room_name_prefix(room_name_prefix: NonEmptyString) -> NonEmptyString: + return room_name_prefix + ROOM_PREFIX_SEPARATOR + + +# room name comes with "/" from whereby api but lacks "/" e.g. in recording filenames +def room_name_to_whereby_api_room_name(room_name: NonEmptyString) -> NonEmptyString: + return f"/{room_name}" diff --git a/server/reflector/views/daily.py b/server/reflector/views/daily.py new file mode 100644 index 00000000..6f51cd1e --- /dev/null +++ b/server/reflector/views/daily.py @@ -0,0 +1,233 @@ +import json +from typing import Any, Dict, Literal + +from fastapi import APIRouter, HTTPException, Request +from pydantic import BaseModel + +from reflector.db.meetings import meetings_controller +from reflector.logger import logger as _logger +from reflector.settings import settings +from reflector.utils.daily import DailyRoomName +from reflector.video_platforms.factory import create_platform_client +from reflector.worker.process import process_multitrack_recording + +router = APIRouter() + +logger = _logger.bind(platform="daily") + + +class DailyTrack(BaseModel): + type: Literal["audio", "video"] + s3Key: str + size: int + + +class DailyWebhookEvent(BaseModel): + version: str + type: str + id: str + payload: Dict[str, Any] + event_ts: float + + +def _extract_room_name(event: DailyWebhookEvent) -> DailyRoomName | None: + """Extract room name from Daily event payload. + + Daily.co API inconsistency: + - participant.* events use "room" field + - recording.* events use "room_name" field + """ + return event.payload.get("room_name") or event.payload.get("room") + + +@router.post("/webhook") +async def webhook(request: Request): + """Handle Daily webhook events. + + Daily.co circuit-breaker: After 3+ failed responses (4xx/5xx), webhook + state→FAILED, stops sending events. Reset: scripts/recreate_daily_webhook.py + """ + body = await request.body() + signature = request.headers.get("X-Webhook-Signature", "") + timestamp = request.headers.get("X-Webhook-Timestamp", "") + + client = create_platform_client("daily") + + # TEMPORARY: Bypass signature check for testing + # TODO: Remove this after testing is complete + BYPASS_FOR_TESTING = True + if not BYPASS_FOR_TESTING: + if not client.verify_webhook_signature(body, signature, timestamp): + logger.warning( + "Invalid webhook signature", + signature=signature, + timestamp=timestamp, + has_body=bool(body), + ) + raise HTTPException(status_code=401, detail="Invalid webhook signature") + + try: + body_json = json.loads(body) + except json.JSONDecodeError: + raise HTTPException(status_code=422, detail="Invalid JSON") + + if body_json.get("test") == "test": + logger.info("Received Daily webhook test event") + return {"status": "ok"} + + # Parse as actual event + try: + event = DailyWebhookEvent(**body_json) + except Exception as e: + logger.error("Failed to parse webhook event", error=str(e), body=body.decode()) + raise HTTPException(status_code=422, detail="Invalid event format") + + # Handle participant events + if event.type == "participant.joined": + await _handle_participant_joined(event) + elif event.type == "participant.left": + await _handle_participant_left(event) + elif event.type == "recording.started": + await _handle_recording_started(event) + elif event.type == "recording.ready-to-download": + await _handle_recording_ready(event) + elif event.type == "recording.error": + await _handle_recording_error(event) + else: + logger.warning( + "Unhandled Daily webhook event type", + event_type=event.type, + payload=event.payload, + ) + + return {"status": "ok"} + + +async def _handle_participant_joined(event: DailyWebhookEvent): + daily_room_name = _extract_room_name(event) + if not daily_room_name: + logger.warning("participant.joined: no room in payload", payload=event.payload) + return + + meeting = await meetings_controller.get_by_room_name(daily_room_name) + if meeting: + await meetings_controller.increment_num_clients(meeting.id) + logger.info( + "Participant joined", + meeting_id=meeting.id, + room_name=daily_room_name, + recording_type=meeting.recording_type, + recording_trigger=meeting.recording_trigger, + ) + else: + logger.warning( + "participant.joined: meeting not found", room_name=daily_room_name + ) + + +async def _handle_participant_left(event: DailyWebhookEvent): + room_name = _extract_room_name(event) + if not room_name: + return + + meeting = await meetings_controller.get_by_room_name(room_name) + if meeting: + await meetings_controller.decrement_num_clients(meeting.id) + + +async def _handle_recording_started(event: DailyWebhookEvent): + room_name = _extract_room_name(event) + if not room_name: + logger.warning( + "recording.started: no room_name in payload", payload=event.payload + ) + return + + meeting = await meetings_controller.get_by_room_name(room_name) + if meeting: + logger.info( + "Recording started", + meeting_id=meeting.id, + room_name=room_name, + recording_id=event.payload.get("recording_id"), + platform="daily", + ) + else: + logger.warning("recording.started: meeting not found", room_name=room_name) + + +async def _handle_recording_ready(event: DailyWebhookEvent): + """Handle recording ready for download event. + + Daily.co webhook payload for raw-tracks recordings: + { + "recording_id": "...", + "room_name": "test2-20251009192341", + "tracks": [ + {"type": "audio", "s3Key": "monadical/test2-.../uuid-cam-audio-123.webm", "size": 400000}, + {"type": "video", "s3Key": "monadical/test2-.../uuid-cam-video-456.webm", "size": 30000000} + ] + } + """ + room_name = _extract_room_name(event) + recording_id = event.payload.get("recording_id") + tracks_raw = event.payload.get("tracks", []) + + if not room_name or not tracks_raw: + logger.warning( + "recording.ready-to-download: missing room_name or tracks", + room_name=room_name, + has_tracks=bool(tracks_raw), + payload=event.payload, + ) + return + + try: + tracks = [DailyTrack(**t) for t in tracks_raw] + except Exception as e: + logger.error( + "recording.ready-to-download: invalid tracks structure", + error=str(e), + tracks=tracks_raw, + ) + return + + logger.info( + "Recording ready for download", + room_name=room_name, + recording_id=recording_id, + num_tracks=len(tracks), + platform="daily", + ) + + bucket_name = settings.DAILYCO_STORAGE_AWS_BUCKET_NAME + if not bucket_name: + logger.error( + "DAILYCO_STORAGE_AWS_BUCKET_NAME not configured; cannot process Daily recording" + ) + return + + track_keys = [t.s3Key for t in tracks if t.type == "audio"] + + process_multitrack_recording.delay( + bucket_name=bucket_name, + daily_room_name=room_name, + recording_id=recording_id, + track_keys=track_keys, + ) + + +async def _handle_recording_error(event: DailyWebhookEvent): + room_name = _extract_room_name(event) + error = event.payload.get("error", "Unknown error") + + if room_name: + meeting = await meetings_controller.get_by_room_name(room_name) + if meeting: + logger.error( + "Recording error", + meeting_id=meeting.id, + room_name=room_name, + error=error, + platform="daily", + ) diff --git a/server/reflector/views/rooms.py b/server/reflector/views/rooms.py index 70e3f9e4..e786b0d9 100644 --- a/server/reflector/views/rooms.py +++ b/server/reflector/views/rooms.py @@ -15,9 +15,14 @@ from reflector.db.calendar_events import calendar_events_controller from reflector.db.meetings import meetings_controller from reflector.db.rooms import rooms_controller from reflector.redis_cache import RedisAsyncLock +from reflector.schemas.platform import Platform from reflector.services.ics_sync import ics_sync_service from reflector.settings import settings -from reflector.whereby import create_meeting, upload_logo +from reflector.utils.url import add_query_param +from reflector.video_platforms.factory import ( + create_platform_client, + get_platform, +) from reflector.worker.webhook import test_webhook logger = logging.getLogger(__name__) @@ -41,6 +46,7 @@ class Room(BaseModel): ics_enabled: bool = False ics_last_sync: Optional[datetime] = None ics_last_etag: Optional[str] = None + platform: Platform class RoomDetails(Room): @@ -68,6 +74,7 @@ class Meeting(BaseModel): is_active: bool = True calendar_event_id: str | None = None calendar_metadata: dict[str, Any] | None = None + platform: Platform class CreateRoom(BaseModel): @@ -85,6 +92,7 @@ class CreateRoom(BaseModel): ics_url: Optional[str] = None ics_fetch_interval: int = 300 ics_enabled: bool = False + platform: Optional[Platform] = None class UpdateRoom(BaseModel): @@ -102,6 +110,7 @@ class UpdateRoom(BaseModel): ics_url: Optional[str] = None ics_fetch_interval: Optional[int] = None ics_enabled: Optional[bool] = None + platform: Optional[Platform] = None class CreateRoomMeeting(BaseModel): @@ -165,14 +174,6 @@ class CalendarEventResponse(BaseModel): router = APIRouter() -def parse_datetime_with_timezone(iso_string: str) -> datetime: - """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive).""" - dt = datetime.fromisoformat(iso_string) - if dt.tzinfo is None: - dt = dt.replace(tzinfo=timezone.utc) - return dt - - @router.get("/rooms", response_model=Page[RoomDetails]) async def rooms_list( user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)], @@ -182,13 +183,18 @@ async def rooms_list( user_id = user["sub"] if user else None - return await apaginate( + paginated = await apaginate( get_database(), await rooms_controller.get_all( user_id=user_id, order_by="-created_at", return_query=True ), ) + for room in paginated.items: + room.platform = get_platform(room.platform) + + return paginated + @router.get("/rooms/{room_id}", response_model=RoomDetails) async def rooms_get( @@ -201,6 +207,7 @@ async def rooms_get( raise HTTPException(status_code=404, detail="Room not found") if not room.is_shared and (user_id is None or room.user_id != user_id): raise HTTPException(status_code=403, detail="Room access denied") + room.platform = get_platform(room.platform) return room @@ -214,17 +221,16 @@ async def rooms_get_by_name( if not room: raise HTTPException(status_code=404, detail="Room not found") - # Convert to RoomDetails format (add webhook fields if user is owner) room_dict = room.__dict__.copy() if user_id == room.user_id: - # User is owner, include webhook details if available room_dict["webhook_url"] = getattr(room, "webhook_url", None) room_dict["webhook_secret"] = getattr(room, "webhook_secret", None) else: - # Non-owner, hide webhook details room_dict["webhook_url"] = None room_dict["webhook_secret"] = None + room_dict["platform"] = get_platform(room.platform) + return RoomDetails(**room_dict) @@ -251,6 +257,7 @@ async def rooms_create( ics_url=room.ics_url, ics_fetch_interval=room.ics_fetch_interval, ics_enabled=room.ics_enabled, + platform=room.platform, ) @@ -268,6 +275,7 @@ async def rooms_update( raise HTTPException(status_code=403, detail="Not authorized") values = info.dict(exclude_unset=True) await rooms_controller.update(room, values) + room.platform = get_platform(room.platform) return room @@ -315,19 +323,22 @@ async def rooms_create_meeting( if meeting is None: end_date = current_time + timedelta(hours=8) - whereby_meeting = await create_meeting("", end_date=end_date, room=room) + platform = get_platform(room.platform) + client = create_platform_client(platform) - await upload_logo(whereby_meeting["roomName"], "./images/logo.png") + meeting_data = await client.create_meeting( + room.name, end_date=end_date, room=room + ) + + await client.upload_logo(meeting_data.room_name, "./images/logo.png") meeting = await meetings_controller.create( - id=whereby_meeting["meetingId"], - room_name=whereby_meeting["roomName"], - room_url=whereby_meeting["roomUrl"], - host_room_url=whereby_meeting["hostRoomUrl"], - start_date=parse_datetime_with_timezone( - whereby_meeting["startDate"] - ), - end_date=parse_datetime_with_timezone(whereby_meeting["endDate"]), + id=meeting_data.meeting_id, + room_name=meeting_data.room_name, + room_url=meeting_data.room_url, + host_room_url=meeting_data.host_room_url, + start_date=current_time, + end_date=end_date, room=room, ) except LockError: @@ -336,6 +347,18 @@ async def rooms_create_meeting( status_code=503, detail="Meeting creation in progress, please try again" ) + if meeting.platform == "daily" and room.recording_trigger != "none": + client = create_platform_client(meeting.platform) + token = await client.create_meeting_token( + meeting.room_name, + enable_recording=True, + user_id=user_id, + ) + meeting = meeting.model_copy() + meeting.room_url = add_query_param(meeting.room_url, "t", token) + if meeting.host_room_url: + meeting.host_room_url = add_query_param(meeting.host_room_url, "t", token) + if user_id != room.user_id: meeting.host_room_url = "" @@ -490,7 +513,10 @@ async def rooms_list_active_meetings( room=room, current_time=current_time ) - # Hide host URLs from non-owners + effective_platform = get_platform(room.platform) + for meeting in meetings: + meeting.platform = effective_platform + if user_id != room.user_id: for meeting in meetings: meeting.host_room_url = "" @@ -511,15 +537,10 @@ async def rooms_get_meeting( if not room: raise HTTPException(status_code=404, detail="Room not found") - meeting = await meetings_controller.get_by_id(meeting_id) + meeting = await meetings_controller.get_by_id(meeting_id, room=room) if not meeting: raise HTTPException(status_code=404, detail="Meeting not found") - if meeting.room_id != room.id: - raise HTTPException( - status_code=403, detail="Meeting does not belong to this room" - ) - if user_id != room.user_id and not room.is_shared: meeting.host_room_url = "" @@ -538,16 +559,11 @@ async def rooms_join_meeting( if not room: raise HTTPException(status_code=404, detail="Room not found") - meeting = await meetings_controller.get_by_id(meeting_id) + meeting = await meetings_controller.get_by_id(meeting_id, room=room) if not meeting: raise HTTPException(status_code=404, detail="Meeting not found") - if meeting.room_id != room.id: - raise HTTPException( - status_code=403, detail="Meeting does not belong to this room" - ) - if not meeting.is_active: raise HTTPException(status_code=400, detail="Meeting is not active") @@ -555,7 +571,6 @@ async def rooms_join_meeting( if meeting.end_date <= current_time: raise HTTPException(status_code=400, detail="Meeting has ended") - # Hide host URL from non-owners if user_id != room.user_id: meeting.host_room_url = "" diff --git a/server/reflector/views/transcripts_process.py b/server/reflector/views/transcripts_process.py index f9295765..46e070fd 100644 --- a/server/reflector/views/transcripts_process.py +++ b/server/reflector/views/transcripts_process.py @@ -5,8 +5,12 @@ from fastapi import APIRouter, Depends, HTTPException from pydantic import BaseModel import reflector.auth as auth +from reflector.db.recordings import recordings_controller from reflector.db.transcripts import transcripts_controller from reflector.pipelines.main_file_pipeline import task_pipeline_file_process +from reflector.pipelines.main_multitrack_pipeline import ( + task_pipeline_multitrack_process, +) router = APIRouter() @@ -33,14 +37,35 @@ async def transcript_process( status_code=400, detail="Recording is not ready for processing" ) + # avoid duplicate scheduling for either pipeline if task_is_scheduled_or_active( "reflector.pipelines.main_file_pipeline.task_pipeline_file_process", transcript_id=transcript_id, + ) or task_is_scheduled_or_active( + "reflector.pipelines.main_multitrack_pipeline.task_pipeline_multitrack_process", + transcript_id=transcript_id, ): return ProcessStatus(status="already running") - # schedule a background task process the file - task_pipeline_file_process.delay(transcript_id=transcript_id) + # Determine processing mode strictly from DB to avoid S3 scans + bucket_name = None + track_keys: list[str] = [] + + if transcript.recording_id: + recording = await recordings_controller.get_by_id(transcript.recording_id) + if recording: + bucket_name = recording.bucket_name + track_keys = list(getattr(recording, "track_keys", []) or []) + + if bucket_name: + task_pipeline_multitrack_process.delay( + transcript_id=transcript_id, + bucket_name=bucket_name, + track_keys=track_keys, + ) + else: + # Default single-file pipeline + task_pipeline_file_process.delay(transcript_id=transcript_id) return ProcessStatus(status="ok") diff --git a/server/reflector/whereby.py b/server/reflector/whereby.py deleted file mode 100644 index 8b5c18fd..00000000 --- a/server/reflector/whereby.py +++ /dev/null @@ -1,114 +0,0 @@ -import logging -from datetime import datetime - -import httpx - -from reflector.db.rooms import Room -from reflector.settings import settings -from reflector.utils.string import parse_non_empty_string - -logger = logging.getLogger(__name__) - - -def _get_headers(): - api_key = parse_non_empty_string( - settings.WHEREBY_API_KEY, "WHEREBY_API_KEY value is required." - ) - return { - "Content-Type": "application/json; charset=utf-8", - "Authorization": f"Bearer {api_key}", - } - - -TIMEOUT = 10 # seconds - - -def _get_whereby_s3_auth(): - errors = [] - try: - bucket_name = parse_non_empty_string( - settings.RECORDING_STORAGE_AWS_BUCKET_NAME, - "RECORDING_STORAGE_AWS_BUCKET_NAME value is required.", - ) - except Exception as e: - errors.append(e) - try: - key_id = parse_non_empty_string( - settings.AWS_WHEREBY_ACCESS_KEY_ID, - "AWS_WHEREBY_ACCESS_KEY_ID value is required.", - ) - except Exception as e: - errors.append(e) - try: - key_secret = parse_non_empty_string( - settings.AWS_WHEREBY_ACCESS_KEY_SECRET, - "AWS_WHEREBY_ACCESS_KEY_SECRET value is required.", - ) - except Exception as e: - errors.append(e) - if len(errors) > 0: - raise Exception( - f"Failed to get Whereby auth settings: {', '.join(str(e) for e in errors)}" - ) - return bucket_name, key_id, key_secret - - -async def create_meeting(room_name_prefix: str, end_date: datetime, room: Room): - s3_bucket_name, s3_key_id, s3_key_secret = _get_whereby_s3_auth() - data = { - "isLocked": room.is_locked, - "roomNamePrefix": room_name_prefix, - "roomNamePattern": "uuid", - "roomMode": room.room_mode, - "endDate": end_date.isoformat(), - "recording": { - "type": room.recording_type, - "destination": { - "provider": "s3", - "bucket": s3_bucket_name, - "accessKeyId": s3_key_id, - "accessKeySecret": s3_key_secret, - "fileFormat": "mp4", - }, - "startTrigger": room.recording_trigger, - }, - "fields": ["hostRoomUrl"], - } - async with httpx.AsyncClient() as client: - response = await client.post( - f"{settings.WHEREBY_API_URL}/meetings", - headers=_get_headers(), - json=data, - timeout=TIMEOUT, - ) - if response.status_code == 403: - logger.warning( - f"Failed to create meeting: access denied on Whereby: {response.text}" - ) - response.raise_for_status() - return response.json() - - -async def get_room_sessions(room_name: str): - async with httpx.AsyncClient() as client: - response = await client.get( - f"{settings.WHEREBY_API_URL}/insights/room-sessions?roomName={room_name}", - headers=_get_headers(), - timeout=TIMEOUT, - ) - response.raise_for_status() - return response.json() - - -async def upload_logo(room_name: str, logo_path: str): - async with httpx.AsyncClient() as client: - with open(logo_path, "rb") as f: - response = await client.put( - f"{settings.WHEREBY_API_URL}/rooms{room_name}/theme/logo", - headers={ - "Authorization": f"Bearer {settings.WHEREBY_API_KEY}", - }, - timeout=TIMEOUT, - files={"image": f}, - ) - response.raise_for_status() diff --git a/server/reflector/worker/cleanup.py b/server/reflector/worker/cleanup.py index 66d45e94..43559e64 100644 --- a/server/reflector/worker/cleanup.py +++ b/server/reflector/worker/cleanup.py @@ -19,7 +19,7 @@ from reflector.db.meetings import meetings from reflector.db.recordings import recordings from reflector.db.transcripts import transcripts, transcripts_controller from reflector.settings import settings -from reflector.storage import get_recordings_storage +from reflector.storage import get_transcripts_storage logger = structlog.get_logger(__name__) @@ -53,8 +53,8 @@ async def delete_single_transcript( ) if recording: try: - await get_recordings_storage().delete_file( - recording["object_key"] + await get_transcripts_storage().delete_file( + recording["object_key"], bucket=recording["bucket_name"] ) except Exception as storage_error: logger.warning( diff --git a/server/reflector/worker/ics_sync.py b/server/reflector/worker/ics_sync.py index faf62f4a..4d72d4ae 100644 --- a/server/reflector/worker/ics_sync.py +++ b/server/reflector/worker/ics_sync.py @@ -7,10 +7,10 @@ from celery.utils.log import get_task_logger from reflector.asynctask import asynctask from reflector.db.calendar_events import calendar_events_controller from reflector.db.meetings import meetings_controller -from reflector.db.rooms import rooms_controller +from reflector.db.rooms import Room, rooms_controller from reflector.redis_cache import RedisAsyncLock from reflector.services.ics_sync import SyncStatus, ics_sync_service -from reflector.whereby import create_meeting, upload_logo +from reflector.video_platforms.factory import create_platform_client, get_platform logger = structlog.wrap_logger(get_task_logger(__name__)) @@ -86,17 +86,17 @@ def _should_sync(room) -> bool: MEETING_DEFAULT_DURATION = timedelta(hours=1) -async def create_upcoming_meetings_for_event(event, create_window, room_id, room): +async def create_upcoming_meetings_for_event(event, create_window, room: Room): if event.start_time <= create_window: return - existing_meeting = await meetings_controller.get_by_calendar_event(event.id) + existing_meeting = await meetings_controller.get_by_calendar_event(event.id, room) if existing_meeting: return logger.info( "Pre-creating meeting for calendar event", - room_id=room_id, + room_id=room.id, event_id=event.id, event_title=event.title, ) @@ -104,20 +104,22 @@ async def create_upcoming_meetings_for_event(event, create_window, room_id, room try: end_date = event.end_time or (event.start_time + MEETING_DEFAULT_DURATION) - whereby_meeting = await create_meeting( + client = create_platform_client(get_platform(room.platform)) + + meeting_data = await client.create_meeting( "", end_date=end_date, room=room, ) - await upload_logo(whereby_meeting["roomName"], "./images/logo.png") + await client.upload_logo(meeting_data.room_name, "./images/logo.png") meeting = await meetings_controller.create( - id=whereby_meeting["meetingId"], - room_name=whereby_meeting["roomName"], - room_url=whereby_meeting["roomUrl"], - host_room_url=whereby_meeting["hostRoomUrl"], - start_date=datetime.fromisoformat(whereby_meeting["startDate"]), - end_date=datetime.fromisoformat(whereby_meeting["endDate"]), + id=meeting_data.meeting_id, + room_name=meeting_data.room_name, + room_url=meeting_data.room_url, + host_room_url=meeting_data.host_room_url, + start_date=event.start_time, + end_date=end_date, room=room, calendar_event_id=event.id, calendar_metadata={ @@ -136,7 +138,7 @@ async def create_upcoming_meetings_for_event(event, create_window, room_id, room except Exception as e: logger.error( "Failed to pre-create meeting", - room_id=room_id, + room_id=room.id, event_id=event.id, error=str(e), ) @@ -166,9 +168,7 @@ async def create_upcoming_meetings(): ) for event in events: - await create_upcoming_meetings_for_event( - event, create_window, room.id, room - ) + await create_upcoming_meetings_for_event(event, create_window, room) logger.info("Completed pre-creation check for upcoming meetings") except Exception as e: diff --git a/server/reflector/worker/process.py b/server/reflector/worker/process.py index e660e840..47cbb1cb 100644 --- a/server/reflector/worker/process.py +++ b/server/reflector/worker/process.py @@ -1,5 +1,6 @@ import json import os +import re from datetime import datetime, timezone from urllib.parse import unquote @@ -14,24 +15,32 @@ from redis.exceptions import LockError from reflector.db.meetings import meetings_controller from reflector.db.recordings import Recording, recordings_controller from reflector.db.rooms import rooms_controller -from reflector.db.transcripts import SourceKind, transcripts_controller +from reflector.db.transcripts import ( + SourceKind, + TranscriptParticipant, + transcripts_controller, +) from reflector.pipelines.main_file_pipeline import task_pipeline_file_process from reflector.pipelines.main_live_pipeline import asynctask +from reflector.pipelines.main_multitrack_pipeline import ( + task_pipeline_multitrack_process, +) +from reflector.pipelines.topic_processing import EmptyPipeline +from reflector.processors import AudioFileWriterProcessor +from reflector.processors.audio_waveform_processor import AudioWaveformProcessor from reflector.redis_cache import get_redis_client from reflector.settings import settings -from reflector.whereby import get_room_sessions +from reflector.storage import get_transcripts_storage +from reflector.utils.daily import DailyRoomName, extract_base_room_name +from reflector.video_platforms.factory import create_platform_client +from reflector.video_platforms.whereby_utils import ( + parse_whereby_recording_filename, + room_name_to_whereby_api_room_name, +) logger = structlog.wrap_logger(get_task_logger(__name__)) -def parse_datetime_with_timezone(iso_string: str) -> datetime: - """Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive).""" - dt = datetime.fromisoformat(iso_string) - if dt.tzinfo is None: - dt = dt.replace(tzinfo=timezone.utc) - return dt - - @shared_task def process_messages(): queue_url = settings.AWS_PROCESS_RECORDING_QUEUE_URL @@ -73,14 +82,16 @@ def process_messages(): logger.error("process_messages", error=str(e)) +# only whereby supported. @shared_task @asynctask async def process_recording(bucket_name: str, object_key: str): logger.info("Processing recording: %s/%s", bucket_name, object_key) - # extract a guid and a datetime from the object key - room_name = f"/{object_key[:36]}" - recorded_at = parse_datetime_with_timezone(object_key[37:57]) + room_name_part, recorded_at = parse_whereby_recording_filename(object_key) + + # we store whereby api room names, NOT whereby room names + room_name = room_name_to_whereby_api_room_name(room_name_part) meeting = await meetings_controller.get_by_room_name(room_name) room = await rooms_controller.get_by_id(meeting.room_id) @@ -102,6 +113,7 @@ async def process_recording(bucket_name: str, object_key: str): transcript, { "topics": [], + "participants": [], }, ) else: @@ -121,15 +133,15 @@ async def process_recording(bucket_name: str, object_key: str): upload_filename = transcript.data_path / f"upload{extension}" upload_filename.parent.mkdir(parents=True, exist_ok=True) - s3 = boto3.client( - "s3", - region_name=settings.TRANSCRIPT_STORAGE_AWS_REGION, - aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID, - aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY, - ) + storage = get_transcripts_storage() - with open(upload_filename, "wb") as f: - s3.download_fileobj(bucket_name, object_key, f) + try: + with open(upload_filename, "wb") as f: + await storage.stream_to_fileobj(object_key, f, bucket=bucket_name) + except Exception: + # Clean up partial file on stream failure + upload_filename.unlink(missing_ok=True) + raise container = av.open(upload_filename.as_posix()) try: @@ -146,6 +158,165 @@ async def process_recording(bucket_name: str, object_key: str): task_pipeline_file_process.delay(transcript_id=transcript.id) +@shared_task +@asynctask +async def process_multitrack_recording( + bucket_name: str, + daily_room_name: DailyRoomName, + recording_id: str, + track_keys: list[str], +): + logger.info( + "Processing multitrack recording", + bucket=bucket_name, + room_name=daily_room_name, + recording_id=recording_id, + provided_keys=len(track_keys), + ) + + if not track_keys: + logger.warning("No audio track keys provided") + return + + tz = timezone.utc + recorded_at = datetime.now(tz) + try: + if track_keys: + folder = os.path.basename(os.path.dirname(track_keys[0])) + ts_match = re.search(r"(\d{14})$", folder) + if ts_match: + ts = ts_match.group(1) + recorded_at = datetime.strptime(ts, "%Y%m%d%H%M%S").replace(tzinfo=tz) + except Exception as e: + logger.warning( + f"Could not parse recorded_at from keys, using now() {recorded_at}", + e, + exc_info=True, + ) + + meeting = await meetings_controller.get_by_room_name(daily_room_name) + + room_name_base = extract_base_room_name(daily_room_name) + + room = await rooms_controller.get_by_name(room_name_base) + if not room: + raise Exception(f"Room not found: {room_name_base}") + + if not meeting: + raise Exception(f"Meeting not found: {room_name_base}") + + logger.info( + "Found existing Meeting for recording", + meeting_id=meeting.id, + room_name=daily_room_name, + recording_id=recording_id, + ) + + recording = await recordings_controller.get_by_id(recording_id) + if not recording: + object_key_dir = os.path.dirname(track_keys[0]) if track_keys else "" + recording = await recordings_controller.create( + Recording( + id=recording_id, + bucket_name=bucket_name, + object_key=object_key_dir, + recorded_at=recorded_at, + meeting_id=meeting.id, + track_keys=track_keys, + ) + ) + else: + # Recording already exists; assume metadata was set at creation time + pass + + transcript = await transcripts_controller.get_by_recording_id(recording.id) + if transcript: + await transcripts_controller.update( + transcript, + { + "topics": [], + "participants": [], + }, + ) + else: + transcript = await transcripts_controller.add( + "", + source_kind=SourceKind.ROOM, + source_language="en", + target_language="en", + user_id=room.user_id, + recording_id=recording.id, + share_mode="public", + meeting_id=meeting.id, + room_id=room.id, + ) + + try: + daily_client = create_platform_client("daily") + + id_to_name = {} + id_to_user_id = {} + + mtg_session_id = None + try: + rec_details = await daily_client.get_recording(recording_id) + mtg_session_id = rec_details.get("mtgSessionId") + except Exception as e: + logger.warning( + "Failed to fetch Daily recording details", + error=str(e), + recording_id=recording_id, + exc_info=True, + ) + + if mtg_session_id: + try: + payload = await daily_client.get_meeting_participants(mtg_session_id) + for p in payload.get("data", []): + pid = p.get("participant_id") + name = p.get("user_name") + user_id = p.get("user_id") + if pid and name: + id_to_name[pid] = name + if pid and user_id: + id_to_user_id[pid] = user_id + except Exception as e: + logger.warning( + "Failed to fetch Daily meeting participants", + error=str(e), + mtg_session_id=mtg_session_id, + exc_info=True, + ) + else: + logger.warning( + "No mtgSessionId found for recording; participant names may be generic", + recording_id=recording_id, + ) + + for idx, key in enumerate(track_keys): + base = os.path.basename(key) + m = re.search(r"\d{13,}-([0-9a-fA-F-]{36})-cam-audio-", base) + participant_id = m.group(1) if m else None + + default_name = f"Speaker {idx}" + name = id_to_name.get(participant_id, default_name) + user_id = id_to_user_id.get(participant_id) + + participant = TranscriptParticipant( + id=participant_id, speaker=idx, name=name, user_id=user_id + ) + await transcripts_controller.upsert_participant(transcript, participant) + + except Exception as e: + logger.warning("Failed to map participant names", error=str(e), exc_info=True) + + task_pipeline_multitrack_process.delay( + transcript_id=transcript.id, + bucket_name=bucket_name, + track_keys=track_keys, + ) + + @shared_task @asynctask async def process_meetings(): @@ -164,7 +335,7 @@ async def process_meetings(): Uses distributed locking to prevent race conditions when multiple workers process the same meeting simultaneously. """ - logger.info("Processing meetings") + logger.debug("Processing meetings") meetings = await meetings_controller.get_all_active() current_time = datetime.now(timezone.utc) redis_client = get_redis_client() @@ -189,7 +360,8 @@ async def process_meetings(): end_date = end_date.replace(tzinfo=timezone.utc) # This API call could be slow, extend lock if needed - response = await get_room_sessions(meeting.room_name) + client = create_platform_client(meeting.platform) + room_sessions = await client.get_room_sessions(meeting.room_name) try: # Extend lock after slow operation to ensure we still hold it @@ -198,7 +370,6 @@ async def process_meetings(): logger_.warning("Lost lock for meeting, skipping") continue - room_sessions = response.get("results", []) has_active_sessions = room_sessions and any( rs["endedAt"] is None for rs in room_sessions ) @@ -231,69 +402,120 @@ async def process_meetings(): except LockError: pass # Lock already released or expired - logger.info( + logger.debug( "Processed meetings finished", processed_count=processed_count, skipped_count=skipped_count, ) +async def convert_audio_and_waveform(transcript) -> None: + """Convert WebM to MP3 and generate waveform for Daily.co recordings. + + This bypasses the full file pipeline which would overwrite stub data. + """ + try: + logger.info( + "Converting audio to MP3 and generating waveform", + transcript_id=transcript.id, + ) + + upload_path = transcript.data_path / "upload.webm" + mp3_path = transcript.audio_mp3_filename + + # Convert WebM to MP3 + mp3_writer = AudioFileWriterProcessor(path=mp3_path) + + container = av.open(str(upload_path)) + for frame in container.decode(audio=0): + await mp3_writer.push(frame) + await mp3_writer.flush() + container.close() + + logger.info( + "Converted WebM to MP3", + transcript_id=transcript.id, + mp3_size=mp3_path.stat().st_size, + ) + + waveform_processor = AudioWaveformProcessor( + audio_path=mp3_path, + waveform_path=transcript.audio_waveform_filename, + ) + waveform_processor.set_pipeline(EmptyPipeline(logger)) + await waveform_processor.flush() + + logger.info( + "Generated waveform", + transcript_id=transcript.id, + waveform_path=transcript.audio_waveform_filename, + ) + + # Update transcript status to ended (successful) + await transcripts_controller.update(transcript, {"status": "ended"}) + + except Exception as e: + logger.error( + "Failed to convert audio or generate waveform", + transcript_id=transcript.id, + error=str(e), + ) + # Keep status as uploaded even if conversion fails + pass + + @shared_task @asynctask async def reprocess_failed_recordings(): """ - Find recordings in the S3 bucket and check if they have proper transcriptions. + Find recordings in Whereby S3 bucket and check if they have proper transcriptions. If not, requeue them for processing. - """ - logger.info("Checking for recordings that need processing or reprocessing") - s3 = boto3.client( - "s3", - region_name=settings.TRANSCRIPT_STORAGE_AWS_REGION, - aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID, - aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY, - ) + Note: Daily.co recordings are processed via webhooks, not this cron job. + """ + logger.info("Checking Whereby recordings that need processing or reprocessing") + + if not settings.WHEREBY_STORAGE_AWS_BUCKET_NAME: + raise ValueError( + "WHEREBY_STORAGE_AWS_BUCKET_NAME required for Whereby recording reprocessing. " + "Set WHEREBY_STORAGE_AWS_BUCKET_NAME environment variable." + ) + + storage = get_transcripts_storage() + bucket_name = settings.WHEREBY_STORAGE_AWS_BUCKET_NAME reprocessed_count = 0 try: - paginator = s3.get_paginator("list_objects_v2") - bucket_name = settings.RECORDING_STORAGE_AWS_BUCKET_NAME - pages = paginator.paginate(Bucket=bucket_name) + object_keys = await storage.list_objects(prefix="", bucket=bucket_name) - for page in pages: - if "Contents" not in page: + for object_key in object_keys: + if not object_key.endswith(".mp4"): continue - for obj in page["Contents"]: - object_key = obj["Key"] + recording = await recordings_controller.get_by_object_key( + bucket_name, object_key + ) + if not recording: + logger.info(f"Queueing recording for processing: {object_key}") + process_recording.delay(bucket_name, object_key) + reprocessed_count += 1 + continue - if not (object_key.endswith(".mp4")): - continue - - recording = await recordings_controller.get_by_object_key( - bucket_name, object_key + transcript = None + try: + transcript = await transcripts_controller.get_by_recording_id( + recording.id + ) + except ValidationError: + await transcripts_controller.remove_by_recording_id(recording.id) + logger.warning( + f"Removed invalid transcript for recording: {recording.id}" ) - if not recording: - logger.info(f"Queueing recording for processing: {object_key}") - process_recording.delay(bucket_name, object_key) - reprocessed_count += 1 - continue - transcript = None - try: - transcript = await transcripts_controller.get_by_recording_id( - recording.id - ) - except ValidationError: - await transcripts_controller.remove_by_recording_id(recording.id) - logger.warning( - f"Removed invalid transcript for recording: {recording.id}" - ) - - if transcript is None or transcript.status == "error": - logger.info(f"Queueing recording for processing: {object_key}") - process_recording.delay(bucket_name, object_key) - reprocessed_count += 1 + if transcript is None or transcript.status == "error": + logger.info(f"Queueing recording for processing: {object_key}") + process_recording.delay(bucket_name, object_key) + reprocessed_count += 1 except Exception as e: logger.error(f"Error checking S3 bucket: {str(e)}") diff --git a/server/scripts/recreate_daily_webhook.py b/server/scripts/recreate_daily_webhook.py new file mode 100644 index 00000000..a378baf2 --- /dev/null +++ b/server/scripts/recreate_daily_webhook.py @@ -0,0 +1,123 @@ +#!/usr/bin/env python3 + +import asyncio +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent.parent)) + +import httpx + +from reflector.settings import settings + + +async def setup_webhook(webhook_url: str): + """ + Create or update Daily.co webhook for this environment. + Uses DAILY_WEBHOOK_UUID to identify existing webhook. + """ + if not settings.DAILY_API_KEY: + print("Error: DAILY_API_KEY not set") + return 1 + + headers = { + "Authorization": f"Bearer {settings.DAILY_API_KEY}", + "Content-Type": "application/json", + } + + webhook_data = { + "url": webhook_url, + "eventTypes": [ + "participant.joined", + "participant.left", + "recording.started", + "recording.ready-to-download", + "recording.error", + ], + "hmac": settings.DAILY_WEBHOOK_SECRET, + } + + async with httpx.AsyncClient() as client: + webhook_uuid = settings.DAILY_WEBHOOK_UUID + + if webhook_uuid: + # Update existing webhook + print(f"Updating existing webhook {webhook_uuid}...") + try: + resp = await client.patch( + f"https://api.daily.co/v1/webhooks/{webhook_uuid}", + headers=headers, + json=webhook_data, + ) + resp.raise_for_status() + result = resp.json() + print(f"✓ Updated webhook {result['uuid']} (state: {result['state']})") + print(f" URL: {result['url']}") + return 0 + except httpx.HTTPStatusError as e: + if e.response.status_code == 404: + print(f"Webhook {webhook_uuid} not found, creating new one...") + webhook_uuid = None # Fall through to creation + else: + print(f"Error updating webhook: {e}") + return 1 + + if not webhook_uuid: + # Create new webhook + print("Creating new webhook...") + resp = await client.post( + "https://api.daily.co/v1/webhooks", headers=headers, json=webhook_data + ) + resp.raise_for_status() + result = resp.json() + webhook_uuid = result["uuid"] + + print(f"✓ Created webhook {webhook_uuid} (state: {result['state']})") + print(f" URL: {result['url']}") + print() + print("=" * 60) + print("IMPORTANT: Add this to your environment variables:") + print("=" * 60) + print(f"DAILY_WEBHOOK_UUID: {webhook_uuid}") + print("=" * 60) + print() + + # Try to write UUID to .env file + env_file = Path(__file__).parent.parent / ".env" + if env_file.exists(): + lines = env_file.read_text().splitlines() + updated = False + + # Update existing DAILY_WEBHOOK_UUID line or add it + for i, line in enumerate(lines): + if line.startswith("DAILY_WEBHOOK_UUID="): + lines[i] = f"DAILY_WEBHOOK_UUID={webhook_uuid}" + updated = True + break + + if not updated: + lines.append(f"DAILY_WEBHOOK_UUID={webhook_uuid}") + + env_file.write_text("\n".join(lines) + "\n") + print(f"✓ Also saved to local .env file") + else: + print(f"⚠ Local .env file not found - please add manually") + + return 0 + + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python recreate_daily_webhook.py ") + print( + "Example: python recreate_daily_webhook.py https://example.com/v1/daily/webhook" + ) + print() + print("Behavior:") + print(" - If DAILY_WEBHOOK_UUID set: Updates existing webhook") + print( + " - If DAILY_WEBHOOK_UUID empty: Creates new webhook, saves UUID to .env" + ) + sys.exit(1) + + sys.exit(asyncio.run(setup_webhook(sys.argv[1]))) diff --git a/server/tests/conftest.py b/server/tests/conftest.py index a70604ae..7d6c4302 100644 --- a/server/tests/conftest.py +++ b/server/tests/conftest.py @@ -5,6 +5,18 @@ from unittest.mock import patch import pytest +from reflector.schemas.platform import WHEREBY_PLATFORM + + +@pytest.fixture(scope="session", autouse=True) +def register_mock_platform(): + from mocks.mock_platform import MockPlatformClient + + from reflector.video_platforms.registry import register_platform + + register_platform(WHEREBY_PLATFORM, MockPlatformClient) + yield + @pytest.fixture(scope="session", autouse=True) def settings_configuration(): diff --git a/server/tests/mocks/__init__.py b/server/tests/mocks/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/server/tests/mocks/mock_platform.py b/server/tests/mocks/mock_platform.py new file mode 100644 index 00000000..0f84a271 --- /dev/null +++ b/server/tests/mocks/mock_platform.py @@ -0,0 +1,112 @@ +import uuid +from datetime import datetime +from typing import Any, Dict, Literal, Optional + +from reflector.db.rooms import Room +from reflector.video_platforms.base import ( + ROOM_PREFIX_SEPARATOR, + MeetingData, + VideoPlatformClient, + VideoPlatformConfig, +) + +MockPlatform = Literal["mock"] + + +class MockPlatformClient(VideoPlatformClient): + PLATFORM_NAME: MockPlatform = "mock" + + def __init__(self, config: VideoPlatformConfig): + super().__init__(config) + self._rooms: Dict[str, Dict[str, Any]] = {} + self._webhook_calls: list[Dict[str, Any]] = [] + + async def create_meeting( + self, room_name_prefix: str, end_date: datetime, room: Room + ) -> MeetingData: + meeting_id = str(uuid.uuid4()) + room_name = f"{room_name_prefix}{ROOM_PREFIX_SEPARATOR}{meeting_id[:8]}" + room_url = f"https://mock.video/{room_name}" + host_room_url = f"{room_url}?host=true" + + self._rooms[room_name] = { + "id": meeting_id, + "name": room_name, + "url": room_url, + "host_url": host_room_url, + "end_date": end_date, + "room": room, + "participants": [], + "is_active": True, + } + + return MeetingData.model_construct( + meeting_id=meeting_id, + room_name=room_name, + room_url=room_url, + host_room_url=host_room_url, + platform="whereby", + extra_data={"mock": True}, + ) + + async def get_room_sessions(self, room_name: str) -> Dict[str, Any]: + if room_name not in self._rooms: + return {"error": "Room not found"} + + room_data = self._rooms[room_name] + return { + "roomName": room_name, + "sessions": [ + { + "sessionId": room_data["id"], + "startTime": datetime.utcnow().isoformat(), + "participants": room_data["participants"], + "isActive": room_data["is_active"], + } + ], + } + + async def delete_room(self, room_name: str) -> bool: + if room_name in self._rooms: + self._rooms[room_name]["is_active"] = False + return True + return False + + async def upload_logo(self, room_name: str, logo_path: str) -> bool: + if room_name in self._rooms: + self._rooms[room_name]["logo_path"] = logo_path + return True + return False + + def verify_webhook_signature( + self, body: bytes, signature: str, timestamp: Optional[str] = None + ) -> bool: + return signature == "valid" + + def add_participant( + self, room_name: str, participant_id: str, participant_name: str + ): + if room_name in self._rooms: + self._rooms[room_name]["participants"].append( + { + "id": participant_id, + "name": participant_name, + "joined_at": datetime.utcnow().isoformat(), + } + ) + + def trigger_webhook(self, event_type: str, data: Dict[str, Any]): + self._webhook_calls.append( + { + "type": event_type, + "data": data, + "timestamp": datetime.utcnow().isoformat(), + } + ) + + def get_webhook_calls(self) -> list[Dict[str, Any]]: + return self._webhook_calls.copy() + + def clear_data(self): + self._rooms.clear() + self._webhook_calls.clear() diff --git a/server/tests/test_cleanup.py b/server/tests/test_cleanup.py index 2cb8614c..0c968941 100644 --- a/server/tests/test_cleanup.py +++ b/server/tests/test_cleanup.py @@ -139,14 +139,10 @@ async def test_cleanup_deletes_associated_meeting_and_recording(): mock_settings.PUBLIC_DATA_RETENTION_DAYS = 7 # Mock storage deletion - with patch("reflector.db.transcripts.get_transcripts_storage") as mock_storage: + with patch("reflector.worker.cleanup.get_transcripts_storage") as mock_storage: mock_storage.return_value.delete_file = AsyncMock() - with patch( - "reflector.worker.cleanup.get_recordings_storage" - ) as mock_rec_storage: - mock_rec_storage.return_value.delete_file = AsyncMock() - result = await cleanup_old_public_data() + result = await cleanup_old_public_data() # Check results assert result["transcripts_deleted"] == 1 diff --git a/server/tests/test_consent_multitrack.py b/server/tests/test_consent_multitrack.py new file mode 100644 index 00000000..15948708 --- /dev/null +++ b/server/tests/test_consent_multitrack.py @@ -0,0 +1,330 @@ +from datetime import datetime, timezone +from unittest.mock import AsyncMock, MagicMock, patch + +import pytest + +from reflector.db.meetings import ( + MeetingConsent, + meeting_consent_controller, + meetings_controller, +) +from reflector.db.recordings import Recording, recordings_controller +from reflector.db.rooms import rooms_controller +from reflector.db.transcripts import SourceKind, transcripts_controller +from reflector.pipelines.main_live_pipeline import cleanup_consent + + +@pytest.mark.asyncio +async def test_consent_cleanup_deletes_multitrack_files(): + room = await rooms_controller.add( + name="Test Room", + user_id="test-user", + zulip_auto_post=False, + zulip_stream="", + zulip_topic="", + is_locked=False, + room_mode="normal", + recording_type="cloud", + recording_trigger="automatic", + is_shared=False, + platform="daily", + ) + + # Create meeting + meeting = await meetings_controller.create( + id="test-multitrack-meeting", + room_name="test-room-20250101120000", + room_url="https://test.daily.co/test-room", + host_room_url="https://test.daily.co/test-room", + start_date=datetime.now(timezone.utc), + end_date=datetime.now(timezone.utc), + room=room, + ) + + track_keys = [ + "recordings/test-room-20250101120000/track-0.webm", + "recordings/test-room-20250101120000/track-1.webm", + "recordings/test-room-20250101120000/track-2.webm", + ] + recording = await recordings_controller.create( + Recording( + bucket_name="test-bucket", + object_key="recordings/test-room-20250101120000", # Folder path + recorded_at=datetime.now(timezone.utc), + meeting_id=meeting.id, + track_keys=track_keys, + ) + ) + + # Create transcript + transcript = await transcripts_controller.add( + name="Test Multitrack Transcript", + source_kind=SourceKind.ROOM, + recording_id=recording.id, + meeting_id=meeting.id, + ) + + # Add consent denial + await meeting_consent_controller.upsert( + MeetingConsent( + meeting_id=meeting.id, + user_id="test-user", + consent_given=False, + consent_timestamp=datetime.now(timezone.utc), + ) + ) + + # Mock get_transcripts_storage (master credentials with bucket override) + with patch( + "reflector.pipelines.main_live_pipeline.get_transcripts_storage" + ) as mock_get_transcripts_storage: + mock_master_storage = MagicMock() + mock_master_storage.delete_file = AsyncMock() + mock_get_transcripts_storage.return_value = mock_master_storage + + await cleanup_consent(transcript_id=transcript.id) + + # Verify master storage was used with bucket override for all track keys + assert mock_master_storage.delete_file.call_count == 3 + deleted_keys = [] + for call_args in mock_master_storage.delete_file.call_args_list: + key = call_args[0][0] + bucket_kwarg = call_args[1].get("bucket") + deleted_keys.append(key) + assert bucket_kwarg == "test-bucket" # Verify bucket override! + assert set(deleted_keys) == set(track_keys) + + updated_transcript = await transcripts_controller.get_by_id(transcript.id) + assert updated_transcript.audio_deleted is True + + +@pytest.mark.asyncio +async def test_consent_cleanup_handles_missing_track_keys(): + room = await rooms_controller.add( + name="Test Room 2", + user_id="test-user", + zulip_auto_post=False, + zulip_stream="", + zulip_topic="", + is_locked=False, + room_mode="normal", + recording_type="cloud", + recording_trigger="automatic", + is_shared=False, + platform="daily", + ) + + # Create meeting + meeting = await meetings_controller.create( + id="test-multitrack-meeting-2", + room_name="test-room-20250101120001", + room_url="https://test.daily.co/test-room-2", + host_room_url="https://test.daily.co/test-room-2", + start_date=datetime.now(timezone.utc), + end_date=datetime.now(timezone.utc), + room=room, + ) + + recording = await recordings_controller.create( + Recording( + bucket_name="test-bucket", + object_key="recordings/old-style-recording.mp4", + recorded_at=datetime.now(timezone.utc), + meeting_id=meeting.id, + track_keys=None, + ) + ) + + transcript = await transcripts_controller.add( + name="Test Old-Style Transcript", + source_kind=SourceKind.ROOM, + recording_id=recording.id, + meeting_id=meeting.id, + ) + + # Add consent denial + await meeting_consent_controller.upsert( + MeetingConsent( + meeting_id=meeting.id, + user_id="test-user-2", + consent_given=False, + consent_timestamp=datetime.now(timezone.utc), + ) + ) + + # Mock get_transcripts_storage (master credentials with bucket override) + with patch( + "reflector.pipelines.main_live_pipeline.get_transcripts_storage" + ) as mock_get_transcripts_storage: + mock_master_storage = MagicMock() + mock_master_storage.delete_file = AsyncMock() + mock_get_transcripts_storage.return_value = mock_master_storage + + await cleanup_consent(transcript_id=transcript.id) + + # Verify master storage was used with bucket override + assert mock_master_storage.delete_file.call_count == 1 + call_args = mock_master_storage.delete_file.call_args + assert call_args[0][0] == recording.object_key + assert call_args[1].get("bucket") == "test-bucket" # Verify bucket override! + + +@pytest.mark.asyncio +async def test_consent_cleanup_empty_track_keys_falls_back(): + room = await rooms_controller.add( + name="Test Room 3", + user_id="test-user", + zulip_auto_post=False, + zulip_stream="", + zulip_topic="", + is_locked=False, + room_mode="normal", + recording_type="cloud", + recording_trigger="automatic", + is_shared=False, + platform="daily", + ) + + # Create meeting + meeting = await meetings_controller.create( + id="test-multitrack-meeting-3", + room_name="test-room-20250101120002", + room_url="https://test.daily.co/test-room-3", + host_room_url="https://test.daily.co/test-room-3", + start_date=datetime.now(timezone.utc), + end_date=datetime.now(timezone.utc), + room=room, + ) + + recording = await recordings_controller.create( + Recording( + bucket_name="test-bucket", + object_key="recordings/fallback-recording.mp4", + recorded_at=datetime.now(timezone.utc), + meeting_id=meeting.id, + track_keys=[], + ) + ) + + transcript = await transcripts_controller.add( + name="Test Empty Track Keys Transcript", + source_kind=SourceKind.ROOM, + recording_id=recording.id, + meeting_id=meeting.id, + ) + + # Add consent denial + await meeting_consent_controller.upsert( + MeetingConsent( + meeting_id=meeting.id, + user_id="test-user-3", + consent_given=False, + consent_timestamp=datetime.now(timezone.utc), + ) + ) + + # Mock get_transcripts_storage (master credentials with bucket override) + with patch( + "reflector.pipelines.main_live_pipeline.get_transcripts_storage" + ) as mock_get_transcripts_storage: + mock_master_storage = MagicMock() + mock_master_storage.delete_file = AsyncMock() + mock_get_transcripts_storage.return_value = mock_master_storage + + # Run cleanup + await cleanup_consent(transcript_id=transcript.id) + + # Verify master storage was used with bucket override + assert mock_master_storage.delete_file.call_count == 1 + call_args = mock_master_storage.delete_file.call_args + assert call_args[0][0] == recording.object_key + assert call_args[1].get("bucket") == "test-bucket" # Verify bucket override! + + +@pytest.mark.asyncio +async def test_consent_cleanup_partial_failure_doesnt_mark_deleted(): + room = await rooms_controller.add( + name="Test Room 4", + user_id="test-user", + zulip_auto_post=False, + zulip_stream="", + zulip_topic="", + is_locked=False, + room_mode="normal", + recording_type="cloud", + recording_trigger="automatic", + is_shared=False, + platform="daily", + ) + + # Create meeting + meeting = await meetings_controller.create( + id="test-multitrack-meeting-4", + room_name="test-room-20250101120003", + room_url="https://test.daily.co/test-room-4", + host_room_url="https://test.daily.co/test-room-4", + start_date=datetime.now(timezone.utc), + end_date=datetime.now(timezone.utc), + room=room, + ) + + track_keys = [ + "recordings/test-room-20250101120003/track-0.webm", + "recordings/test-room-20250101120003/track-1.webm", + "recordings/test-room-20250101120003/track-2.webm", + ] + recording = await recordings_controller.create( + Recording( + bucket_name="test-bucket", + object_key="recordings/test-room-20250101120003", + recorded_at=datetime.now(timezone.utc), + meeting_id=meeting.id, + track_keys=track_keys, + ) + ) + + # Create transcript + transcript = await transcripts_controller.add( + name="Test Partial Failure Transcript", + source_kind=SourceKind.ROOM, + recording_id=recording.id, + meeting_id=meeting.id, + ) + + # Add consent denial + await meeting_consent_controller.upsert( + MeetingConsent( + meeting_id=meeting.id, + user_id="test-user-4", + consent_given=False, + consent_timestamp=datetime.now(timezone.utc), + ) + ) + + # Mock get_transcripts_storage (master credentials with bucket override) with partial failure + with patch( + "reflector.pipelines.main_live_pipeline.get_transcripts_storage" + ) as mock_get_transcripts_storage: + mock_master_storage = MagicMock() + + call_count = 0 + + async def delete_side_effect(key, bucket=None): + nonlocal call_count + call_count += 1 + if call_count == 2: + raise Exception("S3 deletion failed") + + mock_master_storage.delete_file = AsyncMock(side_effect=delete_side_effect) + mock_get_transcripts_storage.return_value = mock_master_storage + + await cleanup_consent(transcript_id=transcript.id) + + # Verify master storage was called with bucket override + assert mock_master_storage.delete_file.call_count == 3 + + updated_transcript = await transcripts_controller.get_by_id(transcript.id) + assert ( + updated_transcript.audio_deleted is None + or updated_transcript.audio_deleted is False + ) diff --git a/server/tests/test_pipeline_main_file.py b/server/tests/test_pipeline_main_file.py index f86dc85d..825c8389 100644 --- a/server/tests/test_pipeline_main_file.py +++ b/server/tests/test_pipeline_main_file.py @@ -127,18 +127,27 @@ async def mock_storage(): from reflector.storage.base import Storage class TestStorage(Storage): - async def _put_file(self, path, data): + async def _put_file(self, path, data, bucket=None): return None - async def _get_file_url(self, path): + async def _get_file_url( + self, + path, + operation: str = "get_object", + expires_in: int = 3600, + bucket=None, + ): return f"http://test-storage/{path}" - async def _get_file(self, path): + async def _get_file(self, path, bucket=None): return b"test_audio_data" - async def _delete_file(self, path): + async def _delete_file(self, path, bucket=None): return None + async def _stream_to_fileobj(self, path, fileobj, bucket=None): + fileobj.write(b"test_audio_data") + storage = TestStorage() # Add mock tracking for verification storage._put_file = AsyncMock(side_effect=storage._put_file) @@ -181,7 +190,7 @@ async def mock_waveform_processor(): async def mock_topic_detector(): """Mock TranscriptTopicDetectorProcessor""" with patch( - "reflector.pipelines.main_file_pipeline.TranscriptTopicDetectorProcessor" + "reflector.pipelines.topic_processing.TranscriptTopicDetectorProcessor" ) as mock_topic_class: mock_topic = AsyncMock() mock_topic.set_pipeline = MagicMock() @@ -218,7 +227,7 @@ async def mock_topic_detector(): async def mock_title_processor(): """Mock TranscriptFinalTitleProcessor""" with patch( - "reflector.pipelines.main_file_pipeline.TranscriptFinalTitleProcessor" + "reflector.pipelines.topic_processing.TranscriptFinalTitleProcessor" ) as mock_title_class: mock_title = AsyncMock() mock_title.set_pipeline = MagicMock() @@ -247,7 +256,7 @@ async def mock_title_processor(): async def mock_summary_processor(): """Mock TranscriptFinalSummaryProcessor""" with patch( - "reflector.pipelines.main_file_pipeline.TranscriptFinalSummaryProcessor" + "reflector.pipelines.topic_processing.TranscriptFinalSummaryProcessor" ) as mock_summary_class: mock_summary = AsyncMock() mock_summary.set_pipeline = MagicMock() diff --git a/server/tests/test_room_ics_api.py b/server/tests/test_room_ics_api.py index 8e7cf76f..79512995 100644 --- a/server/tests/test_room_ics_api.py +++ b/server/tests/test_room_ics_api.py @@ -48,6 +48,7 @@ async def test_create_room_with_ics_fields(authenticated_client): "ics_url": "https://calendar.example.com/test.ics", "ics_fetch_interval": 600, "ics_enabled": True, + "platform": "daily", }, ) assert response.status_code == 200 @@ -75,6 +76,7 @@ async def test_update_room_ics_configuration(authenticated_client): "is_shared": False, "webhook_url": "", "webhook_secret": "", + "platform": "daily", }, ) assert response.status_code == 200 @@ -111,6 +113,7 @@ async def test_trigger_ics_sync(authenticated_client): is_shared=False, ics_url="https://calendar.example.com/api.ics", ics_enabled=True, + platform="daily", ) cal = Calendar() @@ -154,6 +157,7 @@ async def test_trigger_ics_sync_unauthorized(client): is_shared=False, ics_url="https://calendar.example.com/api.ics", ics_enabled=True, + platform="daily", ) response = await client.post(f"/rooms/{room.name}/ics/sync") @@ -176,6 +180,7 @@ async def test_trigger_ics_sync_not_configured(authenticated_client): recording_trigger="automatic-2nd-participant", is_shared=False, ics_enabled=False, + platform="daily", ) response = await client.post(f"/rooms/{room.name}/ics/sync") @@ -200,6 +205,7 @@ async def test_get_ics_status(authenticated_client): ics_url="https://calendar.example.com/status.ics", ics_enabled=True, ics_fetch_interval=300, + platform="daily", ) now = datetime.now(timezone.utc) @@ -231,6 +237,7 @@ async def test_get_ics_status_unauthorized(client): is_shared=False, ics_url="https://calendar.example.com/status.ics", ics_enabled=True, + platform="daily", ) response = await client.get(f"/rooms/{room.name}/ics/status") @@ -252,6 +259,7 @@ async def test_list_room_meetings(authenticated_client): recording_type="cloud", recording_trigger="automatic-2nd-participant", is_shared=False, + platform="daily", ) now = datetime.now(timezone.utc) @@ -298,6 +306,7 @@ async def test_list_room_meetings_non_owner(client): recording_type="cloud", recording_trigger="automatic-2nd-participant", is_shared=False, + platform="daily", ) event = CalendarEvent( @@ -334,6 +343,7 @@ async def test_list_upcoming_meetings(authenticated_client): recording_type="cloud", recording_trigger="automatic-2nd-participant", is_shared=False, + platform="daily", ) now = datetime.now(timezone.utc) diff --git a/server/tests/test_storage.py b/server/tests/test_storage.py new file mode 100644 index 00000000..ccfc3dbd --- /dev/null +++ b/server/tests/test_storage.py @@ -0,0 +1,321 @@ +"""Tests for storage abstraction layer.""" + +import io +from unittest.mock import AsyncMock, MagicMock, patch + +import pytest +from botocore.exceptions import ClientError + +from reflector.storage.base import StoragePermissionError +from reflector.storage.storage_aws import AwsStorage + + +@pytest.mark.asyncio +async def test_aws_storage_stream_to_fileobj(): + """Test that AWS storage can stream directly to a file object without loading into memory.""" + # Setup + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock download_fileobj to write data + async def mock_download(Bucket, Key, Fileobj, **kwargs): + Fileobj.write(b"chunk1chunk2") + + mock_client = AsyncMock() + mock_client.download_fileobj = AsyncMock(side_effect=mock_download) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + # Patch the session client + with patch.object(storage.session, "client", return_value=mock_client): + # Create a file-like object to stream to + output = io.BytesIO() + + # Act - stream to file object + await storage.stream_to_fileobj("test-file.mp4", output, bucket="test-bucket") + + # Assert + mock_client.download_fileobj.assert_called_once_with( + Bucket="test-bucket", Key="test-file.mp4", Fileobj=output + ) + + # Check that data was written to output + output.seek(0) + assert output.read() == b"chunk1chunk2" + + +@pytest.mark.asyncio +async def test_aws_storage_stream_to_fileobj_with_folder(): + """Test streaming with folder prefix in bucket name.""" + storage = AwsStorage( + aws_bucket_name="test-bucket/recordings", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + async def mock_download(Bucket, Key, Fileobj, **kwargs): + Fileobj.write(b"data") + + mock_client = AsyncMock() + mock_client.download_fileobj = AsyncMock(side_effect=mock_download) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + output = io.BytesIO() + await storage.stream_to_fileobj("file.mp4", output, bucket="other-bucket") + + # Should use folder prefix from instance config + mock_client.download_fileobj.assert_called_once_with( + Bucket="other-bucket", Key="recordings/file.mp4", Fileobj=output + ) + + +@pytest.mark.asyncio +async def test_storage_base_class_stream_to_fileobj(): + """Test that base Storage class has stream_to_fileobj method.""" + from reflector.storage.base import Storage + + # Verify method exists in base class + assert hasattr(Storage, "stream_to_fileobj") + + # Create a mock storage instance + storage = MagicMock(spec=Storage) + storage.stream_to_fileobj = AsyncMock() + + # Should be callable + await storage.stream_to_fileobj("file.mp4", io.BytesIO()) + storage.stream_to_fileobj.assert_called_once() + + +@pytest.mark.asyncio +async def test_aws_storage_stream_uses_download_fileobj(): + """Test that download_fileobj is called correctly.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + async def mock_download(Bucket, Key, Fileobj, **kwargs): + Fileobj.write(b"data") + + mock_client = AsyncMock() + mock_client.download_fileobj = AsyncMock(side_effect=mock_download) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + output = io.BytesIO() + await storage.stream_to_fileobj("test.mp4", output) + + # Verify download_fileobj was called with correct parameters + mock_client.download_fileobj.assert_called_once_with( + Bucket="test-bucket", Key="test.mp4", Fileobj=output + ) + + +@pytest.mark.asyncio +async def test_aws_storage_handles_access_denied_error(): + """Test that AccessDenied errors are caught and wrapped in StoragePermissionError.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError with AccessDenied + error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}} + mock_client = AsyncMock() + mock_client.put_object = AsyncMock( + side_effect=ClientError(error_response, "PutObject") + ) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + with pytest.raises(StoragePermissionError) as exc_info: + await storage.put_file("test.txt", b"data") + + # Verify error message contains expected information + error_msg = str(exc_info.value) + assert "AccessDenied" in error_msg + assert "default bucket 'test-bucket'" in error_msg + assert "S3 upload failed" in error_msg + + +@pytest.mark.asyncio +async def test_aws_storage_handles_no_such_bucket_error(): + """Test that NoSuchBucket errors are caught and wrapped in StoragePermissionError.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError with NoSuchBucket + error_response = { + "Error": { + "Code": "NoSuchBucket", + "Message": "The specified bucket does not exist", + } + } + mock_client = AsyncMock() + mock_client.delete_object = AsyncMock( + side_effect=ClientError(error_response, "DeleteObject") + ) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + with pytest.raises(StoragePermissionError) as exc_info: + await storage.delete_file("test.txt") + + # Verify error message contains expected information + error_msg = str(exc_info.value) + assert "NoSuchBucket" in error_msg + assert "default bucket 'test-bucket'" in error_msg + assert "S3 delete failed" in error_msg + + +@pytest.mark.asyncio +async def test_aws_storage_error_message_with_bucket_override(): + """Test that error messages correctly show overridden bucket.""" + storage = AwsStorage( + aws_bucket_name="default-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError with AccessDenied + error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}} + mock_client = AsyncMock() + mock_client.get_object = AsyncMock( + side_effect=ClientError(error_response, "GetObject") + ) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + with pytest.raises(StoragePermissionError) as exc_info: + await storage.get_file("test.txt", bucket="override-bucket") + + # Verify error message shows overridden bucket, not default + error_msg = str(exc_info.value) + assert "overridden bucket 'override-bucket'" in error_msg + assert "default-bucket" not in error_msg + assert "S3 download failed" in error_msg + + +@pytest.mark.asyncio +async def test_aws_storage_reraises_non_handled_errors(): + """Test that non-AccessDenied/NoSuchBucket errors are re-raised as-is.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError with different error code + error_response = { + "Error": {"Code": "InternalError", "Message": "Internal Server Error"} + } + mock_client = AsyncMock() + mock_client.put_object = AsyncMock( + side_effect=ClientError(error_response, "PutObject") + ) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + # Should raise ClientError, not StoragePermissionError + with pytest.raises(ClientError) as exc_info: + await storage.put_file("test.txt", b"data") + + # Verify it's the original ClientError + assert exc_info.value.response["Error"]["Code"] == "InternalError" + + +@pytest.mark.asyncio +async def test_aws_storage_presign_url_handles_errors(): + """Test that presigned URL generation handles permission errors.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError with AccessDenied during presign operation + error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}} + mock_client = AsyncMock() + mock_client.generate_presigned_url = AsyncMock( + side_effect=ClientError(error_response, "GeneratePresignedUrl") + ) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + with pytest.raises(StoragePermissionError) as exc_info: + await storage.get_file_url("test.txt") + + # Verify error message + error_msg = str(exc_info.value) + assert "S3 presign failed" in error_msg + assert "AccessDenied" in error_msg + + +@pytest.mark.asyncio +async def test_aws_storage_list_objects_handles_errors(): + """Test that list_objects handles permission errors.""" + storage = AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + ) + + # Mock ClientError during list operation + error_response = {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}} + mock_paginator = MagicMock() + + async def mock_paginate(*args, **kwargs): + raise ClientError(error_response, "ListObjectsV2") + yield # Make it an async generator + + mock_paginator.paginate = mock_paginate + + mock_client = AsyncMock() + mock_client.get_paginator = MagicMock(return_value=mock_paginator) + mock_client.__aenter__ = AsyncMock(return_value=mock_client) + mock_client.__aexit__ = AsyncMock(return_value=None) + + with patch.object(storage.session, "client", return_value=mock_client): + with pytest.raises(StoragePermissionError) as exc_info: + await storage.list_objects(prefix="test/") + + error_msg = str(exc_info.value) + assert "S3 list_objects failed" in error_msg + assert "AccessDenied" in error_msg + + +def test_aws_storage_constructor_rejects_mixed_auth(): + """Test that constructor rejects both role_arn and access keys.""" + with pytest.raises(ValueError, match="cannot use both.*role_arn.*access keys"): + AwsStorage( + aws_bucket_name="test-bucket", + aws_region="us-east-1", + aws_access_key_id="test-key", + aws_secret_access_key="test-secret", + aws_role_arn="arn:aws:iam::123456789012:role/test-role", + ) diff --git a/server/tests/test_transcripts_recording_deletion.py b/server/tests/test_transcripts_recording_deletion.py index 810fe567..3a632612 100644 --- a/server/tests/test_transcripts_recording_deletion.py +++ b/server/tests/test_transcripts_recording_deletion.py @@ -22,13 +22,16 @@ async def test_recording_deleted_with_transcript(): recording_id=recording.id, ) - with patch("reflector.db.transcripts.get_recordings_storage") as mock_get_storage: + with patch("reflector.db.transcripts.get_transcripts_storage") as mock_get_storage: storage_instance = mock_get_storage.return_value storage_instance.delete_file = AsyncMock() await transcripts_controller.remove_by_id(transcript.id) - storage_instance.delete_file.assert_awaited_once_with(recording.object_key) + # Should be called with bucket override + storage_instance.delete_file.assert_awaited_once_with( + recording.object_key, bucket=recording.bucket_name + ) assert await recordings_controller.get_by_id(recording.id) is None assert await transcripts_controller.get_by_id(transcript.id) is None diff --git a/server/tests/test_utils_daily.py b/server/tests/test_utils_daily.py new file mode 100644 index 00000000..356ffc94 --- /dev/null +++ b/server/tests/test_utils_daily.py @@ -0,0 +1,17 @@ +import pytest + +from reflector.utils.daily import extract_base_room_name + + +@pytest.mark.parametrize( + "daily_room_name,expected", + [ + ("daily-20251020193458", "daily"), + ("daily-2-20251020193458", "daily-2"), + ("my-room-name-20251020193458", "my-room-name"), + ("room-with-numbers-123-20251020193458", "room-with-numbers-123"), + ("x-20251020193458", "x"), + ], +) +def test_extract_base_room_name(daily_room_name, expected): + assert extract_base_room_name(daily_room_name) == expected diff --git a/server/tests/test_utils_url.py b/server/tests/test_utils_url.py new file mode 100644 index 00000000..c833983c --- /dev/null +++ b/server/tests/test_utils_url.py @@ -0,0 +1,63 @@ +"""Tests for URL utility functions.""" + +from reflector.utils.url import add_query_param + + +class TestAddQueryParam: + """Test the add_query_param function.""" + + def test_add_param_to_url_without_query(self): + """Should add query param with ? to URL without existing params.""" + url = "https://example.com/room" + result = add_query_param(url, "t", "token123") + assert result == "https://example.com/room?t=token123" + + def test_add_param_to_url_with_existing_query(self): + """Should add query param with & to URL with existing params.""" + url = "https://example.com/room?existing=param" + result = add_query_param(url, "t", "token123") + assert result == "https://example.com/room?existing=param&t=token123" + + def test_add_param_to_url_with_multiple_existing_params(self): + """Should add query param to URL with multiple existing params.""" + url = "https://example.com/room?param1=value1¶m2=value2" + result = add_query_param(url, "t", "token123") + assert ( + result == "https://example.com/room?param1=value1¶m2=value2&t=token123" + ) + + def test_add_param_with_special_characters(self): + """Should properly encode special characters in param value.""" + url = "https://example.com/room" + result = add_query_param(url, "name", "hello world") + assert result == "https://example.com/room?name=hello+world" + + def test_add_param_to_url_with_fragment(self): + """Should preserve URL fragment when adding query param.""" + url = "https://example.com/room#section" + result = add_query_param(url, "t", "token123") + assert result == "https://example.com/room?t=token123#section" + + def test_add_param_to_url_with_query_and_fragment(self): + """Should preserve fragment when adding param to URL with existing query.""" + url = "https://example.com/room?existing=param#section" + result = add_query_param(url, "t", "token123") + assert result == "https://example.com/room?existing=param&t=token123#section" + + def test_add_param_overwrites_existing_param(self): + """Should overwrite existing param with same name.""" + url = "https://example.com/room?t=oldtoken" + result = add_query_param(url, "t", "newtoken") + assert result == "https://example.com/room?t=newtoken" + + def test_url_without_scheme(self): + """Should handle URLs without scheme (relative URLs).""" + url = "/room/path" + result = add_query_param(url, "t", "token123") + assert result == "/room/path?t=token123" + + def test_empty_url(self): + """Should handle empty URL.""" + url = "" + result = add_query_param(url, "t", "token123") + assert result == "?t=token123" diff --git a/server/tests/test_video_platforms_factory.py b/server/tests/test_video_platforms_factory.py new file mode 100644 index 00000000..6c8c02c5 --- /dev/null +++ b/server/tests/test_video_platforms_factory.py @@ -0,0 +1,58 @@ +"""Tests for video_platforms.factory module.""" + +from unittest.mock import patch + +from reflector.video_platforms.factory import get_platform + + +class TestGetPlatformF: + """Test suite for get_platform function.""" + + @patch("reflector.video_platforms.factory.settings") + def test_with_room_platform(self, mock_settings): + """When room_platform provided, should return room_platform.""" + mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby" + + # Should return the room's platform when provided + assert get_platform(room_platform="daily") == "daily" + assert get_platform(room_platform="whereby") == "whereby" + + @patch("reflector.video_platforms.factory.settings") + def test_without_room_platform_uses_default(self, mock_settings): + """When no room_platform, should return DEFAULT_VIDEO_PLATFORM.""" + mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby" + + # Should return default when room_platform is None + assert get_platform(room_platform=None) == "whereby" + + @patch("reflector.video_platforms.factory.settings") + def test_with_daily_default(self, mock_settings): + """When DEFAULT_VIDEO_PLATFORM is 'daily', should return 'daily' when no room_platform.""" + mock_settings.DEFAULT_VIDEO_PLATFORM = "daily" + + # Should return default 'daily' when room_platform is None + assert get_platform(room_platform=None) == "daily" + + @patch("reflector.video_platforms.factory.settings") + def test_no_room_id_provided(self, mock_settings): + """Should work correctly even when room_id is not provided.""" + mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby" + + # Should use room_platform when provided + assert get_platform(room_platform="daily") == "daily" + + # Should use default when room_platform not provided + assert get_platform(room_platform=None) == "whereby" + + @patch("reflector.video_platforms.factory.settings") + def test_room_platform_always_takes_precedence(self, mock_settings): + """room_platform should always be used when provided.""" + mock_settings.DEFAULT_VIDEO_PLATFORM = "whereby" + + # room_platform should take precedence over default + assert get_platform(room_platform="daily") == "daily" + assert get_platform(room_platform="whereby") == "whereby" + + # Different default shouldn't matter when room_platform provided + mock_settings.DEFAULT_VIDEO_PLATFORM = "daily" + assert get_platform(room_platform="whereby") == "whereby" diff --git a/www/app/[roomName]/[meetingId]/page.tsx b/www/app/[roomName]/[meetingId]/page.tsx index 8ce405ba..725aa571 100644 --- a/www/app/[roomName]/[meetingId]/page.tsx +++ b/www/app/[roomName]/[meetingId]/page.tsx @@ -1,3 +1,3 @@ -import Room from "../room"; +import RoomContainer from "../components/RoomContainer"; -export default Room; +export default RoomContainer; diff --git a/www/app/[roomName]/components/DailyRoom.tsx b/www/app/[roomName]/components/DailyRoom.tsx new file mode 100644 index 00000000..920f8624 --- /dev/null +++ b/www/app/[roomName]/components/DailyRoom.tsx @@ -0,0 +1,93 @@ +"use client"; + +import { useCallback, useEffect, useRef } from "react"; +import { Box } from "@chakra-ui/react"; +import { useRouter } from "next/navigation"; +import DailyIframe, { DailyCall } from "@daily-co/daily-js"; +import type { components } from "../../reflector-api"; +import { useAuth } from "../../lib/AuthProvider"; +import { + ConsentDialogButton, + recordingTypeRequiresConsent, +} from "../../lib/consent"; + +type Meeting = components["schemas"]["Meeting"]; + +interface DailyRoomProps { + meeting: Meeting; +} + +export default function DailyRoom({ meeting }: DailyRoomProps) { + const router = useRouter(); + const auth = useAuth(); + const status = auth.status; + const containerRef = useRef(null); + + const roomUrl = meeting?.host_room_url || meeting?.room_url; + + const isLoading = status === "loading"; + + const handleLeave = useCallback(() => { + router.push("/browse"); + }, [router]); + + useEffect(() => { + if (isLoading || !roomUrl || !containerRef.current) return; + + let frame: DailyCall | null = null; + let destroyed = false; + + const createAndJoin = async () => { + try { + const existingFrame = DailyIframe.getCallInstance(); + if (existingFrame) { + await existingFrame.destroy(); + } + + frame = DailyIframe.createFrame(containerRef.current!, { + iframeStyle: { + width: "100vw", + height: "100vh", + border: "none", + }, + showLeaveButton: true, + showFullscreenButton: true, + }); + + if (destroyed) { + await frame.destroy(); + return; + } + + frame.on("left-meeting", handleLeave); + await frame.join({ url: roomUrl }); + } catch (error) { + console.error("Error creating Daily frame:", error); + } + }; + + createAndJoin(); + + return () => { + destroyed = true; + if (frame) { + frame.destroy().catch((e) => { + console.error("Error destroying frame:", e); + }); + } + }; + }, [roomUrl, isLoading, handleLeave]); + + if (!roomUrl) { + return null; + } + + return ( + +
+ {meeting.recording_type && + recordingTypeRequiresConsent(meeting.recording_type) && + meeting.id && } + + ); +} diff --git a/www/app/[roomName]/components/RoomContainer.tsx b/www/app/[roomName]/components/RoomContainer.tsx new file mode 100644 index 00000000..bfcd82f7 --- /dev/null +++ b/www/app/[roomName]/components/RoomContainer.tsx @@ -0,0 +1,214 @@ +"use client"; + +import { roomMeetingUrl } from "../../lib/routes"; +import { useCallback, useEffect, useState, use } from "react"; +import { Box, Text, Spinner } from "@chakra-ui/react"; +import { useRouter } from "next/navigation"; +import { + useRoomGetByName, + useRoomsCreateMeeting, + useRoomGetMeeting, +} from "../../lib/apiHooks"; +import type { components } from "../../reflector-api"; +import MeetingSelection from "../MeetingSelection"; +import useRoomDefaultMeeting from "../useRoomDefaultMeeting"; +import WherebyRoom from "./WherebyRoom"; +import DailyRoom from "./DailyRoom"; +import { useAuth } from "../../lib/AuthProvider"; +import { useError } from "../../(errors)/errorContext"; +import { parseNonEmptyString } from "../../lib/utils"; +import { printApiError } from "../../api/_error"; + +type Meeting = components["schemas"]["Meeting"]; + +export type RoomDetails = { + params: Promise<{ + roomName: string; + meetingId?: string; + }>; +}; + +function LoadingSpinner() { + return ( + + + + ); +} + +export default function RoomContainer(details: RoomDetails) { + const params = use(details.params); + const roomName = parseNonEmptyString( + params.roomName, + true, + "panic! params.roomName is required", + ); + const router = useRouter(); + const auth = useAuth(); + const status = auth.status; + const isAuthenticated = status === "authenticated"; + const { setError } = useError(); + + const roomQuery = useRoomGetByName(roomName); + const createMeetingMutation = useRoomsCreateMeeting(); + + const room = roomQuery.data; + + const pageMeetingId = params.meetingId; + + const defaultMeeting = useRoomDefaultMeeting( + room && !room.ics_enabled && !pageMeetingId ? roomName : null, + ); + + const explicitMeeting = useRoomGetMeeting(roomName, pageMeetingId || null); + + const meeting = explicitMeeting.data || defaultMeeting.response; + + const isLoading = + status === "loading" || + roomQuery.isLoading || + defaultMeeting?.loading || + explicitMeeting.isLoading || + createMeetingMutation.isPending; + + const errors = [ + explicitMeeting.error, + defaultMeeting.error, + roomQuery.error, + createMeetingMutation.error, + ].filter(Boolean); + + const isOwner = + isAuthenticated && room ? auth.user?.id === room.user_id : false; + + const handleMeetingSelect = (selectedMeeting: Meeting) => { + router.push( + roomMeetingUrl( + roomName, + parseNonEmptyString( + selectedMeeting.id, + true, + "panic! selectedMeeting.id is required", + ), + ), + ); + }; + + const handleCreateUnscheduled = async () => { + try { + const newMeeting = await createMeetingMutation.mutateAsync({ + params: { + path: { room_name: roomName }, + }, + body: { + allow_duplicated: room ? room.ics_enabled : false, + }, + }); + handleMeetingSelect(newMeeting); + } catch (err) { + console.error("Failed to create meeting:", err); + } + }; + + if (isLoading) { + return ; + } + + if (!room) { + return ( + + Room not found + + ); + } + + if (room.ics_enabled && !params.meetingId) { + return ( + + ); + } + + if (errors.length > 0) { + return ( + + {errors.map((error, i) => ( + + {printApiError(error)} + + ))} + + ); + } + + if (!meeting) { + return ; + } + + const platform = meeting.platform; + + if (!platform) { + return ( + + Meeting platform not configured + + ); + } + + switch (platform) { + case "daily": + return ; + case "whereby": + return ; + default: { + const _exhaustive: never = platform; + return ( + + Unknown platform: {platform} + + ); + } + } +} diff --git a/www/app/[roomName]/components/WherebyRoom.tsx b/www/app/[roomName]/components/WherebyRoom.tsx new file mode 100644 index 00000000..d670b4e2 --- /dev/null +++ b/www/app/[roomName]/components/WherebyRoom.tsx @@ -0,0 +1,101 @@ +"use client"; + +import { useCallback, useEffect, useRef, RefObject } from "react"; +import { useRouter } from "next/navigation"; +import type { components } from "../../reflector-api"; +import { useAuth } from "../../lib/AuthProvider"; +import { getWherebyUrl, useWhereby } from "../../lib/wherebyClient"; +import { assertExistsAndNonEmptyString, NonEmptyString } from "../../lib/utils"; +import { + ConsentDialogButton as BaseConsentDialogButton, + useConsentDialog, + recordingTypeRequiresConsent, +} from "../../lib/consent"; + +type Meeting = components["schemas"]["Meeting"]; + +interface WherebyRoomProps { + meeting: Meeting; +} + +function WherebyConsentDialogButton({ + meetingId, + wherebyRef, +}: { + meetingId: NonEmptyString; + wherebyRef: React.RefObject; +}) { + const previousFocusRef = useRef(null); + + useEffect(() => { + const element = wherebyRef.current; + if (!element) return; + + const handleWherebyReady = () => { + previousFocusRef.current = document.activeElement as HTMLElement; + }; + + element.addEventListener("ready", handleWherebyReady); + + return () => { + element.removeEventListener("ready", handleWherebyReady); + if (previousFocusRef.current && document.activeElement === element) { + previousFocusRef.current.focus(); + } + }; + }, [wherebyRef]); + + return ; +} + +export default function WherebyRoom({ meeting }: WherebyRoomProps) { + const wherebyLoaded = useWhereby(); + const wherebyRef = useRef(null); + const router = useRouter(); + const auth = useAuth(); + const status = auth.status; + const isAuthenticated = status === "authenticated"; + + const wherebyRoomUrl = getWherebyUrl(meeting); + const recordingType = meeting.recording_type; + const meetingId = meeting.id; + + const isLoading = status === "loading"; + + const handleLeave = useCallback(() => { + router.push("/browse"); + }, [router]); + + useEffect(() => { + if (isLoading || !isAuthenticated || !wherebyRoomUrl || !wherebyLoaded) + return; + + wherebyRef.current?.addEventListener("leave", handleLeave); + + return () => { + wherebyRef.current?.removeEventListener("leave", handleLeave); + }; + }, [handleLeave, wherebyRoomUrl, isLoading, isAuthenticated, wherebyLoaded]); + + if (!wherebyRoomUrl || !wherebyLoaded) { + return null; + } + + return ( + <> + + {recordingType && + recordingTypeRequiresConsent(recordingType) && + meetingId && ( + + )} + + ); +} diff --git a/www/app/[roomName]/page.tsx b/www/app/[roomName]/page.tsx index 1aaca4c7..87651a50 100644 --- a/www/app/[roomName]/page.tsx +++ b/www/app/[roomName]/page.tsx @@ -1,3 +1,3 @@ -import Room from "./room"; +import RoomContainer from "./components/RoomContainer"; -export default Room; +export default RoomContainer; diff --git a/www/app/lib/consent/ConsentDialog.tsx b/www/app/lib/consent/ConsentDialog.tsx new file mode 100644 index 00000000..488599d0 --- /dev/null +++ b/www/app/lib/consent/ConsentDialog.tsx @@ -0,0 +1,36 @@ +"use client"; + +import { Box, Button, Text, VStack, HStack } from "@chakra-ui/react"; +import { CONSENT_DIALOG_TEXT } from "./constants"; + +interface ConsentDialogProps { + onAccept: () => void; + onReject: () => void; +} + +export function ConsentDialog({ onAccept, onReject }: ConsentDialogProps) { + return ( + + + + {CONSENT_DIALOG_TEXT.question} + + + + + + + + ); +} diff --git a/www/app/lib/consent/ConsentDialogButton.tsx b/www/app/lib/consent/ConsentDialogButton.tsx new file mode 100644 index 00000000..2c1d084b --- /dev/null +++ b/www/app/lib/consent/ConsentDialogButton.tsx @@ -0,0 +1,39 @@ +"use client"; + +import { Button, Icon } from "@chakra-ui/react"; +import { FaBars } from "react-icons/fa6"; +import { useConsentDialog } from "./useConsentDialog"; +import { + CONSENT_BUTTON_TOP_OFFSET, + CONSENT_BUTTON_LEFT_OFFSET, + CONSENT_BUTTON_Z_INDEX, + CONSENT_DIALOG_TEXT, +} from "./constants"; + +interface ConsentDialogButtonProps { + meetingId: string; +} + +export function ConsentDialogButton({ meetingId }: ConsentDialogButtonProps) { + const { showConsentModal, consentState, hasConsent, consentLoading } = + useConsentDialog(meetingId); + + if (!consentState.ready || hasConsent(meetingId) || consentLoading) { + return null; + } + + return ( + + ); +} diff --git a/www/app/lib/consent/constants.ts b/www/app/lib/consent/constants.ts new file mode 100644 index 00000000..41e7c7e1 --- /dev/null +++ b/www/app/lib/consent/constants.ts @@ -0,0 +1,12 @@ +export const CONSENT_BUTTON_TOP_OFFSET = "56px"; +export const CONSENT_BUTTON_LEFT_OFFSET = "8px"; +export const CONSENT_BUTTON_Z_INDEX = 1000; +export const TOAST_CHECK_INTERVAL_MS = 100; + +export const CONSENT_DIALOG_TEXT = { + question: + "Can we have your permission to store this meeting's audio recording on our servers?", + acceptButton: "Yes, store the audio", + rejectButton: "No, delete after transcription", + triggerButton: "Meeting is being recorded", +} as const; diff --git a/www/app/lib/consent/index.ts b/www/app/lib/consent/index.ts new file mode 100644 index 00000000..eabca8ac --- /dev/null +++ b/www/app/lib/consent/index.ts @@ -0,0 +1,8 @@ +"use client"; + +export { ConsentDialogButton } from "./ConsentDialogButton"; +export { ConsentDialog } from "./ConsentDialog"; +export { useConsentDialog } from "./useConsentDialog"; +export { recordingTypeRequiresConsent } from "./utils"; +export * from "./constants"; +export * from "./types"; diff --git a/www/app/lib/consent/types.ts b/www/app/lib/consent/types.ts new file mode 100644 index 00000000..0bd15202 --- /dev/null +++ b/www/app/lib/consent/types.ts @@ -0,0 +1,9 @@ +export interface ConsentDialogResult { + showConsentModal: () => void; + consentState: { + ready: boolean; + consentAnsweredForMeetings?: Set; + }; + hasConsent: (meetingId: string) => boolean; + consentLoading: boolean; +} diff --git a/www/app/lib/consent/useConsentDialog.tsx b/www/app/lib/consent/useConsentDialog.tsx new file mode 100644 index 00000000..2a5c0ab3 --- /dev/null +++ b/www/app/lib/consent/useConsentDialog.tsx @@ -0,0 +1,109 @@ +"use client"; + +import { useCallback, useState, useEffect, useRef } from "react"; +import { toaster } from "../../components/ui/toaster"; +import { useRecordingConsent } from "../../recordingConsentContext"; +import { useMeetingAudioConsent } from "../apiHooks"; +import { ConsentDialog } from "./ConsentDialog"; +import { TOAST_CHECK_INTERVAL_MS } from "./constants"; +import type { ConsentDialogResult } from "./types"; + +export function useConsentDialog(meetingId: string): ConsentDialogResult { + const { state: consentState, touch, hasConsent } = useRecordingConsent(); + const [modalOpen, setModalOpen] = useState(false); + const audioConsentMutation = useMeetingAudioConsent(); + const intervalRef = useRef(null); + const keydownHandlerRef = useRef<((event: KeyboardEvent) => void) | null>( + null, + ); + + useEffect(() => { + return () => { + if (intervalRef.current) { + clearInterval(intervalRef.current); + intervalRef.current = null; + } + if (keydownHandlerRef.current) { + document.removeEventListener("keydown", keydownHandlerRef.current); + keydownHandlerRef.current = null; + } + }; + }, []); + + const handleConsent = useCallback( + async (given: boolean) => { + try { + await audioConsentMutation.mutateAsync({ + params: { + path: { meeting_id: meetingId }, + }, + body: { + consent_given: given, + }, + }); + + touch(meetingId); + } catch (error) { + console.error("Error submitting consent:", error); + } + }, + [audioConsentMutation, touch, meetingId], + ); + + const showConsentModal = useCallback(() => { + if (modalOpen) return; + + setModalOpen(true); + + const toastId = toaster.create({ + placement: "top", + duration: null, + render: ({ dismiss }) => ( + { + handleConsent(true); + dismiss(); + }} + onReject={() => { + handleConsent(false); + dismiss(); + }} + /> + ), + }); + + const handleKeyDown = (event: KeyboardEvent) => { + if (event.key === "Escape") { + toastId.then((id) => toaster.dismiss(id)); + } + }; + + keydownHandlerRef.current = handleKeyDown; + document.addEventListener("keydown", handleKeyDown); + + toastId.then((id) => { + intervalRef.current = setInterval(() => { + if (!toaster.isActive(id)) { + setModalOpen(false); + + if (intervalRef.current) { + clearInterval(intervalRef.current); + intervalRef.current = null; + } + + if (keydownHandlerRef.current) { + document.removeEventListener("keydown", keydownHandlerRef.current); + keydownHandlerRef.current = null; + } + } + }, TOAST_CHECK_INTERVAL_MS); + }); + }, [handleConsent, modalOpen]); + + return { + showConsentModal, + consentState, + hasConsent, + consentLoading: audioConsentMutation.isPending, + }; +} diff --git a/www/app/lib/consent/utils.ts b/www/app/lib/consent/utils.ts new file mode 100644 index 00000000..146bdd68 --- /dev/null +++ b/www/app/lib/consent/utils.ts @@ -0,0 +1,13 @@ +import type { components } from "../../reflector-api"; + +type Meeting = components["schemas"]["Meeting"]; + +/** + * Determines if a meeting's recording type requires user consent. + * Currently only "cloud" recordings require consent. + */ +export function recordingTypeRequiresConsent( + recordingType: Meeting["recording_type"], +): boolean { + return recordingType === "cloud"; +} diff --git a/www/app/lib/useLoginRequiredPages.ts b/www/app/lib/useLoginRequiredPages.ts index 37ee96b1..d0dee1b6 100644 --- a/www/app/lib/useLoginRequiredPages.ts +++ b/www/app/lib/useLoginRequiredPages.ts @@ -3,6 +3,7 @@ import { PROTECTED_PAGES } from "./auth"; import { usePathname } from "next/navigation"; import { useAuth } from "./AuthProvider"; import { useEffect } from "react"; +import { featureEnabled } from "./features"; const HOME = "/" as const; @@ -13,7 +14,9 @@ export const useLoginRequiredPages = () => { const isNotLoggedIn = auth.status === "unauthenticated"; // safety const isLastDestination = pathname === HOME; - const shouldRedirect = isNotLoggedIn && isProtected && !isLastDestination; + const requireLogin = featureEnabled("requireLogin"); + const shouldRedirect = + requireLogin && isNotLoggedIn && isProtected && !isLastDestination; useEffect(() => { if (!shouldRedirect) return; // on the backend, the redirect goes straight to the auth provider, but we don't have it because it's hidden inside next-auth middleware diff --git a/www/app/reflector-api.d.ts b/www/app/reflector-api.d.ts index 1dc92f2b..9b9582ba 100644 --- a/www/app/reflector-api.d.ts +++ b/www/app/reflector-api.d.ts @@ -696,6 +696,26 @@ export interface paths { patch?: never; trace?: never; }; + "/v1/webhook": { + parameters: { + query?: never; + header?: never; + path?: never; + cookie?: never; + }; + get?: never; + put?: never; + /** + * Webhook + * @description Handle Daily webhook events. + */ + post: operations["v1_webhook"]; + delete?: never; + options?: never; + head?: never; + patch?: never; + trace?: never; + }; } export type webhooks = Record; export interface components { @@ -852,6 +872,8 @@ export interface components { * @default false */ ics_enabled: boolean; + /** Platform */ + platform?: ("whereby" | "daily") | null; }; /** CreateRoomMeeting */ CreateRoomMeeting: { @@ -877,6 +899,22 @@ export interface components { target_language: string; source_kind?: components["schemas"]["SourceKind"] | null; }; + /** + * DailyWebhookEvent + * @description Daily webhook event structure. + */ + DailyWebhookEvent: { + /** Type */ + type: string; + /** Id */ + id: string; + /** Ts */ + ts: number; + /** Data */ + data: { + [key: string]: unknown; + }; + }; /** DeletionStatus */ DeletionStatus: { /** Status */ @@ -1193,6 +1231,12 @@ export interface components { calendar_metadata?: { [key: string]: unknown; } | null; + /** + * Platform + * @default whereby + * @enum {string} + */ + platform: "whereby" | "daily"; }; /** MeetingConsentRequest */ MeetingConsentRequest: { @@ -1279,6 +1323,12 @@ export interface components { ics_last_sync?: string | null; /** Ics Last Etag */ ics_last_etag?: string | null; + /** + * Platform + * @default whereby + * @enum {string} + */ + platform: "whereby" | "daily"; }; /** RoomDetails */ RoomDetails: { @@ -1325,6 +1375,12 @@ export interface components { ics_last_sync?: string | null; /** Ics Last Etag */ ics_last_etag?: string | null; + /** + * Platform + * @default whereby + * @enum {string} + */ + platform: "whereby" | "daily"; /** Webhook Url */ webhook_url: string | null; /** Webhook Secret */ @@ -1505,6 +1561,8 @@ export interface components { ics_fetch_interval?: number | null; /** Ics Enabled */ ics_enabled?: boolean | null; + /** Platform */ + platform?: ("whereby" | "daily") | null; }; /** UpdateTranscript */ UpdateTranscript: { @@ -3191,4 +3249,37 @@ export interface operations { }; }; }; + v1_webhook: { + parameters: { + query?: never; + header?: never; + path?: never; + cookie?: never; + }; + requestBody: { + content: { + "application/json": components["schemas"]["DailyWebhookEvent"]; + }; + }; + responses: { + /** @description Successful Response */ + 200: { + headers: { + [name: string]: unknown; + }; + content: { + "application/json": unknown; + }; + }; + /** @description Validation Error */ + 422: { + headers: { + [name: string]: unknown; + }; + content: { + "application/json": components["schemas"]["HTTPValidationError"]; + }; + }; + }; + }; } diff --git a/www/package.json b/www/package.json index 5169dbe2..f4412db0 100644 --- a/www/package.json +++ b/www/package.json @@ -14,6 +14,7 @@ }, "dependencies": { "@chakra-ui/react": "^3.24.2", + "@daily-co/daily-js": "^0.84.0", "@emotion/react": "^11.14.0", "@fortawesome/fontawesome-svg-core": "^6.4.0", "@fortawesome/free-solid-svg-icons": "^6.4.0", diff --git a/www/pnpm-lock.yaml b/www/pnpm-lock.yaml index 6c0a3d83..92667b7e 100644 --- a/www/pnpm-lock.yaml +++ b/www/pnpm-lock.yaml @@ -10,6 +10,9 @@ importers: "@chakra-ui/react": specifier: ^3.24.2 version: 3.24.2(@emotion/react@11.14.0(@types/react@18.2.20)(react@18.3.1))(react-dom@18.3.1(react@18.3.1))(react@18.3.1) + "@daily-co/daily-js": + specifier: ^0.84.0 + version: 0.84.0 "@emotion/react": specifier: ^11.14.0 version: 11.14.0(@types/react@18.2.20)(react@18.3.1) @@ -487,6 +490,13 @@ packages: } engines: { node: ">=12" } + "@daily-co/daily-js@0.84.0": + resolution: + { + integrity: sha512-/ynXrMDDkRXhLlHxiFNf9QU5yw4ZGPr56wNARgja/Tiid71UIniundTavCNF5cMb2I1vNoMh7oEJ/q8stg/V7g==, + } + engines: { node: ">=10.0.0" } + "@emnapi/core@1.4.5": resolution: { @@ -2293,6 +2303,13 @@ packages: } engines: { node: ">=18" } + "@sentry-internal/browser-utils@8.55.0": + resolution: + { + integrity: sha512-ROgqtQfpH/82AQIpESPqPQe0UyWywKJsmVIqi3c5Fh+zkds5LUxnssTj3yNd1x+kxaPDVB023jAP+3ibNgeNDw==, + } + engines: { node: ">=14.18" } + "@sentry-internal/feedback@10.11.0": resolution: { @@ -2300,6 +2317,13 @@ packages: } engines: { node: ">=18" } + "@sentry-internal/feedback@8.55.0": + resolution: + { + integrity: sha512-cP3BD/Q6pquVQ+YL+rwCnorKuTXiS9KXW8HNKu4nmmBAyf7urjs+F6Hr1k9MXP5yQ8W3yK7jRWd09Yu6DHWOiw==, + } + engines: { node: ">=14.18" } + "@sentry-internal/replay-canvas@10.11.0": resolution: { @@ -2307,6 +2331,13 @@ packages: } engines: { node: ">=18" } + "@sentry-internal/replay-canvas@8.55.0": + resolution: + { + integrity: sha512-nIkfgRWk1091zHdu4NbocQsxZF1rv1f7bbp3tTIlZYbrH62XVZosx5iHAuZG0Zc48AETLE7K4AX9VGjvQj8i9w==, + } + engines: { node: ">=14.18" } + "@sentry-internal/replay@10.11.0": resolution: { @@ -2314,6 +2345,13 @@ packages: } engines: { node: ">=18" } + "@sentry-internal/replay@8.55.0": + resolution: + { + integrity: sha512-roCDEGkORwolxBn8xAKedybY+Jlefq3xYmgN2fr3BTnsXjSYOPC7D1/mYqINBat99nDtvgFvNfRcZPiwwZ1hSw==, + } + engines: { node: ">=14.18" } + "@sentry/babel-plugin-component-annotate@4.3.0": resolution: { @@ -2328,6 +2366,13 @@ packages: } engines: { node: ">=18" } + "@sentry/browser@8.55.0": + resolution: + { + integrity: sha512-1A31mCEWCjaMxJt6qGUK+aDnLDcK6AwLAZnqpSchNysGni1pSn1RWSmk9TBF8qyTds5FH8B31H480uxMPUJ7Cw==, + } + engines: { node: ">=14.18" } + "@sentry/bundler-plugin-core@4.3.0": resolution: { @@ -2421,6 +2466,13 @@ packages: } engines: { node: ">=18" } + "@sentry/core@8.55.0": + resolution: + { + integrity: sha512-6g7jpbefjHYs821Z+EBJ8r4Z7LT5h80YSWRJaylGS4nW5W5Z2KXzpdnyFarv37O7QjauzVC2E+PABmpkw5/JGA==, + } + engines: { node: ">=14.18" } + "@sentry/nextjs@10.11.0": resolution: { @@ -4029,6 +4081,12 @@ packages: } engines: { node: ">=8" } + bowser@2.12.1: + resolution: + { + integrity: sha512-z4rE2Gxh7tvshQ4hluIT7XcFrgLIQaw9X3A+kTTRdovCz5PMukm/0QC/BKSYPj3omF5Qfypn9O/c5kgpmvYUCw==, + } + brace-expansion@1.1.12: resolution: { @@ -9288,6 +9346,14 @@ snapshots: "@jridgewell/trace-mapping": 0.3.9 optional: true + "@daily-co/daily-js@0.84.0": + dependencies: + "@babel/runtime": 7.28.2 + "@sentry/browser": 8.55.0 + bowser: 2.12.1 + dequal: 2.0.3 + events: 3.3.0 + "@emnapi/core@1.4.5": dependencies: "@emnapi/wasi-threads": 1.0.4 @@ -10506,20 +10572,38 @@ snapshots: dependencies: "@sentry/core": 10.11.0 + "@sentry-internal/browser-utils@8.55.0": + dependencies: + "@sentry/core": 8.55.0 + "@sentry-internal/feedback@10.11.0": dependencies: "@sentry/core": 10.11.0 + "@sentry-internal/feedback@8.55.0": + dependencies: + "@sentry/core": 8.55.0 + "@sentry-internal/replay-canvas@10.11.0": dependencies: "@sentry-internal/replay": 10.11.0 "@sentry/core": 10.11.0 + "@sentry-internal/replay-canvas@8.55.0": + dependencies: + "@sentry-internal/replay": 8.55.0 + "@sentry/core": 8.55.0 + "@sentry-internal/replay@10.11.0": dependencies: "@sentry-internal/browser-utils": 10.11.0 "@sentry/core": 10.11.0 + "@sentry-internal/replay@8.55.0": + dependencies: + "@sentry-internal/browser-utils": 8.55.0 + "@sentry/core": 8.55.0 + "@sentry/babel-plugin-component-annotate@4.3.0": {} "@sentry/browser@10.11.0": @@ -10530,6 +10614,14 @@ snapshots: "@sentry-internal/replay-canvas": 10.11.0 "@sentry/core": 10.11.0 + "@sentry/browser@8.55.0": + dependencies: + "@sentry-internal/browser-utils": 8.55.0 + "@sentry-internal/feedback": 8.55.0 + "@sentry-internal/replay": 8.55.0 + "@sentry-internal/replay-canvas": 8.55.0 + "@sentry/core": 8.55.0 + "@sentry/bundler-plugin-core@4.3.0": dependencies: "@babel/core": 7.28.3 @@ -10590,6 +10682,8 @@ snapshots: "@sentry/core@10.11.0": {} + "@sentry/core@8.55.0": {} + "@sentry/nextjs@10.11.0(@opentelemetry/context-async-hooks@2.1.0(@opentelemetry/api@1.9.0))(@opentelemetry/core@2.1.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.1.0(@opentelemetry/api@1.9.0))(next@15.5.3(@babel/core@7.28.3)(@opentelemetry/api@1.9.0)(babel-plugin-macros@3.1.0)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)(sass@1.90.0))(react@18.3.1)(webpack@5.101.3)": dependencies: "@opentelemetry/api": 1.9.0 @@ -11967,6 +12061,8 @@ snapshots: binary-extensions@2.3.0: {} + bowser@2.12.1: {} + brace-expansion@1.1.12: dependencies: balanced-match: 1.0.2