# Daily.co and Reflector Data Model This document explains the data model relationships between Daily.co's API concepts and Reflector's database schema, clarifying common sources of confusion. --- ## Table of Contents 1. [Core Entities Overview](#core-entities-overview) 2. [Daily.co vs Reflector Terminology](#dailyco-vs-reflector-terminology) 3. [Entity Relationships](#entity-relationships) 4. [Recording Multiplicity](#recording-multiplicity) 5. [Session Identifiers Explained](#session-identifiers-explained) 6. [Time-Based Matching](#time-based-matching) 7. [Multitrack Recording Details](#multitrack-recording-details) 8. [Verified Example](#verified-example) --- ## Core Entities Overview ### Reflector's Four Primary Entities ``` ┌─────────────────────────────────────────────────────────────────┐ │ Room (Reflector) │ │ - Persistent meeting template │ │ - User-created configuration │ │ - Example: "team-standup" │ └────────────────────┬────────────────────────────────────────────┘ │ 1:N ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Meeting (Reflector) │ │ - Single session instance │ │ - Creates NEW Daily.co room with timestamp │ │ - Example: "team-standup-20260115120000" │ └────────────────────┬────────────────────────────────────────────┘ │ 1:N ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Recording (Reflector + Daily.co) │ │ - One segment of audio/video │ │ - New recording created on stop/restart │ │ - track_keys: JSON array of S3 file paths │ └────────────────────┬────────────────────────────────────────────┘ │ 1:1 ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Transcript (Reflector) │ │ - Processed audio with transcription │ │ - Diarization, summaries, topics │ │ - One transcript per recording │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Daily.co vs Reflector Terminology ### Room | Aspect | Daily.co | Reflector | |--------|----------|-----------| | **Definition** | Virtual meeting space on Daily.co platform | User-created meeting template/configuration | | **Lifetime** | Configurable expiration | Persistent until user deletes | | **Creation** | API call for each meeting | Pre-created by user once | | **Reuse** | Can host multiple sessions | Generates new Daily.co room per meeting | | **Name Format** | `room-name` (reusable) | `room-name` (base identifier) | | **Timestamping** | Not required | Meeting adds timestamp: `{name}-YYYYMMDDHHMMSS` | **Example:** ``` Reflector Room: "daily-private-igor" (persistent config) ↓ starts meeting Daily.co Room: "daily-private-igor-20260110042117" ``` ### Meeting | Aspect | Daily.co | Reflector | |--------|----------|-----------| | **Definition** | Session that starts when first participant joins | Explicit database record of a session | | **Identifier** | `mtgSessionId` (generated by Daily.co) | `meeting.id` (UUID, generated by Reflector) | | **Creation** | Implicit (first participant join) | Explicit API call before participants join | | **Purpose** | Tracks active session state | Links recordings, transcripts, participants | | **Scope** | Per room instance | Per Reflector room + timestamp | **Critical Limitation:** Daily.co's recordings API often does NOT return `mtgSessionId`, requiring time-based matching (see [Time-Based Matching](#time-based-matching)). ### Recording | Aspect | Daily.co | Reflector | |--------|----------|-----------| | **Definition** | Audio/video files on S3 | Metadata + processing status | | **Types** | `cloud` (composed video), `raw-tracks` (multitrack) | Stores references + `track_keys` array | | **Multiplicity** | One recording object per start/stop cycle | One DB row per Daily.co recording object | | **Identifier** | Daily.co `recording_id` | Same `recording_id` (stored in DB) | | **Multitrack** | Array of `.webm` files (one per participant) | `track_keys` JSON array with S3 paths | | **Linkage** | Via `room_name` + `start_ts` | FK `meeting_id` (set via time-based match) | **Critical Behavior:** Recording **stops/restarts** create **separate recording objects** with unique IDs. --- ## Entity Relationships ### Database Schema Relationships ```sql -- Simplified schema showing key relationships TABLE room ( id VARCHAR PRIMARY KEY, name VARCHAR UNIQUE, platform VARCHAR -- 'whereby' | 'daily' ) TABLE meeting ( id VARCHAR PRIMARY KEY, room_id VARCHAR REFERENCES room(id) ON DELETE CASCADE, -- nullable room_name VARCHAR, -- Daily.co room name (timestamped) start_date TIMESTAMP, platform VARCHAR ) TABLE recording ( id VARCHAR PRIMARY KEY, -- Daily.co recording_id meeting_id VARCHAR, -- FK to meeting (set via time-based match) bucket_name VARCHAR, object_key VARCHAR, -- S3 prefix track_keys JSON, -- Array of S3 keys for multitrack recorded_at TIMESTAMP ) TABLE transcript ( id VARCHAR PRIMARY KEY, recording_id VARCHAR, -- nullable FK meeting_id VARCHAR, -- nullable FK room_id VARCHAR, -- nullable FK participants JSON, -- [{id, speaker, name, user_id}, ...] title VARCHAR, long_summary VARCHAR, webvtt TEXT ) ``` **Relationship Cardinalities:** ``` 1 Room → N Meetings 1 Meeting → N Recordings (common: 1-21 recordings per meeting) 1 Recording → 1 Transcript 1 Meeting → N Transcripts (via recordings) ``` --- ## Recording Multiplicity ### Why Multiple Recordings Per Meeting? Daily.co creates a **new recording object** (new ID, new files) whenever recording stops and restarts. This happens due to: 1. **Manual stop/start** - User clicks stop, then start recording again 2. **Network reconnection** - Participant drops, reconnects → triggers restart 3. **Participant rejoin** - Last participant leaves, new one joins → new session --- ## Session Identifiers Explained ### The Hidden Entity: Daily.co Meeting Session Daily.co has an **implicit ephemeral entity** that sits between Room and Recording: ``` Daily.co Room: "daily-private-igor-20260110042117" │ ├─ Daily.co Meeting Session #1 (mtgSessionId: c04334de...) │ └─ Recording #3 (f4a50f94) - 4s, 1 track │ └─ Daily.co Meeting Session #2 (mtgSessionId: 4cdae3c0...) ├─ Recording #2 (b0fa94da) - 80s, 2 tracks ← recording stopped └─ Recording #1 (05edf519) - 62s, 1 track ← then restarted ``` **Daily.co Meeting Session:** - **Lifecycle:** Starts when first participant joins, ends when last participant leaves - **Identifier:** `mtgSessionId` (generated by Daily.co) - **Persistence:** Ephemeral - new ID if everyone leaves and someone rejoins - **Relationship:** 1 Session → N Recordings (if recording stops/restarts during session) **Key Insight:** Multiple recordings can share the same `mtgSessionId` if recording was stopped and restarted while participants remained connected. ### mtgSessionId (Meeting Session Identifier) `mtgSessionId` identifies a **Daily.co meeting session** (not individual participants, not a room). ### session_id (Per-Participant) **Different concept:** Per-participant connection identifier from webhooks. **Reflector Tracking:** `daily_participant_session` table ```sql TABLE daily_participant_session ( id VARCHAR PRIMARY KEY, -- {meeting_id}:{user_id}:{joined_at_ms} meeting_id VARCHAR, session_id VARCHAR, -- From webhook (per-participant) user_id VARCHAR, user_name VARCHAR, joined_at TIMESTAMP, left_at TIMESTAMP ) ``` --- ## Time-Based Matching ### Problem Statement Daily.co's recordings API does not reliably return `mtgSessionId`, making it impossible to directly link recordings to meetings via Daily.co's identifiers. **Example API response:** ```json { "id": "recording-uuid", "room_name": "daily-private-igor-20260110042117", "start_ts": 1768018896, "mtgSessionId": null ← Missing! } ``` ### Solution: Time-Based Matching **Implementation:** `reflector/db/meetings.py:get_by_room_name_and_time()` --- ## Multitrack Recording Details ### track_keys JSON Array **Schema:** `recording.track_keys` (JSON, nullable) ```sql -- Example recording with 2 audio tracks { "id": "b0fa94da-73b5-4f95-9239-5216a682a505", "track_keys": [ "igormonadical/daily-private-igor-20260110042117/1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565", "igormonadical/daily-private-igor-20260110042117/1768018896877-9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-1768018899286" ] } ``` **Semantics:** - `track_keys = null` → Not multitrack (cloud recording) - `track_keys = []` → Multitrack recording with no audio captured (silence/muted) - `track_keys = [...]` → Multitrack with N audio tracks **Property:** `recording.is_multitrack` (Python) ```python @property def is_multitrack(self) -> bool: return self.track_keys is not None and len(self.track_keys) > 0 ``` ### Track Filename Format Daily.co multitrack filenames encode timing and participant information: **Format:** `{recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}` **Example:** `1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565` **Parsed Components:** ```python # reflector/utils/daily.py:25-60 class DailyRecordingFilename(NamedTuple): recording_start_ts: int # 1768018896877 (milliseconds) participant_id: str # 890c0eae-e186-4534-a7bd-7c794b7d6d7f track_start_ts: int # 1768018914565 (milliseconds) ``` **Note:** Browser downloads from S3 add `.webm` extension due to MIME headers, but S3 object keys have no extension. ### Video Track Filtering Daily.co API returns both audio and video tracks, but Reflector only processes audio. **Filtering Logic:** `reflector/worker/process.py:660` ```python track_keys = [t.s3Key for t in recording.tracks if t.type == "audio"] ``` **Example API Response:** ```json { "tracks": [ {"type": "audio", "s3Key": "...cam-audio-1768018914565"}, {"type": "audio", "s3Key": "...cam-audio-1768018899286"}, {"type": "video", "s3Key": "...cam-video-1768018897095"} ← Filtered out ] } ``` **Result:** Only 2 audio tracks stored in `recording.track_keys`, video track discarded. **Rationale:** Reflector is audio transcription system; video not needed for processing. ### Track-to-Participant Mapping **Flow:** 1. Daily.co webhook/polling provides `track_keys` array 2. Each track filename contains `participant_id` 3. Reflector queries Daily.co API: `GET /meetings/{mtgSessionId}/participants` 4. Maps `participant_id` → `user_name` 5. Stores in `transcript.participants` JSON: ```json [ { "id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f", "speaker": 0, "name": "test2", "user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22" }, { "id": "9660e8e9-4297-4f17-951d-0b2bf2401803", "speaker": 1, "name": "test", "user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22" } ] ``` **Diarization:** Multitrack recordings don't need speaker diarization AI — speaker identity comes from separate audio tracks. --- ## Example ### Meeting: daily-private-igor-20260110042117 **Context:** User conducted test recording with start/stop cycles, producing 3 recordings. #### Database State ```sql -- Meeting id: 034804b8-cee2-4fb4-94d7-122f6f068a61 room_name: daily-private-igor-20260110042117 start_date: 2026-01-10 04:21:17+00 ``` #### Daily.co API Response ```json [ { "id": "f4a50f94-053c-4f9d-bda6-78ad051fbc36", "room_name": "daily-private-igor-20260110042117", "start_ts": 1768018885, "duration": 4, "status": "finished", "mtgSessionId": "c04334de-42a0-4c2a-96be-a49b068dca85", "tracks": [ {"type": "audio", "s3Key": "...62e8f3ae...cam-audio-1768018885417"} ] }, { "id": "b0fa94da-73b5-4f95-9239-5216a682a505", "room_name": "daily-private-igor-20260110042117", "start_ts": 1768018896, "duration": 80, "status": "finished", "mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345", "tracks": [ {"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914565"}, {"type": "audio", "s3Key": "...9660e8e9...cam-audio-1768018899286"}, {"type": "video", "s3Key": "...9660e8e9...cam-video-1768018897095"} ] }, { "id": "05edf519-9048-4b49-9a75-73e9826fd950", "room_name": "daily-private-igor-20260110042117", "start_ts": 1768018914, "duration": 62, "status": "finished", "mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345", "tracks": [ {"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914948"} ] } ] ``` **Key Observations:** - 3 recording objects returned by Daily.co - 2 different `mtgSessionId` values (2 different meeting instances) - Recording #2 has 3 tracks (2 audio + 1 video) - Timestamps: 1768018885 → 1768018896 (+11s) → 1768018914 (+18s) #### Reflector Database **Recordings:** ``` ┌──────────────────────────────────────┬──────────────┬────────────┬──────────────────────────────────────┐ │ id │ track_count │ duration │ mtgSessionId │ ├──────────────────────────────────────┼──────────────┼────────────┼──────────────────────────────────────┤ │ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 │ 4s │ c04334de-42a0-4c2a-96be-a49b068dca85 │ │ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (video=0) │ 80s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │ │ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 │ 62s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │ └──────────────────────────────────────┴──────────────┴────────────┴──────────────────────────────────────┘ ``` **Note:** Recording #2 has 2 audio tracks (video filtered out), not 3. **Transcripts:** ``` ┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────┬──────────────────────────────────────────────┐ │ id │ recording_id │ participants │ title │ ├──────────────────────────────────────┼──────────────────────────────────────┼──────────────┼──────────────────────────────────────────────┤ │ 17149b1f-546c-4837-80a0-f8140bd16592 │ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 (test) │ (empty - no speech) │ │ 49801332-3222-4c11-bdb2-375479fc87f2 │ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (test, │ "Examination and Validation Procedures │ │ │ │ test2) │ Review" │ │ e5271e12-20fb-42d2-b5a8-21438abadef9 │ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 (test2) │ "Technical Sound Check Procedure Review" │ └──────────────────────────────────────┴──────────────────────────────────────┴──────────────┴──────────────────────────────────────────────┘ ``` **Transcript Content:** *Transcript #1* (17149b1f): Empty WebVTT (no audio captured) *Transcript #2* (49801332): ```webvtt WEBVTT 00:00:03.109 --> 00:00:05.589 Test, test, test. Test, test, test, test, test. 00:00:19.829 --> 00:00:22.710 Test test test test test test test test test test test. ``` **AI-Generated Summary:** > "The meeting focused on the critical importance of rigorous testing for ensuring reliability and quality, with test and test2 emphasizing the need for a structured testing framework and meticulous documentation..." *Transcript #3* (e5271e12): ```webvtt WEBVTT 00:00:02.029 --> 00:00:04.910 Test, test, test, test, test, test, test, test, test, test, test. ``` #### Validation: track_keys → participants **Recording #2 (b0fa94da) tracks:** ```json [ ".../890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-...", ".../9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-..." ] ``` **Transcript #2 (49801332) participants:** ```json [ {"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f", "speaker": 0, "name": "test2"}, {"id": "9660e8e9-4297-4f17-951d-0b2bf2401803", "speaker": 1, "name": "test"} ] ``` ### Data Flow ``` Daily.co API: 3 recordings ↓ Polling: _poll_raw_tracks_recordings() ↓ Worker: process_multitrack_recording.delay() × 3 ↓ DB: 3 recording rows created ↓ Pipeline: Audio processing + transcription × 3 ↓ DB: 3 transcript rows created (1:1 with recordings) ↓ UI: User sees 3 separate transcripts ``` **Result:** ✅ 1:1 Recording → Transcript relationship maintained. --- **Document Version:** 1.0 **Last Verified:** 2026-01-15 **Data Source:** Production database + Daily.co API inspection