22 KiB
Daily.co and Reflector Data Model
This document explains the data model relationships between Daily.co's API concepts and Reflector's database schema, clarifying common sources of confusion.
Table of Contents
- Core Entities Overview
- Daily.co vs Reflector Terminology
- Entity Relationships
- Recording Multiplicity
- Session Identifiers Explained
- Time-Based Matching
- Multitrack Recording Details
- Verified Example
Core Entities Overview
Reflector's Four Primary Entities
┌─────────────────────────────────────────────────────────────────┐
│ Room (Reflector) │
│ - Persistent meeting template │
│ - User-created configuration │
│ - Example: "team-standup" │
└────────────────────┬────────────────────────────────────────────┘
│ 1:N
▼
┌─────────────────────────────────────────────────────────────────┐
│ Meeting (Reflector) │
│ - Single session instance │
│ - Creates NEW Daily.co room with timestamp │
│ - Example: "team-standup-20260115120000" │
└────────────────────┬────────────────────────────────────────────┘
│ 1:N
▼
┌─────────────────────────────────────────────────────────────────┐
│ Recording (Reflector + Daily.co) │
│ - One segment of audio/video │
│ - New recording created on stop/restart │
│ - track_keys: JSON array of S3 file paths │
└────────────────────┬────────────────────────────────────────────┘
│ 1:1
▼
┌─────────────────────────────────────────────────────────────────┐
│ Transcript (Reflector) │
│ - Processed audio with transcription │
│ - Diarization, summaries, topics │
│ - One transcript per recording │
└─────────────────────────────────────────────────────────────────┘
Daily.co vs Reflector Terminology
Room
| Aspect | Daily.co | Reflector |
|---|---|---|
| Definition | Virtual meeting space on Daily.co platform | User-created meeting template/configuration |
| Lifetime | Configurable expiration | Persistent until user deletes |
| Creation | API call for each meeting | Pre-created by user once |
| Reuse | Can host multiple sessions | Generates new Daily.co room per meeting |
| Name Format | room-name (reusable) |
room-name (base identifier) |
| Timestamping | Not required | Meeting adds timestamp: {name}-YYYYMMDDHHMMSS |
Example:
Reflector Room: "daily-private-igor" (persistent config)
↓ starts meeting
Daily.co Room: "daily-private-igor-20260110042117"
Meeting
| Aspect | Daily.co | Reflector |
|---|---|---|
| Definition | Session that starts when first participant joins | Explicit database record of a session |
| Identifier | mtgSessionId (generated by Daily.co) |
meeting.id (UUID, generated by Reflector) |
| Creation | Implicit (first participant join) | Explicit API call before participants join |
| Purpose | Tracks active session state | Links recordings, transcripts, participants |
| Scope | Per room instance | Per Reflector room + timestamp |
Critical Limitation: Daily.co's recordings API often does NOT return mtgSessionId (can be null), requiring time-based matching (see Time-Based Matching).
Recording
| Aspect | Daily.co | Reflector |
|---|---|---|
| Definition | Audio/video files on S3 | Metadata + processing status |
| Types | cloud (composed video), raw-tracks (multitrack) |
Stores references + track_keys array |
| Multiplicity | One recording object per start/stop cycle | One DB row per Daily.co recording object |
| Identifier | Daily.co recording_id |
Same recording_id (stored in DB) |
| Multitrack | Array of .webm files (one per participant) |
track_keys JSON array with S3 paths |
| Linkage | Via room_name + start_ts |
FK meeting_id (set via time-based match) |
Critical Behavior: Recording stops/restarts create separate recording objects with unique IDs.
instanceId (Reflector-Generated)
Definition: UUID we generate and send when starting recording via REST API.
Generation: Deterministic from meeting_id
- Cloud:
instanceId = meeting_iddirectly - Raw-tracks:
instanceId = UUIDv5(meeting_id, namespace)
Key behaviors:
- ✅ Reuse allowed: Same instanceId can be used after stop (validated 2026-01-20)
- ❌ Not returned: Daily.co does NOT echo instanceId back in GET /recordings response
- ✅ Present in error webhooks:
recording.errorwebhook includes instanceId - Purpose: Allows multiple concurrent recordings (cloud + raw-tracks) in same room
Stop/restart example:
Recording 1: POST /start with instanceId="779e6376..." → recording_id="ee00c4e8..."
Stop recording
Recording 2: POST /start with instanceId="779e6376..." (SAME) → recording_id="b702f509..." (DIFFERENT)
✅ Both succeed, different recording_ids returned
Implication: Cannot match recordings by instanceId (not in response) - must use recording_id.
Entity Relationships
Database Schema Relationships
-- Simplified schema showing key relationships
TABLE room (
id VARCHAR PRIMARY KEY,
name VARCHAR UNIQUE,
platform VARCHAR -- 'whereby' | 'daily'
)
TABLE meeting (
id VARCHAR PRIMARY KEY,
room_id VARCHAR REFERENCES room(id) ON DELETE CASCADE, -- nullable
room_name VARCHAR, -- Daily.co room name (timestamped)
start_date TIMESTAMP,
platform VARCHAR
)
TABLE recording (
id VARCHAR PRIMARY KEY, -- Daily.co recording_id
meeting_id VARCHAR, -- FK to meeting (set via time-based match)
bucket_name VARCHAR,
object_key VARCHAR, -- S3 prefix
track_keys JSON, -- Array of S3 keys for multitrack
recorded_at TIMESTAMP
)
TABLE transcript (
id VARCHAR PRIMARY KEY,
recording_id VARCHAR, -- nullable FK
meeting_id VARCHAR, -- nullable FK
room_id VARCHAR, -- nullable FK
participants JSON, -- [{id, speaker, name, user_id}, ...]
title VARCHAR,
long_summary VARCHAR,
webvtt TEXT
)
Relationship Cardinalities:
1 Room → N Meetings
1 Meeting → N Recordings (common: 1-21 recordings per meeting)
1 Recording → 1 Transcript
1 Meeting → N Transcripts (via recordings)
Recording Multiplicity
Why Multiple Recordings Per Meeting?
Daily.co creates a new recording object (new ID, new files) whenever recording stops and restarts. This happens due to:
- Manual stop/start - User clicks stop, then start recording again
- Network reconnection - Participant drops, reconnects → triggers restart
- Participant rejoin - Last participant leaves, new one joins → new session
Session Identifiers Explained
The Hidden Entity: Daily.co Meeting Session
Daily.co has an implicit ephemeral entity that sits between Room and Recording:
Daily.co Room: "daily-private-igor-20260110042117"
│
├─ Daily.co Meeting Session #1 (mtgSessionId: c04334de...)
│ └─ Recording #3 (f4a50f94) - 4s, 1 track
│
└─ Daily.co Meeting Session #2 (mtgSessionId: 4cdae3c0...)
├─ Recording #2 (b0fa94da) - 80s, 2 tracks ← recording stopped
└─ Recording #1 (05edf519) - 62s, 1 track ← then restarted
Daily.co Meeting Session:
- Lifecycle: Starts when first participant joins, ends when last participant leaves
- Identifier:
mtgSessionId(generated by Daily.co) - Persistence: Ephemeral - new ID if everyone leaves and someone rejoins
- Relationship: 1 Session → N Recordings (if recording stops/restarts during session)
Key Insight: Multiple recordings can share the same mtgSessionId if recording was stopped and restarted while participants remained connected.
mtgSessionId (Meeting Session Identifier)
mtgSessionId identifies a Daily.co meeting session (not individual participants, not a room).
Reliability: Can be null or present in GET /recordings response (unreliable).
When present: Multiple recordings from same session (stop/restart with participants connected) share same mtgSessionId.
Example (validated 2026-01-20):
Recording 1: {"id": "ee00c4e8...", "mtgSessionId": "92c4136a-a8da-41c5-9c45-e9a2baae6bd6"}
Recording 2: {"id": "b702f509...", "mtgSessionId": "92c4136a-a8da-41c5-9c45-e9a2baae6bd6"}
// Same mtgSessionId (stop/restart in same session)
When null: Common - Daily.co API does not reliably populate this field.
session_id (Per-Participant)
Different concept: Per-participant connection identifier from webhooks.
Reflector Tracking: daily_participant_session table
TABLE daily_participant_session (
id VARCHAR PRIMARY KEY, -- {meeting_id}:{user_id}:{joined_at_ms}
meeting_id VARCHAR,
session_id VARCHAR, -- From webhook (per-participant)
user_id VARCHAR,
user_name VARCHAR,
joined_at TIMESTAMP,
left_at TIMESTAMP
)
Time-Based Matching
Problem Statement
Daily.co's recordings API does not reliably return mtgSessionId, making it impossible to directly link recordings to meetings via Daily.co's identifiers.
Example API response (mtgSessionId can be null OR present):
{
"id": "recording-uuid",
"room_name": "daily-private-igor-20260110042117",
"start_ts": 1768018896,
"mtgSessionId": null // ← Often null (unreliable)
}
// OR (when present):
{
"id": "recording-uuid",
"mtgSessionId": "92c4136a-a8da-41c5-9c45-e9a2baae6bd6" // ← Sometimes present
}
Key insight: Cannot rely on mtgSessionId for matching (unreliable). instanceId also not returned. Only reliable identifier is recording.id.
Solution: Time-Based Matching
Implementation: reflector/db/meetings.py:get_by_room_name_and_time()
Multitrack Recording Details
track_keys JSON Array
Schema: recording.track_keys (JSON, nullable)
-- Example recording with 2 audio tracks
{
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
"track_keys": [
"igormonadical/daily-private-igor-20260110042117/1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565",
"igormonadical/daily-private-igor-20260110042117/1768018896877-9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-1768018899286"
]
}
Semantics:
track_keys = null→ Not multitrack (cloud recording)track_keys = []→ Multitrack recording with no audio captured (silence/muted)track_keys = [...]→ Multitrack with N audio tracks
Property: recording.is_multitrack (Python)
@property
def is_multitrack(self) -> bool:
return self.track_keys is not None and len(self.track_keys) > 0
Track Filename Format
Daily.co multitrack filenames encode timing and participant information:
Format: {recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}
Example: 1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565
Parsed Components:
# reflector/utils/daily.py:25-60
class DailyRecordingFilename(NamedTuple):
recording_start_ts: int # 1768018896877 (milliseconds)
participant_id: str # 890c0eae-e186-4534-a7bd-7c794b7d6d7f
track_start_ts: int # 1768018914565 (milliseconds)
Note: Browser downloads from S3 add .webm extension due to MIME headers, but S3 object keys have no extension.
Video Track Filtering
Daily.co API returns both audio and video tracks, but Reflector only processes audio.
Filtering Logic: reflector/worker/process.py:660
track_keys = [t.s3Key for t in recording.tracks if t.type == "audio"]
Example API Response:
{
"tracks": [
{"type": "audio", "s3Key": "...cam-audio-1768018914565"},
{"type": "audio", "s3Key": "...cam-audio-1768018899286"},
{"type": "video", "s3Key": "...cam-video-1768018897095"} ← Filtered out
]
}
Result: Only 2 audio tracks stored in recording.track_keys, video track discarded.
Rationale: Reflector is audio transcription system; video not needed for processing.
Track-to-Participant Mapping
Flow:
- Daily.co webhook/polling provides
track_keysarray - Each track filename contains
participant_id - Reflector queries Daily.co API:
GET /meetings/{mtgSessionId}/participants - Maps
participant_id→user_name - Stores in
transcript.participantsJSON:
[
{
"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f",
"speaker": 0,
"name": "test2",
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
},
{
"id": "9660e8e9-4297-4f17-951d-0b2bf2401803",
"speaker": 1,
"name": "test",
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
}
]
Diarization: Multitrack recordings don't need speaker diarization AI — speaker identity comes from separate audio tracks.
Example
Meeting: daily-private-igor-20260110042117
Context: User conducted test recording with start/stop cycles, producing 3 recordings.
Database State
-- Meeting
id: 034804b8-cee2-4fb4-94d7-122f6f068a61
room_name: daily-private-igor-20260110042117
start_date: 2026-01-10 04:21:17+00
Daily.co API Response
[
{
"id": "f4a50f94-053c-4f9d-bda6-78ad051fbc36",
"room_name": "daily-private-igor-20260110042117",
"start_ts": 1768018885,
"duration": 4,
"status": "finished",
"mtgSessionId": "c04334de-42a0-4c2a-96be-a49b068dca85",
"tracks": [
{"type": "audio", "s3Key": "...62e8f3ae...cam-audio-1768018885417"}
]
},
{
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
"room_name": "daily-private-igor-20260110042117",
"start_ts": 1768018896,
"duration": 80,
"status": "finished",
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
"tracks": [
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914565"},
{"type": "audio", "s3Key": "...9660e8e9...cam-audio-1768018899286"},
{"type": "video", "s3Key": "...9660e8e9...cam-video-1768018897095"}
]
},
{
"id": "05edf519-9048-4b49-9a75-73e9826fd950",
"room_name": "daily-private-igor-20260110042117",
"start_ts": 1768018914,
"duration": 62,
"status": "finished",
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
"tracks": [
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914948"}
]
}
]
Key Observations:
- 3 recording objects returned by Daily.co
- 2 different
mtgSessionIdvalues (2 different meeting instances) - Recording #2 has 3 tracks (2 audio + 1 video)
- Timestamps: 1768018885 → 1768018896 (+11s) → 1768018914 (+18s)
Reflector Database
Recordings:
┌──────────────────────────────────────┬──────────────┬────────────┬──────────────────────────────────────┐
│ id │ track_count │ duration │ mtgSessionId │
├──────────────────────────────────────┼──────────────┼────────────┼──────────────────────────────────────┤
│ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 │ 4s │ c04334de-42a0-4c2a-96be-a49b068dca85 │
│ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (video=0) │ 80s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
│ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 │ 62s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
└──────────────────────────────────────┴──────────────┴────────────┴──────────────────────────────────────┘
Note: Recording #2 has 2 audio tracks (video filtered out), not 3.
Transcripts:
┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────┬──────────────────────────────────────────────┐
│ id │ recording_id │ participants │ title │
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────┼──────────────────────────────────────────────┤
│ 17149b1f-546c-4837-80a0-f8140bd16592 │ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 (test) │ (empty - no speech) │
│ 49801332-3222-4c11-bdb2-375479fc87f2 │ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (test, │ "Examination and Validation Procedures │
│ │ │ test2) │ Review" │
│ e5271e12-20fb-42d2-b5a8-21438abadef9 │ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 (test2) │ "Technical Sound Check Procedure Review" │
└──────────────────────────────────────┴──────────────────────────────────────┴──────────────┴──────────────────────────────────────────────┘
Transcript Content:
Transcript #1 (17149b1f): Empty WebVTT (no audio captured)
Transcript #2 (49801332):
WEBVTT
00:00:03.109 --> 00:00:05.589
<v Speaker1>Test, test, test. Test, test, test, test, test.
00:00:19.829 --> 00:00:22.710
<v Speaker0>Test test test test test test test test test test test.
AI-Generated Summary:
"The meeting focused on the critical importance of rigorous testing for ensuring reliability and quality, with test and test2 emphasizing the need for a structured testing framework and meticulous documentation..."
Transcript #3 (e5271e12):
WEBVTT
00:00:02.029 --> 00:00:04.910
<v Speaker0>Test, test, test, test, test, test, test, test, test, test, test.
Validation: track_keys → participants
Recording #2 (b0fa94da) tracks:
[
".../890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-...",
".../9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-..."
]
Transcript #2 (49801332) participants:
[
{"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f", "speaker": 0, "name": "test2"},
{"id": "9660e8e9-4297-4f17-951d-0b2bf2401803", "speaker": 1, "name": "test"}
]
Data Flow
Daily.co API: 3 recordings
↓
Polling: _poll_raw_tracks_recordings()
↓
Worker: process_multitrack_recording.delay() × 3
↓
DB: 3 recording rows created
↓
Pipeline: Audio processing + transcription × 3
↓
DB: 3 transcript rows created (1:1 with recordings)
↓
UI: User sees 3 separate transcripts
Result: ✅ 1:1 Recording → Transcript relationship maintained.
Document Version: 1.1 Last Updated: 2026-01-20 Data Source: Production database + Daily.co API inspection + empirical testing Changes in 1.1:
- Added instanceId behavior documentation (reuse allowed, not returned in API)
- Clarified mtgSessionId reliability (can be null or present)
- Added empirical validation of stop/restart behavior