mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-02-04 09:56:47 +00:00
* brady bunch PRD/tasks * clean dead daily.co code * brady bunch prototype (no-mistakes) * brady bunch prototype (no-mistakes) review * self-review * daily poll time match (no-mistakes) * daily poll self-review (no-mistakes) * daily poll self-review (no-mistakes) * daily co doc * cleanup * cleanup * self-review (no-mistakes) * self-review (no-mistakes) * self-review * self-review * ui typefix * dupe calls error handling proper * daily reflector data model doc * logging style fix * migration merge --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
497 lines
20 KiB
Markdown
497 lines
20 KiB
Markdown
# Daily.co and Reflector Data Model
|
||
|
||
This document explains the data model relationships between Daily.co's API concepts and Reflector's database schema, clarifying common sources of confusion.
|
||
|
||
---
|
||
|
||
## Table of Contents
|
||
|
||
1. [Core Entities Overview](#core-entities-overview)
|
||
2. [Daily.co vs Reflector Terminology](#dailyco-vs-reflector-terminology)
|
||
3. [Entity Relationships](#entity-relationships)
|
||
4. [Recording Multiplicity](#recording-multiplicity)
|
||
5. [Session Identifiers Explained](#session-identifiers-explained)
|
||
6. [Time-Based Matching](#time-based-matching)
|
||
7. [Multitrack Recording Details](#multitrack-recording-details)
|
||
8. [Verified Example](#verified-example)
|
||
|
||
---
|
||
|
||
## Core Entities Overview
|
||
|
||
### Reflector's Four Primary Entities
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Room (Reflector) │
|
||
│ - Persistent meeting template │
|
||
│ - User-created configuration │
|
||
│ - Example: "team-standup" │
|
||
└────────────────────┬────────────────────────────────────────────┘
|
||
│ 1:N
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Meeting (Reflector) │
|
||
│ - Single session instance │
|
||
│ - Creates NEW Daily.co room with timestamp │
|
||
│ - Example: "team-standup-20260115120000" │
|
||
└────────────────────┬────────────────────────────────────────────┘
|
||
│ 1:N
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Recording (Reflector + Daily.co) │
|
||
│ - One segment of audio/video │
|
||
│ - New recording created on stop/restart │
|
||
│ - track_keys: JSON array of S3 file paths │
|
||
└────────────────────┬────────────────────────────────────────────┘
|
||
│ 1:1
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Transcript (Reflector) │
|
||
│ - Processed audio with transcription │
|
||
│ - Diarization, summaries, topics │
|
||
│ - One transcript per recording │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Daily.co vs Reflector Terminology
|
||
|
||
### Room
|
||
|
||
| Aspect | Daily.co | Reflector |
|
||
|--------|----------|-----------|
|
||
| **Definition** | Virtual meeting space on Daily.co platform | User-created meeting template/configuration |
|
||
| **Lifetime** | Configurable expiration | Persistent until user deletes |
|
||
| **Creation** | API call for each meeting | Pre-created by user once |
|
||
| **Reuse** | Can host multiple sessions | Generates new Daily.co room per meeting |
|
||
| **Name Format** | `room-name` (reusable) | `room-name` (base identifier) |
|
||
| **Timestamping** | Not required | Meeting adds timestamp: `{name}-YYYYMMDDHHMMSS` |
|
||
|
||
**Example:**
|
||
```
|
||
Reflector Room: "daily-private-igor" (persistent config)
|
||
↓ starts meeting
|
||
Daily.co Room: "daily-private-igor-20260110042117"
|
||
```
|
||
|
||
### Meeting
|
||
|
||
| Aspect | Daily.co | Reflector |
|
||
|--------|----------|-----------|
|
||
| **Definition** | Session that starts when first participant joins | Explicit database record of a session |
|
||
| **Identifier** | `mtgSessionId` (generated by Daily.co) | `meeting.id` (UUID, generated by Reflector) |
|
||
| **Creation** | Implicit (first participant join) | Explicit API call before participants join |
|
||
| **Purpose** | Tracks active session state | Links recordings, transcripts, participants |
|
||
| **Scope** | Per room instance | Per Reflector room + timestamp |
|
||
|
||
**Critical Limitation:** Daily.co's recordings API often does NOT return `mtgSessionId`, requiring time-based matching (see [Time-Based Matching](#time-based-matching)).
|
||
|
||
### Recording
|
||
|
||
| Aspect | Daily.co | Reflector |
|
||
|--------|----------|-----------|
|
||
| **Definition** | Audio/video files on S3 | Metadata + processing status |
|
||
| **Types** | `cloud` (composed video), `raw-tracks` (multitrack) | Stores references + `track_keys` array |
|
||
| **Multiplicity** | One recording object per start/stop cycle | One DB row per Daily.co recording object |
|
||
| **Identifier** | Daily.co `recording_id` | Same `recording_id` (stored in DB) |
|
||
| **Multitrack** | Array of `.webm` files (one per participant) | `track_keys` JSON array with S3 paths |
|
||
| **Linkage** | Via `room_name` + `start_ts` | FK `meeting_id` (set via time-based match) |
|
||
|
||
**Critical Behavior:** Recording **stops/restarts** create **separate recording objects** with unique IDs.
|
||
|
||
---
|
||
|
||
## Entity Relationships
|
||
|
||
### Database Schema Relationships
|
||
|
||
```sql
|
||
-- Simplified schema showing key relationships
|
||
|
||
TABLE room (
|
||
id VARCHAR PRIMARY KEY,
|
||
name VARCHAR UNIQUE,
|
||
platform VARCHAR -- 'whereby' | 'daily'
|
||
)
|
||
|
||
TABLE meeting (
|
||
id VARCHAR PRIMARY KEY,
|
||
room_id VARCHAR REFERENCES room(id) ON DELETE CASCADE, -- nullable
|
||
room_name VARCHAR, -- Daily.co room name (timestamped)
|
||
start_date TIMESTAMP,
|
||
platform VARCHAR
|
||
)
|
||
|
||
TABLE recording (
|
||
id VARCHAR PRIMARY KEY, -- Daily.co recording_id
|
||
meeting_id VARCHAR, -- FK to meeting (set via time-based match)
|
||
bucket_name VARCHAR,
|
||
object_key VARCHAR, -- S3 prefix
|
||
track_keys JSON, -- Array of S3 keys for multitrack
|
||
recorded_at TIMESTAMP
|
||
)
|
||
|
||
TABLE transcript (
|
||
id VARCHAR PRIMARY KEY,
|
||
recording_id VARCHAR, -- nullable FK
|
||
meeting_id VARCHAR, -- nullable FK
|
||
room_id VARCHAR, -- nullable FK
|
||
participants JSON, -- [{id, speaker, name, user_id}, ...]
|
||
title VARCHAR,
|
||
long_summary VARCHAR,
|
||
webvtt TEXT
|
||
)
|
||
```
|
||
|
||
**Relationship Cardinalities:**
|
||
```
|
||
1 Room → N Meetings
|
||
1 Meeting → N Recordings (common: 1-21 recordings per meeting)
|
||
1 Recording → 1 Transcript
|
||
1 Meeting → N Transcripts (via recordings)
|
||
```
|
||
|
||
---
|
||
|
||
## Recording Multiplicity
|
||
|
||
### Why Multiple Recordings Per Meeting?
|
||
|
||
Daily.co creates a **new recording object** (new ID, new files) whenever recording stops and restarts. This happens due to:
|
||
|
||
1. **Manual stop/start** - User clicks stop, then start recording again
|
||
2. **Network reconnection** - Participant drops, reconnects → triggers restart
|
||
3. **Participant rejoin** - Last participant leaves, new one joins → new session
|
||
|
||
---
|
||
|
||
## Session Identifiers Explained
|
||
|
||
### The Hidden Entity: Daily.co Meeting Session
|
||
|
||
Daily.co has an **implicit ephemeral entity** that sits between Room and Recording:
|
||
|
||
```
|
||
Daily.co Room: "daily-private-igor-20260110042117"
|
||
│
|
||
├─ Daily.co Meeting Session #1 (mtgSessionId: c04334de...)
|
||
│ └─ Recording #3 (f4a50f94) - 4s, 1 track
|
||
│
|
||
└─ Daily.co Meeting Session #2 (mtgSessionId: 4cdae3c0...)
|
||
├─ Recording #2 (b0fa94da) - 80s, 2 tracks ← recording stopped
|
||
└─ Recording #1 (05edf519) - 62s, 1 track ← then restarted
|
||
```
|
||
|
||
**Daily.co Meeting Session:**
|
||
- **Lifecycle:** Starts when first participant joins, ends when last participant leaves
|
||
- **Identifier:** `mtgSessionId` (generated by Daily.co)
|
||
- **Persistence:** Ephemeral - new ID if everyone leaves and someone rejoins
|
||
- **Relationship:** 1 Session → N Recordings (if recording stops/restarts during session)
|
||
|
||
**Key Insight:** Multiple recordings can share the same `mtgSessionId` if recording was stopped and restarted while participants remained connected.
|
||
|
||
### mtgSessionId (Meeting Session Identifier)
|
||
|
||
`mtgSessionId` identifies a **Daily.co meeting session** (not individual participants, not a room).
|
||
|
||
### session_id (Per-Participant)
|
||
|
||
**Different concept:** Per-participant connection identifier from webhooks.
|
||
|
||
**Reflector Tracking:** `daily_participant_session` table
|
||
```sql
|
||
TABLE daily_participant_session (
|
||
id VARCHAR PRIMARY KEY, -- {meeting_id}:{user_id}:{joined_at_ms}
|
||
meeting_id VARCHAR,
|
||
session_id VARCHAR, -- From webhook (per-participant)
|
||
user_id VARCHAR,
|
||
user_name VARCHAR,
|
||
joined_at TIMESTAMP,
|
||
left_at TIMESTAMP
|
||
)
|
||
```
|
||
---
|
||
|
||
## Time-Based Matching
|
||
|
||
### Problem Statement
|
||
|
||
Daily.co's recordings API does not reliably return `mtgSessionId`, making it impossible to directly link recordings to meetings via Daily.co's identifiers.
|
||
|
||
**Example API response:**
|
||
```json
|
||
{
|
||
"id": "recording-uuid",
|
||
"room_name": "daily-private-igor-20260110042117",
|
||
"start_ts": 1768018896,
|
||
"mtgSessionId": null ← Missing!
|
||
}
|
||
```
|
||
|
||
### Solution: Time-Based Matching
|
||
|
||
**Implementation:** `reflector/db/meetings.py:get_by_room_name_and_time()`
|
||
|
||
|
||
---
|
||
|
||
## Multitrack Recording Details
|
||
|
||
### track_keys JSON Array
|
||
|
||
**Schema:** `recording.track_keys` (JSON, nullable)
|
||
```sql
|
||
-- Example recording with 2 audio tracks
|
||
{
|
||
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
|
||
"track_keys": [
|
||
"igormonadical/daily-private-igor-20260110042117/1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565",
|
||
"igormonadical/daily-private-igor-20260110042117/1768018896877-9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-1768018899286"
|
||
]
|
||
}
|
||
```
|
||
|
||
**Semantics:**
|
||
- `track_keys = null` → Not multitrack (cloud recording)
|
||
- `track_keys = []` → Multitrack recording with no audio captured (silence/muted)
|
||
- `track_keys = [...]` → Multitrack with N audio tracks
|
||
|
||
**Property:** `recording.is_multitrack` (Python)
|
||
```python
|
||
@property
|
||
def is_multitrack(self) -> bool:
|
||
return self.track_keys is not None and len(self.track_keys) > 0
|
||
```
|
||
|
||
### Track Filename Format
|
||
|
||
Daily.co multitrack filenames encode timing and participant information:
|
||
|
||
**Format:** `{recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}`
|
||
|
||
**Example:** `1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565`
|
||
|
||
**Parsed Components:**
|
||
```python
|
||
# reflector/utils/daily.py:25-60
|
||
class DailyRecordingFilename(NamedTuple):
|
||
recording_start_ts: int # 1768018896877 (milliseconds)
|
||
participant_id: str # 890c0eae-e186-4534-a7bd-7c794b7d6d7f
|
||
track_start_ts: int # 1768018914565 (milliseconds)
|
||
```
|
||
|
||
**Note:** Browser downloads from S3 add `.webm` extension due to MIME headers, but S3 object keys have no extension.
|
||
|
||
### Video Track Filtering
|
||
|
||
Daily.co API returns both audio and video tracks, but Reflector only processes audio.
|
||
|
||
**Filtering Logic:** `reflector/worker/process.py:660`
|
||
```python
|
||
track_keys = [t.s3Key for t in recording.tracks if t.type == "audio"]
|
||
```
|
||
|
||
**Example API Response:**
|
||
```json
|
||
{
|
||
"tracks": [
|
||
{"type": "audio", "s3Key": "...cam-audio-1768018914565"},
|
||
{"type": "audio", "s3Key": "...cam-audio-1768018899286"},
|
||
{"type": "video", "s3Key": "...cam-video-1768018897095"} ← Filtered out
|
||
]
|
||
}
|
||
```
|
||
|
||
**Result:** Only 2 audio tracks stored in `recording.track_keys`, video track discarded.
|
||
|
||
**Rationale:** Reflector is audio transcription system; video not needed for processing.
|
||
|
||
### Track-to-Participant Mapping
|
||
|
||
**Flow:**
|
||
1. Daily.co webhook/polling provides `track_keys` array
|
||
2. Each track filename contains `participant_id`
|
||
3. Reflector queries Daily.co API: `GET /meetings/{mtgSessionId}/participants`
|
||
4. Maps `participant_id` → `user_name`
|
||
5. Stores in `transcript.participants` JSON:
|
||
```json
|
||
[
|
||
{
|
||
"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f",
|
||
"speaker": 0,
|
||
"name": "test2",
|
||
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
|
||
},
|
||
{
|
||
"id": "9660e8e9-4297-4f17-951d-0b2bf2401803",
|
||
"speaker": 1,
|
||
"name": "test",
|
||
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
|
||
}
|
||
]
|
||
```
|
||
|
||
**Diarization:** Multitrack recordings don't need speaker diarization AI — speaker identity comes from separate audio tracks.
|
||
|
||
---
|
||
|
||
## Example
|
||
|
||
### Meeting: daily-private-igor-20260110042117
|
||
|
||
**Context:** User conducted test recording with start/stop cycles, producing 3 recordings.
|
||
|
||
#### Database State
|
||
|
||
```sql
|
||
-- Meeting
|
||
id: 034804b8-cee2-4fb4-94d7-122f6f068a61
|
||
room_name: daily-private-igor-20260110042117
|
||
start_date: 2026-01-10 04:21:17+00
|
||
```
|
||
|
||
#### Daily.co API Response
|
||
|
||
```json
|
||
[
|
||
{
|
||
"id": "f4a50f94-053c-4f9d-bda6-78ad051fbc36",
|
||
"room_name": "daily-private-igor-20260110042117",
|
||
"start_ts": 1768018885,
|
||
"duration": 4,
|
||
"status": "finished",
|
||
"mtgSessionId": "c04334de-42a0-4c2a-96be-a49b068dca85",
|
||
"tracks": [
|
||
{"type": "audio", "s3Key": "...62e8f3ae...cam-audio-1768018885417"}
|
||
]
|
||
},
|
||
{
|
||
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
|
||
"room_name": "daily-private-igor-20260110042117",
|
||
"start_ts": 1768018896,
|
||
"duration": 80,
|
||
"status": "finished",
|
||
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
|
||
"tracks": [
|
||
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914565"},
|
||
{"type": "audio", "s3Key": "...9660e8e9...cam-audio-1768018899286"},
|
||
{"type": "video", "s3Key": "...9660e8e9...cam-video-1768018897095"}
|
||
]
|
||
},
|
||
{
|
||
"id": "05edf519-9048-4b49-9a75-73e9826fd950",
|
||
"room_name": "daily-private-igor-20260110042117",
|
||
"start_ts": 1768018914,
|
||
"duration": 62,
|
||
"status": "finished",
|
||
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
|
||
"tracks": [
|
||
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914948"}
|
||
]
|
||
}
|
||
]
|
||
```
|
||
|
||
**Key Observations:**
|
||
- 3 recording objects returned by Daily.co
|
||
- 2 different `mtgSessionId` values (2 different meeting instances)
|
||
- Recording #2 has 3 tracks (2 audio + 1 video)
|
||
- Timestamps: 1768018885 → 1768018896 (+11s) → 1768018914 (+18s)
|
||
|
||
#### Reflector Database
|
||
|
||
**Recordings:**
|
||
```
|
||
┌──────────────────────────────────────┬──────────────┬────────────┬──────────────────────────────────────┐
|
||
│ id │ track_count │ duration │ mtgSessionId │
|
||
├──────────────────────────────────────┼──────────────┼────────────┼──────────────────────────────────────┤
|
||
│ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 │ 4s │ c04334de-42a0-4c2a-96be-a49b068dca85 │
|
||
│ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (video=0) │ 80s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
|
||
│ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 │ 62s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
|
||
└──────────────────────────────────────┴──────────────┴────────────┴──────────────────────────────────────┘
|
||
```
|
||
**Note:** Recording #2 has 2 audio tracks (video filtered out), not 3.
|
||
|
||
**Transcripts:**
|
||
```
|
||
┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────┬──────────────────────────────────────────────┐
|
||
│ id │ recording_id │ participants │ title │
|
||
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────┼──────────────────────────────────────────────┤
|
||
│ 17149b1f-546c-4837-80a0-f8140bd16592 │ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 (test) │ (empty - no speech) │
|
||
│ 49801332-3222-4c11-bdb2-375479fc87f2 │ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (test, │ "Examination and Validation Procedures │
|
||
│ │ │ test2) │ Review" │
|
||
│ e5271e12-20fb-42d2-b5a8-21438abadef9 │ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 (test2) │ "Technical Sound Check Procedure Review" │
|
||
└──────────────────────────────────────┴──────────────────────────────────────┴──────────────┴──────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Transcript Content:**
|
||
|
||
*Transcript #1* (17149b1f): Empty WebVTT (no audio captured)
|
||
|
||
*Transcript #2* (49801332):
|
||
```webvtt
|
||
WEBVTT
|
||
|
||
00:00:03.109 --> 00:00:05.589
|
||
<v Speaker1>Test, test, test. Test, test, test, test, test.
|
||
|
||
00:00:19.829 --> 00:00:22.710
|
||
<v Speaker0>Test test test test test test test test test test test.
|
||
```
|
||
**AI-Generated Summary:**
|
||
> "The meeting focused on the critical importance of rigorous testing for ensuring reliability and quality, with test and test2 emphasizing the need for a structured testing framework and meticulous documentation..."
|
||
|
||
*Transcript #3* (e5271e12):
|
||
```webvtt
|
||
WEBVTT
|
||
|
||
00:00:02.029 --> 00:00:04.910
|
||
<v Speaker0>Test, test, test, test, test, test, test, test, test, test, test.
|
||
```
|
||
|
||
#### Validation: track_keys → participants
|
||
|
||
**Recording #2 (b0fa94da) tracks:**
|
||
```json
|
||
[
|
||
".../890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-...",
|
||
".../9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-..."
|
||
]
|
||
```
|
||
|
||
**Transcript #2 (49801332) participants:**
|
||
```json
|
||
[
|
||
{"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f", "speaker": 0, "name": "test2"},
|
||
{"id": "9660e8e9-4297-4f17-951d-0b2bf2401803", "speaker": 1, "name": "test"}
|
||
]
|
||
```
|
||
|
||
### Data Flow
|
||
|
||
```
|
||
Daily.co API: 3 recordings
|
||
↓
|
||
Polling: _poll_raw_tracks_recordings()
|
||
↓
|
||
Worker: process_multitrack_recording.delay() × 3
|
||
↓
|
||
DB: 3 recording rows created
|
||
↓
|
||
Pipeline: Audio processing + transcription × 3
|
||
↓
|
||
DB: 3 transcript rows created (1:1 with recordings)
|
||
↓
|
||
UI: User sees 3 separate transcripts
|
||
```
|
||
|
||
**Result:** ✅ 1:1 Recording → Transcript relationship maintained.
|
||
|
||
|
||
---
|
||
**Document Version:** 1.0
|
||
**Last Verified:** 2026-01-15
|
||
**Data Source:** Production database + Daily.co API inspection
|