mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-02-04 09:56:47 +00:00
feat: brady bunch (#816)
* brady bunch PRD/tasks * clean dead daily.co code * brady bunch prototype (no-mistakes) * brady bunch prototype (no-mistakes) review * self-review * daily poll time match (no-mistakes) * daily poll self-review (no-mistakes) * daily poll self-review (no-mistakes) * daily co doc * cleanup * cleanup * self-review (no-mistakes) * self-review (no-mistakes) * self-review * self-review * ui typefix * dupe calls error handling proper * daily reflector data model doc * logging style fix * migration merge --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
This commit is contained in:
496
server/docs/DAILY_REFLECTOR_DATA_MODEL.md
Normal file
496
server/docs/DAILY_REFLECTOR_DATA_MODEL.md
Normal file
@@ -0,0 +1,496 @@
|
||||
# Daily.co and Reflector Data Model
|
||||
|
||||
This document explains the data model relationships between Daily.co's API concepts and Reflector's database schema, clarifying common sources of confusion.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Core Entities Overview](#core-entities-overview)
|
||||
2. [Daily.co vs Reflector Terminology](#dailyco-vs-reflector-terminology)
|
||||
3. [Entity Relationships](#entity-relationships)
|
||||
4. [Recording Multiplicity](#recording-multiplicity)
|
||||
5. [Session Identifiers Explained](#session-identifiers-explained)
|
||||
6. [Time-Based Matching](#time-based-matching)
|
||||
7. [Multitrack Recording Details](#multitrack-recording-details)
|
||||
8. [Verified Example](#verified-example)
|
||||
|
||||
---
|
||||
|
||||
## Core Entities Overview
|
||||
|
||||
### Reflector's Four Primary Entities
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Room (Reflector) │
|
||||
│ - Persistent meeting template │
|
||||
│ - User-created configuration │
|
||||
│ - Example: "team-standup" │
|
||||
└────────────────────┬────────────────────────────────────────────┘
|
||||
│ 1:N
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Meeting (Reflector) │
|
||||
│ - Single session instance │
|
||||
│ - Creates NEW Daily.co room with timestamp │
|
||||
│ - Example: "team-standup-20260115120000" │
|
||||
└────────────────────┬────────────────────────────────────────────┘
|
||||
│ 1:N
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Recording (Reflector + Daily.co) │
|
||||
│ - One segment of audio/video │
|
||||
│ - New recording created on stop/restart │
|
||||
│ - track_keys: JSON array of S3 file paths │
|
||||
└────────────────────┬────────────────────────────────────────────┘
|
||||
│ 1:1
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Transcript (Reflector) │
|
||||
│ - Processed audio with transcription │
|
||||
│ - Diarization, summaries, topics │
|
||||
│ - One transcript per recording │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daily.co vs Reflector Terminology
|
||||
|
||||
### Room
|
||||
|
||||
| Aspect | Daily.co | Reflector |
|
||||
|--------|----------|-----------|
|
||||
| **Definition** | Virtual meeting space on Daily.co platform | User-created meeting template/configuration |
|
||||
| **Lifetime** | Configurable expiration | Persistent until user deletes |
|
||||
| **Creation** | API call for each meeting | Pre-created by user once |
|
||||
| **Reuse** | Can host multiple sessions | Generates new Daily.co room per meeting |
|
||||
| **Name Format** | `room-name` (reusable) | `room-name` (base identifier) |
|
||||
| **Timestamping** | Not required | Meeting adds timestamp: `{name}-YYYYMMDDHHMMSS` |
|
||||
|
||||
**Example:**
|
||||
```
|
||||
Reflector Room: "daily-private-igor" (persistent config)
|
||||
↓ starts meeting
|
||||
Daily.co Room: "daily-private-igor-20260110042117"
|
||||
```
|
||||
|
||||
### Meeting
|
||||
|
||||
| Aspect | Daily.co | Reflector |
|
||||
|--------|----------|-----------|
|
||||
| **Definition** | Session that starts when first participant joins | Explicit database record of a session |
|
||||
| **Identifier** | `mtgSessionId` (generated by Daily.co) | `meeting.id` (UUID, generated by Reflector) |
|
||||
| **Creation** | Implicit (first participant join) | Explicit API call before participants join |
|
||||
| **Purpose** | Tracks active session state | Links recordings, transcripts, participants |
|
||||
| **Scope** | Per room instance | Per Reflector room + timestamp |
|
||||
|
||||
**Critical Limitation:** Daily.co's recordings API often does NOT return `mtgSessionId`, requiring time-based matching (see [Time-Based Matching](#time-based-matching)).
|
||||
|
||||
### Recording
|
||||
|
||||
| Aspect | Daily.co | Reflector |
|
||||
|--------|----------|-----------|
|
||||
| **Definition** | Audio/video files on S3 | Metadata + processing status |
|
||||
| **Types** | `cloud` (composed video), `raw-tracks` (multitrack) | Stores references + `track_keys` array |
|
||||
| **Multiplicity** | One recording object per start/stop cycle | One DB row per Daily.co recording object |
|
||||
| **Identifier** | Daily.co `recording_id` | Same `recording_id` (stored in DB) |
|
||||
| **Multitrack** | Array of `.webm` files (one per participant) | `track_keys` JSON array with S3 paths |
|
||||
| **Linkage** | Via `room_name` + `start_ts` | FK `meeting_id` (set via time-based match) |
|
||||
|
||||
**Critical Behavior:** Recording **stops/restarts** create **separate recording objects** with unique IDs.
|
||||
|
||||
---
|
||||
|
||||
## Entity Relationships
|
||||
|
||||
### Database Schema Relationships
|
||||
|
||||
```sql
|
||||
-- Simplified schema showing key relationships
|
||||
|
||||
TABLE room (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
name VARCHAR UNIQUE,
|
||||
platform VARCHAR -- 'whereby' | 'daily'
|
||||
)
|
||||
|
||||
TABLE meeting (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
room_id VARCHAR REFERENCES room(id) ON DELETE CASCADE, -- nullable
|
||||
room_name VARCHAR, -- Daily.co room name (timestamped)
|
||||
start_date TIMESTAMP,
|
||||
platform VARCHAR
|
||||
)
|
||||
|
||||
TABLE recording (
|
||||
id VARCHAR PRIMARY KEY, -- Daily.co recording_id
|
||||
meeting_id VARCHAR, -- FK to meeting (set via time-based match)
|
||||
bucket_name VARCHAR,
|
||||
object_key VARCHAR, -- S3 prefix
|
||||
track_keys JSON, -- Array of S3 keys for multitrack
|
||||
recorded_at TIMESTAMP
|
||||
)
|
||||
|
||||
TABLE transcript (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
recording_id VARCHAR, -- nullable FK
|
||||
meeting_id VARCHAR, -- nullable FK
|
||||
room_id VARCHAR, -- nullable FK
|
||||
participants JSON, -- [{id, speaker, name, user_id}, ...]
|
||||
title VARCHAR,
|
||||
long_summary VARCHAR,
|
||||
webvtt TEXT
|
||||
)
|
||||
```
|
||||
|
||||
**Relationship Cardinalities:**
|
||||
```
|
||||
1 Room → N Meetings
|
||||
1 Meeting → N Recordings (common: 1-21 recordings per meeting)
|
||||
1 Recording → 1 Transcript
|
||||
1 Meeting → N Transcripts (via recordings)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recording Multiplicity
|
||||
|
||||
### Why Multiple Recordings Per Meeting?
|
||||
|
||||
Daily.co creates a **new recording object** (new ID, new files) whenever recording stops and restarts. This happens due to:
|
||||
|
||||
1. **Manual stop/start** - User clicks stop, then start recording again
|
||||
2. **Network reconnection** - Participant drops, reconnects → triggers restart
|
||||
3. **Participant rejoin** - Last participant leaves, new one joins → new session
|
||||
|
||||
---
|
||||
|
||||
## Session Identifiers Explained
|
||||
|
||||
### The Hidden Entity: Daily.co Meeting Session
|
||||
|
||||
Daily.co has an **implicit ephemeral entity** that sits between Room and Recording:
|
||||
|
||||
```
|
||||
Daily.co Room: "daily-private-igor-20260110042117"
|
||||
│
|
||||
├─ Daily.co Meeting Session #1 (mtgSessionId: c04334de...)
|
||||
│ └─ Recording #3 (f4a50f94) - 4s, 1 track
|
||||
│
|
||||
└─ Daily.co Meeting Session #2 (mtgSessionId: 4cdae3c0...)
|
||||
├─ Recording #2 (b0fa94da) - 80s, 2 tracks ← recording stopped
|
||||
└─ Recording #1 (05edf519) - 62s, 1 track ← then restarted
|
||||
```
|
||||
|
||||
**Daily.co Meeting Session:**
|
||||
- **Lifecycle:** Starts when first participant joins, ends when last participant leaves
|
||||
- **Identifier:** `mtgSessionId` (generated by Daily.co)
|
||||
- **Persistence:** Ephemeral - new ID if everyone leaves and someone rejoins
|
||||
- **Relationship:** 1 Session → N Recordings (if recording stops/restarts during session)
|
||||
|
||||
**Key Insight:** Multiple recordings can share the same `mtgSessionId` if recording was stopped and restarted while participants remained connected.
|
||||
|
||||
### mtgSessionId (Meeting Session Identifier)
|
||||
|
||||
`mtgSessionId` identifies a **Daily.co meeting session** (not individual participants, not a room).
|
||||
|
||||
### session_id (Per-Participant)
|
||||
|
||||
**Different concept:** Per-participant connection identifier from webhooks.
|
||||
|
||||
**Reflector Tracking:** `daily_participant_session` table
|
||||
```sql
|
||||
TABLE daily_participant_session (
|
||||
id VARCHAR PRIMARY KEY, -- {meeting_id}:{user_id}:{joined_at_ms}
|
||||
meeting_id VARCHAR,
|
||||
session_id VARCHAR, -- From webhook (per-participant)
|
||||
user_id VARCHAR,
|
||||
user_name VARCHAR,
|
||||
joined_at TIMESTAMP,
|
||||
left_at TIMESTAMP
|
||||
)
|
||||
```
|
||||
---
|
||||
|
||||
## Time-Based Matching
|
||||
|
||||
### Problem Statement
|
||||
|
||||
Daily.co's recordings API does not reliably return `mtgSessionId`, making it impossible to directly link recordings to meetings via Daily.co's identifiers.
|
||||
|
||||
**Example API response:**
|
||||
```json
|
||||
{
|
||||
"id": "recording-uuid",
|
||||
"room_name": "daily-private-igor-20260110042117",
|
||||
"start_ts": 1768018896,
|
||||
"mtgSessionId": null ← Missing!
|
||||
}
|
||||
```
|
||||
|
||||
### Solution: Time-Based Matching
|
||||
|
||||
**Implementation:** `reflector/db/meetings.py:get_by_room_name_and_time()`
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Multitrack Recording Details
|
||||
|
||||
### track_keys JSON Array
|
||||
|
||||
**Schema:** `recording.track_keys` (JSON, nullable)
|
||||
```sql
|
||||
-- Example recording with 2 audio tracks
|
||||
{
|
||||
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
|
||||
"track_keys": [
|
||||
"igormonadical/daily-private-igor-20260110042117/1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565",
|
||||
"igormonadical/daily-private-igor-20260110042117/1768018896877-9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-1768018899286"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Semantics:**
|
||||
- `track_keys = null` → Not multitrack (cloud recording)
|
||||
- `track_keys = []` → Multitrack recording with no audio captured (silence/muted)
|
||||
- `track_keys = [...]` → Multitrack with N audio tracks
|
||||
|
||||
**Property:** `recording.is_multitrack` (Python)
|
||||
```python
|
||||
@property
|
||||
def is_multitrack(self) -> bool:
|
||||
return self.track_keys is not None and len(self.track_keys) > 0
|
||||
```
|
||||
|
||||
### Track Filename Format
|
||||
|
||||
Daily.co multitrack filenames encode timing and participant information:
|
||||
|
||||
**Format:** `{recording_start_ts}-{participant_id}-cam-audio-{track_start_ts}`
|
||||
|
||||
**Example:** `1768018896877-890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-1768018914565`
|
||||
|
||||
**Parsed Components:**
|
||||
```python
|
||||
# reflector/utils/daily.py:25-60
|
||||
class DailyRecordingFilename(NamedTuple):
|
||||
recording_start_ts: int # 1768018896877 (milliseconds)
|
||||
participant_id: str # 890c0eae-e186-4534-a7bd-7c794b7d6d7f
|
||||
track_start_ts: int # 1768018914565 (milliseconds)
|
||||
```
|
||||
|
||||
**Note:** Browser downloads from S3 add `.webm` extension due to MIME headers, but S3 object keys have no extension.
|
||||
|
||||
### Video Track Filtering
|
||||
|
||||
Daily.co API returns both audio and video tracks, but Reflector only processes audio.
|
||||
|
||||
**Filtering Logic:** `reflector/worker/process.py:660`
|
||||
```python
|
||||
track_keys = [t.s3Key for t in recording.tracks if t.type == "audio"]
|
||||
```
|
||||
|
||||
**Example API Response:**
|
||||
```json
|
||||
{
|
||||
"tracks": [
|
||||
{"type": "audio", "s3Key": "...cam-audio-1768018914565"},
|
||||
{"type": "audio", "s3Key": "...cam-audio-1768018899286"},
|
||||
{"type": "video", "s3Key": "...cam-video-1768018897095"} ← Filtered out
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Result:** Only 2 audio tracks stored in `recording.track_keys`, video track discarded.
|
||||
|
||||
**Rationale:** Reflector is audio transcription system; video not needed for processing.
|
||||
|
||||
### Track-to-Participant Mapping
|
||||
|
||||
**Flow:**
|
||||
1. Daily.co webhook/polling provides `track_keys` array
|
||||
2. Each track filename contains `participant_id`
|
||||
3. Reflector queries Daily.co API: `GET /meetings/{mtgSessionId}/participants`
|
||||
4. Maps `participant_id` → `user_name`
|
||||
5. Stores in `transcript.participants` JSON:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f",
|
||||
"speaker": 0,
|
||||
"name": "test2",
|
||||
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
|
||||
},
|
||||
{
|
||||
"id": "9660e8e9-4297-4f17-951d-0b2bf2401803",
|
||||
"speaker": 1,
|
||||
"name": "test",
|
||||
"user_id": "907f2cc1-eaab-435f-8ee2-09185f416b22"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Diarization:** Multitrack recordings don't need speaker diarization AI — speaker identity comes from separate audio tracks.
|
||||
|
||||
---
|
||||
|
||||
## Example
|
||||
|
||||
### Meeting: daily-private-igor-20260110042117
|
||||
|
||||
**Context:** User conducted test recording with start/stop cycles, producing 3 recordings.
|
||||
|
||||
#### Database State
|
||||
|
||||
```sql
|
||||
-- Meeting
|
||||
id: 034804b8-cee2-4fb4-94d7-122f6f068a61
|
||||
room_name: daily-private-igor-20260110042117
|
||||
start_date: 2026-01-10 04:21:17+00
|
||||
```
|
||||
|
||||
#### Daily.co API Response
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "f4a50f94-053c-4f9d-bda6-78ad051fbc36",
|
||||
"room_name": "daily-private-igor-20260110042117",
|
||||
"start_ts": 1768018885,
|
||||
"duration": 4,
|
||||
"status": "finished",
|
||||
"mtgSessionId": "c04334de-42a0-4c2a-96be-a49b068dca85",
|
||||
"tracks": [
|
||||
{"type": "audio", "s3Key": "...62e8f3ae...cam-audio-1768018885417"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "b0fa94da-73b5-4f95-9239-5216a682a505",
|
||||
"room_name": "daily-private-igor-20260110042117",
|
||||
"start_ts": 1768018896,
|
||||
"duration": 80,
|
||||
"status": "finished",
|
||||
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
|
||||
"tracks": [
|
||||
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914565"},
|
||||
{"type": "audio", "s3Key": "...9660e8e9...cam-audio-1768018899286"},
|
||||
{"type": "video", "s3Key": "...9660e8e9...cam-video-1768018897095"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "05edf519-9048-4b49-9a75-73e9826fd950",
|
||||
"room_name": "daily-private-igor-20260110042117",
|
||||
"start_ts": 1768018914,
|
||||
"duration": 62,
|
||||
"status": "finished",
|
||||
"mtgSessionId": "4cdae3c0-86cb-4578-8a6d-3a228bb48345",
|
||||
"tracks": [
|
||||
{"type": "audio", "s3Key": "...890c0eae...cam-audio-1768018914948"}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Key Observations:**
|
||||
- 3 recording objects returned by Daily.co
|
||||
- 2 different `mtgSessionId` values (2 different meeting instances)
|
||||
- Recording #2 has 3 tracks (2 audio + 1 video)
|
||||
- Timestamps: 1768018885 → 1768018896 (+11s) → 1768018914 (+18s)
|
||||
|
||||
#### Reflector Database
|
||||
|
||||
**Recordings:**
|
||||
```
|
||||
┌──────────────────────────────────────┬──────────────┬────────────┬──────────────────────────────────────┐
|
||||
│ id │ track_count │ duration │ mtgSessionId │
|
||||
├──────────────────────────────────────┼──────────────┼────────────┼──────────────────────────────────────┤
|
||||
│ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 │ 4s │ c04334de-42a0-4c2a-96be-a49b068dca85 │
|
||||
│ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (video=0) │ 80s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
|
||||
│ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 │ 62s │ 4cdae3c0-86cb-4578-8a6d-3a228bb48345 │
|
||||
└──────────────────────────────────────┴──────────────┴────────────┴──────────────────────────────────────┘
|
||||
```
|
||||
**Note:** Recording #2 has 2 audio tracks (video filtered out), not 3.
|
||||
|
||||
**Transcripts:**
|
||||
```
|
||||
┌──────────────────────────────────────┬──────────────────────────────────────┬──────────────┬──────────────────────────────────────────────┐
|
||||
│ id │ recording_id │ participants │ title │
|
||||
├──────────────────────────────────────┼──────────────────────────────────────┼──────────────┼──────────────────────────────────────────────┤
|
||||
│ 17149b1f-546c-4837-80a0-f8140bd16592 │ f4a50f94-053c-4f9d-bda6-78ad051fbc36 │ 1 (test) │ (empty - no speech) │
|
||||
│ 49801332-3222-4c11-bdb2-375479fc87f2 │ b0fa94da-73b5-4f95-9239-5216a682a505 │ 2 (test, │ "Examination and Validation Procedures │
|
||||
│ │ │ test2) │ Review" │
|
||||
│ e5271e12-20fb-42d2-b5a8-21438abadef9 │ 05edf519-9048-4b49-9a75-73e9826fd950 │ 1 (test2) │ "Technical Sound Check Procedure Review" │
|
||||
└──────────────────────────────────────┴──────────────────────────────────────┴──────────────┴──────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Transcript Content:**
|
||||
|
||||
*Transcript #1* (17149b1f): Empty WebVTT (no audio captured)
|
||||
|
||||
*Transcript #2* (49801332):
|
||||
```webvtt
|
||||
WEBVTT
|
||||
|
||||
00:00:03.109 --> 00:00:05.589
|
||||
<v Speaker1>Test, test, test. Test, test, test, test, test.
|
||||
|
||||
00:00:19.829 --> 00:00:22.710
|
||||
<v Speaker0>Test test test test test test test test test test test.
|
||||
```
|
||||
**AI-Generated Summary:**
|
||||
> "The meeting focused on the critical importance of rigorous testing for ensuring reliability and quality, with test and test2 emphasizing the need for a structured testing framework and meticulous documentation..."
|
||||
|
||||
*Transcript #3* (e5271e12):
|
||||
```webvtt
|
||||
WEBVTT
|
||||
|
||||
00:00:02.029 --> 00:00:04.910
|
||||
<v Speaker0>Test, test, test, test, test, test, test, test, test, test, test.
|
||||
```
|
||||
|
||||
#### Validation: track_keys → participants
|
||||
|
||||
**Recording #2 (b0fa94da) tracks:**
|
||||
```json
|
||||
[
|
||||
".../890c0eae-e186-4534-a7bd-7c794b7d6d7f-cam-audio-...",
|
||||
".../9660e8e9-4297-4f17-951d-0b2bf2401803-cam-audio-..."
|
||||
]
|
||||
```
|
||||
|
||||
**Transcript #2 (49801332) participants:**
|
||||
```json
|
||||
[
|
||||
{"id": "890c0eae-e186-4534-a7bd-7c794b7d6d7f", "speaker": 0, "name": "test2"},
|
||||
{"id": "9660e8e9-4297-4f17-951d-0b2bf2401803", "speaker": 1, "name": "test"}
|
||||
]
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Daily.co API: 3 recordings
|
||||
↓
|
||||
Polling: _poll_raw_tracks_recordings()
|
||||
↓
|
||||
Worker: process_multitrack_recording.delay() × 3
|
||||
↓
|
||||
DB: 3 recording rows created
|
||||
↓
|
||||
Pipeline: Audio processing + transcription × 3
|
||||
↓
|
||||
DB: 3 transcript rows created (1:1 with recordings)
|
||||
↓
|
||||
UI: User sees 3 separate transcripts
|
||||
```
|
||||
|
||||
**Result:** ✅ 1:1 Recording → Transcript relationship maintained.
|
||||
|
||||
|
||||
---
|
||||
**Document Version:** 1.0
|
||||
**Last Verified:** 2026-01-15
|
||||
**Data Source:** Production database + Daily.co API inspection
|
||||
Reference in New Issue
Block a user