selfhostyourtech/reflector

Fork 0

mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-20 20:29:06 +00:00

Files

Igor Loskutov 7d239fe380 dailico track merge vibe

2025-10-21 10:30:19 -04:00

18 KiB

Raw Blame History

Daily.co Integration Test Plan

✅ IMPLEMENTATION STATUS: Real Transcription Active

This test validates Daily.co multitrack recording integration with REAL transcription/diarization.

The implementation includes complete audio processing pipeline:

Multitrack recordings from Daily.co S3 (separate audio stream per participant)
PyAV-based audio mixdown with PTS-based track alignment
Real transcription via Modal GPU backend (Whisper)
Real diarization via Modal GPU backend (speaker identification)
Per-track transcription with timestamp synchronization
Complete database entities (recording, transcript, topics, participants, words)

Processing pipeline (PipelineMainMultitrack):

Download all audio tracks from Daily.co S3
Align tracks by PTS (presentation timestamp) to handle late joiners
Mix tracks into single audio file for unified playback
Transcribe each track individually with proper offset handling
Perform diarization on mixed audio
Generate topics, summaries, and word-level timestamps
Convert audio to MP3 and generate waveform visualization

Note: A stub processor (process_daily_recording) exists for testing webhook flow without GPU costs, but the production code path uses process_multitrack_recording with full ML pipeline.

Prerequisites

1. Environment Variables (check in .env.development.local):

# Daily.co API Configuration
DAILY_API_KEY=<key>
DAILY_SUBDOMAIN=monadical
DAILY_WEBHOOK_SECRET=<base64-encoded-secret>
AWS_DAILY_S3_BUCKET=reflector-dailyco-local
AWS_DAILY_S3_REGION=us-east-1
AWS_DAILY_ROLE_ARN=arn:aws:iam::950402358378:role/DailyCo
DAILY_MIGRATION_ENABLED=true
DAILY_MIGRATION_ROOM_IDS=["552640fd-16f2-4162-9526-8cf40cd2357e"]

# Transcription/Diarization Backend (Required for real processing)
DIARIZATION_BACKEND=modal
DIARIZATION_MODAL_API_KEY=<modal-api-key>
# TRANSCRIPTION_BACKEND is not explicitly set (uses default/modal)

2. Services Running:

docker compose ps  # server, postgres, redis, worker, beat should be UP

IMPORTANT: Worker and beat services MUST be running for transcription processing:

docker compose up -d worker beat

3. ngrok Tunnel for Webhooks:

# Start ngrok (if not already running)
ngrok http 1250 --log=stdout > /tmp/ngrok.log 2>&1 &

# Get public URL
curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; data=json.load(sys.stdin); print(data['tunnels'][0]['public_url'])"

Current ngrok URL: https://0503947384a3.ngrok-free.app (as of last registration)

4. Webhook Created:

cd server
uv run python scripts/recreate_daily_webhook.py https://0503947384a3.ngrok-free.app/v1/daily/webhook
# Verify: "Created webhook <uuid> (state: ACTIVE)"

Current webhook status: ✅ ACTIVE (webhook ID: dad5ad16-ceca-488e-8fc5-dae8650b51d0)

Test 1: Database Configuration

Check room platform:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, name, platform, recording_type FROM room WHERE name = 'test2';"

Expected:

id: 552640fd-16f2-4162-9526-8cf40cd2357e
name: test2
platform: whereby  # DB value (overridden by env var DAILY_MIGRATION_ROOM_IDS)
recording_type: cloud

Clear old meetings:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "UPDATE meeting SET is_active = false WHERE room_id = '552640fd-16f2-4162-9526-8cf40cd2357e';"

Test 2: Meeting Creation with Auto-Recording

Create meeting:

curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
  -H "Content-Type: application/json" \
  -d '{"allow_duplicated":false}' | python3 -m json.tool

Expected Response:

{
  "room_name": "test2-YYYYMMDDHHMMSS",  // Includes "test2" prefix!
  "room_url": "https://monadical.daily.co/test2-...?t=<JWT_TOKEN>",  // Has token!
  "platform": "daily",
  "recording_type": "cloud"  // DB value (Whereby-specific)
}

Decode token to verify auto-recording:

# Extract token from room_url, decode JWT payload
echo "<token>" | python3 -c "
import sys, json, base64
token = sys.stdin.read().strip()
payload = token.split('.')[1] + '=' * (4 - len(token.split('.')[1]) % 4)
print(json.dumps(json.loads(base64.b64decode(payload)), indent=2))
"

Expected token payload:

{
  "r": "test2-YYYYMMDDHHMMSS",  // Room name
  "sr": true,  // start_recording: true ✅
  "d": "...",  // Domain ID
  "iat": 1234567890
}

Test 3: Daily.co API Verification

Check room configuration:

ROOM_NAME="<from previous step>"
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool

Expected config:

{
  "config": {
    "enable_recording": "raw-tracks",  // ✅
    "recordings_bucket": {
      "bucket_name": "reflector-dailyco-local",
      "bucket_region": "us-east-1",
      "assume_role_arn": "arn:aws:iam::950402358378:role/DailyCo"
    }
  }
}

Test 4: Browser UI Test (Playwright MCP)

Using Claude Code MCP tools:

Load room:

Use: mcp__playwright__browser_navigate
Input: {"url": "http://localhost:3000/test2"}

Then wait 12 seconds for iframe to load

Verify Daily.co iframe loaded:

Use: mcp__playwright__browser_snapshot

Expected in snapshot:
- iframe element with src containing "monadical.daily.co"
- Daily.co pre-call UI visible

Take screenshot:

Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-before-join.png"}

Expected: Daily.co pre-call UI with "Join" button visible

Join meeting:

Note: Daily.co iframe interaction requires clicking inside iframe.
Use: mcp__playwright__browser_click
Input: {"element": "Join button in Daily.co iframe", "ref": "<ref-from-snapshot>"}

Then wait 5 seconds for call to connect

Verify in-call:

Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-in-call.png"}

Expected: "Waiting for others to join" or participant video visible

Leave meeting:

Use: mcp__playwright__browser_click
Input: {"element": "Leave button in Daily.co iframe", "ref": "<ref-from-snapshot>"}

Alternative: JavaScript snippets (for manual testing):

await page.goto('http://localhost:3000/test2');
await new Promise(f => setTimeout(f, 12000));  // Wait for load

// Verify iframe
const iframes = document.querySelectorAll('iframe');
// Expected: 1 iframe with src containing "monadical.daily.co"

// Screenshot
await page.screenshot({ path: 'test2-before-join.png' });

// Join
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Join' }).click();
await new Promise(f => setTimeout(f, 5000));

// In-call screenshot
await page.screenshot({ path: 'test2-in-call.png' });

// Leave
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Leave' }).click();

Test 5: Webhook Verification

Check server logs for webhooks:

docker-compose logs --since 15m server 2>&1 | grep -i "participant joined\|recording started"

Expected logs:

[info] Participant joined | meeting_id=... | num_clients=1 | recording_type=cloud | recording_trigger=automatic-2nd-participant
[info] Recording started | meeting_id=... | recording_id=... | platform=daily

Check Daily.co webhook delivery logs:

curl -s -X GET "https://api.daily.co/v1/logs/webhooks?limit=20" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
logs = json.load(sys.stdin)
for log in logs[:10]:
    req = json.loads(log['request'])
    room = req.get('payload', {}).get('room') or req.get('payload', {}).get('room_name', 'N/A')
    print(f\"{req['type']:30s} | room: {room:30s} | status: {log['status']}\")
"

Expected output:

participant.joined             | room: test2-YYYYMMDDHHMMSS       | status: 200
recording.started              | room: test2-YYYYMMDDHHMMSS       | status: 200
participant.left               | room: test2-YYYYMMDDHHMMSS       | status: 200
recording.ready-to-download    | room: test2-YYYYMMDDHHMMSS       | status: 200

Check database updated:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT room_name, num_clients FROM meeting WHERE room_name LIKE 'test2-%' ORDER BY end_date DESC LIMIT 1;"

Expected:

room_name: test2-YYYYMMDDHHMMSS
num_clients: 0  // After participant left

Test 6: Recording in S3

List recent recordings:

curl -s -X GET "https://api.daily.co/v1/recordings" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for rec in data.get('data', [])[:5]:
    if 'test2-' in rec.get('room_name', ''):
        print(f\"Room: {rec['room_name']}\")
        print(f\"Status: {rec['status']}\")
        print(f\"Duration: {rec.get('duration', 0)}s\")
        print(f\"S3 key: {rec.get('s3key', 'N/A')}\")
        print(f\"Tracks: {len(rec.get('tracks', []))} files\")
        for track in rec.get('tracks', []):
            print(f\"  - {track['type']}: {track['s3Key'].split('/')[-1]} ({track['size']} bytes)\")
        print()
"

Expected output:

Room: test2-20251009192341
Status: finished
Duration: ~30-120s
S3 key: monadical/test2-20251009192341/1760037914930
Tracks: 2 files
  - audio: 1760037914930-<uuid>-cam-audio-1760037915265 (~400 KB)
  - video: 1760037914930-<uuid>-cam-video-1760037915269 (~10-30 MB)

Verify S3 path structure:

monadical/ - Daily.co subdomain
test2-20251009192341/ - Reflector room name + timestamp
<timestamp>-<participant-uuid>-<media-type>-<track-start>.webm - Individual track files

Test 7: Database Check - Recording and Transcript

Check recording created:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, bucket_name, object_key, status, meeting_id, recorded_at
   FROM recording
   ORDER BY recorded_at DESC LIMIT 1;"

Expected:

id: <recording-id-from-webhook>
bucket_name: reflector-dailyco-local
object_key: monadical/test2-<timestamp>/<recording-timestamp>-<uuid>-cam-audio-<track-start>.webm
status: completed
meeting_id: <meeting-id>
recorded_at: <recent-timestamp>

Check transcript created:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, title, status, duration, recording_id, meeting_id, room_id
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected (REAL transcription):

id: <transcript-id>
title: <AI-generated title based on actual conversation content>
status: uploaded  (audio file processed and available)
duration: <actual meeting duration in seconds>
recording_id: <same-as-recording-id-above>
meeting_id: <meeting-id>
room_id: 552640fd-16f2-4162-9526-8cf40cd2357e

Note: Title and content will reflect the ACTUAL conversation, not mock data. Processing time depends on recording length and GPU backend availability (Modal).

Verify audio file exists:

ls -lh data/<transcript-id>/upload.webm

Expected:

-rw-r--r-- 1 user staff ~100-200K Oct 10 18:48 upload.webm

Check transcript topics (REAL transcription):

TRANSCRIPT_ID=$(docker compose exec -T postgres psql -U reflector -d reflector -t -c \
  "SELECT id FROM transcript ORDER BY created_at DESC LIMIT 1;")

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     jsonb_array_length(topics) as num_topics,
     jsonb_array_length(participants) as num_participants,
     short_summary,
     title
   FROM transcript
   WHERE id = '$TRANSCRIPT_ID';"

Expected (REAL data):

num_topics: <varies based on conversation>
num_participants: <actual number of participants who spoke>
short_summary: <AI-generated summary of actual conversation>
title: <AI-generated title based on content>

Check topics contain actual transcription:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT topics->0->'title', topics->0->'summary', topics->0->'transcript'
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;" | head -20

Expected output: Will contain the ACTUAL transcribed conversation from the Daily.co meeting, not mock data.

Check participants:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT participants FROM transcript ORDER BY created_at DESC LIMIT 1;" \
  | python3 -c "import sys, json; data=json.loads(sys.stdin.read()); print(json.dumps(data, indent=2))"

Expected (REAL diarization):

[
  {
    "id": "<uuid>",
    "speaker": 0,
    "name": "Speaker 1"
  },
  {
    "id": "<uuid>",
    "speaker": 1,
    "name": "Speaker 2"
  }
]

Note: Speaker names will be generic ("Speaker 1", "Speaker 2", etc.) as determined by the diarization backend. Number of participants depends on how many actually spoke during the meeting.

Check word-level data:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT jsonb_array_length(topics->0->'words') as num_words_first_topic
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected:

num_words_first_topic: <varies based on actual conversation length and topic chunking>

Verify speaker diarization in words:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     topics->0->'words'->0->>'text' as first_word,
     topics->0->'words'->0->>'speaker' as speaker,
     topics->0->'words'->0->>'start' as start_time,
     topics->0->'words'->0->>'end' as end_time
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected (REAL transcription):

first_word: <actual first word from transcription>
speaker: 0, 1, 2, ... (actual speaker ID from diarization)
start_time: <actual timestamp in seconds>
end_time: <actual end timestamp>

Note: All timestamps and speaker IDs are from real transcription/diarization, synchronized across tracks.

Test 8: Recording Type Verification

Check what Daily.co received:

curl -s -X GET "https://api.daily.co/v1/rooms/test2-<timestamp>" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool | grep "enable_recording"

Expected:

"enable_recording": "raw-tracks"

NOT: "enable_recording": "cloud" (that would be wrong - we want raw tracks)

Troubleshooting

Issue: No webhooks received

Check webhook state:

curl -s -X GET "https://api.daily.co/v1/webhooks" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool

If state is FAILED:

cd server
uv run python scripts/recreate_daily_webhook.py https://<ngrok-url>/v1/daily/webhook

Issue: Webhooks return 422

Check server logs:

docker-compose logs --tail=50 server | grep "Failed to parse webhook event"

Common cause: Event structure mismatch. Daily.co events use:

{
  "version": "1.0.0",
  "type": "participant.joined",
  "payload": {...},  // NOT "data"
  "event_ts": 123.456  // NOT "ts"
}

Issue: Recording not starting

Check token has sr: true:
- Decode JWT token from room_url query param
- Should contain "sr": true
Check Daily.co room config:
- enable_recording must be set (not false)
- For raw-tracks: must be exactly "raw-tracks"
Check participant actually joined:
- Logs should show "Participant joined"
- Must click "Join" button, not just pre-call screen

Issue: Recording in S3 but wrong format

Daily.co recording types:

"cloud" → Single MP4 file (download_link in webhook)
"raw-tracks" → Multiple WebM files (tracks array in webhook)
"raw-tracks-audio-only" → Only audio WebM files

Current implementation: Always uses "raw-tracks" (better for transcription)

Quick Validation Commands

One-liner to verify everything:

# 1. Check room exists
docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT name, platform FROM room WHERE name = 'test2';" && \

# 2. Create meeting
MEETING=$(curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
  -H "Content-Type: application/json" -d '{"allow_duplicated":false}') && \
echo "$MEETING" | python3 -c "import sys,json; m=json.load(sys.stdin); print(f'Room: {m[\"room_name\"]}\nURL: {m[\"room_url\"][:80]}...')" && \

# 3. Check Daily.co config
ROOM_NAME=$(echo "$MEETING" | python3 -c "import sys,json; print(json.load(sys.stdin)['room_name'])") && \
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "import sys,json; print(f'Recording: {json.load(sys.stdin)[\"config\"][\"enable_recording\"]}')"

Expected output:

name: test2, platform: whereby
Room: test2-20251009192341
URL: https://monadical.daily.co/test2-20251009192341?t=eyJhbGc...
Recording: raw-tracks

Success Criteria Checklist

Room name includes Reflector room prefix (test2-...)
Meeting URL contains JWT token (?t=...)
Token has sr: true (auto-recording enabled)
Daily.co room config: enable_recording: "raw-tracks"
Browser loads Daily.co interface (not Whereby)
Recording auto-starts when participant joins
Webhooks received: participant.joined, recording.started, participant.left, recording.ready-to-download
Recording status: finished
S3 contains 2 files: audio (.webm) and video (.webm)
S3 path: monadical/test2-{timestamp}/{recording-start-ts}-{participant-uuid}-cam-{audio|video}-{track-start-ts}
Database num_clients increments/decrements correctly
Database recording entry created with correct S3 path and status completed
Database transcript entry created with status uploaded
Audio file downloaded to data/{transcript_id}/upload.webm
Transcript has REAL data: AI-generated title based on conversation
Transcript has topics generated from actual content
Transcript has participants with proper speaker diarization
Topics contain word-level data with accurate timestamps and speaker IDs
Total duration matches actual meeting length
MP3 and waveform files generated by file processing pipeline
Frontend transcript page loads without "Failed to load audio" error
Audio player functional with working playback and waveform visualization
Multitrack processing completed without errors in worker logs
Modal GPU backends accessible (transcription and diarization)

18 KiB Raw Blame History

Daily.co Integration Test Plan

✅ IMPLEMENTATION STATUS: Real Transcription Active

Prerequisites

Test 1: Database Configuration

Test 2: Meeting Creation with Auto-Recording

Test 3: Daily.co API Verification

Test 4: Browser UI Test (Playwright MCP)

Test 5: Webhook Verification

Test 6: Recording in S3

Test 7: Database Check - Recording and Transcript

Test 8: Recording Type Verification

Troubleshooting

Issue: No webhooks received

Issue: Webhooks return 422

Issue: Recording not starting

Issue: Recording in S3 but wrong format

Quick Validation Commands

Success Criteria Checklist

18 KiB

Raw Blame History