Files
reflector/server/DAILYCO_TEST.md
2025-10-21 10:30:19 -04:00

18 KiB

Daily.co Integration Test Plan

IMPLEMENTATION STATUS: Real Transcription Active

This test validates Daily.co multitrack recording integration with REAL transcription/diarization.

The implementation includes complete audio processing pipeline:

  • Multitrack recordings from Daily.co S3 (separate audio stream per participant)
  • PyAV-based audio mixdown with PTS-based track alignment
  • Real transcription via Modal GPU backend (Whisper)
  • Real diarization via Modal GPU backend (speaker identification)
  • Per-track transcription with timestamp synchronization
  • Complete database entities (recording, transcript, topics, participants, words)

Processing pipeline (PipelineMainMultitrack):

  1. Download all audio tracks from Daily.co S3
  2. Align tracks by PTS (presentation timestamp) to handle late joiners
  3. Mix tracks into single audio file for unified playback
  4. Transcribe each track individually with proper offset handling
  5. Perform diarization on mixed audio
  6. Generate topics, summaries, and word-level timestamps
  7. Convert audio to MP3 and generate waveform visualization

Note: A stub processor (process_daily_recording) exists for testing webhook flow without GPU costs, but the production code path uses process_multitrack_recording with full ML pipeline.


Prerequisites

1. Environment Variables (check in .env.development.local):

# Daily.co API Configuration
DAILY_API_KEY=<key>
DAILY_SUBDOMAIN=monadical
DAILY_WEBHOOK_SECRET=<base64-encoded-secret>
AWS_DAILY_S3_BUCKET=reflector-dailyco-local
AWS_DAILY_S3_REGION=us-east-1
AWS_DAILY_ROLE_ARN=arn:aws:iam::950402358378:role/DailyCo
DAILY_MIGRATION_ENABLED=true
DAILY_MIGRATION_ROOM_IDS=["552640fd-16f2-4162-9526-8cf40cd2357e"]

# Transcription/Diarization Backend (Required for real processing)
DIARIZATION_BACKEND=modal
DIARIZATION_MODAL_API_KEY=<modal-api-key>
# TRANSCRIPTION_BACKEND is not explicitly set (uses default/modal)

2. Services Running:

docker compose ps  # server, postgres, redis, worker, beat should be UP

IMPORTANT: Worker and beat services MUST be running for transcription processing:

docker compose up -d worker beat

3. ngrok Tunnel for Webhooks:

# Start ngrok (if not already running)
ngrok http 1250 --log=stdout > /tmp/ngrok.log 2>&1 &

# Get public URL
curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; data=json.load(sys.stdin); print(data['tunnels'][0]['public_url'])"

Current ngrok URL: https://0503947384a3.ngrok-free.app (as of last registration)

4. Webhook Created:

cd server
uv run python scripts/recreate_daily_webhook.py https://0503947384a3.ngrok-free.app/v1/daily/webhook
# Verify: "Created webhook <uuid> (state: ACTIVE)"

Current webhook status: ACTIVE (webhook ID: dad5ad16-ceca-488e-8fc5-dae8650b51d0)


Test 1: Database Configuration

Check room platform:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, name, platform, recording_type FROM room WHERE name = 'test2';"

Expected:

id: 552640fd-16f2-4162-9526-8cf40cd2357e
name: test2
platform: whereby  # DB value (overridden by env var DAILY_MIGRATION_ROOM_IDS)
recording_type: cloud

Clear old meetings:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "UPDATE meeting SET is_active = false WHERE room_id = '552640fd-16f2-4162-9526-8cf40cd2357e';"

Test 2: Meeting Creation with Auto-Recording

Create meeting:

curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
  -H "Content-Type: application/json" \
  -d '{"allow_duplicated":false}' | python3 -m json.tool

Expected Response:

{
  "room_name": "test2-YYYYMMDDHHMMSS",  // Includes "test2" prefix!
  "room_url": "https://monadical.daily.co/test2-...?t=<JWT_TOKEN>",  // Has token!
  "platform": "daily",
  "recording_type": "cloud"  // DB value (Whereby-specific)
}

Decode token to verify auto-recording:

# Extract token from room_url, decode JWT payload
echo "<token>" | python3 -c "
import sys, json, base64
token = sys.stdin.read().strip()
payload = token.split('.')[1] + '=' * (4 - len(token.split('.')[1]) % 4)
print(json.dumps(json.loads(base64.b64decode(payload)), indent=2))
"

Expected token payload:

{
  "r": "test2-YYYYMMDDHHMMSS",  // Room name
  "sr": true,  // start_recording: true ✅
  "d": "...",  // Domain ID
  "iat": 1234567890
}

Test 3: Daily.co API Verification

Check room configuration:

ROOM_NAME="<from previous step>"
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool

Expected config:

{
  "config": {
    "enable_recording": "raw-tracks",  // ✅
    "recordings_bucket": {
      "bucket_name": "reflector-dailyco-local",
      "bucket_region": "us-east-1",
      "assume_role_arn": "arn:aws:iam::950402358378:role/DailyCo"
    }
  }
}

Test 4: Browser UI Test (Playwright MCP)

Using Claude Code MCP tools:

Load room:

Use: mcp__playwright__browser_navigate
Input: {"url": "http://localhost:3000/test2"}

Then wait 12 seconds for iframe to load

Verify Daily.co iframe loaded:

Use: mcp__playwright__browser_snapshot

Expected in snapshot:
- iframe element with src containing "monadical.daily.co"
- Daily.co pre-call UI visible

Take screenshot:

Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-before-join.png"}

Expected: Daily.co pre-call UI with "Join" button visible

Join meeting:

Note: Daily.co iframe interaction requires clicking inside iframe.
Use: mcp__playwright__browser_click
Input: {"element": "Join button in Daily.co iframe", "ref": "<ref-from-snapshot>"}

Then wait 5 seconds for call to connect

Verify in-call:

Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-in-call.png"}

Expected: "Waiting for others to join" or participant video visible

Leave meeting:

Use: mcp__playwright__browser_click
Input: {"element": "Leave button in Daily.co iframe", "ref": "<ref-from-snapshot>"}

Alternative: JavaScript snippets (for manual testing):

await page.goto('http://localhost:3000/test2');
await new Promise(f => setTimeout(f, 12000));  // Wait for load

// Verify iframe
const iframes = document.querySelectorAll('iframe');
// Expected: 1 iframe with src containing "monadical.daily.co"

// Screenshot
await page.screenshot({ path: 'test2-before-join.png' });

// Join
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Join' }).click();
await new Promise(f => setTimeout(f, 5000));

// In-call screenshot
await page.screenshot({ path: 'test2-in-call.png' });

// Leave
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Leave' }).click();

Test 5: Webhook Verification

Check server logs for webhooks:

docker-compose logs --since 15m server 2>&1 | grep -i "participant joined\|recording started"

Expected logs:

[info] Participant joined | meeting_id=... | num_clients=1 | recording_type=cloud | recording_trigger=automatic-2nd-participant
[info] Recording started | meeting_id=... | recording_id=... | platform=daily

Check Daily.co webhook delivery logs:

curl -s -X GET "https://api.daily.co/v1/logs/webhooks?limit=20" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
logs = json.load(sys.stdin)
for log in logs[:10]:
    req = json.loads(log['request'])
    room = req.get('payload', {}).get('room') or req.get('payload', {}).get('room_name', 'N/A')
    print(f\"{req['type']:30s} | room: {room:30s} | status: {log['status']}\")
"

Expected output:

participant.joined             | room: test2-YYYYMMDDHHMMSS       | status: 200
recording.started              | room: test2-YYYYMMDDHHMMSS       | status: 200
participant.left               | room: test2-YYYYMMDDHHMMSS       | status: 200
recording.ready-to-download    | room: test2-YYYYMMDDHHMMSS       | status: 200

Check database updated:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT room_name, num_clients FROM meeting WHERE room_name LIKE 'test2-%' ORDER BY end_date DESC LIMIT 1;"

Expected:

room_name: test2-YYYYMMDDHHMMSS
num_clients: 0  // After participant left

Test 6: Recording in S3

List recent recordings:

curl -s -X GET "https://api.daily.co/v1/recordings" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for rec in data.get('data', [])[:5]:
    if 'test2-' in rec.get('room_name', ''):
        print(f\"Room: {rec['room_name']}\")
        print(f\"Status: {rec['status']}\")
        print(f\"Duration: {rec.get('duration', 0)}s\")
        print(f\"S3 key: {rec.get('s3key', 'N/A')}\")
        print(f\"Tracks: {len(rec.get('tracks', []))} files\")
        for track in rec.get('tracks', []):
            print(f\"  - {track['type']}: {track['s3Key'].split('/')[-1]} ({track['size']} bytes)\")
        print()
"

Expected output:

Room: test2-20251009192341
Status: finished
Duration: ~30-120s
S3 key: monadical/test2-20251009192341/1760037914930
Tracks: 2 files
  - audio: 1760037914930-<uuid>-cam-audio-1760037915265 (~400 KB)
  - video: 1760037914930-<uuid>-cam-video-1760037915269 (~10-30 MB)

Verify S3 path structure:

  • monadical/ - Daily.co subdomain
  • test2-20251009192341/ - Reflector room name + timestamp
  • <timestamp>-<participant-uuid>-<media-type>-<track-start>.webm - Individual track files

Test 7: Database Check - Recording and Transcript

Check recording created:

docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, bucket_name, object_key, status, meeting_id, recorded_at
   FROM recording
   ORDER BY recorded_at DESC LIMIT 1;"

Expected:

id: <recording-id-from-webhook>
bucket_name: reflector-dailyco-local
object_key: monadical/test2-<timestamp>/<recording-timestamp>-<uuid>-cam-audio-<track-start>.webm
status: completed
meeting_id: <meeting-id>
recorded_at: <recent-timestamp>

Check transcript created:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, title, status, duration, recording_id, meeting_id, room_id
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected (REAL transcription):

id: <transcript-id>
title: <AI-generated title based on actual conversation content>
status: uploaded  (audio file processed and available)
duration: <actual meeting duration in seconds>
recording_id: <same-as-recording-id-above>
meeting_id: <meeting-id>
room_id: 552640fd-16f2-4162-9526-8cf40cd2357e

Note: Title and content will reflect the ACTUAL conversation, not mock data. Processing time depends on recording length and GPU backend availability (Modal).

Verify audio file exists:

ls -lh data/<transcript-id>/upload.webm

Expected:

-rw-r--r-- 1 user staff ~100-200K Oct 10 18:48 upload.webm

Check transcript topics (REAL transcription):

TRANSCRIPT_ID=$(docker compose exec -T postgres psql -U reflector -d reflector -t -c \
  "SELECT id FROM transcript ORDER BY created_at DESC LIMIT 1;")

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     jsonb_array_length(topics) as num_topics,
     jsonb_array_length(participants) as num_participants,
     short_summary,
     title
   FROM transcript
   WHERE id = '$TRANSCRIPT_ID';"

Expected (REAL data):

num_topics: <varies based on conversation>
num_participants: <actual number of participants who spoke>
short_summary: <AI-generated summary of actual conversation>
title: <AI-generated title based on content>

Check topics contain actual transcription:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT topics->0->'title', topics->0->'summary', topics->0->'transcript'
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;" | head -20

Expected output: Will contain the ACTUAL transcribed conversation from the Daily.co meeting, not mock data.

Check participants:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT participants FROM transcript ORDER BY created_at DESC LIMIT 1;" \
  | python3 -c "import sys, json; data=json.loads(sys.stdin.read()); print(json.dumps(data, indent=2))"

Expected (REAL diarization):

[
  {
    "id": "<uuid>",
    "speaker": 0,
    "name": "Speaker 1"
  },
  {
    "id": "<uuid>",
    "speaker": 1,
    "name": "Speaker 2"
  }
]

Note: Speaker names will be generic ("Speaker 1", "Speaker 2", etc.) as determined by the diarization backend. Number of participants depends on how many actually spoke during the meeting.

Check word-level data:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT jsonb_array_length(topics->0->'words') as num_words_first_topic
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected:

num_words_first_topic: <varies based on actual conversation length and topic chunking>

Verify speaker diarization in words:

docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     topics->0->'words'->0->>'text' as first_word,
     topics->0->'words'->0->>'speaker' as speaker,
     topics->0->'words'->0->>'start' as start_time,
     topics->0->'words'->0->>'end' as end_time
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"

Expected (REAL transcription):

first_word: <actual first word from transcription>
speaker: 0, 1, 2, ... (actual speaker ID from diarization)
start_time: <actual timestamp in seconds>
end_time: <actual end timestamp>

Note: All timestamps and speaker IDs are from real transcription/diarization, synchronized across tracks.


Test 8: Recording Type Verification

Check what Daily.co received:

curl -s -X GET "https://api.daily.co/v1/rooms/test2-<timestamp>" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool | grep "enable_recording"

Expected:

"enable_recording": "raw-tracks"

NOT: "enable_recording": "cloud" (that would be wrong - we want raw tracks)


Troubleshooting

Issue: No webhooks received

Check webhook state:

curl -s -X GET "https://api.daily.co/v1/webhooks" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool

If state is FAILED:

cd server
uv run python scripts/recreate_daily_webhook.py https://<ngrok-url>/v1/daily/webhook

Issue: Webhooks return 422

Check server logs:

docker-compose logs --tail=50 server | grep "Failed to parse webhook event"

Common cause: Event structure mismatch. Daily.co events use:

{
  "version": "1.0.0",
  "type": "participant.joined",
  "payload": {...},  // NOT "data"
  "event_ts": 123.456  // NOT "ts"
}

Issue: Recording not starting

  1. Check token has sr: true:

    • Decode JWT token from room_url query param
    • Should contain "sr": true
  2. Check Daily.co room config:

    • enable_recording must be set (not false)
    • For raw-tracks: must be exactly "raw-tracks"
  3. Check participant actually joined:

    • Logs should show "Participant joined"
    • Must click "Join" button, not just pre-call screen

Issue: Recording in S3 but wrong format

Daily.co recording types:

  • "cloud" → Single MP4 file (download_link in webhook)
  • "raw-tracks" → Multiple WebM files (tracks array in webhook)
  • "raw-tracks-audio-only" → Only audio WebM files

Current implementation: Always uses "raw-tracks" (better for transcription)


Quick Validation Commands

One-liner to verify everything:

# 1. Check room exists
docker-compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT name, platform FROM room WHERE name = 'test2';" && \

# 2. Create meeting
MEETING=$(curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
  -H "Content-Type: application/json" -d '{"allow_duplicated":false}') && \
echo "$MEETING" | python3 -c "import sys,json; m=json.load(sys.stdin); print(f'Room: {m[\"room_name\"]}\nURL: {m[\"room_url\"][:80]}...')" && \

# 3. Check Daily.co config
ROOM_NAME=$(echo "$MEETING" | python3 -c "import sys,json; print(json.load(sys.stdin)['room_name'])") && \
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
  -H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "import sys,json; print(f'Recording: {json.load(sys.stdin)[\"config\"][\"enable_recording\"]}')"

Expected output:

name: test2, platform: whereby
Room: test2-20251009192341
URL: https://monadical.daily.co/test2-20251009192341?t=eyJhbGc...
Recording: raw-tracks

Success Criteria Checklist

  • Room name includes Reflector room prefix (test2-...)
  • Meeting URL contains JWT token (?t=...)
  • Token has sr: true (auto-recording enabled)
  • Daily.co room config: enable_recording: "raw-tracks"
  • Browser loads Daily.co interface (not Whereby)
  • Recording auto-starts when participant joins
  • Webhooks received: participant.joined, recording.started, participant.left, recording.ready-to-download
  • Recording status: finished
  • S3 contains 2 files: audio (.webm) and video (.webm)
  • S3 path: monadical/test2-{timestamp}/{recording-start-ts}-{participant-uuid}-cam-{audio|video}-{track-start-ts}
  • Database num_clients increments/decrements correctly
  • Database recording entry created with correct S3 path and status completed
  • Database transcript entry created with status uploaded
  • Audio file downloaded to data/{transcript_id}/upload.webm
  • Transcript has REAL data: AI-generated title based on conversation
  • Transcript has topics generated from actual content
  • Transcript has participants with proper speaker diarization
  • Topics contain word-level data with accurate timestamps and speaker IDs
  • Total duration matches actual meeting length
  • MP3 and waveform files generated by file processing pipeline
  • Frontend transcript page loads without "Failed to load audio" error
  • Audio player functional with working playback and waveform visualization
  • Multitrack processing completed without errors in worker logs
  • Modal GPU backends accessible (transcription and diarization)