18 KiB
Daily.co Integration Test Plan
✅ IMPLEMENTATION STATUS: Real Transcription Active
This test validates Daily.co multitrack recording integration with REAL transcription/diarization.
The implementation includes complete audio processing pipeline:
- Multitrack recordings from Daily.co S3 (separate audio stream per participant)
- PyAV-based audio mixdown with PTS-based track alignment
- Real transcription via Modal GPU backend (Whisper)
- Real diarization via Modal GPU backend (speaker identification)
- Per-track transcription with timestamp synchronization
- Complete database entities (recording, transcript, topics, participants, words)
Processing pipeline (PipelineMainMultitrack):
- Download all audio tracks from Daily.co S3
- Align tracks by PTS (presentation timestamp) to handle late joiners
- Mix tracks into single audio file for unified playback
- Transcribe each track individually with proper offset handling
- Perform diarization on mixed audio
- Generate topics, summaries, and word-level timestamps
- Convert audio to MP3 and generate waveform visualization
Note: A stub processor (process_daily_recording) exists for testing webhook flow without GPU costs, but the production code path uses process_multitrack_recording with full ML pipeline.
Prerequisites
1. Environment Variables (check in .env.development.local):
# Daily.co API Configuration
DAILY_API_KEY=<key>
DAILY_SUBDOMAIN=monadical
DAILY_WEBHOOK_SECRET=<base64-encoded-secret>
AWS_DAILY_S3_BUCKET=reflector-dailyco-local
AWS_DAILY_S3_REGION=us-east-1
AWS_DAILY_ROLE_ARN=arn:aws:iam::950402358378:role/DailyCo
DAILY_MIGRATION_ENABLED=true
DAILY_MIGRATION_ROOM_IDS=["552640fd-16f2-4162-9526-8cf40cd2357e"]
# Transcription/Diarization Backend (Required for real processing)
DIARIZATION_BACKEND=modal
DIARIZATION_MODAL_API_KEY=<modal-api-key>
# TRANSCRIPTION_BACKEND is not explicitly set (uses default/modal)
2. Services Running:
docker compose ps # server, postgres, redis, worker, beat should be UP
IMPORTANT: Worker and beat services MUST be running for transcription processing:
docker compose up -d worker beat
3. ngrok Tunnel for Webhooks:
# Start ngrok (if not already running)
ngrok http 1250 --log=stdout > /tmp/ngrok.log 2>&1 &
# Get public URL
curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; data=json.load(sys.stdin); print(data['tunnels'][0]['public_url'])"
Current ngrok URL: https://0503947384a3.ngrok-free.app (as of last registration)
4. Webhook Created:
cd server
uv run python scripts/recreate_daily_webhook.py https://0503947384a3.ngrok-free.app/v1/daily/webhook
# Verify: "Created webhook <uuid> (state: ACTIVE)"
Current webhook status: ✅ ACTIVE (webhook ID: dad5ad16-ceca-488e-8fc5-dae8650b51d0)
Test 1: Database Configuration
Check room platform:
docker-compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT id, name, platform, recording_type FROM room WHERE name = 'test2';"
Expected:
id: 552640fd-16f2-4162-9526-8cf40cd2357e
name: test2
platform: whereby # DB value (overridden by env var DAILY_MIGRATION_ROOM_IDS)
recording_type: cloud
Clear old meetings:
docker-compose exec -T postgres psql -U reflector -d reflector -c \
"UPDATE meeting SET is_active = false WHERE room_id = '552640fd-16f2-4162-9526-8cf40cd2357e';"
Test 2: Meeting Creation with Auto-Recording
Create meeting:
curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
-H "Content-Type: application/json" \
-d '{"allow_duplicated":false}' | python3 -m json.tool
Expected Response:
{
"room_name": "test2-YYYYMMDDHHMMSS", // Includes "test2" prefix!
"room_url": "https://monadical.daily.co/test2-...?t=<JWT_TOKEN>", // Has token!
"platform": "daily",
"recording_type": "cloud" // DB value (Whereby-specific)
}
Decode token to verify auto-recording:
# Extract token from room_url, decode JWT payload
echo "<token>" | python3 -c "
import sys, json, base64
token = sys.stdin.read().strip()
payload = token.split('.')[1] + '=' * (4 - len(token.split('.')[1]) % 4)
print(json.dumps(json.loads(base64.b64decode(payload)), indent=2))
"
Expected token payload:
{
"r": "test2-YYYYMMDDHHMMSS", // Room name
"sr": true, // start_recording: true ✅
"d": "...", // Domain ID
"iat": 1234567890
}
Test 3: Daily.co API Verification
Check room configuration:
ROOM_NAME="<from previous step>"
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool
Expected config:
{
"config": {
"enable_recording": "raw-tracks", // ✅
"recordings_bucket": {
"bucket_name": "reflector-dailyco-local",
"bucket_region": "us-east-1",
"assume_role_arn": "arn:aws:iam::950402358378:role/DailyCo"
}
}
}
Test 4: Browser UI Test (Playwright MCP)
Using Claude Code MCP tools:
Load room:
Use: mcp__playwright__browser_navigate
Input: {"url": "http://localhost:3000/test2"}
Then wait 12 seconds for iframe to load
Verify Daily.co iframe loaded:
Use: mcp__playwright__browser_snapshot
Expected in snapshot:
- iframe element with src containing "monadical.daily.co"
- Daily.co pre-call UI visible
Take screenshot:
Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-before-join.png"}
Expected: Daily.co pre-call UI with "Join" button visible
Join meeting:
Note: Daily.co iframe interaction requires clicking inside iframe.
Use: mcp__playwright__browser_click
Input: {"element": "Join button in Daily.co iframe", "ref": "<ref-from-snapshot>"}
Then wait 5 seconds for call to connect
Verify in-call:
Use: mcp__playwright__browser_take_screenshot
Input: {"filename": "test2-in-call.png"}
Expected: "Waiting for others to join" or participant video visible
Leave meeting:
Use: mcp__playwright__browser_click
Input: {"element": "Leave button in Daily.co iframe", "ref": "<ref-from-snapshot>"}
Alternative: JavaScript snippets (for manual testing):
await page.goto('http://localhost:3000/test2');
await new Promise(f => setTimeout(f, 12000)); // Wait for load
// Verify iframe
const iframes = document.querySelectorAll('iframe');
// Expected: 1 iframe with src containing "monadical.daily.co"
// Screenshot
await page.screenshot({ path: 'test2-before-join.png' });
// Join
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Join' }).click();
await new Promise(f => setTimeout(f, 5000));
// In-call screenshot
await page.screenshot({ path: 'test2-in-call.png' });
// Leave
await page.locator('iframe').contentFrame().getByRole('button', { name: 'Leave' }).click();
Test 5: Webhook Verification
Check server logs for webhooks:
docker-compose logs --since 15m server 2>&1 | grep -i "participant joined\|recording started"
Expected logs:
[info] Participant joined | meeting_id=... | num_clients=1 | recording_type=cloud | recording_trigger=automatic-2nd-participant
[info] Recording started | meeting_id=... | recording_id=... | platform=daily
Check Daily.co webhook delivery logs:
curl -s -X GET "https://api.daily.co/v1/logs/webhooks?limit=20" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
logs = json.load(sys.stdin)
for log in logs[:10]:
req = json.loads(log['request'])
room = req.get('payload', {}).get('room') or req.get('payload', {}).get('room_name', 'N/A')
print(f\"{req['type']:30s} | room: {room:30s} | status: {log['status']}\")
"
Expected output:
participant.joined | room: test2-YYYYMMDDHHMMSS | status: 200
recording.started | room: test2-YYYYMMDDHHMMSS | status: 200
participant.left | room: test2-YYYYMMDDHHMMSS | status: 200
recording.ready-to-download | room: test2-YYYYMMDDHHMMSS | status: 200
Check database updated:
docker-compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT room_name, num_clients FROM meeting WHERE room_name LIKE 'test2-%' ORDER BY end_date DESC LIMIT 1;"
Expected:
room_name: test2-YYYYMMDDHHMMSS
num_clients: 0 // After participant left
Test 6: Recording in S3
List recent recordings:
curl -s -X GET "https://api.daily.co/v1/recordings" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for rec in data.get('data', [])[:5]:
if 'test2-' in rec.get('room_name', ''):
print(f\"Room: {rec['room_name']}\")
print(f\"Status: {rec['status']}\")
print(f\"Duration: {rec.get('duration', 0)}s\")
print(f\"S3 key: {rec.get('s3key', 'N/A')}\")
print(f\"Tracks: {len(rec.get('tracks', []))} files\")
for track in rec.get('tracks', []):
print(f\" - {track['type']}: {track['s3Key'].split('/')[-1]} ({track['size']} bytes)\")
print()
"
Expected output:
Room: test2-20251009192341
Status: finished
Duration: ~30-120s
S3 key: monadical/test2-20251009192341/1760037914930
Tracks: 2 files
- audio: 1760037914930-<uuid>-cam-audio-1760037915265 (~400 KB)
- video: 1760037914930-<uuid>-cam-video-1760037915269 (~10-30 MB)
Verify S3 path structure:
monadical/- Daily.co subdomaintest2-20251009192341/- Reflector room name + timestamp<timestamp>-<participant-uuid>-<media-type>-<track-start>.webm- Individual track files
Test 7: Database Check - Recording and Transcript
Check recording created:
docker-compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT id, bucket_name, object_key, status, meeting_id, recorded_at
FROM recording
ORDER BY recorded_at DESC LIMIT 1;"
Expected:
id: <recording-id-from-webhook>
bucket_name: reflector-dailyco-local
object_key: monadical/test2-<timestamp>/<recording-timestamp>-<uuid>-cam-audio-<track-start>.webm
status: completed
meeting_id: <meeting-id>
recorded_at: <recent-timestamp>
Check transcript created:
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT id, title, status, duration, recording_id, meeting_id, room_id
FROM transcript
ORDER BY created_at DESC LIMIT 1;"
Expected (REAL transcription):
id: <transcript-id>
title: <AI-generated title based on actual conversation content>
status: uploaded (audio file processed and available)
duration: <actual meeting duration in seconds>
recording_id: <same-as-recording-id-above>
meeting_id: <meeting-id>
room_id: 552640fd-16f2-4162-9526-8cf40cd2357e
Note: Title and content will reflect the ACTUAL conversation, not mock data. Processing time depends on recording length and GPU backend availability (Modal).
Verify audio file exists:
ls -lh data/<transcript-id>/upload.webm
Expected:
-rw-r--r-- 1 user staff ~100-200K Oct 10 18:48 upload.webm
Check transcript topics (REAL transcription):
TRANSCRIPT_ID=$(docker compose exec -T postgres psql -U reflector -d reflector -t -c \
"SELECT id FROM transcript ORDER BY created_at DESC LIMIT 1;")
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT
jsonb_array_length(topics) as num_topics,
jsonb_array_length(participants) as num_participants,
short_summary,
title
FROM transcript
WHERE id = '$TRANSCRIPT_ID';"
Expected (REAL data):
num_topics: <varies based on conversation>
num_participants: <actual number of participants who spoke>
short_summary: <AI-generated summary of actual conversation>
title: <AI-generated title based on content>
Check topics contain actual transcription:
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT topics->0->'title', topics->0->'summary', topics->0->'transcript'
FROM transcript
ORDER BY created_at DESC LIMIT 1;" | head -20
Expected output: Will contain the ACTUAL transcribed conversation from the Daily.co meeting, not mock data.
Check participants:
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT participants FROM transcript ORDER BY created_at DESC LIMIT 1;" \
| python3 -c "import sys, json; data=json.loads(sys.stdin.read()); print(json.dumps(data, indent=2))"
Expected (REAL diarization):
[
{
"id": "<uuid>",
"speaker": 0,
"name": "Speaker 1"
},
{
"id": "<uuid>",
"speaker": 1,
"name": "Speaker 2"
}
]
Note: Speaker names will be generic ("Speaker 1", "Speaker 2", etc.) as determined by the diarization backend. Number of participants depends on how many actually spoke during the meeting.
Check word-level data:
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT jsonb_array_length(topics->0->'words') as num_words_first_topic
FROM transcript
ORDER BY created_at DESC LIMIT 1;"
Expected:
num_words_first_topic: <varies based on actual conversation length and topic chunking>
Verify speaker diarization in words:
docker compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT
topics->0->'words'->0->>'text' as first_word,
topics->0->'words'->0->>'speaker' as speaker,
topics->0->'words'->0->>'start' as start_time,
topics->0->'words'->0->>'end' as end_time
FROM transcript
ORDER BY created_at DESC LIMIT 1;"
Expected (REAL transcription):
first_word: <actual first word from transcription>
speaker: 0, 1, 2, ... (actual speaker ID from diarization)
start_time: <actual timestamp in seconds>
end_time: <actual end timestamp>
Note: All timestamps and speaker IDs are from real transcription/diarization, synchronized across tracks.
Test 8: Recording Type Verification
Check what Daily.co received:
curl -s -X GET "https://api.daily.co/v1/rooms/test2-<timestamp>" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool | grep "enable_recording"
Expected:
"enable_recording": "raw-tracks"
NOT: "enable_recording": "cloud" (that would be wrong - we want raw tracks)
Troubleshooting
Issue: No webhooks received
Check webhook state:
curl -s -X GET "https://api.daily.co/v1/webhooks" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -m json.tool
If state is FAILED:
cd server
uv run python scripts/recreate_daily_webhook.py https://<ngrok-url>/v1/daily/webhook
Issue: Webhooks return 422
Check server logs:
docker-compose logs --tail=50 server | grep "Failed to parse webhook event"
Common cause: Event structure mismatch. Daily.co events use:
{
"version": "1.0.0",
"type": "participant.joined",
"payload": {...}, // NOT "data"
"event_ts": 123.456 // NOT "ts"
}
Issue: Recording not starting
-
Check token has
sr: true:- Decode JWT token from room_url query param
- Should contain
"sr": true
-
Check Daily.co room config:
enable_recordingmust be set (not false)- For raw-tracks: must be exactly
"raw-tracks"
-
Check participant actually joined:
- Logs should show "Participant joined"
- Must click "Join" button, not just pre-call screen
Issue: Recording in S3 but wrong format
Daily.co recording types:
"cloud"→ Single MP4 file (download_linkin webhook)"raw-tracks"→ Multiple WebM files (tracksarray in webhook)"raw-tracks-audio-only"→ Only audio WebM files
Current implementation: Always uses "raw-tracks" (better for transcription)
Quick Validation Commands
One-liner to verify everything:
# 1. Check room exists
docker-compose exec -T postgres psql -U reflector -d reflector -c \
"SELECT name, platform FROM room WHERE name = 'test2';" && \
# 2. Create meeting
MEETING=$(curl -s -X POST http://localhost:1250/v1/rooms/test2/meeting \
-H "Content-Type: application/json" -d '{"allow_duplicated":false}') && \
echo "$MEETING" | python3 -c "import sys,json; m=json.load(sys.stdin); print(f'Room: {m[\"room_name\"]}\nURL: {m[\"room_url\"][:80]}...')" && \
# 3. Check Daily.co config
ROOM_NAME=$(echo "$MEETING" | python3 -c "import sys,json; print(json.load(sys.stdin)['room_name'])") && \
curl -s -X GET "https://api.daily.co/v1/rooms/$ROOM_NAME" \
-H "Authorization: Bearer $DAILY_API_KEY" | python3 -c "import sys,json; print(f'Recording: {json.load(sys.stdin)[\"config\"][\"enable_recording\"]}')"
Expected output:
name: test2, platform: whereby
Room: test2-20251009192341
URL: https://monadical.daily.co/test2-20251009192341?t=eyJhbGc...
Recording: raw-tracks
Success Criteria Checklist
- Room name includes Reflector room prefix (
test2-...) - Meeting URL contains JWT token (
?t=...) - Token has
sr: true(auto-recording enabled) - Daily.co room config:
enable_recording: "raw-tracks" - Browser loads Daily.co interface (not Whereby)
- Recording auto-starts when participant joins
- Webhooks received: participant.joined, recording.started, participant.left, recording.ready-to-download
- Recording status:
finished - S3 contains 2 files: audio (.webm) and video (.webm)
- S3 path:
monadical/test2-{timestamp}/{recording-start-ts}-{participant-uuid}-cam-{audio|video}-{track-start-ts} - Database
num_clientsincrements/decrements correctly - Database recording entry created with correct S3 path and status
completed - Database transcript entry created with status
uploaded - Audio file downloaded to
data/{transcript_id}/upload.webm - Transcript has REAL data: AI-generated title based on conversation
- Transcript has topics generated from actual content
- Transcript has participants with proper speaker diarization
- Topics contain word-level data with accurate timestamps and speaker IDs
- Total duration matches actual meeting length
- MP3 and waveform files generated by file processing pipeline
- Frontend transcript page loads without "Failed to load audio" error
- Audio player functional with working playback and waveform visualization
- Multitrack processing completed without errors in worker logs
- Modal GPU backends accessible (transcription and diarization)