dailico track merge vibe

2026-02-04 09:56:47 +00:00 · 2025-10-21 10:30:19 -04:00
parent f844b9fc1f
commit 7d239fe380
12 changed files with 1993 additions and 124 deletions
--- a/server/DAILYCO_TEST.md
+++ b/server/DAILYCO_TEST.md
@@ -1,27 +1,27 @@
 # Daily.co Integration Test Plan
-## ⚠️ IMPORTANT: Stub Implementation
+## ✅ IMPLEMENTATION STATUS: Real Transcription Active
-**This test validates Daily.co webhook integration with MOCK transcription data.**
+**This test validates Daily.co multitrack recording integration with REAL transcription/diarization.**
-The actual audio/video files are recorded to S3, but transcription/diarization is NOT performed. Instead:
+The implementation includes complete audio processing pipeline:
- A **stub processor** generates fake transcript with predetermined text ("The Great Fish Eating Argument")
+- **Multitrack recordings** from Daily.co S3 (separate audio stream per participant)
- **Audio track is downloaded from Daily.co S3** to local storage for playback in the frontend
+- **PyAV-based audio mixdown** with PTS-based track alignment
- All database entities (recording, transcript, topics, participants, words) are created with **fake "fish" conversation data**
+- **Real transcription** via Modal GPU backend (Whisper)
- This allows testing the complete webhook → database flow WITHOUT expensive GPU processing
+- **Real diarization** via Modal GPU backend (speaker identification)
 - **Per-track transcription** with timestamp synchronization
 - **Complete database entities** (recording, transcript, topics, participants, words)
-**Expected transcript content:**
+**Processing pipeline** (`PipelineMainMultitrack`):
- Title: "The Great Fish Eating Argument"
+1. Download all audio tracks from Daily.co S3
- Participants: "Fish Eater" (speaker 0), "Annoying Person" (speaker 1)
+2. Align tracks by PTS (presentation timestamp) to handle late joiners
- Transcription: Nonsensical argument about eating fish (see `reflector/worker/daily_stub_data.py`)
+3. Mix tracks into single audio file for unified playback
- Audio file: Downloaded WebM from Daily.co S3 (stored in `data/{transcript_id}/upload.webm`)
+4. Transcribe each track individually with proper offset handling
 5. Perform diarization on mixed audio
 6. Generate topics, summaries, and word-level timestamps
 7. Convert audio to MP3 and generate waveform visualization
-**File processing pipeline** then:
+**Note:** A stub processor (`process_daily_recording`) exists for testing webhook flow without GPU costs, but the production code path uses `process_multitrack_recording` with full ML pipeline.
 - Converts WebM to MP3 format (for frontend audio player)
 - Generates waveform visualization data (audio.json)
 - These files enable proper frontend transcript page display
 **Next implementation step:** Replace stub with real transcription pipeline (merge audio tracks, run Whisper/diarization).
 ---
@@ -29,6 +29,7 @@ The actual audio/video files are recorded to S3, but transcription/diarization i
 **1. Environment Variables** (check in `.env.development.local`):
 ```bash
 # Daily.co API Configuration
 DAILY_API_KEY=<key>
 DAILY_SUBDOMAIN=monadical
 DAILY_WEBHOOK_SECRET=<base64-encoded-secret>
@@ -37,25 +38,43 @@ AWS_DAILY_S3_REGION=us-east-1
 AWS_DAILY_ROLE_ARN=arn:aws:iam::950402358378:role/DailyCo
 DAILY_MIGRATION_ENABLED=true
 DAILY_MIGRATION_ROOM_IDS=["552640fd-16f2-4162-9526-8cf40cd2357e"]
 # Transcription/Diarization Backend (Required for real processing)
 DIARIZATION_BACKEND=modal
 DIARIZATION_MODAL_API_KEY=<modal-api-key>
 # TRANSCRIPTION_BACKEND is not explicitly set (uses default/modal)
 ```
 **2. Services Running:**
 ```bash
-docker-compose ps  # server, postgres, redis should be UP
+docker compose ps  # server, postgres, redis, worker, beat should be UP
 ```
 **IMPORTANT:** Worker and beat services MUST be running for transcription processing:
 ```bash
 docker compose up -d worker beat
 ```
 **3. ngrok Tunnel for Webhooks:**
 ```bash
-ngrok http 1250  # Note the URL (e.g., https://abc123.ngrok-free.app)
+# Start ngrok (if not already running)
 ngrok http 1250 --log=stdout > /tmp/ngrok.log 2>&1 &
 # Get public URL
 curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; data=json.load(sys.stdin); print(data['tunnels'][0]['public_url'])"
 ```
 **Current ngrok URL:** `https://0503947384a3.ngrok-free.app` (as of last registration)
 **4. Webhook Created:**
 ```bash
 cd server
-uv run python scripts/recreate_daily_webhook.py https://abc123.ngrok-free.app/v1/daily/webhook
+uv run python scripts/recreate_daily_webhook.py https://0503947384a3.ngrok-free.app/v1/daily/webhook
 # Verify: "Created webhook <uuid> (state: ACTIVE)"
 ```
 **Current webhook status:** ✅ ACTIVE (webhook ID: dad5ad16-ceca-488e-8fc5-dae8650b51d0)
 ---
 ## Test 1: Database Configuration
@@ -338,23 +357,25 @@ recorded_at: <recent-timestamp>
 **Check transcript created:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, title, status, duration, recording_id, meeting_id, room_id
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"
 ```
-**Expected:**
+**Expected (REAL transcription):**
 ```
 id: <transcript-id>
-title: The Great Fish Eating Argument
+title: <AI-generated title based on actual conversation content>
-status: uploaded  (audio file downloaded for playback)
+status: uploaded  (audio file processed and available)
-duration: ~200-300 seconds (depends on fish text parsing)
+duration: <actual meeting duration in seconds>
 recording_id: <same-as-recording-id-above>
 meeting_id: <meeting-id>
 room_id: 552640fd-16f2-4162-9526-8cf40cd2357e
 ```
 **Note:** Title and content will reflect the ACTUAL conversation, not mock data. Processing time depends on recording length and GPU backend availability (Modal).
 **Verify audio file exists:**
 ```bash
 ls -lh data/<transcript-id>/upload.webm
@@ -365,12 +386,12 @@ ls -lh data/<transcript-id>/upload.webm
 -rw-r--r-- 1 user staff ~100-200K Oct 10 18:48 upload.webm
 ```
-**Check transcript topics (stub data):**
+**Check transcript topics (REAL transcription):**
 ```bash
-TRANSCRIPT_ID=$(docker-compose exec -T postgres psql -U reflector -d reflector -t -c \
+TRANSCRIPT_ID=$(docker compose exec -T postgres psql -U reflector -d reflector -t -c \
  "SELECT id FROM transcript ORDER BY created_at DESC LIMIT 1;")
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     jsonb_array_length(topics) as num_topics,
     jsonb_array_length(participants) as num_participants,
@@ -380,55 +401,52 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \
   WHERE id = '$TRANSCRIPT_ID';"
 ```
-**Expected:**
+**Expected (REAL data):**
 ```
-num_topics: 3
+num_topics: <varies based on conversation>
-num_participants: 2
+num_participants: <actual number of participants who spoke>
-short_summary: Two people argue about eating fish
+short_summary: <AI-generated summary of actual conversation>
-title: The Great Fish Eating Argument
+title: <AI-generated title based on content>
 ```
-**Check topics contain fish text:**
+**Check topics contain actual transcription:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT topics->0->'title', topics->0->'summary', topics->0->'transcript'
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;" | head -20
 ```
-**Expected output should contain:**
+**Expected output:** Will contain the ACTUAL transcribed conversation from the Daily.co meeting, not mock data.
 ```
 Fish Argument Part 1
 Argument about eating fish continues (part 1)
 Fish for dinner are nothing wrong with you? There's nothing...
 ```
 **Check participants:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT participants FROM transcript ORDER BY created_at DESC LIMIT 1;" \
  | python3 -c "import sys, json; data=json.loads(sys.stdin.read()); print(json.dumps(data, indent=2))"
 ```
-**Expected:**
+**Expected (REAL diarization):**
 ```json
 [
  {
    "id": "<uuid>",
    "speaker": 0,
-    "name": "Fish Eater"
+    "name": "Speaker 1"
  },
  {
    "id": "<uuid>",
    "speaker": 1,
-    "name": "Annoying Person"
+    "name": "Speaker 2"
  }
 ]
 ```
 **Note:** Speaker names will be generic ("Speaker 1", "Speaker 2", etc.) as determined by the diarization backend. Number of participants depends on how many actually spoke during the meeting.
 **Check word-level data:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT jsonb_array_length(topics->0->'words') as num_words_first_topic
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"
@@ -436,12 +454,12 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \
 **Expected:**
 ```
-num_words_first_topic: ~100-150 (varies based on topic chunking)
+num_words_first_topic: <varies based on actual conversation length and topic chunking>
 ```
 **Verify speaker diarization in words:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     topics->0->'words'->0->>'text' as first_word,
     topics->0->'words'->0->>'speaker' as speaker,
@@ -451,14 +469,16 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \
   ORDER BY created_at DESC LIMIT 1;"
 ```
-**Expected:**
+**Expected (REAL transcription):**
 ```
-first_word: Fish
+first_word: <actual first word from transcription>
-speaker: 0 or 1 (depends on parsing)
+speaker: 0, 1, 2, ... (actual speaker ID from diarization)
-start_time: 0.0
+start_time: <actual timestamp in seconds>
-end_time: 0.35 (approximate)
+end_time: <actual end timestamp>
 ```
 **Note:** All timestamps and speaker IDs are from real transcription/diarization, synchronized across tracks.
 ---
 ## Test 8: Recording Type Verification
@@ -579,13 +599,15 @@ Recording: raw-tracks
 - [x] S3 path: `monadical/test2-{timestamp}/{recording-start-ts}-{participant-uuid}-cam-{audio|video}-{track-start-ts}`
 - [x] Database `num_clients` increments/decrements correctly
 - [x] **Database recording entry created** with correct S3 path and status `completed`
- [x] **Database transcript entry created** with status `uploaded`
+- [ ] **Database transcript entry created** with status `uploaded`
- [x] **Audio file downloaded** to `data/{transcript_id}/upload.webm` (~100-200KB)
+- [ ] **Audio file downloaded** to `data/{transcript_id}/upload.webm`
- [x] **Transcript has stub data**: title "The Great Fish Eating Argument"
+- [ ] **Transcript has REAL data**: AI-generated title based on conversation
- [x] **Transcript has 3 topics** about fish argument
+- [ ] **Transcript has topics** generated from actual content
- [x] **Transcript has 2 participants**: "Fish Eater" (speaker 0) and "Annoying Person" (speaker 1)
+- [ ] **Transcript has participants** with proper speaker diarization
- [x] **Topics contain word-level data** with timestamps and speaker IDs
+- [ ] **Topics contain word-level data** with accurate timestamps and speaker IDs
- [x] **Total duration** ~200-300 seconds based on fish text parsing
+- [ ] **Total duration** matches actual meeting length
- [x] **MP3 and waveform files generated** by file processing pipeline
+- [ ] **MP3 and waveform files generated** by file processing pipeline
- [x] **Frontend transcript page loads** without "Failed to load audio" error
+- [ ] **Frontend transcript page loads** without "Failed to load audio" error
- [x] **Audio player functional** with working playback and waveform visualization
+- [ ] **Audio player functional** with working playback and waveform visualization
 - [ ] **Multitrack processing completed** without errors in worker logs
 - [ ] **Modal GPU backends accessible** (transcription and diarization)
--- a/server/Dockerfile
+++ b/server/Dockerfile
@@ -6,7 +6,7 @@ ENV PYTHONUNBUFFERED=1 \
 # builder install base dependencies
 WORKDIR /tmp
-RUN apt-get update && apt-get install -y curl && apt-get clean
+RUN apt-get update && apt-get install -y curl ffmpeg && apt-get clean
 ADD https://astral.sh/uv/install.sh /uv-installer.sh
 RUN sh /uv-installer.sh && rm /uv-installer.sh
 ENV PATH="/root/.local/bin/:$PATH"
--- a/server/reflector/pipelines/MULTITRACK_FIX_SUMMARY.md
+++ b/server/reflector/pipelines/MULTITRACK_FIX_SUMMARY.md
@@ -0,0 +1,84 @@
 # Multitrack Pipeline Fix Summary
 ## Problem
 Whisper timestamps were incorrect because it ignores leading silence in audio files. Daily.co tracks can have arbitrary amounts of silence before speech starts.
 ## Solution
 **Pad tracks BEFORE transcription using stream metadata `start_time`**
 This makes Whisper timestamps automatically correct relative to recording start.
 ## Key Changes in `main_multitrack_pipeline_fixed.py`
 ### 1. Added `pad_track_for_transcription()` method (lines 55-172)
 ```python
 async def pad_track_for_transcription(
    self,
    track_data: bytes,
    track_idx: int,
    storage,
 ) -> tuple[bytes, str]:
 ```
 - Extracts stream metadata `start_time` using PyAV
 - Creates PyAV filter graph with `adelay` filter to add padding
 - Stores padded track to S3 and returns URL
 - Uses same audio processing library (PyAV) already in the pipeline
 ### 2. Modified `process()` method
 #### REMOVED (lines 255-302):
 - Entire filename parsing for offsets - NOT NEEDED ANYMORE
 - The complex regex parsing of Daily.co filenames
 - Offset adjustment after transcription
 #### ADDED (lines 371-382):
 - Padding step BEFORE transcription:
 ```python
 # PAD TRACKS BEFORE TRANSCRIPTION - THIS IS THE KEY FIX!
 padded_track_urls: list[str] = []
 for idx, data in enumerate(track_datas):
    if not data:
        padded_track_urls.append("")
        continue
    _, padded_url = await self.pad_track_for_transcription(
        data, idx, storage
    )
    padded_track_urls.append(padded_url)
 ```
 #### MODIFIED (lines 385-435):
 - Transcribe PADDED tracks instead of raw tracks
 - Removed all timestamp offset adjustment code
 - Just set speaker ID - timestamps already correct!
 ```python
 # NO OFFSET ADJUSTMENT NEEDED!
 # Timestamps are already correct because we transcribed padded tracks
 # Just set speaker ID
 for w in t.words:
    w.speaker = idx
 ```
 ## Why This Works
 1. **Stream metadata is authoritative**: Daily.co sets `start_time` in the WebM container
 2. **PyAV respects metadata**: `audio_stream.start_time * audio_stream.time_base` gives seconds
 3. **Padding before transcription**: Whisper sees continuous audio from time 0
 4. **Automatic alignment**: Word at 51s in padded track = 51s in recording
 ## Testing
 Process the test recording (daily-20251020193458) and verify:
 - Participant 0 words appear at ~2s
 - Participant 1 words appear at ~51s
 - No word interleaving
 - Correct chronological order
 ## Files
 - **Original**: `main_multitrack_pipeline.py`
 - **Fixed**: `main_multitrack_pipeline_fixed.py`
 - **Test data**: `/Users/firfi/work/clients/monadical/reflector/1760988935484-*.webm`
--- a/server/reflector/pipelines/main_multitrack_pipeline.backup.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline.backup.py
@@ -0,0 +1,510 @@
 import asyncio
 import io
 from fractions import Fraction
 import av
 import boto3
 import structlog
 from av.audio.resampler import AudioResampler
 from celery import chain, shared_task
 from reflector.asynctask import asynctask
 from reflector.db.transcripts import (
    TranscriptStatus,
    TranscriptText,
    transcripts_controller,
 )
 from reflector.logger import logger
 from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed
 from reflector.pipelines.main_live_pipeline import (
    PipelineMainBase,
    task_cleanup_consent,
    task_pipeline_post_to_zulip,
 )
 from reflector.processors import (
    AudioFileWriterProcessor,
    TranscriptFinalSummaryProcessor,
    TranscriptFinalTitleProcessor,
    TranscriptTopicDetectorProcessor,
 )
 from reflector.processors.file_transcript import FileTranscriptInput
 from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
 from reflector.processors.types import TitleSummary
 from reflector.processors.types import (
    Transcript as TranscriptType,
 )
 from reflector.settings import settings
 from reflector.storage import get_transcripts_storage
 class EmptyPipeline:
    def __init__(self, logger: structlog.BoundLogger):
        self.logger = logger
    def get_pref(self, k, d=None):
        return d
    async def emit(self, event):
        pass
 class PipelineMainMultitrack(PipelineMainBase):
    """Process multiple participant tracks for a transcript without mixing audio."""
    def __init__(self, transcript_id: str):
        super().__init__(transcript_id=transcript_id)
        self.logger = logger.bind(transcript_id=self.transcript_id)
        self.empty_pipeline = EmptyPipeline(logger=self.logger)
    async def mixdown_tracks(
        self,
        track_datas: list[bytes],
        writer: AudioFileWriterProcessor,
        offsets_seconds: list[float] | None = None,
    ) -> None:
        """
        Minimal multi-track mixdown using a PyAV filter graph (amix), no resampling.
        """
        # Discover target sample rate from first decodable frame
        target_sample_rate: int | None = None
        for data in track_datas:
            if not data:
                continue
            try:
                container = av.open(io.BytesIO(data))
                try:
                    for frame in container.decode(audio=0):
                        target_sample_rate = frame.sample_rate
                        break
                finally:
                    container.close()
            except Exception:
                continue
            if target_sample_rate:
                break
        if not target_sample_rate:
            self.logger.warning("Mixdown skipped - no decodable audio frames found")
            return
        # Build PyAV filter graph:
        # N abuffer (s32/stereo)
        #   -> optional adelay per input (for alignment)
        #   -> amix (s32)
        #   -> aformat(s16)
        #   -> sink
        graph = av.filter.Graph()
        inputs = []
        valid_track_datas = [d for d in track_datas if d]
        # Align offsets list with the filtered inputs (skip empties)
        input_offsets_seconds = None
        if offsets_seconds is not None:
            input_offsets_seconds = [
                offsets_seconds[i] for i, d in enumerate(track_datas) if d
            ]
        for idx, data in enumerate(valid_track_datas):
            args = (
                f"time_base=1/{target_sample_rate}:"
                f"sample_rate={target_sample_rate}:"
                f"sample_fmt=s32:"
                f"channel_layout=stereo"
            )
            in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
            inputs.append(in_ctx)
        if not inputs:
            self.logger.warning("Mixdown skipped - no valid inputs for graph")
            return
        mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
        fmt = graph.add(
            "aformat",
            args=(
                f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}"
            ),
            name="fmt",
        )
        sink = graph.add("abuffersink", name="out")
        # Optional per-input delay before mixing
        delays_ms: list[int] = []
        if input_offsets_seconds is not None:
            base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
            delays_ms = [
                max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds
            ]
        else:
            delays_ms = [0 for _ in inputs]
        for idx, in_ctx in enumerate(inputs):
            delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
            if delay_ms > 0:
                # adelay requires one value per channel; use same for stereo
                adelay = graph.add(
                    "adelay",
                    args=f"delays={delay_ms}|{delay_ms}:all=1",
                    name=f"delay{idx}",
                )
                in_ctx.link_to(adelay)
                adelay.link_to(mixer, 0, idx)
            else:
                in_ctx.link_to(mixer, 0, idx)
        mixer.link_to(fmt)
        fmt.link_to(sink)
        graph.configure()
        # Open containers for decoding
        containers = []
        for i, d in enumerate(valid_track_datas):
            try:
                c = av.open(io.BytesIO(d))
                containers.append(c)
            except Exception as e:
                self.logger.warning(
                    "Mixdown: failed to open container", input=i, error=str(e)
                )
                containers.append(None)
        # Filter out Nones for decoders
        containers = [c for c in containers if c is not None]
        decoders = [c.decode(audio=0) for c in containers]
        active = [True] * len(decoders)
        # Per-input resamplers to enforce s32/stereo at the same rate (no resample of rate)
        resamplers = [
            AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
            for _ in decoders
        ]
        try:
            # Round-robin feed frames into graph, pull mixed frames as they become available
            while any(active):
                for i, (dec, is_active) in enumerate(zip(decoders, active)):
                    if not is_active:
                        continue
                    try:
                        frame = next(dec)
                    except StopIteration:
                        active[i] = False
                        continue
                    # Enforce same sample rate; convert format/layout to s16/stereo (no resample)
                    if frame.sample_rate != target_sample_rate:
                        # Skip frames with differing rate
                        continue
                    out_frames = resamplers[i].resample(frame) or []
                    for rf in out_frames:
                        rf.sample_rate = target_sample_rate
                        rf.time_base = Fraction(1, target_sample_rate)
                        inputs[i].push(rf)
                    # Drain available mixed frames
                    while True:
                        try:
                            mixed = sink.pull()
                        except Exception:
                            break
                        mixed.sample_rate = target_sample_rate
                        mixed.time_base = Fraction(1, target_sample_rate)
                        await writer.push(mixed)
            # Signal EOF to inputs and drain remaining
            for in_ctx in inputs:
                in_ctx.push(None)
            while True:
                try:
                    mixed = sink.pull()
                except Exception:
                    break
                mixed.sample_rate = target_sample_rate
                mixed.time_base = Fraction(1, target_sample_rate)
                await writer.push(mixed)
        finally:
            for c in containers:
                c.close()
    async def set_status(self, transcript_id: str, status: TranscriptStatus):
        async with self.lock_transaction():
            return await transcripts_controller.set_status(transcript_id, status)
    async def process(self, bucket_name: str, track_keys: list[str]):
        transcript = await self.get_transcript()
        s3 = boto3.client(
            "s3",
            region_name=settings.RECORDING_STORAGE_AWS_REGION,
            aws_access_key_id=settings.RECORDING_STORAGE_AWS_ACCESS_KEY_ID,
            aws_secret_access_key=settings.RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY,
        )
        storage = get_transcripts_storage()
        # Pre-download bytes for all tracks for mixing and transcription
        track_datas: list[bytes] = []
        for key in track_keys:
            try:
                obj = s3.get_object(Bucket=bucket_name, Key=key)
                track_datas.append(obj["Body"].read())
            except Exception as e:
                self.logger.warning(
                    "Skipping track - cannot read S3 object", key=key, error=str(e)
                )
                track_datas.append(b"")
        # Extract offsets from Daily.co filename timestamps
        # Format: {rec_start_ts}-{uuid}-{media_type}-{track_start_ts}.{ext}
        # Example: 1760988935484-uuid-cam-audio-1760988935922
        import re
        offsets_seconds: list[float] = []
        recording_start_ts: int | None = None
        for key in track_keys:
            # Parse Daily.co raw-tracks filename pattern
            match = re.search(r"(\d+)-([0-9a-f-]{36})-(cam-audio)-(\d+)", key)
            if not match:
                self.logger.warning(
                    "Track key doesn't match Daily.co pattern, using 0.0 offset",
                    key=key,
                )
                offsets_seconds.append(0.0)
                continue
            rec_start_ts = int(match.group(1))
            track_start_ts = int(match.group(4))
            # Validate all tracks belong to same recording
            if recording_start_ts is None:
                recording_start_ts = rec_start_ts
            elif rec_start_ts != recording_start_ts:
                self.logger.error(
                    "Track belongs to different recording",
                    key=key,
                    expected_start=recording_start_ts,
                    got_start=rec_start_ts,
                )
                offsets_seconds.append(0.0)
                continue
            # Calculate offset in seconds
            offset_ms = track_start_ts - rec_start_ts
            offset_s = offset_ms / 1000.0
            self.logger.info(
                "Parsed track offset from filename",
                key=key,
                recording_start=rec_start_ts,
                track_start=track_start_ts,
                offset_seconds=offset_s,
            )
            offsets_seconds.append(max(0.0, offset_s))
        # Mixdown all available tracks into transcript.audio_mp3_filename, preserving sample rate
        try:
            mp3_writer = AudioFileWriterProcessor(
                path=str(transcript.audio_mp3_filename)
            )
            await self.mixdown_tracks(track_datas, mp3_writer, offsets_seconds)
            await mp3_writer.flush()
        except Exception as e:
            self.logger.error("Mixdown failed", error=str(e))
        speaker_transcripts: list[TranscriptType] = []
        for idx, key in enumerate(track_keys):
            ext = ".mp4"
            try:
                obj = s3.get_object(Bucket=bucket_name, Key=key)
                data = obj["Body"].read()
            except Exception as e:
                self.logger.error(
                    "Skipping track - cannot read S3 object", key=key, error=str(e)
                )
                continue
            storage_path = f"file_pipeline/{transcript.id}/tracks/track_{idx}{ext}"
            try:
                await storage.put_file(storage_path, data)
                audio_url = await storage.get_file_url(storage_path)
            except Exception as e:
                self.logger.error(
                    "Skipping track - cannot upload to storage", key=key, error=str(e)
                )
                continue
            try:
                t = await self.transcribe_file(audio_url, transcript.source_language)
            except Exception as e:
                self.logger.error(
                    "Transcription via default backend failed, trying local whisper",
                    key=key,
                    url=audio_url,
                    error=str(e),
                )
                try:
                    fallback = FileTranscriptAutoProcessor(name="whisper")
                    result = None
                    async def capture_result(r):
                        nonlocal result
                        result = r
                    fallback.on(capture_result)
                    await fallback.push(
                        FileTranscriptInput(
                            audio_url=audio_url, language=transcript.source_language
                        )
                    )
                    await fallback.flush()
                    if not result:
                        raise Exception("No transcript captured in fallback")
                    t = result
                except Exception as e2:
                    self.logger.error(
                        "Skipping track - transcription failed after fallback",
                        key=key,
                        url=audio_url,
                        error=str(e2),
                    )
                    continue
            if not t.words:
                continue
            # Shift word timestamps by the track's offset so all are relative to 00:00
            track_offset = offsets_seconds[idx] if idx < len(offsets_seconds) else 0.0
            for w in t.words:
                try:
                    if hasattr(w, "start") and w.start is not None:
                        w.start = float(w.start) + track_offset
                    if hasattr(w, "end") and w.end is not None:
                        w.end = float(w.end) + track_offset
                except Exception:
                    pass
                w.speaker = idx
            speaker_transcripts.append(t)
        if not speaker_transcripts:
            raise Exception("No valid track transcriptions")
        merged_words = []
        for t in speaker_transcripts:
            merged_words.extend(t.words)
        merged_words.sort(key=lambda w: w.start)
        merged_transcript = TranscriptType(words=merged_words, translation=None)
        await transcripts_controller.append_event(
            transcript,
            event="TRANSCRIPT",
            data=TranscriptText(
                text=merged_transcript.text, translation=merged_transcript.translation
            ),
        )
        topics = await self.detect_topics(merged_transcript, transcript.target_language)
        await asyncio.gather(
            self.generate_title(topics),
            self.generate_summaries(topics),
            return_exceptions=False,
        )
        await self.set_status(transcript.id, "ended")
    async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
        processor = FileTranscriptAutoProcessor()
        input_data = FileTranscriptInput(audio_url=audio_url, language=language)
        result: TranscriptType | None = None
        async def capture_result(transcript):
            nonlocal result
            result = transcript
        processor.on(capture_result)
        await processor.push(input_data)
        await processor.flush()
        if not result:
            raise ValueError("No transcript captured")
        return result
    async def detect_topics(
        self, transcript: TranscriptType, target_language: str
    ) -> list[TitleSummary]:
        chunk_size = 300
        topics: list[TitleSummary] = []
        async def on_topic(topic: TitleSummary):
            topics.append(topic)
            return await self.on_topic(topic)
        topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
        topic_detector.set_pipeline(self.empty_pipeline)
        for i in range(0, len(transcript.words), chunk_size):
            chunk_words = transcript.words[i : i + chunk_size]
            if not chunk_words:
                continue
            chunk_transcript = TranscriptType(
                words=chunk_words, translation=transcript.translation
            )
            await topic_detector.push(chunk_transcript)
        await topic_detector.flush()
        return topics
    async def generate_title(self, topics: list[TitleSummary]):
        if not topics:
            self.logger.warning("No topics for title generation")
            return
        processor = TranscriptFinalTitleProcessor(callback=self.on_title)
        processor.set_pipeline(self.empty_pipeline)
        for topic in topics:
            await processor.push(topic)
        await processor.flush()
    async def generate_summaries(self, topics: list[TitleSummary]):
        if not topics:
            self.logger.warning("No topics for summary generation")
            return
        transcript = await self.get_transcript()
        processor = TranscriptFinalSummaryProcessor(
            transcript=transcript,
            callback=self.on_long_summary,
            on_short_summary=self.on_short_summary,
        )
        processor.set_pipeline(self.empty_pipeline)
        for topic in topics:
            await processor.push(topic)
        await processor.flush()
@shared_task
@asynctask
 async def task_pipeline_multitrack_process(
    *, transcript_id: str, bucket_name: str, track_keys: list[str]
 ):
    pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
    try:
        await pipeline.set_status(transcript_id, "processing")
        await pipeline.process(bucket_name, track_keys)
    except Exception:
        await pipeline.set_status(transcript_id, "error")
        raise
    post_chain = chain(
        task_cleanup_consent.si(transcript_id=transcript_id),
        task_pipeline_post_to_zulip.si(transcript_id=transcript_id),
        task_send_webhook_if_needed.si(transcript_id=transcript_id),
    )
    post_chain.delay()
--- a/server/reflector/pipelines/main_multitrack_pipeline.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline.py
@@ -12,6 +12,7 @@ from reflector.asynctask import asynctask
 from reflector.db.transcripts import (
    TranscriptStatus,
    TranscriptText,
    TranscriptWaveform,
    transcripts_controller,
 )
 from reflector.logger import logger
@@ -27,6 +28,7 @@ from reflector.processors import (
    TranscriptFinalTitleProcessor,
    TranscriptTopicDetectorProcessor,
 )
 from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
 from reflector.processors.file_transcript import FileTranscriptInput
 from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
 from reflector.processors.types import TitleSummary
@@ -56,6 +58,145 @@ class PipelineMainMultitrack(PipelineMainBase):
        self.logger = logger.bind(transcript_id=self.transcript_id)
        self.empty_pipeline = EmptyPipeline(logger=self.logger)
    async def pad_track_for_transcription(
        self,
        track_data: bytes,
        track_idx: int,
        storage,
    ) -> tuple[bytes, str]:
        """
        Pad a single track with silence based on stream metadata start_time.
        This ensures Whisper timestamps will be relative to recording start.
        Uses ffmpeg subprocess approach proven to work with python-raw-tracks-align.
        Returns: (padded_data, storage_url)
        """
        import json
        import math
        import subprocess
        import tempfile
        if not track_data:
            return b"", ""
        transcript = await self.get_transcript()
        # Create temp files for ffmpeg processing
        with tempfile.NamedTemporaryFile(suffix=".webm", delete=False) as input_file:
            input_file.write(track_data)
            input_file_path = input_file.name
        output_file_path = input_file_path.replace(".webm", "_padded.webm")
        try:
            # Get stream metadata using ffprobe
            ffprobe_cmd = [
                "ffprobe",
                "-v",
                "error",
                "-show_entries",
                "stream=start_time",
                "-of",
                "json",
                input_file_path,
            ]
            result = subprocess.run(
                ffprobe_cmd, capture_output=True, text=True, check=True
            )
            metadata = json.loads(result.stdout)
            # Extract start_time from stream metadata
            start_time_seconds = 0.0
            if metadata.get("streams") and len(metadata["streams"]) > 0:
                start_time_str = metadata["streams"][0].get("start_time", "0")
                start_time_seconds = float(start_time_str)
            self.logger.info(
                f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s",
                track_idx=track_idx,
            )
            # If no padding needed, use original
            if start_time_seconds <= 0:
                storage_path = f"file_pipeline/{transcript.id}/tracks/original_track_{track_idx}.webm"
                await storage.put_file(storage_path, track_data)
                url = await storage.get_file_url(storage_path)
                return track_data, url
            # Calculate delay in milliseconds
            delay_ms = math.floor(start_time_seconds * 1000)
            # Run ffmpeg to pad the audio while maintaining WebM/Opus format for Modal compatibility
            # ffmpeg quirk: aresample needs to come before adelay in the filter chain
            ffmpeg_cmd = [
                "ffmpeg",
                "-hide_banner",
                "-loglevel",
                "error",
                "-y",  # overwrite output
                "-i",
                input_file_path,
                "-af",
                f"aresample=async=1,adelay={delay_ms}:all=true",
                "-c:a",
                "libopus",  # Keep Opus codec for Modal compatibility
                "-b:a",
                "128k",  # Standard bitrate for Opus
                output_file_path,
            ]
            self.logger.info(
                f"Padding track {track_idx} with {delay_ms}ms delay using ffmpeg",
                track_idx=track_idx,
                delay_ms=delay_ms,
                command=" ".join(ffmpeg_cmd),
            )
            result = subprocess.run(ffmpeg_cmd, capture_output=True, text=True)
            if result.returncode != 0:
                self.logger.error(
                    f"ffmpeg padding failed for track {track_idx}",
                    track_idx=track_idx,
                    stderr=result.stderr,
                    returncode=result.returncode,
                )
                raise Exception(f"ffmpeg padding failed: {result.stderr}")
            # Read the padded output
            with open(output_file_path, "rb") as f:
                padded_data = f.read()
            # Store padded track
            storage_path = (
                f"file_pipeline/{transcript.id}/tracks/padded_track_{track_idx}.webm"
            )
            await storage.put_file(storage_path, padded_data)
            padded_url = await storage.get_file_url(storage_path)
            self.logger.info(
                f"Successfully padded track {track_idx} with {start_time_seconds:.3f}s offset, stored at {storage_path}",
                track_idx=track_idx,
                delay_ms=delay_ms,
                padded_url=padded_url,
                padded_size=len(padded_data),
            )
            return padded_data, padded_url
        finally:
            # Clean up temp files
            import os
            try:
                os.unlink(input_file_path)
            except:
                pass
            try:
                os.unlink(output_file_path)
            except:
                pass
    async def mixdown_tracks(
        self,
        track_datas: list[bytes],
@@ -228,6 +369,14 @@ class PipelineMainMultitrack(PipelineMainBase):
        async with self.lock_transaction():
            return await transcripts_controller.set_status(transcript_id, status)
    async def on_waveform(self, data):
        async with self.transaction():
            waveform = TranscriptWaveform(waveform=data)
            transcript = await self.get_transcript()
            return await transcripts_controller.append_event(
                transcript=transcript, event="WAVEFORM", data=waveform
            )
    async def process(self, bucket_name: str, track_keys: list[str]):
        transcript = await self.get_transcript()
@@ -252,64 +401,90 @@ class PipelineMainMultitrack(PipelineMainBase):
                )
                track_datas.append(b"")
-        # Estimate offsets from first frame PTS, aligned to track_keys
+        # PAD TRACKS FIRST - this creates full-length tracks with correct timeline
-        offsets_seconds: list[float] = []
+        padded_track_datas: list[bytes] = []
-        for data, key in zip(track_datas, track_keys):
+        padded_track_urls: list[str] = []
-            off_s = 0.0
+        for idx, data in enumerate(track_datas):
-            if data:
+            if not data:
-                try:
+                padded_track_datas.append(b"")
-                    c = av.open(io.BytesIO(data))
+                padded_track_urls.append("")
-                    try:
+                continue
                        for frame in c.decode(audio=0):
                            if frame.pts is not None and frame.time_base:
                                off_s = float(frame.pts * frame.time_base)
                            break
                    finally:
                        c.close()
                except Exception:
                    pass
            offsets_seconds.append(max(0.0, float(off_s)))
-        # Mixdown all available tracks into transcript.audio_mp3_filename, preserving sample rate
+            padded_data, padded_url = await self.pad_track_for_transcription(
                data, idx, storage
            )
            padded_track_datas.append(padded_data)
            padded_track_urls.append(padded_url)
            self.logger.info(f"Padded track {idx} for transcription: {padded_url}")
        # Mixdown PADDED tracks (already aligned with timeline) into transcript.audio_mp3_filename
        try:
            # Ensure data directory exists
            transcript.data_path.mkdir(parents=True, exist_ok=True)
            mp3_writer = AudioFileWriterProcessor(
                path=str(transcript.audio_mp3_filename)
            )
-            await self.mixdown_tracks(track_datas, mp3_writer, offsets_seconds)
+            # Use PADDED tracks with NO additional offsets (already aligned by padding)
            await self.mixdown_tracks(
                padded_track_datas, mp3_writer, offsets_seconds=None
            )
            await mp3_writer.flush()
        except Exception as e:
            self.logger.error("Mixdown failed", error=str(e))
            # Upload the mixed audio to S3 for web playback
            if transcript.audio_mp3_filename.exists():
                mp3_data = transcript.audio_mp3_filename.read_bytes()
                storage_path = f"{transcript.id}/audio.mp3"
                await storage.put_file(storage_path, mp3_data)
                mp3_url = await storage.get_file_url(storage_path)
                # Update transcript to indicate audio is in storage
                await transcripts_controller.update(
                    transcript, {"audio_location": "storage"}
                )
                self.logger.info(
                    f"Uploaded mixed audio to storage",
                    storage_path=storage_path,
                    size=len(mp3_data),
                    url=mp3_url,
                )
            else:
                self.logger.warning("Mixdown file does not exist after processing")
        except Exception as e:
            self.logger.error("Mixdown failed", error=str(e), exc_info=True)
        # Generate waveform from the mixed audio file
        if transcript.audio_mp3_filename.exists():
            try:
                self.logger.info("Generating waveform from mixed audio")
                waveform_processor = AudioWaveformProcessor(
                    audio_path=transcript.audio_mp3_filename,
                    waveform_path=transcript.audio_waveform_filename,
                    on_waveform=self.on_waveform,
                )
                waveform_processor.set_pipeline(self.empty_pipeline)
                await waveform_processor.flush()
                self.logger.info("Waveform generated successfully")
            except Exception as e:
                self.logger.error(
                    "Waveform generation failed", error=str(e), exc_info=True
                )
        # Transcribe PADDED tracks - timestamps will be automatically correct!
        speaker_transcripts: list[TranscriptType] = []
-        for idx, key in enumerate(track_keys):
+        for idx, padded_url in enumerate(padded_track_urls):
-            ext = ".mp4"
+            if not padded_url:
            try:
                obj = s3.get_object(Bucket=bucket_name, Key=key)
                data = obj["Body"].read()
            except Exception as e:
                self.logger.error(
                    "Skipping track - cannot read S3 object", key=key, error=str(e)
                )
                continue
            storage_path = f"file_pipeline/{transcript.id}/tracks/track_{idx}{ext}"
            try:
                await storage.put_file(storage_path, data)
                audio_url = await storage.get_file_url(storage_path)
            except Exception as e:
                self.logger.error(
                    "Skipping track - cannot upload to storage", key=key, error=str(e)
                )
                continue
            try:
-                t = await self.transcribe_file(audio_url, transcript.source_language)
+                # Transcribe the PADDED track
                t = await self.transcribe_file(padded_url, transcript.source_language)
            except Exception as e:
                self.logger.error(
                    "Transcription via default backend failed, trying local whisper",
-                    key=key,
+                    track_idx=idx,
-                    url=audio_url,
+                    url=padded_url,
                    error=str(e),
                )
                try:
@@ -323,7 +498,7 @@ class PipelineMainMultitrack(PipelineMainBase):
                    fallback.on(capture_result)
                    await fallback.push(
                        FileTranscriptInput(
-                            audio_url=audio_url, language=transcript.source_language
+                            audio_url=padded_url, language=transcript.source_language
                        )
                    )
                    await fallback.flush()
@@ -333,34 +508,37 @@ class PipelineMainMultitrack(PipelineMainBase):
                except Exception as e2:
                    self.logger.error(
                        "Skipping track - transcription failed after fallback",
-                        key=key,
+                        track_idx=idx,
-                        url=audio_url,
+                        url=padded_url,
                        error=str(e2),
                    )
                    continue
            if not t.words:
                continue
-            # Shift word timestamps by the track's offset so all are relative to 00:00
+
-            track_offset = offsets_seconds[idx] if idx < len(offsets_seconds) else 0.0
+            # NO OFFSET ADJUSTMENT NEEDED!
            # Timestamps are already correct because we transcribed padded tracks
            # Just set speaker ID
            for w in t.words:
                try:
                    if hasattr(w, "start") and w.start is not None:
                        w.start = float(w.start) + track_offset
                    if hasattr(w, "end") and w.end is not None:
                        w.end = float(w.end) + track_offset
                except Exception:
                    pass
                w.speaker = idx
            speaker_transcripts.append(t)
            self.logger.info(
                f"Track {idx} transcribed successfully with {len(t.words)} words",
                track_idx=idx,
            )
        if not speaker_transcripts:
            raise Exception("No valid track transcriptions")
        # Merge all words and sort by timestamp
        merged_words = []
        for t in speaker_transcripts:
            merged_words.extend(t.words)
-        merged_words.sort(key=lambda w: w.start)
+        merged_words.sort(
            key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0
        )
        merged_transcript = TranscriptType(words=merged_words, translation=None)
--- a/server/reflector/pipelines/main_multitrack_pipeline_fixed.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline_fixed.py
@@ -0,0 +1,629 @@
 import asyncio
 import io
 from fractions import Fraction
 import av
 import boto3
 import structlog
 from av.audio.resampler import AudioResampler
 from celery import chain, shared_task
 from reflector.asynctask import asynctask
 from reflector.db.transcripts import (
    TranscriptStatus,
    TranscriptText,
    transcripts_controller,
 )
 from reflector.logger import logger
 from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed
 from reflector.pipelines.main_live_pipeline import (
    PipelineMainBase,
    task_cleanup_consent,
    task_pipeline_post_to_zulip,
 )
 from reflector.processors import (
    AudioFileWriterProcessor,
    TranscriptFinalSummaryProcessor,
    TranscriptFinalTitleProcessor,
    TranscriptTopicDetectorProcessor,
 )
 from reflector.processors.file_transcript import FileTranscriptInput
 from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
 from reflector.processors.types import TitleSummary
 from reflector.processors.types import (
    Transcript as TranscriptType,
 )
 from reflector.settings import settings
 from reflector.storage import get_transcripts_storage
 class EmptyPipeline:
    def __init__(self, logger: structlog.BoundLogger):
        self.logger = logger
    def get_pref(self, k, d=None):
        return d
    async def emit(self, event):
        pass
 class PipelineMainMultitrack(PipelineMainBase):
    """Process multiple participant tracks for a transcript without mixing audio."""
    def __init__(self, transcript_id: str):
        super().__init__(transcript_id=transcript_id)
        self.logger = logger.bind(transcript_id=self.transcript_id)
        self.empty_pipeline = EmptyPipeline(logger=self.logger)
    async def pad_track_for_transcription(
        self,
        track_data: bytes,
        track_idx: int,
        storage,
    ) -> tuple[bytes, str]:
        """
        Pad a single track with silence based on stream metadata start_time.
        This ensures Whisper timestamps will be relative to recording start.
        Returns: (padded_data, storage_url)
        """
        if not track_data:
            return b"", ""
        transcript = await self.get_transcript()
        # Get stream metadata start_time using PyAV
        container = av.open(io.BytesIO(track_data))
        try:
            audio_stream = container.streams.audio[0]
            # Extract start_time from stream metadata
            if (
                audio_stream.start_time is not None
                and audio_stream.time_base is not None
            ):
                start_time_seconds = float(
                    audio_stream.start_time * audio_stream.time_base
                )
            else:
                start_time_seconds = 0.0
            sample_rate = audio_stream.sample_rate
            codec_name = audio_stream.codec.name
        finally:
            container.close()
        self.logger.info(
            f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s, sample_rate={sample_rate}",
            track_idx=track_idx,
        )
        # If no padding needed, use original
        if start_time_seconds <= 0:
            storage_path = (
                f"file_pipeline/{transcript.id}/tracks/original_track_{track_idx}.webm"
            )
            await storage.put_file(storage_path, track_data)
            url = await storage.get_file_url(storage_path)
            return track_data, url
        # Create PyAV filter graph for padding
        graph = av.filter.Graph()
        # Input buffer
        in_args = (
            f"time_base=1/{sample_rate}:"
            f"sample_rate={sample_rate}:"
            f"sample_fmt=s16:"
            f"channel_layout=stereo"
        )
        input_buffer = graph.add("abuffer", args=in_args, name="in")
        # Add delay filter for padding
        delay_ms = int(start_time_seconds * 1000)
        delay_filter = graph.add(
            "adelay", args=f"delays={delay_ms}|{delay_ms}:all=1", name="delay"
        )
        # Output sink
        sink = graph.add("abuffersink", name="out")
        # Link filters
        input_buffer.link_to(delay_filter)
        delay_filter.link_to(sink)
        graph.configure()
        # Process audio through filter
        output_bytes = io.BytesIO()
        output_container = av.open(output_bytes, "w", format="webm")
        output_stream = output_container.add_stream("libopus", rate=sample_rate)
        output_stream.channels = 2
        # Reopen input for processing
        input_container = av.open(io.BytesIO(track_data))
        resampler = AudioResampler(format="s16", layout="stereo", rate=sample_rate)
        try:
            # Process frames
            for frame in input_container.decode(audio=0):
                # Resample to match filter requirements
                resampled_frames = resampler.resample(frame)
                for resampled_frame in resampled_frames:
                    resampled_frame.pts = frame.pts
                    resampled_frame.time_base = Fraction(1, sample_rate)
                    input_buffer.push(resampled_frame)
                    # Pull from filter and encode
                    while True:
                        try:
                            out_frame = sink.pull()
                            out_frame.pts = out_frame.pts if out_frame.pts else 0
                            out_frame.time_base = Fraction(1, sample_rate)
                            for packet in output_stream.encode(out_frame):
                                output_container.mux(packet)
                        except av.BlockingIOError:
                            break
            # Flush
            input_buffer.push(None)
            while True:
                try:
                    out_frame = sink.pull()
                    for packet in output_stream.encode(out_frame):
                        output_container.mux(packet)
                except (av.BlockingIOError, av.EOFError):
                    break
            # Flush encoder
            for packet in output_stream.encode(None):
                output_container.mux(packet)
        finally:
            input_container.close()
            output_container.close()
        padded_data = output_bytes.getvalue()
        # Store padded track
        storage_path = (
            f"file_pipeline/{transcript.id}/tracks/padded_track_{track_idx}.webm"
        )
        await storage.put_file(storage_path, padded_data)
        padded_url = await storage.get_file_url(storage_path)
        self.logger.info(
            f"Padded track {track_idx} with {start_time_seconds:.3f}s offset, stored at {storage_path}",
            track_idx=track_idx,
            delay_ms=delay_ms,
            padded_url=padded_url,
        )
        return padded_data, padded_url
    async def mixdown_tracks(
        self,
        track_datas: list[bytes],
        writer: AudioFileWriterProcessor,
        offsets_seconds: list[float] | None = None,
    ) -> None:
        """
        Minimal multi-track mixdown using a PyAV filter graph (amix), no resampling.
        """
        # Discover target sample rate from first decodable frame
        target_sample_rate: int | None = None
        for data in track_datas:
            if not data:
                continue
            try:
                container = av.open(io.BytesIO(data))
                try:
                    for frame in container.decode(audio=0):
                        target_sample_rate = frame.sample_rate
                        break
                finally:
                    container.close()
            except Exception:
                continue
            if target_sample_rate:
                break
        if not target_sample_rate:
            self.logger.warning("Mixdown skipped - no decodable audio frames found")
            return
        # Build PyAV filter graph:
        # N abuffer (s32/stereo)
        #   -> optional adelay per input (for alignment)
        #   -> amix (s32)
        #   -> aformat(s16)
        #   -> sink
        graph = av.filter.Graph()
        inputs = []
        valid_track_datas = [d for d in track_datas if d]
        # Align offsets list with the filtered inputs (skip empties)
        input_offsets_seconds = None
        if offsets_seconds is not None:
            input_offsets_seconds = [
                offsets_seconds[i] for i, d in enumerate(track_datas) if d
            ]
        for idx, data in enumerate(valid_track_datas):
            args = (
                f"time_base=1/{target_sample_rate}:"
                f"sample_rate={target_sample_rate}:"
                f"sample_fmt=s32:"
                f"channel_layout=stereo"
            )
            in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
            inputs.append(in_ctx)
        if not inputs:
            self.logger.warning("Mixdown skipped - no valid inputs for graph")
            return
        mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
        fmt = graph.add(
            "aformat",
            args=(
                f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}"
            ),
            name="fmt",
        )
        sink = graph.add("abuffersink", name="out")
        # Optional per-input delay before mixing
        delays_ms: list[int] = []
        if input_offsets_seconds is not None:
            base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
            delays_ms = [
                max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds
            ]
        else:
            delays_ms = [0 for _ in inputs]
        for idx, in_ctx in enumerate(inputs):
            delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
            if delay_ms > 0:
                # adelay requires one value per channel; use same for stereo
                adelay = graph.add(
                    "adelay",
                    args=f"delays={delay_ms}|{delay_ms}:all=1",
                    name=f"delay{idx}",
                )
                in_ctx.link_to(adelay)
                adelay.link_to(mixer, 0, idx)
            else:
                in_ctx.link_to(mixer, 0, idx)
        mixer.link_to(fmt)
        fmt.link_to(sink)
        graph.configure()
        # Open containers for decoding
        containers = []
        for i, d in enumerate(valid_track_datas):
            try:
                c = av.open(io.BytesIO(d))
                containers.append(c)
            except Exception as e:
                self.logger.warning(
                    "Mixdown: failed to open container", input=i, error=str(e)
                )
                containers.append(None)
        # Filter out Nones for decoders
        containers = [c for c in containers if c is not None]
        decoders = [c.decode(audio=0) for c in containers]
        active = [True] * len(decoders)
        # Per-input resamplers to enforce s32/stereo at the same rate (no resample of rate)
        resamplers = [
            AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
            for _ in decoders
        ]
        try:
            # Round-robin feed frames into graph, pull mixed frames as they become available
            while any(active):
                for i, (dec, is_active) in enumerate(zip(decoders, active)):
                    if not is_active:
                        continue
                    try:
                        frame = next(dec)
                    except StopIteration:
                        active[i] = False
                        continue
                    # Enforce same sample rate; convert format/layout to s16/stereo (no resample)
                    if frame.sample_rate != target_sample_rate:
                        # Skip frames with differing rate
                        continue
                    out_frames = resamplers[i].resample(frame) or []
                    for rf in out_frames:
                        rf.sample_rate = target_sample_rate
                        rf.time_base = Fraction(1, target_sample_rate)
                        inputs[i].push(rf)
                    # Drain available mixed frames
                    while True:
                        try:
                            mixed = sink.pull()
                        except Exception:
                            break
                        mixed.sample_rate = target_sample_rate
                        mixed.time_base = Fraction(1, target_sample_rate)
                        await writer.push(mixed)
            # Signal EOF to inputs and drain remaining
            for in_ctx in inputs:
                in_ctx.push(None)
            while True:
                try:
                    mixed = sink.pull()
                except Exception:
                    break
                mixed.sample_rate = target_sample_rate
                mixed.time_base = Fraction(1, target_sample_rate)
                await writer.push(mixed)
        finally:
            for c in containers:
                c.close()
    async def set_status(self, transcript_id: str, status: TranscriptStatus):
        async with self.lock_transaction():
            return await transcripts_controller.set_status(transcript_id, status)
    async def process(self, bucket_name: str, track_keys: list[str]):
        transcript = await self.get_transcript()
        s3 = boto3.client(
            "s3",
            region_name=settings.RECORDING_STORAGE_AWS_REGION,
            aws_access_key_id=settings.RECORDING_STORAGE_AWS_ACCESS_KEY_ID,
            aws_secret_access_key=settings.RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY,
        )
        storage = get_transcripts_storage()
        # Pre-download bytes for all tracks for mixing and transcription
        track_datas: list[bytes] = []
        for key in track_keys:
            try:
                obj = s3.get_object(Bucket=bucket_name, Key=key)
                track_datas.append(obj["Body"].read())
            except Exception as e:
                self.logger.warning(
                    "Skipping track - cannot read S3 object", key=key, error=str(e)
                )
                track_datas.append(b"")
        # REMOVED: Filename offset extraction - not needed anymore!
        # We use stream metadata start_time for padding instead
        # Get stream metadata start_times for mixing (still useful for mixdown)
        stream_start_times: list[float] = []
        for data in track_datas:
            if not data:
                stream_start_times.append(0.0)
                continue
            container = av.open(io.BytesIO(data))
            try:
                audio_stream = container.streams.audio[0]
                if (
                    audio_stream.start_time is not None
                    and audio_stream.time_base is not None
                ):
                    start_time = float(audio_stream.start_time * audio_stream.time_base)
                else:
                    start_time = 0.0
                stream_start_times.append(start_time)
            finally:
                container.close()
        # Mixdown all available tracks into transcript.audio_mp3_filename, using stream metadata offsets
        try:
            mp3_writer = AudioFileWriterProcessor(
                path=str(transcript.audio_mp3_filename)
            )
            await self.mixdown_tracks(track_datas, mp3_writer, stream_start_times)
            await mp3_writer.flush()
        except Exception as e:
            self.logger.error("Mixdown failed", error=str(e))
        # PAD TRACKS BEFORE TRANSCRIPTION - THIS IS THE KEY FIX!
        padded_track_urls: list[str] = []
        for idx, data in enumerate(track_datas):
            if not data:
                padded_track_urls.append("")
                continue
            _, padded_url = await self.pad_track_for_transcription(data, idx, storage)
            padded_track_urls.append(padded_url)
            self.logger.info(f"Padded track {idx} for transcription: {padded_url}")
        # Transcribe PADDED tracks - timestamps will be automatically correct!
        speaker_transcripts: list[TranscriptType] = []
        for idx, padded_url in enumerate(padded_track_urls):
            if not padded_url:
                continue
            try:
                # Transcribe the PADDED track
                t = await self.transcribe_file(padded_url, transcript.source_language)
            except Exception as e:
                self.logger.error(
                    "Transcription via default backend failed, trying local whisper",
                    track_idx=idx,
                    url=padded_url,
                    error=str(e),
                )
                try:
                    fallback = FileTranscriptAutoProcessor(name="whisper")
                    result = None
                    async def capture_result(r):
                        nonlocal result
                        result = r
                    fallback.on(capture_result)
                    await fallback.push(
                        FileTranscriptInput(
                            audio_url=padded_url, language=transcript.source_language
                        )
                    )
                    await fallback.flush()
                    if not result:
                        raise Exception("No transcript captured in fallback")
                    t = result
                except Exception as e2:
                    self.logger.error(
                        "Skipping track - transcription failed after fallback",
                        track_idx=idx,
                        url=padded_url,
                        error=str(e2),
                    )
                    continue
            if not t.words:
                continue
            # NO OFFSET ADJUSTMENT NEEDED!
            # Timestamps are already correct because we transcribed padded tracks
            # Just set speaker ID
            for w in t.words:
                w.speaker = idx
            speaker_transcripts.append(t)
            self.logger.info(
                f"Track {idx} transcribed successfully with {len(t.words)} words",
                track_idx=idx,
            )
        if not speaker_transcripts:
            raise Exception("No valid track transcriptions")
        # Merge all words and sort by timestamp
        merged_words = []
        for t in speaker_transcripts:
            merged_words.extend(t.words)
        merged_words.sort(
            key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0
        )
        merged_transcript = TranscriptType(words=merged_words, translation=None)
        await transcripts_controller.append_event(
            transcript,
            event="TRANSCRIPT",
            data=TranscriptText(
                text=merged_transcript.text, translation=merged_transcript.translation
            ),
        )
        topics = await self.detect_topics(merged_transcript, transcript.target_language)
        await asyncio.gather(
            self.generate_title(topics),
            self.generate_summaries(topics),
            return_exceptions=False,
        )
        await self.set_status(transcript.id, "ended")
    async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
        processor = FileTranscriptAutoProcessor()
        input_data = FileTranscriptInput(audio_url=audio_url, language=language)
        result: TranscriptType | None = None
        async def capture_result(transcript):
            nonlocal result
            result = transcript
        processor.on(capture_result)
        await processor.push(input_data)
        await processor.flush()
        if not result:
            raise ValueError("No transcript captured")
        return result
    async def detect_topics(
        self, transcript: TranscriptType, target_language: str
    ) -> list[TitleSummary]:
        chunk_size = 300
        topics: list[TitleSummary] = []
        async def on_topic(topic: TitleSummary):
            topics.append(topic)
            return await self.on_topic(topic)
        topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
        topic_detector.set_pipeline(self.empty_pipeline)
        for i in range(0, len(transcript.words), chunk_size):
            chunk_words = transcript.words[i : i + chunk_size]
            if not chunk_words:
                continue
            chunk_transcript = TranscriptType(
                words=chunk_words, translation=transcript.translation
            )
            await topic_detector.push(chunk_transcript)
        await topic_detector.flush()
        return topics
    async def generate_title(self, topics: list[TitleSummary]):
        if not topics:
            self.logger.warning("No topics for title generation")
            return
        processor = TranscriptFinalTitleProcessor(callback=self.on_title)
        processor.set_pipeline(self.empty_pipeline)
        for topic in topics:
            await processor.push(topic)
        await processor.flush()
    async def generate_summaries(self, topics: list[TitleSummary]):
        if not topics:
            self.logger.warning("No topics for summary generation")
            return
        transcript = await self.get_transcript()
        processor = TranscriptFinalSummaryProcessor(
            transcript=transcript,
            callback=self.on_long_summary,
            on_short_summary=self.on_short_summary,
        )
        processor.set_pipeline(self.empty_pipeline)
        for topic in topics:
            await processor.push(topic)
        await processor.flush()
@shared_task
@asynctask
 async def task_pipeline_multitrack_process(
    *, transcript_id: str, bucket_name: str, track_keys: list[str]
 ):
    pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
    try:
        await pipeline.set_status(transcript_id, "processing")
        await pipeline.process(bucket_name, track_keys)
    except Exception:
        await pipeline.set_status(transcript_id, "error")
        raise
    post_chain = chain(
        task_cleanup_consent.si(transcript_id=transcript_id),
        task_pipeline_post_to_zulip.si(transcript_id=transcript_id),
        task_send_webhook_if_needed.si(transcript_id=transcript_id),
    )
    post_chain.delay()
--- a/server/reflector/processors/transcript_topic_detector.py
+++ b/server/reflector/processors/transcript_topic_detector.py
@@ -1,6 +1,6 @@
 from textwrap import dedent
-from pydantic import BaseModel, Field
+from pydantic import BaseModel, ConfigDict, Field
 from reflector.llm import LLM
 from reflector.processors.base import Processor
@@ -34,8 +34,14 @@ TOPIC_PROMPT = dedent(
 class TopicResponse(BaseModel):
    """Structured response for topic detection"""
-    title: str = Field(description="A descriptive title for the topic being discussed")
+    model_config = ConfigDict(populate_by_name=True)
-    summary: str = Field(description="A concise 1-2 sentence summary of the discussion")
+
    title: str = Field(
        description="A descriptive title for the topic being discussed", alias="Title"
    )
    summary: str = Field(
        description="A concise 1-2 sentence summary of the discussion", alias="Summary"
    )
 class TranscriptTopicDetectorProcessor(Processor):
--- a/server/reprocess_transcript.py
+++ b/server/reprocess_transcript.py
@@ -0,0 +1,65 @@
 #!/usr/bin/env python
 """
 Reprocess the Daily.co multitrack recording to fix audio mixdown
 """
 import asyncio
 from reflector.pipelines.main_multitrack_pipeline import (
    task_pipeline_multitrack_process,
 )
 async def reprocess():
    """Process the multitrack recording with fixed mixdown"""
    bucket_name = "reflector-dailyco-local"
    track_keys = [
        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
    ]
    # Create a new transcript with fixed mixdown
    import uuid
    from reflector.db import get_database
    from reflector.db.transcripts import Transcript, transcripts
    db = get_database()
    await db.connect()
    try:
        transcript_id = str(uuid.uuid4())
        transcript = Transcript(
            id=transcript_id,
            name="Daily Multitrack - With Audio Mixdown",
            source_kind="file",
            source_language="en",
            target_language="en",
            status="idle",
            events=[],
            title="",
        )
        query = transcripts.insert().values(**transcript.model_dump())
        await db.execute(query)
        print(f"Created transcript: {transcript_id}")
        # Process with the fixed pipeline
        await task_pipeline_multitrack_process(
            transcript_id=transcript_id, bucket_name=bucket_name, track_keys=track_keys
        )
        print(
            f"Processing complete! Check: http://localhost:3000/transcripts/{transcript_id}"
        )
        return transcript_id
    finally:
        await db.disconnect()
 if __name__ == "__main__":
    transcript_id = asyncio.run(reprocess())
    print(f"\n✅ Reprocessing complete!")
    print(f"📍 View at: http://localhost:3000/transcripts/{transcript_id}")
--- a/server/test_multitrack_ffmpeg.py
+++ b/server/test_multitrack_ffmpeg.py
@@ -0,0 +1,124 @@
 #!/usr/bin/env python
 """
 Test script to trigger multitrack recording processing with ffmpeg padding fix
 """
 import asyncio
 from reflector.pipelines.main_multitrack_pipeline import PipelineMainMultitrack
 async def test_processing():
    """Manually trigger multitrack processing for the test recording"""
    # Initialize database connection
    from reflector.db import get_database
    db = get_database()
    await db.connect()
    try:
        # The test recording with known speaker timeline
        bucket_name = "monadical"
        track_keys = [
            "daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm",
            "daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823.webm",
        ]
        # Create a new transcript ID
        import uuid
        transcript_id = str(uuid.uuid4())
        # Create transcript directly with SQL
        from reflector.db.transcripts import (
            Transcript,
            transcripts,
            transcripts_controller,
        )
        pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
        # Create transcript model
        transcript = Transcript(
            id=transcript_id,
            name="FFMPEG Test - Daily Multitrack Recording",
            source_kind="file",
            source_language="en",
            target_language="en",
            status="idle",
            events=[],
            title="",
        )
        # Insert into database
        query = transcripts.insert().values(**transcript.model_dump())
        await db.execute(query)
        print(f"Created transcript: {transcript_id}")
        # Process the tracks using the pipeline
        print(f"Processing multitrack recording with ffmpeg padding...")
        print(f"Track 0: ...935922.webm (expected to start at ~2s)")
        print(f"Track 1: ...943823.webm (expected to start at ~51s)")
        try:
            await pipeline.set_status(transcript_id, "processing")
            await pipeline.process(bucket_name, track_keys)
            print(f"Processing complete!")
        except Exception as e:
            await pipeline.set_status(transcript_id, "error")
            print(f"Error during processing: {e}")
            import traceback
            traceback.print_exc()
            raise
        # Check the results
        final_transcript = await transcripts_controller.get(transcript_id)
        print(f"\nTranscript status: {final_transcript.status}")
        print(f"Transcript title: {final_transcript.title}")
        # Extract timeline from events
        if final_transcript.events:
            for event in final_transcript.events:
                if event.get("event") == "TRANSCRIPT":
                    text = event.get("data", {}).get("text", "")
                    # Show first 500 chars to check if speakers are properly separated
                    print(f"\nTranscript text (first 500 chars):")
                    print(text[:500])
                    # Show last 500 chars too to see if second speaker is at the end
                    print(f"\nTranscript text (last 500 chars):")
                    print(text[-500:])
                    # Count words per speaker
                    words = text.split()
                    print(f"\nTotal words in transcript: {len(words)}")
                    # Check if text has proper speaker separation
                    # Expected: First ~45% from speaker 0, then ~35% from speaker 1, then ~20% from speaker 0
                    first_third = " ".join(words[: len(words) // 3])
                    middle_third = " ".join(
                        words[len(words) // 3 : 2 * len(words) // 3]
                    )
                    last_third = " ".join(words[2 * len(words) // 3 :])
                    print(f"\nFirst third preview: {first_third[:100]}...")
                    print(f"Middle third preview: {middle_third[:100]}...")
                    print(f"Last third preview: {last_third[:100]}...")
                    break
        return transcript_id
    finally:
        await db.disconnect()
 if __name__ == "__main__":
    transcript_id = asyncio.run(test_processing())
    print(f"\n✅ Test complete! Transcript ID: {transcript_id}")
    print(f"\nExpected timeline:")
    print(f"  Speaker 0: ~2s to ~49s (first participant speaks)")
    print(f"  Speaker 1: ~51s to ~70s (second participant speaks)")
    print(f"  Speaker 0: ~73s to end (first participant speaks again)")
    print(
        f"\nIf the text shows proper chronological order (not interleaved), the fix worked!"
    )
--- a/server/test_multitrack_ffmpeg_local.py
+++ b/server/test_multitrack_ffmpeg_local.py
@@ -0,0 +1,162 @@
 #!/usr/bin/env python
 """
 Test script to trigger multitrack recording processing with ffmpeg padding fix
 This version loads tracks from local filesystem instead of S3
 """
 import asyncio
 import os
 from reflector.pipelines.main_multitrack_pipeline import PipelineMainMultitrack
 async def test_processing():
    """Manually trigger multitrack processing for the test recording"""
    # Initialize database connection
    from reflector.db import get_database
    db = get_database()
    await db.connect()
    try:
        # Create a new transcript ID
        import uuid
        transcript_id = str(uuid.uuid4())
        # Create transcript directly with SQL
        from reflector.db.transcripts import (
            Transcript,
            transcripts,
            transcripts_controller,
        )
        pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
        # Create transcript model
        transcript = Transcript(
            id=transcript_id,
            name="FFMPEG Test - Daily Multitrack Recording",
            source_kind="file",
            source_language="en",
            target_language="en",
            status="idle",
            events=[],
            title="",
        )
        # Insert into database
        query = transcripts.insert().values(**transcript.model_dump())
        await db.execute(query)
        print(f"Created transcript: {transcript_id}")
        # Read track files from local filesystem (in the container they'll be at /app/)
        tracks_dir = "/app"
        track_files = [
            "1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm",
            "1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823.webm",
        ]
        # Read track data
        track_datas = []
        for track_file in track_files:
            file_path = os.path.join(tracks_dir, track_file)
            if os.path.exists(file_path):
                with open(file_path, "rb") as f:
                    track_datas.append(f.read())
                print(f"Loaded track: {track_file} ({len(track_datas[-1])} bytes)")
            else:
                print(f"Track file not found: {file_path}")
                track_datas.append(b"")
        # Process the tracks using the pipeline
        print(f"\nProcessing multitrack recording with ffmpeg padding...")
        print(f"Track 0: ...935922.webm (expected to start at ~2s)")
        print(f"Track 1: ...943823.webm (expected to start at ~51s)")
        # Call the process method directly with track data
        # We'll need to mock S3 operations and directly work with the data
        # Save tracks to temporary files and process them
        try:
            await pipeline.set_status(transcript_id, "processing")
            # Create a mock bucket and keys setup
            bucket_name = "test-bucket"
            track_keys = ["track0.webm", "track1.webm"]
            # Mock S3 client to return our local data
            from unittest.mock import MagicMock, patch
            mock_s3 = MagicMock()
            def mock_get_object(Bucket, Key):
                idx = 0 if "track0" in Key else 1
                return {"Body": MagicMock(read=lambda: track_datas[idx])}
            mock_s3.get_object = mock_get_object
            # Patch boto3.client to return our mock
            with patch("boto3.client", return_value=mock_s3):
                await pipeline.process(bucket_name, track_keys)
            print(f"Processing complete!")
        except Exception as e:
            await pipeline.set_status(transcript_id, "error")
            print(f"Error during processing: {e}")
            import traceback
            traceback.print_exc()
            raise
        # Check the results
        final_transcript = await transcripts_controller.get(transcript_id)
        print(f"\nTranscript status: {final_transcript.status}")
        print(f"Transcript title: {final_transcript.title}")
        # Extract timeline from events
        if final_transcript.events:
            for event in final_transcript.events:
                if event.get("event") == "TRANSCRIPT":
                    text = event.get("data", {}).get("text", "")
                    # Show first 500 chars to check if speakers are properly separated
                    print(f"\nTranscript text (first 500 chars):")
                    print(text[:500])
                    # Show last 500 chars too to see if second speaker is at the end
                    print(f"\nTranscript text (last 500 chars):")
                    print(text[-500:])
                    # Count words per speaker
                    words = text.split()
                    print(f"\nTotal words in transcript: {len(words)}")
                    # Check if text has proper speaker separation
                    # Expected: First ~45% from speaker 0, then ~35% from speaker 1, then ~20% from speaker 0
                    first_third = " ".join(words[: len(words) // 3])
                    middle_third = " ".join(
                        words[len(words) // 3 : 2 * len(words) // 3]
                    )
                    last_third = " ".join(words[2 * len(words) // 3 :])
                    print(f"\nFirst third preview: {first_third[:100]}...")
                    print(f"Middle third preview: {middle_third[:100]}...")
                    print(f"Last third preview: {last_third[:100]}...")
                    break
        return transcript_id
    finally:
        await db.disconnect()
 if __name__ == "__main__":
    transcript_id = asyncio.run(test_processing())
    print(f"\n✅ Test complete! Transcript ID: {transcript_id}")
    print(f"\nExpected timeline:")
    print(f"  Speaker 0: ~2s to ~49s (first participant speaks)")
    print(f"  Speaker 1: ~51s to ~70s (second participant speaks)")
    print(f"  Speaker 0: ~73s to end (first participant speaks again)")
    print(
        f"\nIf the text shows proper chronological order (not interleaved), the fix worked!"
    )
--- a/server/test_s3_multitrack.py
+++ b/server/test_s3_multitrack.py
@@ -0,0 +1,66 @@
 #!/usr/bin/env python
 """
 Test multitrack processing with correct S3 bucket configuration
 """
 import asyncio
 import uuid
 from reflector.db import get_database
 from reflector.db.transcripts import Transcript, transcripts
 from reflector.pipelines.main_multitrack_pipeline import (
    task_pipeline_multitrack_process,
 )
 async def create_and_process():
    """Create a new transcript and process with correct S3 bucket"""
    # Correct S3 configuration
    bucket_name = "reflector-dailyco-local"
    track_keys = [
        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
    ]
    # Create a new transcript
    db = get_database()
    await db.connect()
    try:
        transcript_id = str(uuid.uuid4())
        transcript = Transcript(
            id=transcript_id,
            name="Daily Multitrack - Correct S3 Bucket Test",
            source_kind="file",
            source_language="en",
            target_language="en",
            status="idle",
            events=[],
            title="",
        )
        query = transcripts.insert().values(**transcript.model_dump())
        await db.execute(query)
        print(f"Created transcript: {transcript_id}")
        # Trigger processing with Celery
        result = task_pipeline_multitrack_process.delay(
            transcript_id=transcript_id, bucket_name=bucket_name, track_keys=track_keys
        )
        print(f"Task ID: {result.id}")
        print(
            f"Processing started! Check: http://localhost:3000/transcripts/{transcript_id}"
        )
        print(f"API Status: http://localhost:1250/v1/transcripts/{transcript_id}")
        return transcript_id
    finally:
        await db.disconnect()
 if __name__ == "__main__":
    transcript_id = asyncio.run(create_and_process())
    print(f"\n✅ Task submitted successfully!")
    print(f"📍 Transcript ID: {transcript_id}")
--- a/server/trigger_reprocess.py
+++ b/server/trigger_reprocess.py
@@ -0,0 +1,23 @@
 #!/usr/bin/env python
 """
 Trigger reprocessing of Daily.co multitrack recording via Celery
 """
 from reflector.pipelines.main_multitrack_pipeline import (
    task_pipeline_multitrack_process,
 )
 # Trigger the Celery task
 result = task_pipeline_multitrack_process.delay(
    transcript_id="32fad706-f8cf-434c-94c8-1ee69f7be081",  # The ID that was created
    bucket_name="reflector-dailyco-local",
    track_keys=[
        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
    ],
 )
 print(f"Task ID: {result.id}")
 print(
    f"Processing started! Check: http://localhost:3000/transcripts/32fad706-f8cf-434c-94c8-1ee69f7be081"
 )