dailico track merge vibe

2026-02-04 09:56:47 +00:00 · 2025-10-21 10:30:19 -04:00
parent f844b9fc1f
commit 7d239fe380
12 changed files with 1993 additions and 124 deletions
--- a/server/DAILYCO_TEST.md
+++ b/server/DAILYCO_TEST.md
@@ -1,27 +1,27 @@
 # Daily.co Integration Test Plan

-## ⚠️ IMPORTANT: Stub Implementation
+## ✅ IMPLEMENTATION STATUS: Real Transcription Active

-**This test validates Daily.co webhook integration with MOCK transcription data.**
+**This test validates Daily.co multitrack recording integration with REAL transcription/diarization.**

-The actual audio/video files are recorded to S3, but transcription/diarization is NOT performed. Instead:
- A **stub processor** generates fake transcript with predetermined text ("The Great Fish Eating Argument")
- **Audio track is downloaded from Daily.co S3** to local storage for playback in the frontend
- All database entities (recording, transcript, topics, participants, words) are created with **fake "fish" conversation data**
- This allows testing the complete webhook → database flow WITHOUT expensive GPU processing
+The implementation includes complete audio processing pipeline:
+- **Multitrack recordings** from Daily.co S3 (separate audio stream per participant)
+- **PyAV-based audio mixdown** with PTS-based track alignment
+- **Real transcription** via Modal GPU backend (Whisper)
+- **Real diarization** via Modal GPU backend (speaker identification)
+- **Per-track transcription** with timestamp synchronization
+- **Complete database entities** (recording, transcript, topics, participants, words)

-**Expected transcript content:**
- Title: "The Great Fish Eating Argument"
- Participants: "Fish Eater" (speaker 0), "Annoying Person" (speaker 1)
- Transcription: Nonsensical argument about eating fish (see `reflector/worker/daily_stub_data.py`)
- Audio file: Downloaded WebM from Daily.co S3 (stored in `data/{transcript_id}/upload.webm`)
+**Processing pipeline** (`PipelineMainMultitrack`):
+1. Download all audio tracks from Daily.co S3
+2. Align tracks by PTS (presentation timestamp) to handle late joiners
+3. Mix tracks into single audio file for unified playback
+4. Transcribe each track individually with proper offset handling
+5. Perform diarization on mixed audio
+6. Generate topics, summaries, and word-level timestamps
+7. Convert audio to MP3 and generate waveform visualization

-**File processing pipeline** then:
- Converts WebM to MP3 format (for frontend audio player)
- Generates waveform visualization data (audio.json)
- These files enable proper frontend transcript page display
-
-**Next implementation step:** Replace stub with real transcription pipeline (merge audio tracks, run Whisper/diarization).
+**Note:** A stub processor (`process_daily_recording`) exists for testing webhook flow without GPU costs, but the production code path uses `process_multitrack_recording` with full ML pipeline.

 ---

@@ -29,6 +29,7 @@ The actual audio/video files are recorded to S3, but transcription/diarization i

 **1. Environment Variables** (check in `.env.development.local`):
 ```bash
+# Daily.co API Configuration
 DAILY_API_KEY=<key>
 DAILY_SUBDOMAIN=monadical
 DAILY_WEBHOOK_SECRET=<base64-encoded-secret>
@@ -37,25 +38,43 @@ AWS_DAILY_S3_REGION=us-east-1
 AWS_DAILY_ROLE_ARN=arn:aws:iam::950402358378:role/DailyCo
 DAILY_MIGRATION_ENABLED=true
 DAILY_MIGRATION_ROOM_IDS=["552640fd-16f2-4162-9526-8cf40cd2357e"]
+
+# Transcription/Diarization Backend (Required for real processing)
+DIARIZATION_BACKEND=modal
+DIARIZATION_MODAL_API_KEY=<modal-api-key>
+# TRANSCRIPTION_BACKEND is not explicitly set (uses default/modal)
 ```

 **2. Services Running:**
 ```bash
-docker-compose ps  # server, postgres, redis should be UP
+docker compose ps  # server, postgres, redis, worker, beat should be UP
+```
+
+**IMPORTANT:** Worker and beat services MUST be running for transcription processing:
+```bash
+docker compose up -d worker beat
 ```

 **3. ngrok Tunnel for Webhooks:**
 ```bash
-ngrok http 1250  # Note the URL (e.g., https://abc123.ngrok-free.app)
+# Start ngrok (if not already running)
+ngrok http 1250 --log=stdout > /tmp/ngrok.log 2>&1 &
+
+# Get public URL
+curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; data=json.load(sys.stdin); print(data['tunnels'][0]['public_url'])"
 ```

+**Current ngrok URL:** `https://0503947384a3.ngrok-free.app` (as of last registration)
+
 **4. Webhook Created:**
 ```bash
 cd server
-uv run python scripts/recreate_daily_webhook.py https://abc123.ngrok-free.app/v1/daily/webhook
+uv run python scripts/recreate_daily_webhook.py https://0503947384a3.ngrok-free.app/v1/daily/webhook
 # Verify: "Created webhook <uuid> (state: ACTIVE)"
 ```

+**Current webhook status:** ✅ ACTIVE (webhook ID: dad5ad16-ceca-488e-8fc5-dae8650b51d0)
+
 ---

 ## Test 1: Database Configuration
@@ -338,23 +357,25 @@ recorded_at: <recent-timestamp>

 **Check transcript created:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT id, title, status, duration, recording_id, meeting_id, room_id
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"
 ```

-**Expected:**
+**Expected (REAL transcription):**
 ```
 id: <transcript-id>
-title: The Great Fish Eating Argument
-status: uploaded  (audio file downloaded for playback)
-duration: ~200-300 seconds (depends on fish text parsing)
+title: <AI-generated title based on actual conversation content>
+status: uploaded  (audio file processed and available)
+duration: <actual meeting duration in seconds>
 recording_id: <same-as-recording-id-above>
 meeting_id: <meeting-id>
 room_id: 552640fd-16f2-4162-9526-8cf40cd2357e
 ```

+**Note:** Title and content will reflect the ACTUAL conversation, not mock data. Processing time depends on recording length and GPU backend availability (Modal).
+
 **Verify audio file exists:**
 ```bash
 ls -lh data/<transcript-id>/upload.webm
@@ -365,12 +386,12 @@ ls -lh data/<transcript-id>/upload.webm
 -rw-r--r-- 1 user staff ~100-200K Oct 10 18:48 upload.webm
 ```

-**Check transcript topics (stub data):**
+**Check transcript topics (REAL transcription):**
 ```bash
-TRANSCRIPT_ID=$(docker-compose exec -T postgres psql -U reflector -d reflector -t -c \
+TRANSCRIPT_ID=$(docker compose exec -T postgres psql -U reflector -d reflector -t -c \
  "SELECT id FROM transcript ORDER BY created_at DESC LIMIT 1;")

-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     jsonb_array_length(topics) as num_topics,
     jsonb_array_length(participants) as num_participants,
@@ -380,55 +401,52 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \
   WHERE id = '$TRANSCRIPT_ID';"
 ```

-**Expected:**
+**Expected (REAL data):**
 ```
-num_topics: 3
-num_participants: 2
-short_summary: Two people argue about eating fish
-title: The Great Fish Eating Argument
+num_topics: <varies based on conversation>
+num_participants: <actual number of participants who spoke>
+short_summary: <AI-generated summary of actual conversation>
+title: <AI-generated title based on content>
 ```

-**Check topics contain fish text:**
+**Check topics contain actual transcription:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT topics->0->'title', topics->0->'summary', topics->0->'transcript'
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;" | head -20
 ```

-**Expected output should contain:**
-```
-Fish Argument Part 1
-Argument about eating fish continues (part 1)
-Fish for dinner are nothing wrong with you? There's nothing...
-```
+**Expected output:** Will contain the ACTUAL transcribed conversation from the Daily.co meeting, not mock data.

 **Check participants:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT participants FROM transcript ORDER BY created_at DESC LIMIT 1;" \
  | python3 -c "import sys, json; data=json.loads(sys.stdin.read()); print(json.dumps(data, indent=2))"
 ```

-**Expected:**
+**Expected (REAL diarization):**
 ```json
 [
  {
    "id": "<uuid>",
    "speaker": 0,
-    "name": "Fish Eater"
+    "name": "Speaker 1"
  },
  {
    "id": "<uuid>",
    "speaker": 1,
-    "name": "Annoying Person"
+    "name": "Speaker 2"
  }
 ]
 ```

+**Note:** Speaker names will be generic ("Speaker 1", "Speaker 2", etc.) as determined by the diarization backend. Number of participants depends on how many actually spoke during the meeting.
+
 **Check word-level data:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT jsonb_array_length(topics->0->'words') as num_words_first_topic
   FROM transcript
   ORDER BY created_at DESC LIMIT 1;"
@@ -436,12 +454,12 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \

 **Expected:**
 ```
-num_words_first_topic: ~100-150 (varies based on topic chunking)
+num_words_first_topic: <varies based on actual conversation length and topic chunking>
 ```

 **Verify speaker diarization in words:**
 ```bash
-docker-compose exec -T postgres psql -U reflector -d reflector -c \
+docker compose exec -T postgres psql -U reflector -d reflector -c \
  "SELECT
     topics->0->'words'->0->>'text' as first_word,
     topics->0->'words'->0->>'speaker' as speaker,
@@ -451,14 +469,16 @@ docker-compose exec -T postgres psql -U reflector -d reflector -c \
   ORDER BY created_at DESC LIMIT 1;"
 ```

-**Expected:**
+**Expected (REAL transcription):**
 ```
-first_word: Fish
-speaker: 0 or 1 (depends on parsing)
-start_time: 0.0
-end_time: 0.35 (approximate)
+first_word: <actual first word from transcription>
+speaker: 0, 1, 2, ... (actual speaker ID from diarization)
+start_time: <actual timestamp in seconds>
+end_time: <actual end timestamp>
 ```

+**Note:** All timestamps and speaker IDs are from real transcription/diarization, synchronized across tracks.
+
 ---

 ## Test 8: Recording Type Verification
@@ -579,13 +599,15 @@ Recording: raw-tracks
 - [x] S3 path: `monadical/test2-{timestamp}/{recording-start-ts}-{participant-uuid}-cam-{audio|video}-{track-start-ts}`
 - [x] Database `num_clients` increments/decrements correctly
 - [x] **Database recording entry created** with correct S3 path and status `completed`
- [x] **Database transcript entry created** with status `uploaded`
- [x] **Audio file downloaded** to `data/{transcript_id}/upload.webm` (~100-200KB)
- [x] **Transcript has stub data**: title "The Great Fish Eating Argument"
- [x] **Transcript has 3 topics** about fish argument
- [x] **Transcript has 2 participants**: "Fish Eater" (speaker 0) and "Annoying Person" (speaker 1)
- [x] **Topics contain word-level data** with timestamps and speaker IDs
- [x] **Total duration** ~200-300 seconds based on fish text parsing
- [x] **MP3 and waveform files generated** by file processing pipeline
- [x] **Frontend transcript page loads** without "Failed to load audio" error
- [x] **Audio player functional** with working playback and waveform visualization
+- [ ] **Database transcript entry created** with status `uploaded`
+- [ ] **Audio file downloaded** to `data/{transcript_id}/upload.webm`
+- [ ] **Transcript has REAL data**: AI-generated title based on conversation
+- [ ] **Transcript has topics** generated from actual content
+- [ ] **Transcript has participants** with proper speaker diarization
+- [ ] **Topics contain word-level data** with accurate timestamps and speaker IDs
+- [ ] **Total duration** matches actual meeting length
+- [ ] **MP3 and waveform files generated** by file processing pipeline
+- [ ] **Frontend transcript page loads** without "Failed to load audio" error
+- [ ] **Audio player functional** with working playback and waveform visualization
+- [ ] **Multitrack processing completed** without errors in worker logs
+- [ ] **Modal GPU backends accessible** (transcription and diarization)
--- a/server/Dockerfile
+++ b/server/Dockerfile
@@ -6,7 +6,7 @@ ENV PYTHONUNBUFFERED=1 \

 # builder install base dependencies
 WORKDIR /tmp
-RUN apt-get update && apt-get install -y curl && apt-get clean
+RUN apt-get update && apt-get install -y curl ffmpeg && apt-get clean
 ADD https://astral.sh/uv/install.sh /uv-installer.sh
 RUN sh /uv-installer.sh && rm /uv-installer.sh
 ENV PATH="/root/.local/bin/:$PATH"
--- a/server/reflector/pipelines/MULTITRACK_FIX_SUMMARY.md
+++ b/server/reflector/pipelines/MULTITRACK_FIX_SUMMARY.md
@@ -0,0 +1,84 @@
+# Multitrack Pipeline Fix Summary
+
+## Problem
+Whisper timestamps were incorrect because it ignores leading silence in audio files. Daily.co tracks can have arbitrary amounts of silence before speech starts.
+
+## Solution
+**Pad tracks BEFORE transcription using stream metadata `start_time`**
+
+This makes Whisper timestamps automatically correct relative to recording start.
+
+## Key Changes in `main_multitrack_pipeline_fixed.py`
+
+### 1. Added `pad_track_for_transcription()` method (lines 55-172)
+
+```python
+async def pad_track_for_transcription(
+    self,
+    track_data: bytes,
+    track_idx: int,
+    storage,
+) -> tuple[bytes, str]:
+```
+
+- Extracts stream metadata `start_time` using PyAV
+- Creates PyAV filter graph with `adelay` filter to add padding
+- Stores padded track to S3 and returns URL
+- Uses same audio processing library (PyAV) already in the pipeline
+
+### 2. Modified `process()` method
+
+#### REMOVED (lines 255-302):
+- Entire filename parsing for offsets - NOT NEEDED ANYMORE
+- The complex regex parsing of Daily.co filenames
+- Offset adjustment after transcription
+
+#### ADDED (lines 371-382):
+- Padding step BEFORE transcription:
+```python
+# PAD TRACKS BEFORE TRANSCRIPTION - THIS IS THE KEY FIX!
+padded_track_urls: list[str] = []
+for idx, data in enumerate(track_datas):
+    if not data:
+        padded_track_urls.append("")
+        continue
+
+    _, padded_url = await self.pad_track_for_transcription(
+        data, idx, storage
+    )
+    padded_track_urls.append(padded_url)
+```
+
+#### MODIFIED (lines 385-435):
+- Transcribe PADDED tracks instead of raw tracks
+- Removed all timestamp offset adjustment code
+- Just set speaker ID - timestamps already correct!
+
+```python
+# NO OFFSET ADJUSTMENT NEEDED!
+# Timestamps are already correct because we transcribed padded tracks
+# Just set speaker ID
+for w in t.words:
+    w.speaker = idx
+```
+
+## Why This Works
+
+1. **Stream metadata is authoritative**: Daily.co sets `start_time` in the WebM container
+2. **PyAV respects metadata**: `audio_stream.start_time * audio_stream.time_base` gives seconds
+3. **Padding before transcription**: Whisper sees continuous audio from time 0
+4. **Automatic alignment**: Word at 51s in padded track = 51s in recording
+
+## Testing
+
+Process the test recording (daily-20251020193458) and verify:
+- Participant 0 words appear at ~2s
+- Participant 1 words appear at ~51s
+- No word interleaving
+- Correct chronological order
+
+## Files
+
+- **Original**: `main_multitrack_pipeline.py`
+- **Fixed**: `main_multitrack_pipeline_fixed.py`
+- **Test data**: `/Users/firfi/work/clients/monadical/reflector/1760988935484-*.webm`
--- a/server/reflector/pipelines/main_multitrack_pipeline.backup.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline.backup.py
@@ -0,0 +1,510 @@
+import asyncio
+import io
+from fractions import Fraction
+
+import av
+import boto3
+import structlog
+from av.audio.resampler import AudioResampler
+from celery import chain, shared_task
+
+from reflector.asynctask import asynctask
+from reflector.db.transcripts import (
+    TranscriptStatus,
+    TranscriptText,
+    transcripts_controller,
+)
+from reflector.logger import logger
+from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed
+from reflector.pipelines.main_live_pipeline import (
+    PipelineMainBase,
+    task_cleanup_consent,
+    task_pipeline_post_to_zulip,
+)
+from reflector.processors import (
+    AudioFileWriterProcessor,
+    TranscriptFinalSummaryProcessor,
+    TranscriptFinalTitleProcessor,
+    TranscriptTopicDetectorProcessor,
+)
+from reflector.processors.file_transcript import FileTranscriptInput
+from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
+from reflector.processors.types import TitleSummary
+from reflector.processors.types import (
+    Transcript as TranscriptType,
+)
+from reflector.settings import settings
+from reflector.storage import get_transcripts_storage
+
+
+class EmptyPipeline:
+    def __init__(self, logger: structlog.BoundLogger):
+        self.logger = logger
+
+    def get_pref(self, k, d=None):
+        return d
+
+    async def emit(self, event):
+        pass
+
+
+class PipelineMainMultitrack(PipelineMainBase):
+    """Process multiple participant tracks for a transcript without mixing audio."""
+
+    def __init__(self, transcript_id: str):
+        super().__init__(transcript_id=transcript_id)
+        self.logger = logger.bind(transcript_id=self.transcript_id)
+        self.empty_pipeline = EmptyPipeline(logger=self.logger)
+
+    async def mixdown_tracks(
+        self,
+        track_datas: list[bytes],
+        writer: AudioFileWriterProcessor,
+        offsets_seconds: list[float] | None = None,
+    ) -> None:
+        """
+        Minimal multi-track mixdown using a PyAV filter graph (amix), no resampling.
+        """
+
+        # Discover target sample rate from first decodable frame
+        target_sample_rate: int | None = None
+        for data in track_datas:
+            if not data:
+                continue
+            try:
+                container = av.open(io.BytesIO(data))
+                try:
+                    for frame in container.decode(audio=0):
+                        target_sample_rate = frame.sample_rate
+                        break
+                finally:
+                    container.close()
+            except Exception:
+                continue
+            if target_sample_rate:
+                break
+
+        if not target_sample_rate:
+            self.logger.warning("Mixdown skipped - no decodable audio frames found")
+            return
+
+        # Build PyAV filter graph:
+        # N abuffer (s32/stereo)
+        #   -> optional adelay per input (for alignment)
+        #   -> amix (s32)
+        #   -> aformat(s16)
+        #   -> sink
+        graph = av.filter.Graph()
+        inputs = []
+        valid_track_datas = [d for d in track_datas if d]
+        # Align offsets list with the filtered inputs (skip empties)
+        input_offsets_seconds = None
+        if offsets_seconds is not None:
+            input_offsets_seconds = [
+                offsets_seconds[i] for i, d in enumerate(track_datas) if d
+            ]
+        for idx, data in enumerate(valid_track_datas):
+            args = (
+                f"time_base=1/{target_sample_rate}:"
+                f"sample_rate={target_sample_rate}:"
+                f"sample_fmt=s32:"
+                f"channel_layout=stereo"
+            )
+            in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
+            inputs.append(in_ctx)
+
+        if not inputs:
+            self.logger.warning("Mixdown skipped - no valid inputs for graph")
+            return
+
+        mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
+
+        fmt = graph.add(
+            "aformat",
+            args=(
+                f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}"
+            ),
+            name="fmt",
+        )
+
+        sink = graph.add("abuffersink", name="out")
+
+        # Optional per-input delay before mixing
+        delays_ms: list[int] = []
+        if input_offsets_seconds is not None:
+            base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
+            delays_ms = [
+                max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds
+            ]
+        else:
+            delays_ms = [0 for _ in inputs]
+
+        for idx, in_ctx in enumerate(inputs):
+            delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
+            if delay_ms > 0:
+                # adelay requires one value per channel; use same for stereo
+                adelay = graph.add(
+                    "adelay",
+                    args=f"delays={delay_ms}|{delay_ms}:all=1",
+                    name=f"delay{idx}",
+                )
+                in_ctx.link_to(adelay)
+                adelay.link_to(mixer, 0, idx)
+            else:
+                in_ctx.link_to(mixer, 0, idx)
+        mixer.link_to(fmt)
+        fmt.link_to(sink)
+        graph.configure()
+
+        # Open containers for decoding
+        containers = []
+        for i, d in enumerate(valid_track_datas):
+            try:
+                c = av.open(io.BytesIO(d))
+                containers.append(c)
+            except Exception as e:
+                self.logger.warning(
+                    "Mixdown: failed to open container", input=i, error=str(e)
+                )
+                containers.append(None)
+        # Filter out Nones for decoders
+        containers = [c for c in containers if c is not None]
+        decoders = [c.decode(audio=0) for c in containers]
+        active = [True] * len(decoders)
+        # Per-input resamplers to enforce s32/stereo at the same rate (no resample of rate)
+        resamplers = [
+            AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
+            for _ in decoders
+        ]
+
+        try:
+            # Round-robin feed frames into graph, pull mixed frames as they become available
+            while any(active):
+                for i, (dec, is_active) in enumerate(zip(decoders, active)):
+                    if not is_active:
+                        continue
+                    try:
+                        frame = next(dec)
+                    except StopIteration:
+                        active[i] = False
+                        continue
+
+                    # Enforce same sample rate; convert format/layout to s16/stereo (no resample)
+                    if frame.sample_rate != target_sample_rate:
+                        # Skip frames with differing rate
+                        continue
+                    out_frames = resamplers[i].resample(frame) or []
+                    for rf in out_frames:
+                        rf.sample_rate = target_sample_rate
+                        rf.time_base = Fraction(1, target_sample_rate)
+                        inputs[i].push(rf)
+
+                    # Drain available mixed frames
+                    while True:
+                        try:
+                            mixed = sink.pull()
+                        except Exception:
+                            break
+                        mixed.sample_rate = target_sample_rate
+                        mixed.time_base = Fraction(1, target_sample_rate)
+                        await writer.push(mixed)
+
+            # Signal EOF to inputs and drain remaining
+            for in_ctx in inputs:
+                in_ctx.push(None)
+            while True:
+                try:
+                    mixed = sink.pull()
+                except Exception:
+                    break
+                mixed.sample_rate = target_sample_rate
+                mixed.time_base = Fraction(1, target_sample_rate)
+                await writer.push(mixed)
+        finally:
+            for c in containers:
+                c.close()
+
+    async def set_status(self, transcript_id: str, status: TranscriptStatus):
+        async with self.lock_transaction():
+            return await transcripts_controller.set_status(transcript_id, status)
+
+    async def process(self, bucket_name: str, track_keys: list[str]):
+        transcript = await self.get_transcript()
+
+        s3 = boto3.client(
+            "s3",
+            region_name=settings.RECORDING_STORAGE_AWS_REGION,
+            aws_access_key_id=settings.RECORDING_STORAGE_AWS_ACCESS_KEY_ID,
+            aws_secret_access_key=settings.RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY,
+        )
+
+        storage = get_transcripts_storage()
+
+        # Pre-download bytes for all tracks for mixing and transcription
+        track_datas: list[bytes] = []
+        for key in track_keys:
+            try:
+                obj = s3.get_object(Bucket=bucket_name, Key=key)
+                track_datas.append(obj["Body"].read())
+            except Exception as e:
+                self.logger.warning(
+                    "Skipping track - cannot read S3 object", key=key, error=str(e)
+                )
+                track_datas.append(b"")
+
+        # Extract offsets from Daily.co filename timestamps
+        # Format: {rec_start_ts}-{uuid}-{media_type}-{track_start_ts}.{ext}
+        # Example: 1760988935484-uuid-cam-audio-1760988935922
+        import re
+
+        offsets_seconds: list[float] = []
+        recording_start_ts: int | None = None
+
+        for key in track_keys:
+            # Parse Daily.co raw-tracks filename pattern
+            match = re.search(r"(\d+)-([0-9a-f-]{36})-(cam-audio)-(\d+)", key)
+            if not match:
+                self.logger.warning(
+                    "Track key doesn't match Daily.co pattern, using 0.0 offset",
+                    key=key,
+                )
+                offsets_seconds.append(0.0)
+                continue
+
+            rec_start_ts = int(match.group(1))
+            track_start_ts = int(match.group(4))
+
+            # Validate all tracks belong to same recording
+            if recording_start_ts is None:
+                recording_start_ts = rec_start_ts
+            elif rec_start_ts != recording_start_ts:
+                self.logger.error(
+                    "Track belongs to different recording",
+                    key=key,
+                    expected_start=recording_start_ts,
+                    got_start=rec_start_ts,
+                )
+                offsets_seconds.append(0.0)
+                continue
+
+            # Calculate offset in seconds
+            offset_ms = track_start_ts - rec_start_ts
+            offset_s = offset_ms / 1000.0
+
+            self.logger.info(
+                "Parsed track offset from filename",
+                key=key,
+                recording_start=rec_start_ts,
+                track_start=track_start_ts,
+                offset_seconds=offset_s,
+            )
+
+            offsets_seconds.append(max(0.0, offset_s))
+
+        # Mixdown all available tracks into transcript.audio_mp3_filename, preserving sample rate
+        try:
+            mp3_writer = AudioFileWriterProcessor(
+                path=str(transcript.audio_mp3_filename)
+            )
+            await self.mixdown_tracks(track_datas, mp3_writer, offsets_seconds)
+            await mp3_writer.flush()
+        except Exception as e:
+            self.logger.error("Mixdown failed", error=str(e))
+
+        speaker_transcripts: list[TranscriptType] = []
+        for idx, key in enumerate(track_keys):
+            ext = ".mp4"
+
+            try:
+                obj = s3.get_object(Bucket=bucket_name, Key=key)
+                data = obj["Body"].read()
+            except Exception as e:
+                self.logger.error(
+                    "Skipping track - cannot read S3 object", key=key, error=str(e)
+                )
+                continue
+
+            storage_path = f"file_pipeline/{transcript.id}/tracks/track_{idx}{ext}"
+            try:
+                await storage.put_file(storage_path, data)
+                audio_url = await storage.get_file_url(storage_path)
+            except Exception as e:
+                self.logger.error(
+                    "Skipping track - cannot upload to storage", key=key, error=str(e)
+                )
+                continue
+
+            try:
+                t = await self.transcribe_file(audio_url, transcript.source_language)
+            except Exception as e:
+                self.logger.error(
+                    "Transcription via default backend failed, trying local whisper",
+                    key=key,
+                    url=audio_url,
+                    error=str(e),
+                )
+                try:
+                    fallback = FileTranscriptAutoProcessor(name="whisper")
+                    result = None
+
+                    async def capture_result(r):
+                        nonlocal result
+                        result = r
+
+                    fallback.on(capture_result)
+                    await fallback.push(
+                        FileTranscriptInput(
+                            audio_url=audio_url, language=transcript.source_language
+                        )
+                    )
+                    await fallback.flush()
+                    if not result:
+                        raise Exception("No transcript captured in fallback")
+                    t = result
+                except Exception as e2:
+                    self.logger.error(
+                        "Skipping track - transcription failed after fallback",
+                        key=key,
+                        url=audio_url,
+                        error=str(e2),
+                    )
+                    continue
+
+            if not t.words:
+                continue
+            # Shift word timestamps by the track's offset so all are relative to 00:00
+            track_offset = offsets_seconds[idx] if idx < len(offsets_seconds) else 0.0
+            for w in t.words:
+                try:
+                    if hasattr(w, "start") and w.start is not None:
+                        w.start = float(w.start) + track_offset
+                    if hasattr(w, "end") and w.end is not None:
+                        w.end = float(w.end) + track_offset
+                except Exception:
+                    pass
+                w.speaker = idx
+            speaker_transcripts.append(t)
+
+        if not speaker_transcripts:
+            raise Exception("No valid track transcriptions")
+
+        merged_words = []
+        for t in speaker_transcripts:
+            merged_words.extend(t.words)
+        merged_words.sort(key=lambda w: w.start)
+
+        merged_transcript = TranscriptType(words=merged_words, translation=None)
+
+        await transcripts_controller.append_event(
+            transcript,
+            event="TRANSCRIPT",
+            data=TranscriptText(
+                text=merged_transcript.text, translation=merged_transcript.translation
+            ),
+        )
+
+        topics = await self.detect_topics(merged_transcript, transcript.target_language)
+        await asyncio.gather(
+            self.generate_title(topics),
+            self.generate_summaries(topics),
+            return_exceptions=False,
+        )
+
+        await self.set_status(transcript.id, "ended")
+
+    async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
+        processor = FileTranscriptAutoProcessor()
+        input_data = FileTranscriptInput(audio_url=audio_url, language=language)
+
+        result: TranscriptType | None = None
+
+        async def capture_result(transcript):
+            nonlocal result
+            result = transcript
+
+        processor.on(capture_result)
+        await processor.push(input_data)
+        await processor.flush()
+
+        if not result:
+            raise ValueError("No transcript captured")
+
+        return result
+
+    async def detect_topics(
+        self, transcript: TranscriptType, target_language: str
+    ) -> list[TitleSummary]:
+        chunk_size = 300
+        topics: list[TitleSummary] = []
+
+        async def on_topic(topic: TitleSummary):
+            topics.append(topic)
+            return await self.on_topic(topic)
+
+        topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
+        topic_detector.set_pipeline(self.empty_pipeline)
+
+        for i in range(0, len(transcript.words), chunk_size):
+            chunk_words = transcript.words[i : i + chunk_size]
+            if not chunk_words:
+                continue
+
+            chunk_transcript = TranscriptType(
+                words=chunk_words, translation=transcript.translation
+            )
+            await topic_detector.push(chunk_transcript)
+
+        await topic_detector.flush()
+        return topics
+
+    async def generate_title(self, topics: list[TitleSummary]):
+        if not topics:
+            self.logger.warning("No topics for title generation")
+            return
+
+        processor = TranscriptFinalTitleProcessor(callback=self.on_title)
+        processor.set_pipeline(self.empty_pipeline)
+
+        for topic in topics:
+            await processor.push(topic)
+
+        await processor.flush()
+
+    async def generate_summaries(self, topics: list[TitleSummary]):
+        if not topics:
+            self.logger.warning("No topics for summary generation")
+            return
+
+        transcript = await self.get_transcript()
+        processor = TranscriptFinalSummaryProcessor(
+            transcript=transcript,
+            callback=self.on_long_summary,
+            on_short_summary=self.on_short_summary,
+        )
+        processor.set_pipeline(self.empty_pipeline)
+
+        for topic in topics:
+            await processor.push(topic)
+
+        await processor.flush()
+
+
+@shared_task
+@asynctask
+async def task_pipeline_multitrack_process(
+    *, transcript_id: str, bucket_name: str, track_keys: list[str]
+):
+    pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
+    try:
+        await pipeline.set_status(transcript_id, "processing")
+        await pipeline.process(bucket_name, track_keys)
+    except Exception:
+        await pipeline.set_status(transcript_id, "error")
+        raise
+
+    post_chain = chain(
+        task_cleanup_consent.si(transcript_id=transcript_id),
+        task_pipeline_post_to_zulip.si(transcript_id=transcript_id),
+        task_send_webhook_if_needed.si(transcript_id=transcript_id),
+    )
+    post_chain.delay()
--- a/server/reflector/pipelines/main_multitrack_pipeline.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline.py
@@ -12,6 +12,7 @@ from reflector.asynctask import asynctask
 from reflector.db.transcripts import (
    TranscriptStatus,
    TranscriptText,
+    TranscriptWaveform,
    transcripts_controller,
 )
 from reflector.logger import logger
@@ -27,6 +28,7 @@ from reflector.processors import (
    TranscriptFinalTitleProcessor,
    TranscriptTopicDetectorProcessor,
 )
+from reflector.processors.audio_waveform_processor import AudioWaveformProcessor
 from reflector.processors.file_transcript import FileTranscriptInput
 from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
 from reflector.processors.types import TitleSummary
@@ -56,6 +58,145 @@ class PipelineMainMultitrack(PipelineMainBase):
        self.logger = logger.bind(transcript_id=self.transcript_id)
        self.empty_pipeline = EmptyPipeline(logger=self.logger)

+    async def pad_track_for_transcription(
+        self,
+        track_data: bytes,
+        track_idx: int,
+        storage,
+    ) -> tuple[bytes, str]:
+        """
+        Pad a single track with silence based on stream metadata start_time.
+        This ensures Whisper timestamps will be relative to recording start.
+        Uses ffmpeg subprocess approach proven to work with python-raw-tracks-align.
+
+        Returns: (padded_data, storage_url)
+        """
+        import json
+        import math
+        import subprocess
+        import tempfile
+
+        if not track_data:
+            return b"", ""
+
+        transcript = await self.get_transcript()
+
+        # Create temp files for ffmpeg processing
+        with tempfile.NamedTemporaryFile(suffix=".webm", delete=False) as input_file:
+            input_file.write(track_data)
+            input_file_path = input_file.name
+
+        output_file_path = input_file_path.replace(".webm", "_padded.webm")
+
+        try:
+            # Get stream metadata using ffprobe
+            ffprobe_cmd = [
+                "ffprobe",
+                "-v",
+                "error",
+                "-show_entries",
+                "stream=start_time",
+                "-of",
+                "json",
+                input_file_path,
+            ]
+
+            result = subprocess.run(
+                ffprobe_cmd, capture_output=True, text=True, check=True
+            )
+            metadata = json.loads(result.stdout)
+
+            # Extract start_time from stream metadata
+            start_time_seconds = 0.0
+            if metadata.get("streams") and len(metadata["streams"]) > 0:
+                start_time_str = metadata["streams"][0].get("start_time", "0")
+                start_time_seconds = float(start_time_str)
+
+            self.logger.info(
+                f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s",
+                track_idx=track_idx,
+            )
+
+            # If no padding needed, use original
+            if start_time_seconds <= 0:
+                storage_path = f"file_pipeline/{transcript.id}/tracks/original_track_{track_idx}.webm"
+                await storage.put_file(storage_path, track_data)
+                url = await storage.get_file_url(storage_path)
+                return track_data, url
+
+            # Calculate delay in milliseconds
+            delay_ms = math.floor(start_time_seconds * 1000)
+
+            # Run ffmpeg to pad the audio while maintaining WebM/Opus format for Modal compatibility
+            # ffmpeg quirk: aresample needs to come before adelay in the filter chain
+            ffmpeg_cmd = [
+                "ffmpeg",
+                "-hide_banner",
+                "-loglevel",
+                "error",
+                "-y",  # overwrite output
+                "-i",
+                input_file_path,
+                "-af",
+                f"aresample=async=1,adelay={delay_ms}:all=true",
+                "-c:a",
+                "libopus",  # Keep Opus codec for Modal compatibility
+                "-b:a",
+                "128k",  # Standard bitrate for Opus
+                output_file_path,
+            ]
+
+            self.logger.info(
+                f"Padding track {track_idx} with {delay_ms}ms delay using ffmpeg",
+                track_idx=track_idx,
+                delay_ms=delay_ms,
+                command=" ".join(ffmpeg_cmd),
+            )
+
+            result = subprocess.run(ffmpeg_cmd, capture_output=True, text=True)
+            if result.returncode != 0:
+                self.logger.error(
+                    f"ffmpeg padding failed for track {track_idx}",
+                    track_idx=track_idx,
+                    stderr=result.stderr,
+                    returncode=result.returncode,
+                )
+                raise Exception(f"ffmpeg padding failed: {result.stderr}")
+
+            # Read the padded output
+            with open(output_file_path, "rb") as f:
+                padded_data = f.read()
+
+            # Store padded track
+            storage_path = (
+                f"file_pipeline/{transcript.id}/tracks/padded_track_{track_idx}.webm"
+            )
+            await storage.put_file(storage_path, padded_data)
+            padded_url = await storage.get_file_url(storage_path)
+
+            self.logger.info(
+                f"Successfully padded track {track_idx} with {start_time_seconds:.3f}s offset, stored at {storage_path}",
+                track_idx=track_idx,
+                delay_ms=delay_ms,
+                padded_url=padded_url,
+                padded_size=len(padded_data),
+            )
+
+            return padded_data, padded_url
+
+        finally:
+            # Clean up temp files
+            import os
+
+            try:
+                os.unlink(input_file_path)
+            except:
+                pass
+            try:
+                os.unlink(output_file_path)
+            except:
+                pass
+
    async def mixdown_tracks(
        self,
        track_datas: list[bytes],
@@ -228,6 +369,14 @@ class PipelineMainMultitrack(PipelineMainBase):
        async with self.lock_transaction():
            return await transcripts_controller.set_status(transcript_id, status)

+    async def on_waveform(self, data):
+        async with self.transaction():
+            waveform = TranscriptWaveform(waveform=data)
+            transcript = await self.get_transcript()
+            return await transcripts_controller.append_event(
+                transcript=transcript, event="WAVEFORM", data=waveform
+            )
+
    async def process(self, bucket_name: str, track_keys: list[str]):
        transcript = await self.get_transcript()

@@ -252,64 +401,90 @@ class PipelineMainMultitrack(PipelineMainBase):
                )
                track_datas.append(b"")

-        # Estimate offsets from first frame PTS, aligned to track_keys
-        offsets_seconds: list[float] = []
-        for data, key in zip(track_datas, track_keys):
-            off_s = 0.0
-            if data:
-                try:
-                    c = av.open(io.BytesIO(data))
-                    try:
-                        for frame in c.decode(audio=0):
-                            if frame.pts is not None and frame.time_base:
-                                off_s = float(frame.pts * frame.time_base)
-                            break
-                    finally:
-                        c.close()
-                except Exception:
-                    pass
-            offsets_seconds.append(max(0.0, float(off_s)))
+        # PAD TRACKS FIRST - this creates full-length tracks with correct timeline
+        padded_track_datas: list[bytes] = []
+        padded_track_urls: list[str] = []
+        for idx, data in enumerate(track_datas):
+            if not data:
+                padded_track_datas.append(b"")
+                padded_track_urls.append("")
+                continue

-        # Mixdown all available tracks into transcript.audio_mp3_filename, preserving sample rate
+            padded_data, padded_url = await self.pad_track_for_transcription(
+                data, idx, storage
+            )
+            padded_track_datas.append(padded_data)
+            padded_track_urls.append(padded_url)
+            self.logger.info(f"Padded track {idx} for transcription: {padded_url}")
+
+        # Mixdown PADDED tracks (already aligned with timeline) into transcript.audio_mp3_filename
        try:
+            # Ensure data directory exists
+            transcript.data_path.mkdir(parents=True, exist_ok=True)
+
            mp3_writer = AudioFileWriterProcessor(
                path=str(transcript.audio_mp3_filename)
            )
-            await self.mixdown_tracks(track_datas, mp3_writer, offsets_seconds)
+            # Use PADDED tracks with NO additional offsets (already aligned by padding)
+            await self.mixdown_tracks(
+                padded_track_datas, mp3_writer, offsets_seconds=None
+            )
            await mp3_writer.flush()
+
+            # Upload the mixed audio to S3 for web playback
+            if transcript.audio_mp3_filename.exists():
+                mp3_data = transcript.audio_mp3_filename.read_bytes()
+                storage_path = f"{transcript.id}/audio.mp3"
+                await storage.put_file(storage_path, mp3_data)
+                mp3_url = await storage.get_file_url(storage_path)
+
+                # Update transcript to indicate audio is in storage
+                await transcripts_controller.update(
+                    transcript, {"audio_location": "storage"}
+                )
+
+                self.logger.info(
+                    f"Uploaded mixed audio to storage",
+                    storage_path=storage_path,
+                    size=len(mp3_data),
+                    url=mp3_url,
+                )
+            else:
+                self.logger.warning("Mixdown file does not exist after processing")
        except Exception as e:
-            self.logger.error("Mixdown failed", error=str(e))
+            self.logger.error("Mixdown failed", error=str(e), exc_info=True)

+        # Generate waveform from the mixed audio file
+        if transcript.audio_mp3_filename.exists():
+            try:
+                self.logger.info("Generating waveform from mixed audio")
+                waveform_processor = AudioWaveformProcessor(
+                    audio_path=transcript.audio_mp3_filename,
+                    waveform_path=transcript.audio_waveform_filename,
+                    on_waveform=self.on_waveform,
+                )
+                waveform_processor.set_pipeline(self.empty_pipeline)
+                await waveform_processor.flush()
+                self.logger.info("Waveform generated successfully")
+            except Exception as e:
+                self.logger.error(
+                    "Waveform generation failed", error=str(e), exc_info=True
+                )
+
+        # Transcribe PADDED tracks - timestamps will be automatically correct!
        speaker_transcripts: list[TranscriptType] = []
-        for idx, key in enumerate(track_keys):
-            ext = ".mp4"
-
-            try:
-                obj = s3.get_object(Bucket=bucket_name, Key=key)
-                data = obj["Body"].read()
-            except Exception as e:
-                self.logger.error(
-                    "Skipping track - cannot read S3 object", key=key, error=str(e)
-                )
-                continue
-
-            storage_path = f"file_pipeline/{transcript.id}/tracks/track_{idx}{ext}"
-            try:
-                await storage.put_file(storage_path, data)
-                audio_url = await storage.get_file_url(storage_path)
-            except Exception as e:
-                self.logger.error(
-                    "Skipping track - cannot upload to storage", key=key, error=str(e)
-                )
+        for idx, padded_url in enumerate(padded_track_urls):
+            if not padded_url:
                continue

            try:
-                t = await self.transcribe_file(audio_url, transcript.source_language)
+                # Transcribe the PADDED track
+                t = await self.transcribe_file(padded_url, transcript.source_language)
            except Exception as e:
                self.logger.error(
                    "Transcription via default backend failed, trying local whisper",
-                    key=key,
-                    url=audio_url,
+                    track_idx=idx,
+                    url=padded_url,
                    error=str(e),
                )
                try:
@@ -323,7 +498,7 @@ class PipelineMainMultitrack(PipelineMainBase):
                    fallback.on(capture_result)
                    await fallback.push(
                        FileTranscriptInput(
-                            audio_url=audio_url, language=transcript.source_language
+                            audio_url=padded_url, language=transcript.source_language
                        )
                    )
                    await fallback.flush()
@@ -333,34 +508,37 @@ class PipelineMainMultitrack(PipelineMainBase):
                except Exception as e2:
                    self.logger.error(
                        "Skipping track - transcription failed after fallback",
-                        key=key,
-                        url=audio_url,
+                        track_idx=idx,
+                        url=padded_url,
                        error=str(e2),
                    )
                    continue

            if not t.words:
                continue
-            # Shift word timestamps by the track's offset so all are relative to 00:00
-            track_offset = offsets_seconds[idx] if idx < len(offsets_seconds) else 0.0
+
+            # NO OFFSET ADJUSTMENT NEEDED!
+            # Timestamps are already correct because we transcribed padded tracks
+            # Just set speaker ID
            for w in t.words:
-                try:
-                    if hasattr(w, "start") and w.start is not None:
-                        w.start = float(w.start) + track_offset
-                    if hasattr(w, "end") and w.end is not None:
-                        w.end = float(w.end) + track_offset
-                except Exception:
-                    pass
                w.speaker = idx
+
            speaker_transcripts.append(t)
+            self.logger.info(
+                f"Track {idx} transcribed successfully with {len(t.words)} words",
+                track_idx=idx,
+            )

        if not speaker_transcripts:
            raise Exception("No valid track transcriptions")

+        # Merge all words and sort by timestamp
        merged_words = []
        for t in speaker_transcripts:
            merged_words.extend(t.words)
-        merged_words.sort(key=lambda w: w.start)
+        merged_words.sort(
+            key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0
+        )

        merged_transcript = TranscriptType(words=merged_words, translation=None)

--- a/server/reflector/pipelines/main_multitrack_pipeline_fixed.py
+++ b/server/reflector/pipelines/main_multitrack_pipeline_fixed.py
@@ -0,0 +1,629 @@
+import asyncio
+import io
+from fractions import Fraction
+
+import av
+import boto3
+import structlog
+from av.audio.resampler import AudioResampler
+from celery import chain, shared_task
+
+from reflector.asynctask import asynctask
+from reflector.db.transcripts import (
+    TranscriptStatus,
+    TranscriptText,
+    transcripts_controller,
+)
+from reflector.logger import logger
+from reflector.pipelines.main_file_pipeline import task_send_webhook_if_needed
+from reflector.pipelines.main_live_pipeline import (
+    PipelineMainBase,
+    task_cleanup_consent,
+    task_pipeline_post_to_zulip,
+)
+from reflector.processors import (
+    AudioFileWriterProcessor,
+    TranscriptFinalSummaryProcessor,
+    TranscriptFinalTitleProcessor,
+    TranscriptTopicDetectorProcessor,
+)
+from reflector.processors.file_transcript import FileTranscriptInput
+from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor
+from reflector.processors.types import TitleSummary
+from reflector.processors.types import (
+    Transcript as TranscriptType,
+)
+from reflector.settings import settings
+from reflector.storage import get_transcripts_storage
+
+
+class EmptyPipeline:
+    def __init__(self, logger: structlog.BoundLogger):
+        self.logger = logger
+
+    def get_pref(self, k, d=None):
+        return d
+
+    async def emit(self, event):
+        pass
+
+
+class PipelineMainMultitrack(PipelineMainBase):
+    """Process multiple participant tracks for a transcript without mixing audio."""
+
+    def __init__(self, transcript_id: str):
+        super().__init__(transcript_id=transcript_id)
+        self.logger = logger.bind(transcript_id=self.transcript_id)
+        self.empty_pipeline = EmptyPipeline(logger=self.logger)
+
+    async def pad_track_for_transcription(
+        self,
+        track_data: bytes,
+        track_idx: int,
+        storage,
+    ) -> tuple[bytes, str]:
+        """
+        Pad a single track with silence based on stream metadata start_time.
+        This ensures Whisper timestamps will be relative to recording start.
+
+        Returns: (padded_data, storage_url)
+        """
+        if not track_data:
+            return b"", ""
+
+        transcript = await self.get_transcript()
+
+        # Get stream metadata start_time using PyAV
+        container = av.open(io.BytesIO(track_data))
+        try:
+            audio_stream = container.streams.audio[0]
+
+            # Extract start_time from stream metadata
+            if (
+                audio_stream.start_time is not None
+                and audio_stream.time_base is not None
+            ):
+                start_time_seconds = float(
+                    audio_stream.start_time * audio_stream.time_base
+                )
+            else:
+                start_time_seconds = 0.0
+
+            sample_rate = audio_stream.sample_rate
+            codec_name = audio_stream.codec.name
+        finally:
+            container.close()
+
+        self.logger.info(
+            f"Track {track_idx} stream metadata: start_time={start_time_seconds:.3f}s, sample_rate={sample_rate}",
+            track_idx=track_idx,
+        )
+
+        # If no padding needed, use original
+        if start_time_seconds <= 0:
+            storage_path = (
+                f"file_pipeline/{transcript.id}/tracks/original_track_{track_idx}.webm"
+            )
+            await storage.put_file(storage_path, track_data)
+            url = await storage.get_file_url(storage_path)
+            return track_data, url
+
+        # Create PyAV filter graph for padding
+        graph = av.filter.Graph()
+
+        # Input buffer
+        in_args = (
+            f"time_base=1/{sample_rate}:"
+            f"sample_rate={sample_rate}:"
+            f"sample_fmt=s16:"
+            f"channel_layout=stereo"
+        )
+        input_buffer = graph.add("abuffer", args=in_args, name="in")
+
+        # Add delay filter for padding
+        delay_ms = int(start_time_seconds * 1000)
+        delay_filter = graph.add(
+            "adelay", args=f"delays={delay_ms}|{delay_ms}:all=1", name="delay"
+        )
+
+        # Output sink
+        sink = graph.add("abuffersink", name="out")
+
+        # Link filters
+        input_buffer.link_to(delay_filter)
+        delay_filter.link_to(sink)
+
+        graph.configure()
+
+        # Process audio through filter
+        output_bytes = io.BytesIO()
+        output_container = av.open(output_bytes, "w", format="webm")
+        output_stream = output_container.add_stream("libopus", rate=sample_rate)
+        output_stream.channels = 2
+
+        # Reopen input for processing
+        input_container = av.open(io.BytesIO(track_data))
+        resampler = AudioResampler(format="s16", layout="stereo", rate=sample_rate)
+
+        try:
+            # Process frames
+            for frame in input_container.decode(audio=0):
+                # Resample to match filter requirements
+                resampled_frames = resampler.resample(frame)
+                for resampled_frame in resampled_frames:
+                    resampled_frame.pts = frame.pts
+                    resampled_frame.time_base = Fraction(1, sample_rate)
+                    input_buffer.push(resampled_frame)
+
+                    # Pull from filter and encode
+                    while True:
+                        try:
+                            out_frame = sink.pull()
+                            out_frame.pts = out_frame.pts if out_frame.pts else 0
+                            out_frame.time_base = Fraction(1, sample_rate)
+                            for packet in output_stream.encode(out_frame):
+                                output_container.mux(packet)
+                        except av.BlockingIOError:
+                            break
+
+            # Flush
+            input_buffer.push(None)
+            while True:
+                try:
+                    out_frame = sink.pull()
+                    for packet in output_stream.encode(out_frame):
+                        output_container.mux(packet)
+                except (av.BlockingIOError, av.EOFError):
+                    break
+
+            # Flush encoder
+            for packet in output_stream.encode(None):
+                output_container.mux(packet)
+
+        finally:
+            input_container.close()
+            output_container.close()
+
+        padded_data = output_bytes.getvalue()
+
+        # Store padded track
+        storage_path = (
+            f"file_pipeline/{transcript.id}/tracks/padded_track_{track_idx}.webm"
+        )
+        await storage.put_file(storage_path, padded_data)
+        padded_url = await storage.get_file_url(storage_path)
+
+        self.logger.info(
+            f"Padded track {track_idx} with {start_time_seconds:.3f}s offset, stored at {storage_path}",
+            track_idx=track_idx,
+            delay_ms=delay_ms,
+            padded_url=padded_url,
+        )
+
+        return padded_data, padded_url
+
+    async def mixdown_tracks(
+        self,
+        track_datas: list[bytes],
+        writer: AudioFileWriterProcessor,
+        offsets_seconds: list[float] | None = None,
+    ) -> None:
+        """
+        Minimal multi-track mixdown using a PyAV filter graph (amix), no resampling.
+        """
+
+        # Discover target sample rate from first decodable frame
+        target_sample_rate: int | None = None
+        for data in track_datas:
+            if not data:
+                continue
+            try:
+                container = av.open(io.BytesIO(data))
+                try:
+                    for frame in container.decode(audio=0):
+                        target_sample_rate = frame.sample_rate
+                        break
+                finally:
+                    container.close()
+            except Exception:
+                continue
+            if target_sample_rate:
+                break
+
+        if not target_sample_rate:
+            self.logger.warning("Mixdown skipped - no decodable audio frames found")
+            return
+
+        # Build PyAV filter graph:
+        # N abuffer (s32/stereo)
+        #   -> optional adelay per input (for alignment)
+        #   -> amix (s32)
+        #   -> aformat(s16)
+        #   -> sink
+        graph = av.filter.Graph()
+        inputs = []
+        valid_track_datas = [d for d in track_datas if d]
+        # Align offsets list with the filtered inputs (skip empties)
+        input_offsets_seconds = None
+        if offsets_seconds is not None:
+            input_offsets_seconds = [
+                offsets_seconds[i] for i, d in enumerate(track_datas) if d
+            ]
+        for idx, data in enumerate(valid_track_datas):
+            args = (
+                f"time_base=1/{target_sample_rate}:"
+                f"sample_rate={target_sample_rate}:"
+                f"sample_fmt=s32:"
+                f"channel_layout=stereo"
+            )
+            in_ctx = graph.add("abuffer", args=args, name=f"in{idx}")
+            inputs.append(in_ctx)
+
+        if not inputs:
+            self.logger.warning("Mixdown skipped - no valid inputs for graph")
+            return
+
+        mixer = graph.add("amix", args=f"inputs={len(inputs)}:normalize=0", name="mix")
+
+        fmt = graph.add(
+            "aformat",
+            args=(
+                f"sample_fmts=s32:channel_layouts=stereo:sample_rates={target_sample_rate}"
+            ),
+            name="fmt",
+        )
+
+        sink = graph.add("abuffersink", name="out")
+
+        # Optional per-input delay before mixing
+        delays_ms: list[int] = []
+        if input_offsets_seconds is not None:
+            base = min(input_offsets_seconds) if input_offsets_seconds else 0.0
+            delays_ms = [
+                max(0, int(round((o - base) * 1000))) for o in input_offsets_seconds
+            ]
+        else:
+            delays_ms = [0 for _ in inputs]
+
+        for idx, in_ctx in enumerate(inputs):
+            delay_ms = delays_ms[idx] if idx < len(delays_ms) else 0
+            if delay_ms > 0:
+                # adelay requires one value per channel; use same for stereo
+                adelay = graph.add(
+                    "adelay",
+                    args=f"delays={delay_ms}|{delay_ms}:all=1",
+                    name=f"delay{idx}",
+                )
+                in_ctx.link_to(adelay)
+                adelay.link_to(mixer, 0, idx)
+            else:
+                in_ctx.link_to(mixer, 0, idx)
+        mixer.link_to(fmt)
+        fmt.link_to(sink)
+        graph.configure()
+
+        # Open containers for decoding
+        containers = []
+        for i, d in enumerate(valid_track_datas):
+            try:
+                c = av.open(io.BytesIO(d))
+                containers.append(c)
+            except Exception as e:
+                self.logger.warning(
+                    "Mixdown: failed to open container", input=i, error=str(e)
+                )
+                containers.append(None)
+        # Filter out Nones for decoders
+        containers = [c for c in containers if c is not None]
+        decoders = [c.decode(audio=0) for c in containers]
+        active = [True] * len(decoders)
+        # Per-input resamplers to enforce s32/stereo at the same rate (no resample of rate)
+        resamplers = [
+            AudioResampler(format="s32", layout="stereo", rate=target_sample_rate)
+            for _ in decoders
+        ]
+
+        try:
+            # Round-robin feed frames into graph, pull mixed frames as they become available
+            while any(active):
+                for i, (dec, is_active) in enumerate(zip(decoders, active)):
+                    if not is_active:
+                        continue
+                    try:
+                        frame = next(dec)
+                    except StopIteration:
+                        active[i] = False
+                        continue
+
+                    # Enforce same sample rate; convert format/layout to s16/stereo (no resample)
+                    if frame.sample_rate != target_sample_rate:
+                        # Skip frames with differing rate
+                        continue
+                    out_frames = resamplers[i].resample(frame) or []
+                    for rf in out_frames:
+                        rf.sample_rate = target_sample_rate
+                        rf.time_base = Fraction(1, target_sample_rate)
+                        inputs[i].push(rf)
+
+                    # Drain available mixed frames
+                    while True:
+                        try:
+                            mixed = sink.pull()
+                        except Exception:
+                            break
+                        mixed.sample_rate = target_sample_rate
+                        mixed.time_base = Fraction(1, target_sample_rate)
+                        await writer.push(mixed)
+
+            # Signal EOF to inputs and drain remaining
+            for in_ctx in inputs:
+                in_ctx.push(None)
+            while True:
+                try:
+                    mixed = sink.pull()
+                except Exception:
+                    break
+                mixed.sample_rate = target_sample_rate
+                mixed.time_base = Fraction(1, target_sample_rate)
+                await writer.push(mixed)
+        finally:
+            for c in containers:
+                c.close()
+
+    async def set_status(self, transcript_id: str, status: TranscriptStatus):
+        async with self.lock_transaction():
+            return await transcripts_controller.set_status(transcript_id, status)
+
+    async def process(self, bucket_name: str, track_keys: list[str]):
+        transcript = await self.get_transcript()
+
+        s3 = boto3.client(
+            "s3",
+            region_name=settings.RECORDING_STORAGE_AWS_REGION,
+            aws_access_key_id=settings.RECORDING_STORAGE_AWS_ACCESS_KEY_ID,
+            aws_secret_access_key=settings.RECORDING_STORAGE_AWS_SECRET_ACCESS_KEY,
+        )
+
+        storage = get_transcripts_storage()
+
+        # Pre-download bytes for all tracks for mixing and transcription
+        track_datas: list[bytes] = []
+        for key in track_keys:
+            try:
+                obj = s3.get_object(Bucket=bucket_name, Key=key)
+                track_datas.append(obj["Body"].read())
+            except Exception as e:
+                self.logger.warning(
+                    "Skipping track - cannot read S3 object", key=key, error=str(e)
+                )
+                track_datas.append(b"")
+
+        # REMOVED: Filename offset extraction - not needed anymore!
+        # We use stream metadata start_time for padding instead
+
+        # Get stream metadata start_times for mixing (still useful for mixdown)
+        stream_start_times: list[float] = []
+        for data in track_datas:
+            if not data:
+                stream_start_times.append(0.0)
+                continue
+
+            container = av.open(io.BytesIO(data))
+            try:
+                audio_stream = container.streams.audio[0]
+                if (
+                    audio_stream.start_time is not None
+                    and audio_stream.time_base is not None
+                ):
+                    start_time = float(audio_stream.start_time * audio_stream.time_base)
+                else:
+                    start_time = 0.0
+                stream_start_times.append(start_time)
+            finally:
+                container.close()
+
+        # Mixdown all available tracks into transcript.audio_mp3_filename, using stream metadata offsets
+        try:
+            mp3_writer = AudioFileWriterProcessor(
+                path=str(transcript.audio_mp3_filename)
+            )
+            await self.mixdown_tracks(track_datas, mp3_writer, stream_start_times)
+            await mp3_writer.flush()
+        except Exception as e:
+            self.logger.error("Mixdown failed", error=str(e))
+
+        # PAD TRACKS BEFORE TRANSCRIPTION - THIS IS THE KEY FIX!
+        padded_track_urls: list[str] = []
+        for idx, data in enumerate(track_datas):
+            if not data:
+                padded_track_urls.append("")
+                continue
+
+            _, padded_url = await self.pad_track_for_transcription(data, idx, storage)
+            padded_track_urls.append(padded_url)
+            self.logger.info(f"Padded track {idx} for transcription: {padded_url}")
+
+        # Transcribe PADDED tracks - timestamps will be automatically correct!
+        speaker_transcripts: list[TranscriptType] = []
+        for idx, padded_url in enumerate(padded_track_urls):
+            if not padded_url:
+                continue
+
+            try:
+                # Transcribe the PADDED track
+                t = await self.transcribe_file(padded_url, transcript.source_language)
+            except Exception as e:
+                self.logger.error(
+                    "Transcription via default backend failed, trying local whisper",
+                    track_idx=idx,
+                    url=padded_url,
+                    error=str(e),
+                )
+                try:
+                    fallback = FileTranscriptAutoProcessor(name="whisper")
+                    result = None
+
+                    async def capture_result(r):
+                        nonlocal result
+                        result = r
+
+                    fallback.on(capture_result)
+                    await fallback.push(
+                        FileTranscriptInput(
+                            audio_url=padded_url, language=transcript.source_language
+                        )
+                    )
+                    await fallback.flush()
+                    if not result:
+                        raise Exception("No transcript captured in fallback")
+                    t = result
+                except Exception as e2:
+                    self.logger.error(
+                        "Skipping track - transcription failed after fallback",
+                        track_idx=idx,
+                        url=padded_url,
+                        error=str(e2),
+                    )
+                    continue
+
+            if not t.words:
+                continue
+
+            # NO OFFSET ADJUSTMENT NEEDED!
+            # Timestamps are already correct because we transcribed padded tracks
+            # Just set speaker ID
+            for w in t.words:
+                w.speaker = idx
+
+            speaker_transcripts.append(t)
+            self.logger.info(
+                f"Track {idx} transcribed successfully with {len(t.words)} words",
+                track_idx=idx,
+            )
+
+        if not speaker_transcripts:
+            raise Exception("No valid track transcriptions")
+
+        # Merge all words and sort by timestamp
+        merged_words = []
+        for t in speaker_transcripts:
+            merged_words.extend(t.words)
+        merged_words.sort(
+            key=lambda w: w.start if hasattr(w, "start") and w.start is not None else 0
+        )
+
+        merged_transcript = TranscriptType(words=merged_words, translation=None)
+
+        await transcripts_controller.append_event(
+            transcript,
+            event="TRANSCRIPT",
+            data=TranscriptText(
+                text=merged_transcript.text, translation=merged_transcript.translation
+            ),
+        )
+
+        topics = await self.detect_topics(merged_transcript, transcript.target_language)
+        await asyncio.gather(
+            self.generate_title(topics),
+            self.generate_summaries(topics),
+            return_exceptions=False,
+        )
+
+        await self.set_status(transcript.id, "ended")
+
+    async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType:
+        processor = FileTranscriptAutoProcessor()
+        input_data = FileTranscriptInput(audio_url=audio_url, language=language)
+
+        result: TranscriptType | None = None
+
+        async def capture_result(transcript):
+            nonlocal result
+            result = transcript
+
+        processor.on(capture_result)
+        await processor.push(input_data)
+        await processor.flush()
+
+        if not result:
+            raise ValueError("No transcript captured")
+
+        return result
+
+    async def detect_topics(
+        self, transcript: TranscriptType, target_language: str
+    ) -> list[TitleSummary]:
+        chunk_size = 300
+        topics: list[TitleSummary] = []
+
+        async def on_topic(topic: TitleSummary):
+            topics.append(topic)
+            return await self.on_topic(topic)
+
+        topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic)
+        topic_detector.set_pipeline(self.empty_pipeline)
+
+        for i in range(0, len(transcript.words), chunk_size):
+            chunk_words = transcript.words[i : i + chunk_size]
+            if not chunk_words:
+                continue
+
+            chunk_transcript = TranscriptType(
+                words=chunk_words, translation=transcript.translation
+            )
+            await topic_detector.push(chunk_transcript)
+
+        await topic_detector.flush()
+        return topics
+
+    async def generate_title(self, topics: list[TitleSummary]):
+        if not topics:
+            self.logger.warning("No topics for title generation")
+            return
+
+        processor = TranscriptFinalTitleProcessor(callback=self.on_title)
+        processor.set_pipeline(self.empty_pipeline)
+
+        for topic in topics:
+            await processor.push(topic)
+
+        await processor.flush()
+
+    async def generate_summaries(self, topics: list[TitleSummary]):
+        if not topics:
+            self.logger.warning("No topics for summary generation")
+            return
+
+        transcript = await self.get_transcript()
+        processor = TranscriptFinalSummaryProcessor(
+            transcript=transcript,
+            callback=self.on_long_summary,
+            on_short_summary=self.on_short_summary,
+        )
+        processor.set_pipeline(self.empty_pipeline)
+
+        for topic in topics:
+            await processor.push(topic)
+
+        await processor.flush()
+
+
+@shared_task
+@asynctask
+async def task_pipeline_multitrack_process(
+    *, transcript_id: str, bucket_name: str, track_keys: list[str]
+):
+    pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
+    try:
+        await pipeline.set_status(transcript_id, "processing")
+        await pipeline.process(bucket_name, track_keys)
+    except Exception:
+        await pipeline.set_status(transcript_id, "error")
+        raise
+
+    post_chain = chain(
+        task_cleanup_consent.si(transcript_id=transcript_id),
+        task_pipeline_post_to_zulip.si(transcript_id=transcript_id),
+        task_send_webhook_if_needed.si(transcript_id=transcript_id),
+    )
+    post_chain.delay()
--- a/server/reflector/processors/transcript_topic_detector.py
+++ b/server/reflector/processors/transcript_topic_detector.py
@@ -1,6 +1,6 @@
 from textwrap import dedent

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, ConfigDict, Field

 from reflector.llm import LLM
 from reflector.processors.base import Processor
@@ -34,8 +34,14 @@ TOPIC_PROMPT = dedent(
 class TopicResponse(BaseModel):
    """Structured response for topic detection"""

-    title: str = Field(description="A descriptive title for the topic being discussed")
-    summary: str = Field(description="A concise 1-2 sentence summary of the discussion")
+    model_config = ConfigDict(populate_by_name=True)
+
+    title: str = Field(
+        description="A descriptive title for the topic being discussed", alias="Title"
+    )
+    summary: str = Field(
+        description="A concise 1-2 sentence summary of the discussion", alias="Summary"
+    )


 class TranscriptTopicDetectorProcessor(Processor):
--- a/server/reprocess_transcript.py
+++ b/server/reprocess_transcript.py
@@ -0,0 +1,65 @@
+#!/usr/bin/env python
+"""
+Reprocess the Daily.co multitrack recording to fix audio mixdown
+"""
+
+import asyncio
+
+from reflector.pipelines.main_multitrack_pipeline import (
+    task_pipeline_multitrack_process,
+)
+
+
+async def reprocess():
+    """Process the multitrack recording with fixed mixdown"""
+
+    bucket_name = "reflector-dailyco-local"
+    track_keys = [
+        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
+        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
+    ]
+
+    # Create a new transcript with fixed mixdown
+    import uuid
+
+    from reflector.db import get_database
+    from reflector.db.transcripts import Transcript, transcripts
+
+    db = get_database()
+    await db.connect()
+
+    try:
+        transcript_id = str(uuid.uuid4())
+        transcript = Transcript(
+            id=transcript_id,
+            name="Daily Multitrack - With Audio Mixdown",
+            source_kind="file",
+            source_language="en",
+            target_language="en",
+            status="idle",
+            events=[],
+            title="",
+        )
+
+        query = transcripts.insert().values(**transcript.model_dump())
+        await db.execute(query)
+        print(f"Created transcript: {transcript_id}")
+
+        # Process with the fixed pipeline
+        await task_pipeline_multitrack_process(
+            transcript_id=transcript_id, bucket_name=bucket_name, track_keys=track_keys
+        )
+
+        print(
+            f"Processing complete! Check: http://localhost:3000/transcripts/{transcript_id}"
+        )
+
+        return transcript_id
+    finally:
+        await db.disconnect()
+
+
+if __name__ == "__main__":
+    transcript_id = asyncio.run(reprocess())
+    print(f"\n✅ Reprocessing complete!")
+    print(f"📍 View at: http://localhost:3000/transcripts/{transcript_id}")
--- a/server/test_multitrack_ffmpeg.py
+++ b/server/test_multitrack_ffmpeg.py
@@ -0,0 +1,124 @@
+#!/usr/bin/env python
+"""
+Test script to trigger multitrack recording processing with ffmpeg padding fix
+"""
+
+import asyncio
+
+from reflector.pipelines.main_multitrack_pipeline import PipelineMainMultitrack
+
+
+async def test_processing():
+    """Manually trigger multitrack processing for the test recording"""
+
+    # Initialize database connection
+    from reflector.db import get_database
+
+    db = get_database()
+    await db.connect()
+
+    try:
+        # The test recording with known speaker timeline
+        bucket_name = "monadical"
+        track_keys = [
+            "daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm",
+            "daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823.webm",
+        ]
+
+        # Create a new transcript ID
+        import uuid
+
+        transcript_id = str(uuid.uuid4())
+
+        # Create transcript directly with SQL
+        from reflector.db.transcripts import (
+            Transcript,
+            transcripts,
+            transcripts_controller,
+        )
+
+        pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
+
+        # Create transcript model
+        transcript = Transcript(
+            id=transcript_id,
+            name="FFMPEG Test - Daily Multitrack Recording",
+            source_kind="file",
+            source_language="en",
+            target_language="en",
+            status="idle",
+            events=[],
+            title="",
+        )
+        # Insert into database
+        query = transcripts.insert().values(**transcript.model_dump())
+        await db.execute(query)
+        print(f"Created transcript: {transcript_id}")
+
+        # Process the tracks using the pipeline
+        print(f"Processing multitrack recording with ffmpeg padding...")
+        print(f"Track 0: ...935922.webm (expected to start at ~2s)")
+        print(f"Track 1: ...943823.webm (expected to start at ~51s)")
+
+        try:
+            await pipeline.set_status(transcript_id, "processing")
+            await pipeline.process(bucket_name, track_keys)
+            print(f"Processing complete!")
+        except Exception as e:
+            await pipeline.set_status(transcript_id, "error")
+            print(f"Error during processing: {e}")
+            import traceback
+
+            traceback.print_exc()
+            raise
+
+        # Check the results
+        final_transcript = await transcripts_controller.get(transcript_id)
+        print(f"\nTranscript status: {final_transcript.status}")
+        print(f"Transcript title: {final_transcript.title}")
+
+        # Extract timeline from events
+        if final_transcript.events:
+            for event in final_transcript.events:
+                if event.get("event") == "TRANSCRIPT":
+                    text = event.get("data", {}).get("text", "")
+                    # Show first 500 chars to check if speakers are properly separated
+                    print(f"\nTranscript text (first 500 chars):")
+                    print(text[:500])
+
+                    # Show last 500 chars too to see if second speaker is at the end
+                    print(f"\nTranscript text (last 500 chars):")
+                    print(text[-500:])
+
+                    # Count words per speaker
+                    words = text.split()
+                    print(f"\nTotal words in transcript: {len(words)}")
+
+                    # Check if text has proper speaker separation
+                    # Expected: First ~45% from speaker 0, then ~35% from speaker 1, then ~20% from speaker 0
+                    first_third = " ".join(words[: len(words) // 3])
+                    middle_third = " ".join(
+                        words[len(words) // 3 : 2 * len(words) // 3]
+                    )
+                    last_third = " ".join(words[2 * len(words) // 3 :])
+
+                    print(f"\nFirst third preview: {first_third[:100]}...")
+                    print(f"Middle third preview: {middle_third[:100]}...")
+                    print(f"Last third preview: {last_third[:100]}...")
+                    break
+
+        return transcript_id
+    finally:
+        await db.disconnect()
+
+
+if __name__ == "__main__":
+    transcript_id = asyncio.run(test_processing())
+    print(f"\n✅ Test complete! Transcript ID: {transcript_id}")
+    print(f"\nExpected timeline:")
+    print(f"  Speaker 0: ~2s to ~49s (first participant speaks)")
+    print(f"  Speaker 1: ~51s to ~70s (second participant speaks)")
+    print(f"  Speaker 0: ~73s to end (first participant speaks again)")
+    print(
+        f"\nIf the text shows proper chronological order (not interleaved), the fix worked!"
+    )
--- a/server/test_multitrack_ffmpeg_local.py
+++ b/server/test_multitrack_ffmpeg_local.py
@@ -0,0 +1,162 @@
+#!/usr/bin/env python
+"""
+Test script to trigger multitrack recording processing with ffmpeg padding fix
+This version loads tracks from local filesystem instead of S3
+"""
+
+import asyncio
+import os
+
+from reflector.pipelines.main_multitrack_pipeline import PipelineMainMultitrack
+
+
+async def test_processing():
+    """Manually trigger multitrack processing for the test recording"""
+
+    # Initialize database connection
+    from reflector.db import get_database
+
+    db = get_database()
+    await db.connect()
+
+    try:
+        # Create a new transcript ID
+        import uuid
+
+        transcript_id = str(uuid.uuid4())
+
+        # Create transcript directly with SQL
+        from reflector.db.transcripts import (
+            Transcript,
+            transcripts,
+            transcripts_controller,
+        )
+
+        pipeline = PipelineMainMultitrack(transcript_id=transcript_id)
+
+        # Create transcript model
+        transcript = Transcript(
+            id=transcript_id,
+            name="FFMPEG Test - Daily Multitrack Recording",
+            source_kind="file",
+            source_language="en",
+            target_language="en",
+            status="idle",
+            events=[],
+            title="",
+        )
+        # Insert into database
+        query = transcripts.insert().values(**transcript.model_dump())
+        await db.execute(query)
+        print(f"Created transcript: {transcript_id}")
+
+        # Read track files from local filesystem (in the container they'll be at /app/)
+        tracks_dir = "/app"
+        track_files = [
+            "1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922.webm",
+            "1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823.webm",
+        ]
+
+        # Read track data
+        track_datas = []
+        for track_file in track_files:
+            file_path = os.path.join(tracks_dir, track_file)
+            if os.path.exists(file_path):
+                with open(file_path, "rb") as f:
+                    track_datas.append(f.read())
+                print(f"Loaded track: {track_file} ({len(track_datas[-1])} bytes)")
+            else:
+                print(f"Track file not found: {file_path}")
+                track_datas.append(b"")
+
+        # Process the tracks using the pipeline
+        print(f"\nProcessing multitrack recording with ffmpeg padding...")
+        print(f"Track 0: ...935922.webm (expected to start at ~2s)")
+        print(f"Track 1: ...943823.webm (expected to start at ~51s)")
+
+        # Call the process method directly with track data
+        # We'll need to mock S3 operations and directly work with the data
+
+        # Save tracks to temporary files and process them
+
+        try:
+            await pipeline.set_status(transcript_id, "processing")
+
+            # Create a mock bucket and keys setup
+            bucket_name = "test-bucket"
+            track_keys = ["track0.webm", "track1.webm"]
+
+            # Mock S3 client to return our local data
+            from unittest.mock import MagicMock, patch
+
+            mock_s3 = MagicMock()
+
+            def mock_get_object(Bucket, Key):
+                idx = 0 if "track0" in Key else 1
+                return {"Body": MagicMock(read=lambda: track_datas[idx])}
+
+            mock_s3.get_object = mock_get_object
+
+            # Patch boto3.client to return our mock
+            with patch("boto3.client", return_value=mock_s3):
+                await pipeline.process(bucket_name, track_keys)
+
+            print(f"Processing complete!")
+        except Exception as e:
+            await pipeline.set_status(transcript_id, "error")
+            print(f"Error during processing: {e}")
+            import traceback
+
+            traceback.print_exc()
+            raise
+
+        # Check the results
+        final_transcript = await transcripts_controller.get(transcript_id)
+        print(f"\nTranscript status: {final_transcript.status}")
+        print(f"Transcript title: {final_transcript.title}")
+
+        # Extract timeline from events
+        if final_transcript.events:
+            for event in final_transcript.events:
+                if event.get("event") == "TRANSCRIPT":
+                    text = event.get("data", {}).get("text", "")
+                    # Show first 500 chars to check if speakers are properly separated
+                    print(f"\nTranscript text (first 500 chars):")
+                    print(text[:500])
+
+                    # Show last 500 chars too to see if second speaker is at the end
+                    print(f"\nTranscript text (last 500 chars):")
+                    print(text[-500:])
+
+                    # Count words per speaker
+                    words = text.split()
+                    print(f"\nTotal words in transcript: {len(words)}")
+
+                    # Check if text has proper speaker separation
+                    # Expected: First ~45% from speaker 0, then ~35% from speaker 1, then ~20% from speaker 0
+                    first_third = " ".join(words[: len(words) // 3])
+                    middle_third = " ".join(
+                        words[len(words) // 3 : 2 * len(words) // 3]
+                    )
+                    last_third = " ".join(words[2 * len(words) // 3 :])
+
+                    print(f"\nFirst third preview: {first_third[:100]}...")
+                    print(f"Middle third preview: {middle_third[:100]}...")
+                    print(f"Last third preview: {last_third[:100]}...")
+                    break
+
+        return transcript_id
+    finally:
+        await db.disconnect()
+
+
+if __name__ == "__main__":
+    transcript_id = asyncio.run(test_processing())
+    print(f"\n✅ Test complete! Transcript ID: {transcript_id}")
+    print(f"\nExpected timeline:")
+    print(f"  Speaker 0: ~2s to ~49s (first participant speaks)")
+    print(f"  Speaker 1: ~51s to ~70s (second participant speaks)")
+    print(f"  Speaker 0: ~73s to end (first participant speaks again)")
+    print(
+        f"\nIf the text shows proper chronological order (not interleaved), the fix worked!"
+    )
--- a/server/test_s3_multitrack.py
+++ b/server/test_s3_multitrack.py
@@ -0,0 +1,66 @@
+#!/usr/bin/env python
+"""
+Test multitrack processing with correct S3 bucket configuration
+"""
+
+import asyncio
+import uuid
+
+from reflector.db import get_database
+from reflector.db.transcripts import Transcript, transcripts
+from reflector.pipelines.main_multitrack_pipeline import (
+    task_pipeline_multitrack_process,
+)
+
+
+async def create_and_process():
+    """Create a new transcript and process with correct S3 bucket"""
+
+    # Correct S3 configuration
+    bucket_name = "reflector-dailyco-local"
+    track_keys = [
+        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
+        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
+    ]
+
+    # Create a new transcript
+    db = get_database()
+    await db.connect()
+
+    try:
+        transcript_id = str(uuid.uuid4())
+        transcript = Transcript(
+            id=transcript_id,
+            name="Daily Multitrack - Correct S3 Bucket Test",
+            source_kind="file",
+            source_language="en",
+            target_language="en",
+            status="idle",
+            events=[],
+            title="",
+        )
+
+        query = transcripts.insert().values(**transcript.model_dump())
+        await db.execute(query)
+        print(f"Created transcript: {transcript_id}")
+
+        # Trigger processing with Celery
+        result = task_pipeline_multitrack_process.delay(
+            transcript_id=transcript_id, bucket_name=bucket_name, track_keys=track_keys
+        )
+
+        print(f"Task ID: {result.id}")
+        print(
+            f"Processing started! Check: http://localhost:3000/transcripts/{transcript_id}"
+        )
+        print(f"API Status: http://localhost:1250/v1/transcripts/{transcript_id}")
+
+        return transcript_id
+    finally:
+        await db.disconnect()
+
+
+if __name__ == "__main__":
+    transcript_id = asyncio.run(create_and_process())
+    print(f"\n✅ Task submitted successfully!")
+    print(f"📍 Transcript ID: {transcript_id}")
--- a/server/trigger_reprocess.py
+++ b/server/trigger_reprocess.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python
+"""
+Trigger reprocessing of Daily.co multitrack recording via Celery
+"""
+
+from reflector.pipelines.main_multitrack_pipeline import (
+    task_pipeline_multitrack_process,
+)
+
+# Trigger the Celery task
+result = task_pipeline_multitrack_process.delay(
+    transcript_id="32fad706-f8cf-434c-94c8-1ee69f7be081",  # The ID that was created
+    bucket_name="reflector-dailyco-local",
+    track_keys=[
+        "monadical/daily-20251020193458/1760988935484-52f7f48b-fbab-431f-9a50-87b9abfc8255-cam-audio-1760988935922",
+        "monadical/daily-20251020193458/1760988935484-a37c35e3-6f8e-4274-a482-e9d0f102a732-cam-audio-1760988943823",
+    ],
+)
+
+print(f"Task ID: {result.id}")
+print(
+    f"Processing started! Check: http://localhost:3000/transcripts/32fad706-f8cf-434c-94c8-1ee69f7be081"
+)