diff --git a/server/docs/video-platforms/README.md b/server/docs/video-platforms/README.md index 15734db3..eeb21b47 100644 --- a/server/docs/video-platforms/README.md +++ b/server/docs/video-platforms/README.md @@ -176,15 +176,17 @@ Parse timestamps: Track 0: Filename offset: 438ms PyAV metadata: 229ms - Difference: 209ms + Difference: ~200ms Track 1: Filename offset: 8339ms PyAV metadata: 8130ms - Difference: 209ms + Difference: ~200ms ``` -**Consistent 209ms delta** suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata). +**Consistent ~200ms delta** suggests network/encoding delay between file upload initiation (filename) and actual audio stream start (metadata). + +**Note:** The ~200ms difference observed in this test recording is not crucial for timing accuracy. Either method (filename timestamps or PyAV metadata) works well for multi-track alignment. Filename timestamps are preferable as they are better officially documented by Daily.co. **Current implementation uses PyAV metadata** because: - More accurate (represents when audio actually started) diff --git a/server/reflector/pipelines/main_multitrack_pipeline.py b/server/reflector/pipelines/main_multitrack_pipeline.py index b3b18fe0..55b25566 100644 --- a/server/reflector/pipelines/main_multitrack_pipeline.py +++ b/server/reflector/pipelines/main_multitrack_pipeline.py @@ -91,12 +91,12 @@ class PipelineMainMultitrack(PipelineMainBase): - Track 0: (1760988935922 - 1760988935484) / 1000 = 0.438s - Track 1: (1760988943823 - 1760988935484) / 1000 = 8.339s - TIME DIFFERENCE: PyAV metadata vs filename timestamps differ by ~209ms: - - Track 0: filename=438ms, metadata=229ms (diff: 209ms) - - Track 1: filename=8339ms, metadata=8130ms (diff: 209ms) + TIME DIFFERENCE: PyAV metadata vs filename timestamps differ by ~200ms: + - Track 0: filename=438ms, metadata=229ms (diff: ~200ms) + - Track 1: filename=8339ms, metadata=8130ms (diff: ~200ms) - Consistent delta suggests network/encoding delay. PyAV metadata is ground truth - (represents when audio stream actually started vs when file upload initiated). + Note: The ~200ms difference isn't crucial - either method works for alignment. + Filename timestamps are preferable due to being better officially documented. Example with 2 participants: Track A: start_time=0.2s → Joined 200ms after recording began diff --git a/server/reflector/utils/audio_padding.py b/server/reflector/utils/audio_padding.py index 0fb6fabb..67143f54 100644 --- a/server/reflector/utils/audio_padding.py +++ b/server/reflector/utils/audio_padding.py @@ -25,7 +25,9 @@ def extract_stream_start_time_from_container( """Extract meeting-relative start time from WebM stream metadata. Uses PyAV to read stream.start_time from WebM container. - More accurate than filename timestamps by ~209ms due to network/encoding delays. + Note: Differs from filename timestamps by ~200ms in test recordings, but this difference + is not crucial - either method works. Filename timestamps are preferable due to being + better officially documented by Daily.co. Args: container: PyAV container opened from audio file/URL