feat: pipeline improvement with file processing, parakeet, silero-vad (#540)

* feat: improve pipeline threading, and transcriber (parakeet and silero vad)

* refactor: remove whisperx, implement parakeet

* refactor: make audio_chunker more smart and wait for speech, instead of fixed frame

* refactor: make audio merge to always downscale the audio to 16k for transcription

* refactor: make the audio transcript modal accepting batches

* refactor: improve type safety and remove prometheus metrics

- Add DiarizationSegment TypedDict for proper diarization typing
- Replace List/Optional with modern Python list/| None syntax
- Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor
- Add comprehensive file processing pipeline with parallel execution
- Update processor imports and type annotations throughout
- Implement optimized file pipeline as default in process.py tool

* refactor: convert FileDiarizationProcessor I/O types to BaseModel

Update FileDiarizationInput and FileDiarizationOutput to inherit from
BaseModel instead of plain classes, following the standard pattern
used by other processors in the codebase.

* test: add tests for file transcript and diarization with pytest-recording

* build: add pytest-recording

* feat: add local pyannote for testing

* fix: replace PyAV AudioResampler with torchaudio for reliable audio processing

- Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument
- Use torchaudio.functional.resample for robust sample rate conversion
- Optimize processing: skip conversion for already 16kHz mono audio
- Add direct WAV writing with Python wave module for better performance
- Consolidate duplicate downsample checks for cleaner code
- Maintain list[av.AudioFrame] input interface
- Required for Silero VAD which needs 16kHz mono audio

* fix: replace PyAV AudioResampler with torchaudio solution

- Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor
- Replaces problematic PyAV AudioResampler with torchaudio.functional.resample
- Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono
- Uses direct WAV writing with Python's wave module for better performance
- Fixes test_basic_process to disable diarization (pyannote dependency not installed)
- Updates test expectations to match actual processor behavior
- Removes unused pydub dependency from pyproject.toml
- Adds comprehensive TEST_ANALYSIS.md documenting test suite status

* feat: add parameterized test for both diarization modes

- Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True]
- Test with diarization=False always passes (tests core AudioMergeProcessor functionality)
- Test with diarization=True gracefully skips when pyannote.audio is not installed
- Provides comprehensive test coverage for both pipeline configurations

* fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor

- Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute
- Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called
- Updates property usage in _diarize method to use new name
- Now correctly supports pipeline initialization for diarization processing

* fix: add local for pyannote

* test: add diarization test

* fix: resample on audio merge now working

* fix: correctly restore timestamp

* fix: display exception in a threaded processor if that happen

* Update pyproject.toml

* ci: remove option

* ci: update astral-sh/setup-uv

* test: add monadical url for pytest-recording

* refactor: remove previous version

* build: move faster whisper to local dep

* test: fix missing import

* refactor: improve main_file_pipeline organization and error handling

- Move all imports to the top of the file
- Create unified EmptyPipeline class to replace duplicate mock pipeline code
- Remove timeout and fallback logic - let processors handle their own retries
- Fix error handling to raise any exception from parallel tasks
- Add proper type hints and validation for captured results

* fix: wrong function

* fix: remove task_done

* feat: add configurable file processing timeouts for modal processors

- Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription
- Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization
- Replace hardcoded timeout=600 with configurable settings in modal processors
- Allows customization of timeout values via environment variables

* fix: use logger

* fix: worker process meetings now use file pipeline

* fix: topic not gathered

* refactor: remove prepare(), pipeline now work

* refactor: implement many review from Igor

* test: add test for test_pipeline_main_file

* refactor: remove doc

* doc: add doc

* ci: update build to use native arm64 builder

* fix: merge fixes

* refactor: changes from Igor review + add test (not by default) to test gpu modal part

* ci: update to our own runner linux-amd64

* ci: try using suggested mode=min

* fix: update diarizer for latest modal, and use volume

* fix: modal file extension detection

* fix: put the diarizer as A100
This commit is contained in:
2025-08-20 20:07:19 -06:00
committed by GitHub
parent 009590c080
commit 3ea7f6b7b6
37 changed files with 5086 additions and 198 deletions

View File

@@ -0,0 +1,40 @@
interactions:
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
authorization:
- DUMMY_API_KEY
connection:
- keep-alive
content-length:
- '0'
host:
- monadical-sas--reflector-diarizer-web.modal.run
user-agent:
- python-httpx/0.27.2
method: POST
uri: https://monadical-sas--reflector-diarizer-web.modal.run/diarize?audio_file_url=https%3A%2F%2Freflector-github-pytest.s3.us-east-1.amazonaws.com%2Ftest_mathieu_hello.mp3&timestamp=0
response:
body:
string: '{"diarization":[{"start":0.823,"end":1.91,"speaker":0},{"start":2.572,"end":6.409,"speaker":0},{"start":6.783,"end":10.62,"speaker":0},{"start":11.231,"end":14.168,"speaker":0},{"start":14.796,"end":19.295,"speaker":0}]}'
headers:
Alt-Svc:
- h3=":443"; ma=2592000
Content-Length:
- '220'
Content-Type:
- application/json
Date:
- Wed, 13 Aug 2025 18:25:34 GMT
Modal-Function-Call-Id:
- fc-01K2JAVNEP6N7Y1Y7W3T98BCXK
Vary:
- accept-encoding
status:
code: 200
message: OK
version: 1

View File

@@ -0,0 +1,46 @@
interactions:
- request:
body: '{"audio_file_url": "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3",
"language": "en", "batch": true}'
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
authorization:
- DUMMY_API_KEY
connection:
- keep-alive
content-length:
- '136'
content-type:
- application/json
host:
- monadical-sas--reflector-transcriber-parakeet-web.modal.run
user-agent:
- python-httpx/0.27.2
method: POST
uri: https://monadical-sas--reflector-transcriber-parakeet-web.modal.run/v1/audio/transcriptions-from-url
response:
body:
string: '{"text":"Hi there everyone. Today I want to share my incredible experience
with Reflector. a Q teenage product that revolutionizes audio processing.
With reflector, I can easily convert any audio into accurate transcription.
saving me hours of tedious manual work.","words":[{"word":"Hi","start":0.87,"end":1.19},{"word":"there","start":1.19,"end":1.35},{"word":"everyone.","start":1.51,"end":1.83},{"word":"Today","start":2.63,"end":2.87},{"word":"I","start":3.36,"end":3.52},{"word":"want","start":3.6,"end":3.76},{"word":"to","start":3.76,"end":3.92},{"word":"share","start":3.92,"end":4.16},{"word":"my","start":4.16,"end":4.4},{"word":"incredible","start":4.32,"end":4.96},{"word":"experience","start":4.96,"end":5.44},{"word":"with","start":5.44,"end":5.68},{"word":"Reflector.","start":5.68,"end":6.24},{"word":"a","start":6.93,"end":7.01},{"word":"Q","start":7.01,"end":7.17},{"word":"teenage","start":7.25,"end":7.65},{"word":"product","start":7.89,"end":8.29},{"word":"that","start":8.29,"end":8.61},{"word":"revolutionizes","start":8.61,"end":9.65},{"word":"audio","start":9.65,"end":10.05},{"word":"processing.","start":10.05,"end":10.53},{"word":"With","start":11.27,"end":11.43},{"word":"reflector,","start":11.51,"end":12.15},{"word":"I","start":12.31,"end":12.39},{"word":"can","start":12.39,"end":12.55},{"word":"easily","start":12.55,"end":12.95},{"word":"convert","start":12.95,"end":13.43},{"word":"any","start":13.43,"end":13.67},{"word":"audio","start":13.67,"end":13.99},{"word":"into","start":14.98,"end":15.06},{"word":"accurate","start":15.22,"end":15.54},{"word":"transcription.","start":15.7,"end":16.34},{"word":"saving","start":16.99,"end":17.15},{"word":"me","start":17.31,"end":17.47},{"word":"hours","start":17.47,"end":17.87},{"word":"of","start":17.87,"end":18.11},{"word":"tedious","start":18.11,"end":18.67},{"word":"manual","start":18.67,"end":19.07},{"word":"work.","start":19.07,"end":19.31}]}'
headers:
Alt-Svc:
- h3=":443"; ma=2592000
Content-Length:
- '1933'
Content-Type:
- application/json
Date:
- Wed, 13 Aug 2025 18:26:59 GMT
Modal-Function-Call-Id:
- fc-01K2JAWC7GAMKX4DSJ21WV31NG
Vary:
- accept-encoding
status:
code: 200
message: OK
version: 1

View File

@@ -0,0 +1,84 @@
interactions:
- request:
body: '{"audio_file_url": "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3",
"language": "en", "batch": true}'
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
authorization:
- DUMMY_API_KEY
connection:
- keep-alive
content-length:
- '136'
content-type:
- application/json
host:
- monadical-sas--reflector-transcriber-parakeet-web.modal.run
user-agent:
- python-httpx/0.27.2
method: POST
uri: https://monadical-sas--reflector-transcriber-parakeet-web.modal.run/v1/audio/transcriptions-from-url
response:
body:
string: '{"text":"Hi there everyone. Today I want to share my incredible experience
with Reflector. a Q teenage product that revolutionizes audio processing.
With reflector, I can easily convert any audio into accurate transcription.
saving me hours of tedious manual work.","words":[{"word":"Hi","start":0.87,"end":1.19},{"word":"there","start":1.19,"end":1.35},{"word":"everyone.","start":1.51,"end":1.83},{"word":"Today","start":2.63,"end":2.87},{"word":"I","start":3.36,"end":3.52},{"word":"want","start":3.6,"end":3.76},{"word":"to","start":3.76,"end":3.92},{"word":"share","start":3.92,"end":4.16},{"word":"my","start":4.16,"end":4.4},{"word":"incredible","start":4.32,"end":4.96},{"word":"experience","start":4.96,"end":5.44},{"word":"with","start":5.44,"end":5.68},{"word":"Reflector.","start":5.68,"end":6.24},{"word":"a","start":6.93,"end":7.01},{"word":"Q","start":7.01,"end":7.17},{"word":"teenage","start":7.25,"end":7.65},{"word":"product","start":7.89,"end":8.29},{"word":"that","start":8.29,"end":8.61},{"word":"revolutionizes","start":8.61,"end":9.65},{"word":"audio","start":9.65,"end":10.05},{"word":"processing.","start":10.05,"end":10.53},{"word":"With","start":11.27,"end":11.43},{"word":"reflector,","start":11.51,"end":12.15},{"word":"I","start":12.31,"end":12.39},{"word":"can","start":12.39,"end":12.55},{"word":"easily","start":12.55,"end":12.95},{"word":"convert","start":12.95,"end":13.43},{"word":"any","start":13.43,"end":13.67},{"word":"audio","start":13.67,"end":13.99},{"word":"into","start":14.98,"end":15.06},{"word":"accurate","start":15.22,"end":15.54},{"word":"transcription.","start":15.7,"end":16.34},{"word":"saving","start":16.99,"end":17.15},{"word":"me","start":17.31,"end":17.47},{"word":"hours","start":17.47,"end":17.87},{"word":"of","start":17.87,"end":18.11},{"word":"tedious","start":18.11,"end":18.67},{"word":"manual","start":18.67,"end":19.07},{"word":"work.","start":19.07,"end":19.31}]}'
headers:
Alt-Svc:
- h3=":443"; ma=2592000
Content-Length:
- '1933'
Content-Type:
- application/json
Date:
- Wed, 13 Aug 2025 18:27:02 GMT
Modal-Function-Call-Id:
- fc-01K2JAYZ1AR2HE422VJVKBWX9Z
Vary:
- accept-encoding
status:
code: 200
message: OK
- request:
body: ''
headers:
accept:
- '*/*'
accept-encoding:
- gzip, deflate
authorization:
- DUMMY_API_KEY
connection:
- keep-alive
content-length:
- '0'
host:
- monadical-sas--reflector-diarizer-web.modal.run
user-agent:
- python-httpx/0.27.2
method: POST
uri: https://monadical-sas--reflector-diarizer-web.modal.run/diarize?audio_file_url=https%3A%2F%2Freflector-github-pytest.s3.us-east-1.amazonaws.com%2Ftest_mathieu_hello.mp3&timestamp=0
response:
body:
string: '{"diarization":[{"start":0.823,"end":1.91,"speaker":0},{"start":2.572,"end":6.409,"speaker":0},{"start":6.783,"end":10.62,"speaker":0},{"start":11.231,"end":14.168,"speaker":0},{"start":14.796,"end":19.295,"speaker":0}]}'
headers:
Alt-Svc:
- h3=":443"; ma=2592000
Content-Length:
- '220'
Content-Type:
- application/json
Date:
- Wed, 13 Aug 2025 18:27:18 GMT
Modal-Function-Call-Id:
- fc-01K2JAZ1M34NQRJK03CCFK95D6
Vary:
- accept-encoding
status:
code: 200
message: OK
version: 1

View File

@@ -5,7 +5,29 @@ from unittest.mock import patch
import pytest
# Pytest-docker configuration
@pytest.fixture(scope="session", autouse=True)
def settings_configuration():
# theses settings are linked to monadical for pytest-recording
# if a fork is done, they have to provide their own url when cassettes needs to be updated
# modal api keys has to be defined by the user
from reflector.settings import settings
settings.TRANSCRIPT_BACKEND = "modal"
settings.TRANSCRIPT_URL = (
"https://monadical-sas--reflector-transcriber-parakeet-web.modal.run"
)
settings.DIARIZATION_BACKEND = "modal"
settings.DIARIZATION_URL = "https://monadical-sas--reflector-diarizer-web.modal.run"
@pytest.fixture(scope="module")
def vcr_config():
"""VCR configuration to filter sensitive headers"""
return {
"filter_headers": [("authorization", "DUMMY_API_KEY")],
}
@pytest.fixture(scope="session")
def docker_compose_file(pytestconfig):
return os.path.join(str(pytestconfig.rootdir), "tests", "docker-compose.test.yml")

View File

@@ -1,7 +1,7 @@
version: '3.8'
version: "3.8"
services:
postgres_test:
image: postgres:15
image: postgres:17
environment:
POSTGRES_DB: reflector_test
POSTGRES_USER: test_user
@@ -10,4 +10,4 @@ services:
- "15432:5432"
command: postgres -c fsync=off -c synchronous_commit=off -c full_page_writes=off
tmpfs:
- /var/lib/postgresql/data:rw,noexec,nosuid,size=1g
- /var/lib/postgresql/data:rw,noexec,nosuid,size=1g

View File

@@ -0,0 +1,330 @@
"""
Tests for GPU Modal transcription endpoints.
These tests are marked with the "gpu-modal" group and will not run by default.
Run them with: pytest -m gpu-modal tests/test_gpu_modal_transcript_parakeet.py
Required environment variables:
- TRANSCRIPT_URL: URL to the Modal.com endpoint (required)
- TRANSCRIPT_MODAL_API_KEY: API key for authentication (optional)
- TRANSCRIPT_MODEL: Model name to use (optional, defaults to nvidia/parakeet-tdt-0.6b-v2)
Example with pytest (override default addopts to run ONLY gpu_modal tests):
TRANSCRIPT_URL=https://monadical-sas--reflector-transcriber-parakeet-web-dev.modal.run \
TRANSCRIPT_MODAL_API_KEY=your-api-key \
uv run -m pytest -m gpu_modal --no-cov tests/test_gpu_modal_transcript.py
# Or with completely clean options:
uv run -m pytest -m gpu_modal -o addopts="" tests/
Running Modal locally for testing:
modal serve gpu/modal_deployments/reflector_transcriber_parakeet.py
# This will give you a local URL like https://xxxxx--reflector-transcriber-parakeet-web-dev.modal.run to test against
"""
import os
import tempfile
from pathlib import Path
import httpx
import pytest
# Test audio file URL for testing
TEST_AUDIO_URL = (
"https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3"
)
def get_modal_transcript_url():
"""Get and validate the Modal transcript URL from environment."""
url = os.environ.get("TRANSCRIPT_URL")
if not url:
pytest.skip(
"TRANSCRIPT_URL environment variable is required for GPU Modal tests"
)
return url
def get_auth_headers():
"""Get authentication headers if API key is available."""
api_key = os.environ.get("TRANSCRIPT_MODAL_API_KEY")
if api_key:
return {"Authorization": f"Bearer {api_key}"}
return {}
def get_model_name():
"""Get the model name from environment or use default."""
return os.environ.get("TRANSCRIPT_MODEL", "nvidia/parakeet-tdt-0.6b-v2")
@pytest.mark.gpu_modal
class TestGPUModalTranscript:
"""Test suite for GPU Modal transcription endpoints."""
def test_transcriptions_from_url(self):
"""Test the /v1/audio/transcriptions-from-url endpoint."""
url = get_modal_transcript_url()
headers = get_auth_headers()
with httpx.Client(timeout=60.0) as client:
response = client.post(
f"{url}/v1/audio/transcriptions-from-url",
json={
"audio_file_url": TEST_AUDIO_URL,
"model": get_model_name(),
"language": "en",
"timestamp_offset": 0.0,
},
headers=headers,
)
assert response.status_code == 200, f"Request failed: {response.text}"
result = response.json()
# Verify response structure
assert "text" in result
assert "words" in result
assert isinstance(result["text"], str)
assert isinstance(result["words"], list)
# Verify content is meaningful
assert len(result["text"]) > 0, "Transcript text should not be empty"
assert len(result["words"]) > 0, "Words list must not be empty"
# Verify word structure
for word in result["words"]:
assert "word" in word
assert "start" in word
assert "end" in word
assert isinstance(word["start"], (int, float))
assert isinstance(word["end"], (int, float))
assert word["start"] <= word["end"]
def test_transcriptions_single_file(self):
"""Test the /v1/audio/transcriptions endpoint with a single file."""
url = get_modal_transcript_url()
headers = get_auth_headers()
# Download test audio file to upload
with httpx.Client(timeout=60.0) as client:
audio_response = client.get(TEST_AUDIO_URL)
audio_response.raise_for_status()
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file:
tmp_file.write(audio_response.content)
tmp_file_path = tmp_file.name
try:
# Upload the file for transcription
with open(tmp_file_path, "rb") as f:
files = {"file": ("test_audio.mp3", f, "audio/mpeg")}
data = {
"model": get_model_name(),
"language": "en",
"batch": "false",
}
response = client.post(
f"{url}/v1/audio/transcriptions",
files=files,
data=data,
headers=headers,
)
assert response.status_code == 200, f"Request failed: {response.text}"
result = response.json()
# Verify response structure for single file
assert "text" in result
assert "words" in result
assert "filename" in result
assert isinstance(result["text"], str)
assert isinstance(result["words"], list)
# Verify content
assert len(result["text"]) > 0, "Transcript text should not be empty"
finally:
Path(tmp_file_path).unlink(missing_ok=True)
def test_transcriptions_multiple_files(self):
"""Test the /v1/audio/transcriptions endpoint with multiple files (non-batch mode)."""
url = get_modal_transcript_url()
headers = get_auth_headers()
# Create multiple test files (we'll use the same audio content for simplicity)
with httpx.Client(timeout=60.0) as client:
audio_response = client.get(TEST_AUDIO_URL)
audio_response.raise_for_status()
audio_content = audio_response.content
temp_files = []
try:
# Create 3 temporary files
for i in range(3):
tmp_file = tempfile.NamedTemporaryFile(suffix=".mp3", delete=False)
tmp_file.write(audio_content)
tmp_file.close()
temp_files.append(tmp_file.name)
# Upload multiple files for transcription (non-batch)
files = [
("files", (f"test_audio_{i}.mp3", open(f, "rb"), "audio/mpeg"))
for i, f in enumerate(temp_files)
]
data = {
"model": get_model_name(),
"language": "en",
"batch": "false",
}
response = client.post(
f"{url}/v1/audio/transcriptions",
files=files,
data=data,
headers=headers,
)
# Close file handles
for _, file_tuple in files:
file_tuple[1].close()
assert response.status_code == 200, f"Request failed: {response.text}"
result = response.json()
# Verify response structure for multiple files (non-batch)
assert "results" in result
assert isinstance(result["results"], list)
assert len(result["results"]) == 3
for idx, file_result in enumerate(result["results"]):
assert "text" in file_result
assert "words" in file_result
assert "filename" in file_result
assert isinstance(file_result["text"], str)
assert isinstance(file_result["words"], list)
assert len(file_result["text"]) > 0
finally:
for f in temp_files:
Path(f).unlink(missing_ok=True)
def test_transcriptions_multiple_files_batch(self):
"""Test the /v1/audio/transcriptions endpoint with multiple files in batch mode."""
url = get_modal_transcript_url()
headers = get_auth_headers()
# Create multiple test files
with httpx.Client(timeout=60.0) as client:
audio_response = client.get(TEST_AUDIO_URL)
audio_response.raise_for_status()
audio_content = audio_response.content
temp_files = []
try:
# Create 3 temporary files
for i in range(3):
tmp_file = tempfile.NamedTemporaryFile(suffix=".mp3", delete=False)
tmp_file.write(audio_content)
tmp_file.close()
temp_files.append(tmp_file.name)
# Upload multiple files for batch transcription
files = [
("files", (f"test_audio_{i}.mp3", open(f, "rb"), "audio/mpeg"))
for i, f in enumerate(temp_files)
]
data = {
"model": get_model_name(),
"language": "en",
"batch": "true",
}
response = client.post(
f"{url}/v1/audio/transcriptions",
files=files,
data=data,
headers=headers,
)
# Close file handles
for _, file_tuple in files:
file_tuple[1].close()
assert response.status_code == 200, f"Request failed: {response.text}"
result = response.json()
# Verify response structure for batch mode
assert "results" in result
assert isinstance(result["results"], list)
assert len(result["results"]) == 3
for idx, batch_result in enumerate(result["results"]):
assert "text" in batch_result
assert "words" in batch_result
assert "filename" in batch_result
assert isinstance(batch_result["text"], str)
assert isinstance(batch_result["words"], list)
assert len(batch_result["text"]) > 0
finally:
for f in temp_files:
Path(f).unlink(missing_ok=True)
def test_transcriptions_error_handling(self):
"""Test error handling for invalid requests."""
url = get_modal_transcript_url()
headers = get_auth_headers()
with httpx.Client(timeout=60.0) as client:
# Test with unsupported language
response = client.post(
f"{url}/v1/audio/transcriptions-from-url",
json={
"audio_file_url": TEST_AUDIO_URL,
"model": get_model_name(),
"language": "fr", # Parakeet only supports English
"timestamp_offset": 0.0,
},
headers=headers,
)
assert response.status_code == 400
assert "only supports English" in response.text
def test_transcriptions_with_timestamp_offset(self):
"""Test transcription with timestamp offset parameter."""
url = get_modal_transcript_url()
headers = get_auth_headers()
with httpx.Client(timeout=60.0) as client:
# Test with timestamp offset
response = client.post(
f"{url}/v1/audio/transcriptions-from-url",
json={
"audio_file_url": TEST_AUDIO_URL,
"model": get_model_name(),
"language": "en",
"timestamp_offset": 10.0, # Add 10 second offset
},
headers=headers,
)
assert response.status_code == 200, f"Request failed: {response.text}"
result = response.json()
# Verify response structure
assert "text" in result
assert "words" in result
assert len(result["words"]) > 0, "Words list must not be empty"
# Verify that timestamps have been offset
for word in result["words"]:
# All timestamps should be >= 10.0 due to offset
assert (
word["start"] >= 10.0
), f"Word start time {word['start']} should be >= 10.0"
assert (
word["end"] >= 10.0
), f"Word end time {word['end']} should be >= 10.0"

View File

@@ -0,0 +1,633 @@
"""
Tests for PipelineMainFile - file-based processing pipeline
This test verifies the complete file processing pipeline without mocking much,
ensuring all processors are correctly invoked and the happy path works correctly.
"""
from pathlib import Path
from unittest.mock import AsyncMock, MagicMock, patch
from uuid import uuid4
import pytest
from reflector.pipelines.main_file_pipeline import PipelineMainFile
from reflector.processors.file_diarization import FileDiarizationOutput
from reflector.processors.types import (
DiarizationSegment,
TitleSummary,
Word,
)
from reflector.processors.types import (
Transcript as TranscriptType,
)
@pytest.fixture
async def dummy_file_transcript():
"""Mock FileTranscriptAutoProcessor for file processing"""
from reflector.processors.file_transcript import FileTranscriptProcessor
class TestFileTranscriptProcessor(FileTranscriptProcessor):
async def _transcript(self, data):
return TranscriptType(
text="Hello world. How are you today?",
words=[
Word(start=0.0, end=0.5, text="Hello", speaker=0),
Word(start=0.5, end=0.6, text=" ", speaker=0),
Word(start=0.6, end=1.0, text="world", speaker=0),
Word(start=1.0, end=1.1, text=".", speaker=0),
Word(start=1.1, end=1.2, text=" ", speaker=0),
Word(start=1.2, end=1.5, text="How", speaker=0),
Word(start=1.5, end=1.6, text=" ", speaker=0),
Word(start=1.6, end=1.8, text="are", speaker=0),
Word(start=1.8, end=1.9, text=" ", speaker=0),
Word(start=1.9, end=2.1, text="you", speaker=0),
Word(start=2.1, end=2.2, text=" ", speaker=0),
Word(start=2.2, end=2.5, text="today", speaker=0),
Word(start=2.5, end=2.6, text="?", speaker=0),
],
)
with patch(
"reflector.processors.file_transcript_auto.FileTranscriptAutoProcessor.__new__"
) as mock_auto:
mock_auto.return_value = TestFileTranscriptProcessor()
yield
@pytest.fixture
async def dummy_file_diarization():
"""Mock FileDiarizationAutoProcessor for file processing"""
from reflector.processors.file_diarization import FileDiarizationProcessor
class TestFileDiarizationProcessor(FileDiarizationProcessor):
async def _diarize(self, data):
return FileDiarizationOutput(
diarization=[
DiarizationSegment(start=0.0, end=1.1, speaker=0),
DiarizationSegment(start=1.2, end=2.6, speaker=1),
]
)
with patch(
"reflector.processors.file_diarization_auto.FileDiarizationAutoProcessor.__new__"
) as mock_auto:
mock_auto.return_value = TestFileDiarizationProcessor()
yield
@pytest.fixture
async def mock_transcript_in_db(tmpdir):
"""Create a mock transcript in the database"""
from reflector.db.transcripts import Transcript
from reflector.settings import settings
# Set the DATA_DIR to our tmpdir
original_data_dir = settings.DATA_DIR
settings.DATA_DIR = str(tmpdir)
transcript_id = str(uuid4())
data_path = Path(tmpdir) / transcript_id
data_path.mkdir(parents=True, exist_ok=True)
# Create mock transcript object
transcript = Transcript(
id=transcript_id,
name="Test Transcript",
status="processing",
source_kind="file",
source_language="en",
target_language="en",
)
# Mock the controller to return our transcript
try:
with patch(
"reflector.pipelines.main_file_pipeline.transcripts_controller.get_by_id"
) as mock_get:
mock_get.return_value = transcript
with patch(
"reflector.pipelines.main_live_pipeline.transcripts_controller.get_by_id"
) as mock_get2:
mock_get2.return_value = transcript
with patch(
"reflector.pipelines.main_live_pipeline.transcripts_controller.update"
) as mock_update:
mock_update.return_value = None
yield transcript
finally:
# Restore original DATA_DIR
settings.DATA_DIR = original_data_dir
@pytest.fixture
async def mock_storage():
"""Mock storage for file uploads"""
from reflector.storage.base import Storage
class TestStorage(Storage):
async def _put_file(self, path, data):
return None
async def _get_file_url(self, path):
return f"http://test-storage/{path}"
async def _get_file(self, path):
return b"test_audio_data"
async def _delete_file(self, path):
return None
storage = TestStorage()
# Add mock tracking for verification
storage._put_file = AsyncMock(side_effect=storage._put_file)
storage._get_file_url = AsyncMock(side_effect=storage._get_file_url)
with patch(
"reflector.pipelines.main_file_pipeline.get_transcripts_storage"
) as mock_get:
mock_get.return_value = storage
yield storage
@pytest.fixture
async def mock_audio_file_writer():
"""Mock AudioFileWriterProcessor to avoid actual file writing"""
with patch(
"reflector.pipelines.main_file_pipeline.AudioFileWriterProcessor"
) as mock_writer_class:
mock_writer = AsyncMock()
mock_writer.push = AsyncMock()
mock_writer.flush = AsyncMock()
mock_writer_class.return_value = mock_writer
yield mock_writer
@pytest.fixture
async def mock_waveform_processor():
"""Mock AudioWaveformProcessor"""
with patch(
"reflector.pipelines.main_file_pipeline.AudioWaveformProcessor"
) as mock_waveform_class:
mock_waveform = AsyncMock()
mock_waveform.set_pipeline = MagicMock()
mock_waveform.flush = AsyncMock()
mock_waveform_class.return_value = mock_waveform
yield mock_waveform
@pytest.fixture
async def mock_topic_detector():
"""Mock TranscriptTopicDetectorProcessor"""
with patch(
"reflector.pipelines.main_file_pipeline.TranscriptTopicDetectorProcessor"
) as mock_topic_class:
mock_topic = AsyncMock()
mock_topic.set_pipeline = MagicMock()
mock_topic.push = AsyncMock()
mock_topic.flush_called = False
# When flush is called, simulate topic detection by calling the callback
async def flush_with_callback():
mock_topic.flush_called = True
if hasattr(mock_topic, "_callback"):
# Create a minimal transcript for the TitleSummary
test_transcript = TranscriptType(words=[], text="test transcript")
await mock_topic._callback(
TitleSummary(
title="Test Topic",
summary="Test topic summary",
timestamp=0.0,
duration=10.0,
transcript=test_transcript,
)
)
mock_topic.flush = flush_with_callback
def init_with_callback(callback=None):
mock_topic._callback = callback
return mock_topic
mock_topic_class.side_effect = init_with_callback
yield mock_topic
@pytest.fixture
async def mock_title_processor():
"""Mock TranscriptFinalTitleProcessor"""
with patch(
"reflector.pipelines.main_file_pipeline.TranscriptFinalTitleProcessor"
) as mock_title_class:
mock_title = AsyncMock()
mock_title.set_pipeline = MagicMock()
mock_title.push = AsyncMock()
mock_title.flush_called = False
# When flush is called, simulate title generation by calling the callback
async def flush_with_callback():
mock_title.flush_called = True
if hasattr(mock_title, "_callback"):
from reflector.processors.types import FinalTitle
await mock_title._callback(FinalTitle(title="Test Title"))
mock_title.flush = flush_with_callback
def init_with_callback(callback=None):
mock_title._callback = callback
return mock_title
mock_title_class.side_effect = init_with_callback
yield mock_title
@pytest.fixture
async def mock_summary_processor():
"""Mock TranscriptFinalSummaryProcessor"""
with patch(
"reflector.pipelines.main_file_pipeline.TranscriptFinalSummaryProcessor"
) as mock_summary_class:
mock_summary = AsyncMock()
mock_summary.set_pipeline = MagicMock()
mock_summary.push = AsyncMock()
mock_summary.flush_called = False
# When flush is called, simulate summary generation by calling the callbacks
async def flush_with_callback():
mock_summary.flush_called = True
from reflector.processors.types import FinalLongSummary, FinalShortSummary
if hasattr(mock_summary, "_callback"):
await mock_summary._callback(
FinalLongSummary(long_summary="Test long summary", duration=10.0)
)
if hasattr(mock_summary, "_on_short_summary"):
await mock_summary._on_short_summary(
FinalShortSummary(short_summary="Test short summary", duration=10.0)
)
mock_summary.flush = flush_with_callback
def init_with_callback(transcript=None, callback=None, on_short_summary=None):
mock_summary._callback = callback
mock_summary._on_short_summary = on_short_summary
return mock_summary
mock_summary_class.side_effect = init_with_callback
yield mock_summary
@pytest.mark.asyncio
async def test_pipeline_main_file_process(
tmpdir,
mock_transcript_in_db,
dummy_file_transcript,
dummy_file_diarization,
mock_storage,
mock_audio_file_writer,
mock_waveform_processor,
mock_topic_detector,
mock_title_processor,
mock_summary_processor,
):
"""
Test the complete PipelineMainFile processing pipeline.
This test verifies:
1. Audio extraction and writing
2. Audio upload to storage
3. Parallel processing of transcription, diarization, and waveform
4. Assembly of transcript with diarization
5. Topic detection
6. Title and summary generation
"""
# Create a test audio file
test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav"
# Copy test audio to the transcript's data path as if it was uploaded
upload_path = mock_transcript_in_db.data_path / "upload.wav"
upload_path.write_bytes(test_audio_path.read_bytes())
# Also create the audio.mp3 file that would be created by AudioFileWriterProcessor
# Since we're mocking AudioFileWriterProcessor, we need to create this manually
mp3_path = mock_transcript_in_db.data_path / "audio.mp3"
mp3_path.write_bytes(b"mock_mp3_data")
# Track callback invocations
callback_marks = {
"on_status": [],
"on_duration": [],
"on_waveform": [],
"on_topic": [],
"on_title": [],
"on_long_summary": [],
"on_short_summary": [],
}
# Create pipeline with mocked callbacks
pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id)
# Override callbacks to track invocations
async def track_callback(name, data):
callback_marks[name].append(data)
# Call the original callback
original = getattr(PipelineMainFile, name)
return await original(pipeline, data)
for callback_name in callback_marks.keys():
setattr(
pipeline,
callback_name,
lambda data, n=callback_name: track_callback(n, data),
)
# Mock av.open for audio processing
with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av:
# Mock container for checking video streams
mock_container = MagicMock()
mock_container.streams.video = [] # No video streams (audio only)
mock_container.close = MagicMock()
# Mock container for decoding audio frames
mock_decode_container = MagicMock()
mock_decode_container.decode.return_value = iter(
[MagicMock()]
) # One mock audio frame
mock_decode_container.close = MagicMock()
# Return different containers for different calls
mock_av.side_effect = [mock_container, mock_decode_container]
# Run the pipeline
await pipeline.process(upload_path)
# Verify audio extraction and writing
assert mock_audio_file_writer.push.called
assert mock_audio_file_writer.flush.called
# Verify storage upload
assert mock_storage._put_file.called
assert mock_storage._get_file_url.called
# Verify waveform generation
assert mock_waveform_processor.flush.called
assert mock_waveform_processor.set_pipeline.called
# Verify topic detection
assert mock_topic_detector.push.called
assert mock_topic_detector.flush_called
# Verify title generation
assert mock_title_processor.push.called
assert mock_title_processor.flush_called
# Verify summary generation
assert mock_summary_processor.push.called
assert mock_summary_processor.flush_called
# Verify callbacks were invoked
assert len(callback_marks["on_topic"]) > 0, "Topic callback should be invoked"
assert len(callback_marks["on_title"]) > 0, "Title callback should be invoked"
assert (
len(callback_marks["on_long_summary"]) > 0
), "Long summary callback should be invoked"
assert (
len(callback_marks["on_short_summary"]) > 0
), "Short summary callback should be invoked"
print(f"Callback marks: {callback_marks}")
# Verify the pipeline completed successfully
assert pipeline.logger is not None
print("PipelineMainFile test completed successfully!")
@pytest.mark.asyncio
async def test_pipeline_main_file_with_video(
tmpdir,
mock_transcript_in_db,
dummy_file_transcript,
dummy_file_diarization,
mock_storage,
mock_audio_file_writer,
mock_waveform_processor,
mock_topic_detector,
mock_title_processor,
mock_summary_processor,
):
"""
Test PipelineMainFile with video input (verifies audio extraction).
"""
# Create a test audio file
test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav"
# Copy test audio to the transcript's data path as if it was a video upload
upload_path = mock_transcript_in_db.data_path / "upload.mp4"
upload_path.write_bytes(test_audio_path.read_bytes())
# Also create the audio.mp3 file that would be created by AudioFileWriterProcessor
mp3_path = mock_transcript_in_db.data_path / "audio.mp3"
mp3_path.write_bytes(b"mock_mp3_data")
# Create pipeline
pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id)
# Mock av.open for video processing
with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av:
# Mock container for checking video streams
mock_container = MagicMock()
mock_container.streams.video = [MagicMock()] # Has video streams
mock_container.close = MagicMock()
# Mock container for decoding audio frames
mock_decode_container = MagicMock()
mock_decode_container.decode.return_value = iter(
[MagicMock()]
) # One mock audio frame
mock_decode_container.close = MagicMock()
# Return different containers for different calls
mock_av.side_effect = [mock_container, mock_decode_container]
# Run the pipeline
await pipeline.process(upload_path)
# Verify audio extraction from video
assert mock_audio_file_writer.push.called
assert mock_audio_file_writer.flush.called
# Verify the rest of the pipeline completed
assert mock_storage._put_file.called
assert mock_waveform_processor.flush.called
assert mock_topic_detector.push.called
assert mock_title_processor.push.called
assert mock_summary_processor.push.called
print("PipelineMainFile video test completed successfully!")
@pytest.mark.asyncio
async def test_pipeline_main_file_no_diarization(
tmpdir,
mock_transcript_in_db,
dummy_file_transcript,
mock_storage,
mock_audio_file_writer,
mock_waveform_processor,
mock_topic_detector,
mock_title_processor,
mock_summary_processor,
):
"""
Test PipelineMainFile with diarization disabled.
"""
from reflector.settings import settings
# Disable diarization
with patch.object(settings, "DIARIZATION_BACKEND", None):
# Create a test audio file
test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav"
# Copy test audio to the transcript's data path
upload_path = mock_transcript_in_db.data_path / "upload.wav"
upload_path.write_bytes(test_audio_path.read_bytes())
# Also create the audio.mp3 file
mp3_path = mock_transcript_in_db.data_path / "audio.mp3"
mp3_path.write_bytes(b"mock_mp3_data")
# Create pipeline
pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id)
# Mock av.open for audio processing
with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av:
# Mock container for checking video streams
mock_container = MagicMock()
mock_container.streams.video = [] # No video streams
mock_container.close = MagicMock()
# Mock container for decoding audio frames
mock_decode_container = MagicMock()
mock_decode_container.decode.return_value = iter([MagicMock()])
mock_decode_container.close = MagicMock()
# Return different containers for different calls
mock_av.side_effect = [mock_container, mock_decode_container]
# Run the pipeline
await pipeline.process(upload_path)
# Verify the pipeline completed without diarization
assert mock_storage._put_file.called
assert mock_waveform_processor.flush.called
assert mock_topic_detector.push.called
assert mock_title_processor.push.called
assert mock_summary_processor.push.called
print("PipelineMainFile no-diarization test completed successfully!")
@pytest.mark.asyncio
async def test_task_pipeline_file_process(
tmpdir,
mock_transcript_in_db,
dummy_file_transcript,
dummy_file_diarization,
mock_storage,
mock_audio_file_writer,
mock_waveform_processor,
mock_topic_detector,
mock_title_processor,
mock_summary_processor,
):
"""
Test the Celery task entry point for file pipeline processing.
"""
# Direct import of the underlying async function, bypassing the asynctask decorator
# Create a test audio file in the transcript's data path
test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav"
upload_path = mock_transcript_in_db.data_path / "upload.wav"
upload_path.write_bytes(test_audio_path.read_bytes())
# Also create the audio.mp3 file
mp3_path = mock_transcript_in_db.data_path / "audio.mp3"
mp3_path.write_bytes(b"mock_mp3_data")
# Mock av.open for audio processing
with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av:
# Mock container for checking video streams
mock_container = MagicMock()
mock_container.streams.video = [] # No video streams
mock_container.close = MagicMock()
# Mock container for decoding audio frames
mock_decode_container = MagicMock()
mock_decode_container.decode.return_value = iter([MagicMock()])
mock_decode_container.close = MagicMock()
# Return different containers for different calls
mock_av.side_effect = [mock_container, mock_decode_container]
# Get the original async function without the asynctask decorator
# The function is wrapped, so we need to call it differently
# For now, we test the pipeline directly since the task is just a thin wrapper
from reflector.pipelines.main_file_pipeline import PipelineMainFile
pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id)
await pipeline.process(upload_path)
# Verify the pipeline was executed through the task
assert mock_audio_file_writer.push.called
assert mock_audio_file_writer.flush.called
assert mock_storage._put_file.called
assert mock_waveform_processor.flush.called
assert mock_topic_detector.push.called
assert mock_title_processor.push.called
assert mock_summary_processor.push.called
print("task_pipeline_file_process test completed successfully!")
@pytest.mark.asyncio
async def test_pipeline_file_process_no_transcript():
"""
Test the pipeline with a non-existent transcript.
"""
from reflector.pipelines.main_file_pipeline import PipelineMainFile
# Mock the controller to return None (transcript not found)
with patch(
"reflector.pipelines.main_file_pipeline.transcripts_controller.get_by_id"
) as mock_get:
mock_get.return_value = None
pipeline = PipelineMainFile(transcript_id=str(uuid4()))
# Should raise an exception for missing transcript when get_transcript is called
with pytest.raises(Exception, match="Transcript not found"):
await pipeline.get_transcript()
@pytest.mark.asyncio
async def test_pipeline_file_process_no_audio_file(
mock_transcript_in_db,
):
"""
Test the pipeline when no audio file is found.
"""
from reflector.pipelines.main_file_pipeline import PipelineMainFile
# Don't create any audio files in the data path
# The pipeline's process should handle missing files gracefully
pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id)
# Try to process a non-existent file
non_existent_path = mock_transcript_in_db.data_path / "nonexistent.wav"
# This should fail when trying to open the file with av
with pytest.raises(Exception):
await pipeline.process(non_existent_path)

View File

@@ -0,0 +1,265 @@
"""
Tests for Modal-based processors using pytest-recording for HTTP recording/playbook
Note: theses tests require full modal configuration to be able to record
vcr cassettes
Configuration required for the first recording:
- TRANSCRIPT_BACKEND=modal
- TRANSCRIPT_URL=https://xxxxx--reflector-transcriber-parakeet-web.modal.run
- TRANSCRIPT_MODAL_API_KEY=xxxxx
- DIARIZATION_BACKEND=modal
- DIARIZATION_URL=https://xxxxx--reflector-diarizer-web.modal.run
- DIARIZATION_MODAL_API_KEY=xxxxx
"""
from unittest.mock import patch
import pytest
from reflector.processors.file_diarization import FileDiarizationInput
from reflector.processors.file_diarization_modal import FileDiarizationModalProcessor
from reflector.processors.file_transcript import FileTranscriptInput
from reflector.processors.file_transcript_modal import FileTranscriptModalProcessor
from reflector.processors.transcript_diarization_assembler import (
TranscriptDiarizationAssemblerInput,
TranscriptDiarizationAssemblerProcessor,
)
from reflector.processors.types import DiarizationSegment, Transcript, Word
# Public test audio file hosted on S3 specifically for reflector pytests
TEST_AUDIO_URL = (
"https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3"
)
@pytest.mark.asyncio
async def test_file_transcript_modal_processor_missing_url():
with patch("reflector.processors.file_transcript_modal.settings") as mock_settings:
mock_settings.TRANSCRIPT_URL = None
with pytest.raises(Exception, match="TRANSCRIPT_URL required"):
FileTranscriptModalProcessor(modal_api_key="test-api-key")
@pytest.mark.asyncio
async def test_file_diarization_modal_processor_missing_url():
with patch("reflector.processors.file_diarization_modal.settings") as mock_settings:
mock_settings.DIARIZATION_URL = None
with pytest.raises(Exception, match="DIARIZATION_URL required"):
FileDiarizationModalProcessor(modal_api_key="test-api-key")
@pytest.mark.vcr()
@pytest.mark.asyncio
async def test_file_diarization_modal_processor(vcr):
"""Test FileDiarizationModalProcessor using public audio URL and Modal API"""
from reflector.settings import settings
processor = FileDiarizationModalProcessor(
modal_api_key=settings.DIARIZATION_MODAL_API_KEY
)
test_input = FileDiarizationInput(audio_url=TEST_AUDIO_URL)
result = await processor._diarize(test_input)
# Verify the result structure
assert result is not None
assert hasattr(result, "diarization")
assert isinstance(result.diarization, list)
# Check structure of each diarization segment
for segment in result.diarization:
assert "start" in segment
assert "end" in segment
assert "speaker" in segment
assert isinstance(segment["start"], (int, float))
assert isinstance(segment["end"], (int, float))
assert isinstance(segment["speaker"], int)
# Basic sanity check - start should be before end
assert segment["start"] < segment["end"]
@pytest.mark.vcr()
@pytest.mark.asyncio
async def test_file_transcript_modal_processor():
"""Test FileTranscriptModalProcessor using public audio URL and Modal API"""
from reflector.settings import settings
processor = FileTranscriptModalProcessor(
modal_api_key=settings.TRANSCRIPT_MODAL_API_KEY
)
test_input = FileTranscriptInput(
audio_url=TEST_AUDIO_URL,
language="en",
)
# This will record the HTTP interaction on first run, replay on subsequent runs
result = await processor._transcript(test_input)
# Verify the result structure
assert result is not None
assert hasattr(result, "words")
assert isinstance(result.words, list)
# Check structure of each word if present
for word in result.words:
assert hasattr(word, "text")
assert hasattr(word, "start")
assert hasattr(word, "end")
assert isinstance(word.start, (int, float))
assert isinstance(word.end, (int, float))
assert isinstance(word.text, str)
# Basic sanity check - start should be before or equal to end
assert word.start <= word.end
@pytest.mark.asyncio
async def test_transcript_diarization_assembler_processor():
"""Test TranscriptDiarizationAssemblerProcessor without VCR (no HTTP requests)"""
# Create test transcript with words
words = [
Word(text="Hello", start=0.0, end=1.0, speaker=0),
Word(text=" ", start=1.0, end=1.1, speaker=0),
Word(text="world", start=1.1, end=2.0, speaker=0),
Word(text=".", start=2.0, end=2.1, speaker=0),
Word(text=" ", start=2.1, end=2.2, speaker=0),
Word(text="How", start=2.2, end=2.8, speaker=0),
Word(text=" ", start=2.8, end=2.9, speaker=0),
Word(text="are", start=2.9, end=3.2, speaker=0),
Word(text=" ", start=3.2, end=3.3, speaker=0),
Word(text="you", start=3.3, end=3.8, speaker=0),
Word(text="?", start=3.8, end=3.9, speaker=0),
]
transcript = Transcript(words=words)
# Create test diarization segments
diarization = [
DiarizationSegment(start=0.0, end=2.1, speaker=0),
DiarizationSegment(start=2.1, end=3.9, speaker=1),
]
# Create processor and test input
processor = TranscriptDiarizationAssemblerProcessor()
test_input = TranscriptDiarizationAssemblerInput(
transcript=transcript, diarization=diarization
)
# Track emitted results
emitted_results = []
async def capture_result(result):
emitted_results.append(result)
processor.on(capture_result)
# Process the input
await processor.push(test_input)
# Verify result was emitted
assert len(emitted_results) == 1
result = emitted_results[0]
# Verify result structure
assert isinstance(result, Transcript)
assert len(result.words) == len(words)
# Verify speaker assignments were applied
# Words 0-3 (indices) should be speaker 0 (time 0.0-2.0)
# Words 4-10 (indices) should be speaker 1 (time 2.1-3.9)
for i in range(4): # First 4 words (Hello, space, world, .)
assert (
result.words[i].speaker == 0
), f"Word {i} '{result.words[i].text}' should be speaker 0, got {result.words[i].speaker}"
for i in range(4, 11): # Remaining words (space, How, space, are, space, you, ?)
assert (
result.words[i].speaker == 1
), f"Word {i} '{result.words[i].text}' should be speaker 1, got {result.words[i].speaker}"
@pytest.mark.asyncio
async def test_transcript_diarization_assembler_no_diarization():
"""Test TranscriptDiarizationAssemblerProcessor with no diarization data"""
# Create test transcript
words = [Word(text="Hello", start=0.0, end=1.0, speaker=0)]
transcript = Transcript(words=words)
# Create processor and test input with empty diarization
processor = TranscriptDiarizationAssemblerProcessor()
test_input = TranscriptDiarizationAssemblerInput(
transcript=transcript, diarization=[]
)
# Track emitted results
emitted_results = []
async def capture_result(result):
emitted_results.append(result)
processor.on(capture_result)
# Process the input
await processor.push(test_input)
# Verify original transcript was returned unchanged
assert len(emitted_results) == 1
result = emitted_results[0]
assert result is transcript # Should be the same object
assert result.words[0].speaker == 0 # Original speaker unchanged
@pytest.mark.vcr()
@pytest.mark.asyncio
async def test_full_modal_pipeline_integration(vcr):
"""Integration test: Transcription -> Diarization -> Assembly
This test demonstrates the full pipeline:
1. Run transcription via Modal
2. Run diarization via Modal
3. Assemble transcript with diarization
"""
from reflector.settings import settings
# Step 1: Transcription
transcript_processor = FileTranscriptModalProcessor(
modal_api_key=settings.TRANSCRIPT_MODAL_API_KEY
)
transcript_input = FileTranscriptInput(audio_url=TEST_AUDIO_URL, language="en")
transcript = await transcript_processor._transcript(transcript_input)
# Step 2: Diarization
diarization_processor = FileDiarizationModalProcessor(
modal_api_key=settings.DIARIZATION_MODAL_API_KEY
)
diarization_input = FileDiarizationInput(audio_url=TEST_AUDIO_URL)
diarization_result = await diarization_processor._diarize(diarization_input)
# Step 3: Assembly
assembler = TranscriptDiarizationAssemblerProcessor()
assembly_input = TranscriptDiarizationAssemblerInput(
transcript=transcript, diarization=diarization_result.diarization
)
# Track assembled result
assembled_results = []
async def capture_result(result):
assembled_results.append(result)
assembler.on(capture_result)
await assembler.push(assembly_input)
# Verify the full pipeline worked
assert len(assembled_results) == 1
final_transcript = assembled_results[0]
# Verify the final transcript has the original words with updated speaker info
assert isinstance(final_transcript, Transcript)
assert len(final_transcript.words) == len(transcript.words)
assert len(final_transcript.words) > 0
# Verify some words have been assigned speakers from diarization
speakers_found = set(word.speaker for word in final_transcript.words)
assert len(speakers_found) > 0 # At least some speaker assignments

View File

@@ -2,10 +2,13 @@ import pytest
@pytest.mark.asyncio
@pytest.mark.parametrize("enable_diarization", [False, True])
async def test_basic_process(
dummy_transcript,
dummy_llm,
dummy_processors,
enable_diarization,
dummy_diarization,
):
# goal is to start the server, and send rtc audio to it
# validate the events received
@@ -28,12 +31,31 @@ async def test_basic_process(
# invoke the process and capture events
path = Path(__file__).parent / "records" / "test_mathieu_hello.wav"
await process_audio_file(path.as_posix(), event_callback)
print(marks)
if enable_diarization:
# Test with diarization - may fail if pyannote.audio is not installed
try:
await process_audio_file(
path.as_posix(), event_callback, enable_diarization=True
)
except SystemExit:
pytest.skip("pyannote.audio not installed - skipping diarization test")
else:
# Test without diarization - should always work
await process_audio_file(
path.as_posix(), event_callback, enable_diarization=False
)
print(f"Diarization: {enable_diarization}, Marks: {marks}")
# validate the events
assert marks["TranscriptLinerProcessor"] == 1
assert marks["TranscriptTranslatorPassthroughProcessor"] == 1
# Each processor should be called for each audio segment processed
# The final processors (Topic, Title, Summary) should be called once at the end
assert marks["TranscriptLinerProcessor"] > 0
assert marks["TranscriptTranslatorPassthroughProcessor"] > 0
assert marks["TranscriptTopicDetectorProcessor"] == 1
assert marks["TranscriptFinalSummaryProcessor"] == 1
assert marks["TranscriptFinalTitleProcessor"] == 1
if enable_diarization:
assert marks["TestAudioDiarizationProcessor"] == 1