From 3ea7f6b7b63e1f9543252df22091302dcd187005 Mon Sep 17 00:00:00 2001 From: Mathieu Virbel Date: Wed, 20 Aug 2025 20:07:19 -0600 Subject: [PATCH] feat: pipeline improvement with file processing, parakeet, silero-vad (#540) * feat: improve pipeline threading, and transcriber (parakeet and silero vad) * refactor: remove whisperx, implement parakeet * refactor: make audio_chunker more smart and wait for speech, instead of fixed frame * refactor: make audio merge to always downscale the audio to 16k for transcription * refactor: make the audio transcript modal accepting batches * refactor: improve type safety and remove prometheus metrics - Add DiarizationSegment TypedDict for proper diarization typing - Replace List/Optional with modern Python list/| None syntax - Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor - Add comprehensive file processing pipeline with parallel execution - Update processor imports and type annotations throughout - Implement optimized file pipeline as default in process.py tool * refactor: convert FileDiarizationProcessor I/O types to BaseModel Update FileDiarizationInput and FileDiarizationOutput to inherit from BaseModel instead of plain classes, following the standard pattern used by other processors in the codebase. * test: add tests for file transcript and diarization with pytest-recording * build: add pytest-recording * feat: add local pyannote for testing * fix: replace PyAV AudioResampler with torchaudio for reliable audio processing - Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument - Use torchaudio.functional.resample for robust sample rate conversion - Optimize processing: skip conversion for already 16kHz mono audio - Add direct WAV writing with Python wave module for better performance - Consolidate duplicate downsample checks for cleaner code - Maintain list[av.AudioFrame] input interface - Required for Silero VAD which needs 16kHz mono audio * fix: replace PyAV AudioResampler with torchaudio solution - Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor - Replaces problematic PyAV AudioResampler with torchaudio.functional.resample - Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono - Uses direct WAV writing with Python's wave module for better performance - Fixes test_basic_process to disable diarization (pyannote dependency not installed) - Updates test expectations to match actual processor behavior - Removes unused pydub dependency from pyproject.toml - Adds comprehensive TEST_ANALYSIS.md documenting test suite status * feat: add parameterized test for both diarization modes - Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True] - Test with diarization=False always passes (tests core AudioMergeProcessor functionality) - Test with diarization=True gracefully skips when pyannote.audio is not installed - Provides comprehensive test coverage for both pipeline configurations * fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor - Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute - Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called - Updates property usage in _diarize method to use new name - Now correctly supports pipeline initialization for diarization processing * fix: add local for pyannote * test: add diarization test * fix: resample on audio merge now working * fix: correctly restore timestamp * fix: display exception in a threaded processor if that happen * Update pyproject.toml * ci: remove option * ci: update astral-sh/setup-uv * test: add monadical url for pytest-recording * refactor: remove previous version * build: move faster whisper to local dep * test: fix missing import * refactor: improve main_file_pipeline organization and error handling - Move all imports to the top of the file - Create unified EmptyPipeline class to replace duplicate mock pipeline code - Remove timeout and fallback logic - let processors handle their own retries - Fix error handling to raise any exception from parallel tasks - Add proper type hints and validation for captured results * fix: wrong function * fix: remove task_done * feat: add configurable file processing timeouts for modal processors - Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription - Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization - Replace hardcoded timeout=600 with configurable settings in modal processors - Allows customization of timeout values via environment variables * fix: use logger * fix: worker process meetings now use file pipeline * fix: topic not gathered * refactor: remove prepare(), pipeline now work * refactor: implement many review from Igor * test: add test for test_pipeline_main_file * refactor: remove doc * doc: add doc * ci: update build to use native arm64 builder * fix: merge fixes * refactor: changes from Igor review + add test (not by default) to test gpu modal part * ci: update to our own runner linux-amd64 * ci: try using suggested mode=min * fix: update diarizer for latest modal, and use volume * fix: modal file extension detection * fix: put the diarizer as A100 --- .github/workflows/deploy.yml | 76 +- .github/workflows/test_server.yml | 36 +- server/gpu/modal_deployments/README.md | 87 +- .../modal_deployments/reflector_diarizer.py | 196 ++-- .../reflector_transcriber_parakeet.py | 622 ++++++++++++ server/pyproject.toml | 11 +- .../reflector/pipelines/main_file_pipeline.py | 375 +++++++ .../reflector/pipelines/main_live_pipeline.py | 22 +- server/reflector/pipelines/runner.py | 14 +- server/reflector/processors/__init__.py | 7 + server/reflector/processors/audio_chunker.py | 322 +++++- .../reflector/processors/audio_diarization.py | 15 +- .../processors/audio_diarization_pyannote.py | 74 ++ server/reflector/processors/audio_merge.py | 94 +- .../processors/audio_transcript_modal.py | 218 +++- server/reflector/processors/base.py | 48 +- .../reflector/processors/file_diarization.py | 33 + .../processors/file_diarization_auto.py | 33 + .../processors/file_diarization_modal.py | 57 ++ .../reflector/processors/file_transcript.py | 65 ++ .../processors/file_transcript_auto.py | 32 + .../processors/file_transcript_modal.py | 74 ++ .../transcript_diarization_assembler.py | 45 + server/reflector/processors/types.py | 11 +- server/reflector/settings.py | 5 + server/reflector/tools/process.py | 312 +++++- server/reflector/worker/process.py | 5 +- ...test_file_diarization_modal_processor.yaml | 40 + .../test_file_transcript_modal_processor.yaml | 46 + .../test_full_modal_pipeline_integration.yaml | 84 ++ server/tests/conftest.py | 24 +- server/tests/docker-compose.test.yml | 6 +- server/tests/test_gpu_modal_transcript.py | 330 ++++++ server/tests/test_pipeline_main_file.py | 633 ++++++++++++ server/tests/test_processors_modal.py | 265 +++++ server/tests/test_processors_pipeline.py | 30 +- server/uv.lock | 937 +++++++++++++++++- 37 files changed, 5086 insertions(+), 198 deletions(-) create mode 100644 server/gpu/modal_deployments/reflector_transcriber_parakeet.py create mode 100644 server/reflector/pipelines/main_file_pipeline.py create mode 100644 server/reflector/processors/audio_diarization_pyannote.py create mode 100644 server/reflector/processors/file_diarization.py create mode 100644 server/reflector/processors/file_diarization_auto.py create mode 100644 server/reflector/processors/file_diarization_modal.py create mode 100644 server/reflector/processors/file_transcript.py create mode 100644 server/reflector/processors/file_transcript_auto.py create mode 100644 server/reflector/processors/file_transcript_modal.py create mode 100644 server/reflector/processors/transcript_diarization_assembler.py create mode 100644 server/tests/cassettes/test_processors_modal/test_file_diarization_modal_processor.yaml create mode 100644 server/tests/cassettes/test_processors_modal/test_file_transcript_modal_processor.yaml create mode 100644 server/tests/cassettes/test_processors_modal/test_full_modal_pipeline_integration.yaml create mode 100644 server/tests/test_gpu_modal_transcript.py create mode 100644 server/tests/test_pipeline_main_file.py create mode 100644 server/tests/test_processors_modal.py diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 1ab6a031..f38d6d78 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -8,18 +8,30 @@ env: ECR_REPOSITORY: reflector jobs: - deploy: - runs-on: ubuntu-latest + build: + strategy: + matrix: + include: + - platform: linux/amd64 + runner: linux-amd64 + arch: amd64 + - platform: linux/arm64 + runner: linux-arm64 + arch: arm64 + + runs-on: ${{ matrix.runner }} permissions: - deployments: write contents: read + outputs: + registry: ${{ steps.login-ecr.outputs.registry }} + steps: - - uses: actions/checkout@v3 + - uses: actions/checkout@v4 - name: Configure AWS credentials - uses: aws-actions/configure-aws-credentials@0e613a0980cbf65ed5b322eb7a1e075d28913a83 + uses: aws-actions/configure-aws-credentials@v4 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} @@ -27,21 +39,51 @@ jobs: - name: Login to Amazon ECR id: login-ecr - uses: aws-actions/amazon-ecr-login@62f4f872db3836360b72999f4b87f1ff13310f3a - - - name: Set up QEMU - uses: docker/setup-qemu-action@v2 + uses: aws-actions/amazon-ecr-login@v2 - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v2 + uses: docker/setup-buildx-action@v3 - - name: Build and push - id: docker_build - uses: docker/build-push-action@v4 + - name: Build and push ${{ matrix.arch }} + uses: docker/build-push-action@v5 with: context: server - platforms: linux/amd64,linux/arm64 + platforms: ${{ matrix.platform }} push: true - tags: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:latest - cache-from: type=gha - cache-to: type=gha,mode=max + tags: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:latest-${{ matrix.arch }} + cache-from: type=gha,scope=${{ matrix.arch }} + cache-to: type=gha,mode=max,scope=${{ matrix.arch }} + provenance: false + + create-manifest: + runs-on: ubuntu-latest + needs: [build] + + permissions: + deployments: write + contents: read + + steps: + - name: Configure AWS credentials + uses: aws-actions/configure-aws-credentials@v4 + with: + aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} + aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + aws-region: ${{ env.AWS_REGION }} + + - name: Login to Amazon ECR + uses: aws-actions/amazon-ecr-login@v2 + + - name: Create and push multi-arch manifest + run: | + # Get the registry URL (since we can't easily access job outputs in matrix) + ECR_REGISTRY=$(aws ecr describe-registry --query 'registryId' --output text).dkr.ecr.${{ env.AWS_REGION }}.amazonaws.com + + docker manifest create \ + $ECR_REGISTRY/${{ env.ECR_REPOSITORY }}:latest \ + $ECR_REGISTRY/${{ env.ECR_REPOSITORY }}:latest-amd64 \ + $ECR_REGISTRY/${{ env.ECR_REPOSITORY }}:latest-arm64 + + docker manifest push $ECR_REGISTRY/${{ env.ECR_REPOSITORY }}:latest + + echo "✅ Multi-arch manifest pushed: $ECR_REGISTRY/${{ env.ECR_REPOSITORY }}:latest" diff --git a/.github/workflows/test_server.yml b/.github/workflows/test_server.yml index e45f3ab8..6a26798b 100644 --- a/.github/workflows/test_server.yml +++ b/.github/workflows/test_server.yml @@ -19,29 +19,39 @@ jobs: steps: - uses: actions/checkout@v4 - name: Install uv - uses: astral-sh/setup-uv@v3 + uses: astral-sh/setup-uv@v6 with: enable-cache: true working-directory: server - - name: Tests run: | cd server uv run -m pytest -v tests - docker: - runs-on: ubuntu-latest + docker-amd64: + runs-on: linux-amd64 steps: - uses: actions/checkout@v4 - - name: Set up QEMU - uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v2 - - name: Build and push - id: docker_build - uses: docker/build-push-action@v4 + uses: docker/setup-buildx-action@v3 + - name: Build AMD64 + uses: docker/build-push-action@v6 with: context: server - platforms: linux/amd64,linux/arm64 - cache-from: type=gha - cache-to: type=gha,mode=max + platforms: linux/amd64 + cache-from: type=gha,scope=amd64 + cache-to: type=gha,mode=min,scope=amd64 + + docker-arm64: + runs-on: linux-arm64 + steps: + - uses: actions/checkout@v4 + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v3 + - name: Build ARM64 + uses: docker/build-push-action@v6 + with: + context: server + platforms: linux/arm64 + cache-from: type=gha,scope=arm64 + cache-to: type=gha,mode=min,scope=arm64 diff --git a/server/gpu/modal_deployments/README.md b/server/gpu/modal_deployments/README.md index 83309f49..06e6e419 100644 --- a/server/gpu/modal_deployments/README.md +++ b/server/gpu/modal_deployments/README.md @@ -4,7 +4,8 @@ This repository hold an API for the GPU implementation of the Reflector API serv and use [Modal.com](https://modal.com) - `reflector_diarizer.py` - Diarization API -- `reflector_transcriber.py` - Transcription API +- `reflector_transcriber.py` - Transcription API (Whisper) +- `reflector_transcriber_parakeet.py` - Transcription API (NVIDIA Parakeet) - `reflector_translator.py` - Translation API ## Modal.com deployment @@ -19,6 +20,10 @@ $ modal deploy reflector_transcriber.py ... └── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run +$ modal deploy reflector_transcriber_parakeet.py +... +└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run + $ modal deploy reflector_llm.py ... └── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run @@ -68,6 +73,86 @@ Authorization: bearer ### Transcription +#### Parakeet Transcriber (`reflector_transcriber_parakeet.py`) + +NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps. + +**GPU Configuration:** +- **A10G GPU** - Used for `/v1/audio/transcriptions` endpoint (small files, live transcription) + - Higher concurrency (max_inputs=10) + - Optimized for multiple small audio files + - Supports batch processing for efficiency + +- **L40S GPU** - Used for `/v1/audio/transcriptions-from-url` endpoint (large files) + - Lower concurrency but more powerful processing + - Optimized for single large audio files + - VAD-based chunking for long-form audio + +##### `/v1/audio/transcriptions` - Small file transcription + +**request** (multipart/form-data) +- `file` or `files[]` - audio file(s) to transcribe +- `model` - model name (default: `nvidia/parakeet-tdt-0.6b-v2`) +- `language` - language code (default: `en`) +- `batch` - whether to use batch processing for multiple files (default: `true`) + +**response** +```json +{ + "text": "transcribed text", + "words": [ + {"word": "hello", "start": 0.0, "end": 0.5}, + {"word": "world", "start": 0.5, "end": 1.0} + ], + "filename": "audio.mp3" +} +``` + +For multiple files with batch=true: +```json +{ + "results": [ + { + "filename": "audio1.mp3", + "text": "transcribed text", + "words": [...] + }, + { + "filename": "audio2.mp3", + "text": "transcribed text", + "words": [...] + } + ] +} +``` + +##### `/v1/audio/transcriptions-from-url` - Large file transcription + +**request** (application/json) +```json +{ + "audio_file_url": "https://example.com/audio.mp3", + "model": "nvidia/parakeet-tdt-0.6b-v2", + "language": "en", + "timestamp_offset": 0.0 +} +``` + +**response** +```json +{ + "text": "transcribed text from large file", + "words": [ + {"word": "hello", "start": 0.0, "end": 0.5}, + {"word": "world", "start": 0.5, "end": 1.0} + ] +} +``` + +**Supported file types:** mp3, mp4, mpeg, mpga, m4a, wav, webm + +#### Whisper Transcriber (`reflector_transcriber.py`) + `POST /transcribe` **request** (multipart/form-data) diff --git a/server/gpu/modal_deployments/reflector_diarizer.py b/server/gpu/modal_deployments/reflector_diarizer.py index 639b983e..e9a4be46 100644 --- a/server/gpu/modal_deployments/reflector_diarizer.py +++ b/server/gpu/modal_deployments/reflector_diarizer.py @@ -4,14 +4,80 @@ Reflector GPU backend - diarizer """ import os +import uuid +from typing import Mapping, NewType +from urllib.parse import urlparse -import modal.gpu -from modal import App, Image, Secret, asgi_app, enter, method -from pydantic import BaseModel +import modal PYANNOTE_MODEL_NAME: str = "pyannote/speaker-diarization-3.1" MODEL_DIR = "/root/diarization_models" -app = App(name="reflector-diarizer") +UPLOADS_PATH = "/uploads" +SUPPORTED_FILE_EXTENSIONS = ["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"] + +DiarizerUniqFilename = NewType("DiarizerUniqFilename", str) +AudioFileExtension = NewType("AudioFileExtension", str) + +app = modal.App(name="reflector-diarizer") + +# Volume for temporary file uploads +upload_volume = modal.Volume.from_name("diarizer-uploads", create_if_missing=True) + + +def detect_audio_format(url: str, headers: Mapping[str, str]) -> AudioFileExtension: + parsed_url = urlparse(url) + url_path = parsed_url.path + + for ext in SUPPORTED_FILE_EXTENSIONS: + if url_path.lower().endswith(f".{ext}"): + return AudioFileExtension(ext) + + content_type = headers.get("content-type", "").lower() + if "audio/mpeg" in content_type or "audio/mp3" in content_type: + return AudioFileExtension("mp3") + if "audio/wav" in content_type: + return AudioFileExtension("wav") + if "audio/mp4" in content_type: + return AudioFileExtension("mp4") + + raise ValueError( + f"Unsupported audio format for URL: {url}. " + f"Supported extensions: {', '.join(SUPPORTED_FILE_EXTENSIONS)}" + ) + + +def download_audio_to_volume( + audio_file_url: str, +) -> tuple[DiarizerUniqFilename, AudioFileExtension]: + import requests + from fastapi import HTTPException + + print(f"Checking audio file at: {audio_file_url}") + response = requests.head(audio_file_url, allow_redirects=True) + if response.status_code == 404: + raise HTTPException(status_code=404, detail="Audio file not found") + + print(f"Downloading audio file from: {audio_file_url}") + response = requests.get(audio_file_url, allow_redirects=True) + + if response.status_code != 200: + print(f"Download failed with status {response.status_code}: {response.text}") + raise HTTPException( + status_code=response.status_code, + detail=f"Failed to download audio file: {response.status_code}", + ) + + audio_suffix = detect_audio_format(audio_file_url, response.headers) + unique_filename = DiarizerUniqFilename(f"{uuid.uuid4()}.{audio_suffix}") + file_path = f"{UPLOADS_PATH}/{unique_filename}" + + print(f"Writing file to: {file_path} (size: {len(response.content)} bytes)") + with open(file_path, "wb") as f: + f.write(response.content) + + upload_volume.commit() + print(f"File saved as: {unique_filename}") + return unique_filename, audio_suffix def migrate_cache_llm(): @@ -39,7 +105,7 @@ def download_pyannote_audio(): diarizer_image = ( - Image.debian_slim(python_version="3.10.8") + modal.Image.debian_slim(python_version="3.10.8") .pip_install( "pyannote.audio==3.1.0", "requests", @@ -55,7 +121,8 @@ diarizer_image = ( "hf-transfer", ) .run_function( - download_pyannote_audio, secrets=[Secret.from_name("my-huggingface-secret")] + download_pyannote_audio, + secrets=[modal.Secret.from_name("hf_token")], ) .run_function(migrate_cache_llm) .env( @@ -70,53 +137,60 @@ diarizer_image = ( @app.cls( - gpu=modal.gpu.A100(size="40GB"), + gpu="A100", timeout=60 * 30, - scaledown_window=60, - allow_concurrent_inputs=1, image=diarizer_image, + volumes={UPLOADS_PATH: upload_volume}, + enable_memory_snapshot=True, + experimental_options={"enable_gpu_snapshot": True}, + secrets=[ + modal.Secret.from_name("hf_token"), + ], ) +@modal.concurrent(max_inputs=1) class Diarizer: - @enter() + @modal.enter(snap=True) def enter(self): import torch from pyannote.audio import Pipeline self.use_gpu = torch.cuda.is_available() self.device = "cuda" if self.use_gpu else "cpu" + print(f"Using device: {self.device}") self.diarization_pipeline = Pipeline.from_pretrained( - PYANNOTE_MODEL_NAME, cache_dir=MODEL_DIR + PYANNOTE_MODEL_NAME, + cache_dir=MODEL_DIR, + use_auth_token=os.environ["HF_TOKEN"], ) self.diarization_pipeline.to(torch.device(self.device)) - @method() - def diarize(self, audio_data: str, audio_suffix: str, timestamp: float): - import tempfile - + @modal.method() + def diarize(self, filename: str, timestamp: float = 0.0): import torchaudio - with tempfile.NamedTemporaryFile("wb+", suffix=f".{audio_suffix}") as fp: - fp.write(audio_data) + upload_volume.reload() - print("Diarizing audio") - waveform, sample_rate = torchaudio.load(fp.name) - diarization = self.diarization_pipeline( - {"waveform": waveform, "sample_rate": sample_rate} + file_path = f"{UPLOADS_PATH}/{filename}" + if not os.path.exists(file_path): + raise FileNotFoundError(f"File not found: {file_path}") + + print(f"Diarizing audio from: {file_path}") + waveform, sample_rate = torchaudio.load(file_path) + diarization = self.diarization_pipeline( + {"waveform": waveform, "sample_rate": sample_rate} + ) + + words = [] + for diarization_segment, _, speaker in diarization.itertracks(yield_label=True): + words.append( + { + "start": round(timestamp + diarization_segment.start, 3), + "end": round(timestamp + diarization_segment.end, 3), + "speaker": int(speaker[-2:]), + } ) - - words = [] - for diarization_segment, _, speaker in diarization.itertracks( - yield_label=True - ): - words.append( - { - "start": round(timestamp + diarization_segment.start, 3), - "end": round(timestamp + diarization_segment.end, 3), - "speaker": int(speaker[-2:]), - } - ) - print("Diarization complete") - return {"diarization": words} + print("Diarization complete") + return {"diarization": words} # ------------------------------------------------------------------- @@ -127,17 +201,18 @@ class Diarizer: @app.function( timeout=60 * 10, scaledown_window=60 * 3, - allow_concurrent_inputs=40, secrets=[ - Secret.from_name("reflector-gpu"), + modal.Secret.from_name("reflector-gpu"), ], + volumes={UPLOADS_PATH: upload_volume}, image=diarizer_image, ) -@asgi_app() +@modal.concurrent(max_inputs=40) +@modal.asgi_app() def web(): - import requests from fastapi import Depends, FastAPI, HTTPException, status from fastapi.security import OAuth2PasswordBearer + from pydantic import BaseModel diarizerstub = Diarizer() @@ -153,35 +228,26 @@ def web(): headers={"WWW-Authenticate": "Bearer"}, ) - def validate_audio_file(audio_file_url: str): - # Check if the audio file exists - response = requests.head(audio_file_url, allow_redirects=True) - if response.status_code == 404: - raise HTTPException( - status_code=response.status_code, - detail="The audio file does not exist.", - ) - class DiarizationResponse(BaseModel): result: dict - @app.post( - "/diarize", dependencies=[Depends(apikey_auth), Depends(validate_audio_file)] - ) - def diarize( - audio_file_url: str, timestamp: float = 0.0 - ) -> HTTPException | DiarizationResponse: - # Currently the uploaded files are in mp3 format - audio_suffix = "mp3" + @app.post("/diarize", dependencies=[Depends(apikey_auth)]) + def diarize(audio_file_url: str, timestamp: float = 0.0) -> DiarizationResponse: + unique_filename, audio_suffix = download_audio_to_volume(audio_file_url) - print("Downloading audio file") - response = requests.get(audio_file_url, allow_redirects=True) - print("Audio file downloaded successfully") - - func = diarizerstub.diarize.spawn( - audio_data=response.content, audio_suffix=audio_suffix, timestamp=timestamp - ) - result = func.get() - return result + try: + func = diarizerstub.diarize.spawn( + filename=unique_filename, timestamp=timestamp + ) + result = func.get() + return result + finally: + try: + file_path = f"{UPLOADS_PATH}/{unique_filename}" + print(f"Deleting file: {file_path}") + os.remove(file_path) + upload_volume.commit() + except Exception as e: + print(f"Error cleaning up {unique_filename}: {e}") return app diff --git a/server/gpu/modal_deployments/reflector_transcriber_parakeet.py b/server/gpu/modal_deployments/reflector_transcriber_parakeet.py new file mode 100644 index 00000000..df53a0ae --- /dev/null +++ b/server/gpu/modal_deployments/reflector_transcriber_parakeet.py @@ -0,0 +1,622 @@ +import logging +import os +import sys +import threading +import uuid +from typing import Mapping, NewType +from urllib.parse import urlparse + +import modal + +MODEL_NAME = "nvidia/parakeet-tdt-0.6b-v2" +SUPPORTED_FILE_EXTENSIONS = ["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"] +SAMPLERATE = 16000 +UPLOADS_PATH = "/uploads" +CACHE_PATH = "/cache" +VAD_CONFIG = { + "max_segment_duration": 30.0, + "batch_max_files": 10, + "batch_max_duration": 5.0, + "min_segment_duration": 0.02, + "silence_padding": 0.5, + "window_size": 512, +} + +ParakeetUniqFilename = NewType("ParakeetUniqFilename", str) +AudioFileExtension = NewType("AudioFileExtension", str) + +app = modal.App("reflector-transcriber-parakeet") + +# Volume for caching model weights +model_cache = modal.Volume.from_name("parakeet-model-cache", create_if_missing=True) +# Volume for temporary file uploads +upload_volume = modal.Volume.from_name("parakeet-uploads", create_if_missing=True) + +image = ( + modal.Image.from_registry( + "nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04", add_python="3.12" + ) + .env( + { + "HF_HUB_ENABLE_HF_TRANSFER": "1", + "HF_HOME": "/cache", + "DEBIAN_FRONTEND": "noninteractive", + "CXX": "g++", + "CC": "g++", + } + ) + .apt_install("ffmpeg") + .pip_install( + "hf_transfer==0.1.9", + "huggingface_hub[hf-xet]==0.31.2", + "nemo_toolkit[asr]==2.3.0", + "cuda-python==12.8.0", + "fastapi==0.115.12", + "numpy<2", + "librosa==0.10.1", + "requests", + "silero-vad==5.1.0", + "torch", + ) + .entrypoint([]) # silence chatty logs by container on start +) + + +def detect_audio_format(url: str, headers: Mapping[str, str]) -> AudioFileExtension: + parsed_url = urlparse(url) + url_path = parsed_url.path + + for ext in SUPPORTED_FILE_EXTENSIONS: + if url_path.lower().endswith(f".{ext}"): + return AudioFileExtension(ext) + + content_type = headers.get("content-type", "").lower() + if "audio/mpeg" in content_type or "audio/mp3" in content_type: + return AudioFileExtension("mp3") + if "audio/wav" in content_type: + return AudioFileExtension("wav") + if "audio/mp4" in content_type: + return AudioFileExtension("mp4") + + raise ValueError( + f"Unsupported audio format for URL: {url}. " + f"Supported extensions: {', '.join(SUPPORTED_FILE_EXTENSIONS)}" + ) + + +def download_audio_to_volume( + audio_file_url: str, +) -> tuple[ParakeetUniqFilename, AudioFileExtension]: + import requests + from fastapi import HTTPException + + response = requests.head(audio_file_url, allow_redirects=True) + if response.status_code == 404: + raise HTTPException(status_code=404, detail="Audio file not found") + + response = requests.get(audio_file_url, allow_redirects=True) + response.raise_for_status() + + audio_suffix = detect_audio_format(audio_file_url, response.headers) + unique_filename = ParakeetUniqFilename(f"{uuid.uuid4()}.{audio_suffix}") + file_path = f"{UPLOADS_PATH}/{unique_filename}" + + with open(file_path, "wb") as f: + f.write(response.content) + + upload_volume.commit() + return unique_filename, audio_suffix + + +def pad_audio(audio_array, sample_rate: int = SAMPLERATE): + """Add 0.5 seconds of silence if audio is less than 500ms. + + This is a workaround for a Parakeet bug where very short audio (<500ms) causes: + ValueError: `char_offsets`: [] and `processed_tokens`: [157, 834, 834, 841] + have to be of the same length + + See: https://github.com/NVIDIA/NeMo/issues/8451 + """ + import numpy as np + + audio_duration = len(audio_array) / sample_rate + if audio_duration < 0.5: + silence_samples = int(sample_rate * 0.5) + silence = np.zeros(silence_samples, dtype=np.float32) + return np.concatenate([audio_array, silence]) + return audio_array + + +@app.cls( + gpu="A10G", + timeout=600, + scaledown_window=300, + image=image, + volumes={CACHE_PATH: model_cache, UPLOADS_PATH: upload_volume}, + enable_memory_snapshot=True, + experimental_options={"enable_gpu_snapshot": True}, +) +@modal.concurrent(max_inputs=10) +class TranscriberParakeetLive: + @modal.enter(snap=True) + def enter(self): + import nemo.collections.asr as nemo_asr + + logging.getLogger("nemo_logger").setLevel(logging.CRITICAL) + + self.lock = threading.Lock() + self.model = nemo_asr.models.ASRModel.from_pretrained(model_name=MODEL_NAME) + device = next(self.model.parameters()).device + print(f"Model is on device: {device}") + + @modal.method() + def transcribe_segment( + self, + filename: str, + ): + import librosa + + upload_volume.reload() + + file_path = f"{UPLOADS_PATH}/{filename}" + if not os.path.exists(file_path): + raise FileNotFoundError(f"File not found: {file_path}") + + audio_array, sample_rate = librosa.load(file_path, sr=SAMPLERATE, mono=True) + padded_audio = pad_audio(audio_array, sample_rate) + + with self.lock: + with NoStdStreams(): + (output,) = self.model.transcribe([padded_audio], timestamps=True) + + text = output.text.strip() + words = [ + { + "word": word_info["word"], + "start": round(word_info["start"], 2), + "end": round(word_info["end"], 2), + } + for word_info in output.timestamp["word"] + ] + + return {"text": text, "words": words} + + @modal.method() + def transcribe_batch( + self, + filenames: list[str], + ): + import librosa + + upload_volume.reload() + + results = [] + audio_arrays = [] + + # Load all audio files with padding + for filename in filenames: + file_path = f"{UPLOADS_PATH}/{filename}" + if not os.path.exists(file_path): + raise FileNotFoundError(f"Batch file not found: {file_path}") + + audio_array, sample_rate = librosa.load(file_path, sr=SAMPLERATE, mono=True) + padded_audio = pad_audio(audio_array, sample_rate) + audio_arrays.append(padded_audio) + + with self.lock: + with NoStdStreams(): + outputs = self.model.transcribe(audio_arrays, timestamps=True) + + # Process results for each file + for i, (filename, output) in enumerate(zip(filenames, outputs)): + text = output.text.strip() + + words = [ + { + "word": word_info["word"], + "start": round(word_info["start"], 2), + "end": round(word_info["end"], 2), + } + for word_info in output.timestamp["word"] + ] + + results.append( + { + "filename": filename, + "text": text, + "words": words, + } + ) + + return results + + +# L40S class for file transcription (bigger files) +@app.cls( + gpu="L40S", + timeout=900, + image=image, + volumes={CACHE_PATH: model_cache, UPLOADS_PATH: upload_volume}, + enable_memory_snapshot=True, + experimental_options={"enable_gpu_snapshot": True}, +) +class TranscriberParakeetFile: + @modal.enter(snap=True) + def enter(self): + import nemo.collections.asr as nemo_asr + import torch + from silero_vad import load_silero_vad + + logging.getLogger("nemo_logger").setLevel(logging.CRITICAL) + + self.model = nemo_asr.models.ASRModel.from_pretrained(model_name=MODEL_NAME) + device = next(self.model.parameters()).device + print(f"Model is on device: {device}") + + torch.set_num_threads(1) + self.vad_model = load_silero_vad(onnx=False) + print("Silero VAD initialized") + + @modal.method() + def transcribe_segment( + self, + filename: str, + timestamp_offset: float = 0.0, + ): + import librosa + import numpy as np + from silero_vad import VADIterator + + def load_and_convert_audio(file_path): + audio_array, sample_rate = librosa.load(file_path, sr=SAMPLERATE, mono=True) + return audio_array + + def vad_segment_generator(audio_array): + """Generate speech segments using VAD with start/end sample indices""" + vad_iterator = VADIterator(self.vad_model, sampling_rate=SAMPLERATE) + window_size = VAD_CONFIG["window_size"] + start = None + + for i in range(0, len(audio_array), window_size): + chunk = audio_array[i : i + window_size] + if len(chunk) < window_size: + chunk = np.pad( + chunk, (0, window_size - len(chunk)), mode="constant" + ) + + speech_dict = vad_iterator(chunk) + if not speech_dict: + continue + + if "start" in speech_dict: + start = speech_dict["start"] + continue + + if "end" in speech_dict and start is not None: + end = speech_dict["end"] + start_time = start / float(SAMPLERATE) + end_time = end / float(SAMPLERATE) + + # Extract the actual audio segment + audio_segment = audio_array[start:end] + + yield (start_time, end_time, audio_segment) + start = None + + vad_iterator.reset_states() + + def vad_segment_filter(segments): + """Filter VAD segments by duration and chunk large segments""" + min_dur = VAD_CONFIG["min_segment_duration"] + max_dur = VAD_CONFIG["max_segment_duration"] + + for start_time, end_time, audio_segment in segments: + segment_duration = end_time - start_time + + # Skip very small segments + if segment_duration < min_dur: + continue + + # If segment is within max duration, yield as-is + if segment_duration <= max_dur: + yield (start_time, end_time, audio_segment) + continue + + # Chunk large segments into smaller pieces + chunk_samples = int(max_dur * SAMPLERATE) + current_start = start_time + + for chunk_offset in range(0, len(audio_segment), chunk_samples): + chunk_audio = audio_segment[ + chunk_offset : chunk_offset + chunk_samples + ] + if len(chunk_audio) == 0: + break + + chunk_duration = len(chunk_audio) / float(SAMPLERATE) + chunk_end = current_start + chunk_duration + + # Only yield chunks that meet minimum duration + if chunk_duration >= min_dur: + yield (current_start, chunk_end, chunk_audio) + + current_start = chunk_end + + def batch_segments(segments, max_files=10, max_duration=5.0): + batch = [] + batch_duration = 0.0 + + for start_time, end_time, audio_segment in segments: + segment_duration = end_time - start_time + + if segment_duration < VAD_CONFIG["silence_padding"]: + silence_samples = int( + (VAD_CONFIG["silence_padding"] - segment_duration) * SAMPLERATE + ) + padding = np.zeros(silence_samples, dtype=np.float32) + audio_segment = np.concatenate([audio_segment, padding]) + segment_duration = VAD_CONFIG["silence_padding"] + + batch.append((start_time, end_time, audio_segment)) + batch_duration += segment_duration + + if len(batch) >= max_files or batch_duration >= max_duration: + yield batch + batch = [] + batch_duration = 0.0 + + if batch: + yield batch + + def transcribe_batch(model, audio_segments): + with NoStdStreams(): + outputs = model.transcribe(audio_segments, timestamps=True) + return outputs + + def emit_results( + results, + segments_info, + batch_index, + total_batches, + ): + """Yield transcribed text and word timings from model output, adjusting timestamps to absolute positions.""" + for i, (output, (start_time, end_time, _)) in enumerate( + zip(results, segments_info) + ): + text = output.text.strip() + words = [ + { + "word": word_info["word"], + "start": round( + word_info["start"] + start_time + timestamp_offset, 2 + ), + "end": round( + word_info["end"] + start_time + timestamp_offset, 2 + ), + } + for word_info in output.timestamp["word"] + ] + + yield text, words + + upload_volume.reload() + + file_path = f"{UPLOADS_PATH}/{filename}" + if not os.path.exists(file_path): + raise FileNotFoundError(f"File not found: {file_path}") + + audio_array = load_and_convert_audio(file_path) + total_duration = len(audio_array) / float(SAMPLERATE) + processed_duration = 0.0 + + all_text_parts = [] + all_words = [] + + raw_segments = vad_segment_generator(audio_array) + filtered_segments = vad_segment_filter(raw_segments) + batches = batch_segments( + filtered_segments, + VAD_CONFIG["batch_max_files"], + VAD_CONFIG["batch_max_duration"], + ) + + batch_index = 0 + total_batches = max( + 1, int(total_duration / VAD_CONFIG["batch_max_duration"]) + 1 + ) + + for batch in batches: + batch_index += 1 + audio_segments = [seg[2] for seg in batch] + results = transcribe_batch(self.model, audio_segments) + + for text, words in emit_results( + results, + batch, + batch_index, + total_batches, + ): + if not text: + continue + all_text_parts.append(text) + all_words.extend(words) + + processed_duration += sum(len(seg[2]) / float(SAMPLERATE) for seg in batch) + + combined_text = " ".join(all_text_parts) + return {"text": combined_text, "words": all_words} + + +@app.function( + scaledown_window=60, + timeout=600, + secrets=[ + modal.Secret.from_name("reflector-gpu"), + ], + volumes={CACHE_PATH: model_cache, UPLOADS_PATH: upload_volume}, + image=image, +) +@modal.concurrent(max_inputs=40) +@modal.asgi_app() +def web(): + import os + import uuid + + from fastapi import ( + Body, + Depends, + FastAPI, + Form, + HTTPException, + UploadFile, + status, + ) + from fastapi.security import OAuth2PasswordBearer + from pydantic import BaseModel + + transcriber_live = TranscriberParakeetLive() + transcriber_file = TranscriberParakeetFile() + + app = FastAPI() + + oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token") + + def apikey_auth(apikey: str = Depends(oauth2_scheme)): + if apikey == os.environ["REFLECTOR_GPU_APIKEY"]: + return + raise HTTPException( + status_code=status.HTTP_401_UNAUTHORIZED, + detail="Invalid API key", + headers={"WWW-Authenticate": "Bearer"}, + ) + + class TranscriptResponse(BaseModel): + result: dict + + @app.post("/v1/audio/transcriptions", dependencies=[Depends(apikey_auth)]) + def transcribe( + file: UploadFile = None, + files: list[UploadFile] | None = None, + model: str = Form(MODEL_NAME), + language: str = Form("en"), + batch: bool = Form(False), + ): + # Parakeet only supports English + if language != "en": + raise HTTPException( + status_code=400, + detail=f"Parakeet model only supports English. Got language='{language}'", + ) + # Handle both single file and multiple files + if not file and not files: + raise HTTPException( + status_code=400, detail="Either 'file' or 'files' parameter is required" + ) + if batch and not files: + raise HTTPException( + status_code=400, detail="Batch transcription requires 'files'" + ) + + upload_files = [file] if file else files + + # Upload files to volume + uploaded_filenames = [] + for upload_file in upload_files: + audio_suffix = upload_file.filename.split(".")[-1] + assert audio_suffix in SUPPORTED_FILE_EXTENSIONS + + # Generate unique filename + unique_filename = f"{uuid.uuid4()}.{audio_suffix}" + file_path = f"{UPLOADS_PATH}/{unique_filename}" + + print(f"Writing file to: {file_path}") + with open(file_path, "wb") as f: + content = upload_file.file.read() + f.write(content) + + uploaded_filenames.append(unique_filename) + + upload_volume.commit() + + try: + # Use A10G live transcriber for per-file transcription + if batch and len(upload_files) > 1: + # Use batch transcription + func = transcriber_live.transcribe_batch.spawn( + filenames=uploaded_filenames, + ) + results = func.get() + return {"results": results} + + # Per-file transcription + results = [] + for filename in uploaded_filenames: + func = transcriber_live.transcribe_segment.spawn( + filename=filename, + ) + result = func.get() + result["filename"] = filename + results.append(result) + + return {"results": results} if len(results) > 1 else results[0] + + finally: + for filename in uploaded_filenames: + try: + file_path = f"{UPLOADS_PATH}/{filename}" + print(f"Deleting file: {file_path}") + os.remove(file_path) + except Exception as e: + print(f"Error deleting {filename}: {e}") + + upload_volume.commit() + + @app.post("/v1/audio/transcriptions-from-url", dependencies=[Depends(apikey_auth)]) + def transcribe_from_url( + audio_file_url: str = Body( + ..., description="URL of the audio file to transcribe" + ), + model: str = Body(MODEL_NAME), + language: str = Body("en", description="Language code (only 'en' supported)"), + timestamp_offset: float = Body(0.0), + ): + # Parakeet only supports English + if language != "en": + raise HTTPException( + status_code=400, + detail=f"Parakeet model only supports English. Got language='{language}'", + ) + unique_filename, audio_suffix = download_audio_to_volume(audio_file_url) + + try: + func = transcriber_file.transcribe_segment.spawn( + filename=unique_filename, + timestamp_offset=timestamp_offset, + ) + result = func.get() + return result + finally: + try: + file_path = f"{UPLOADS_PATH}/{unique_filename}" + print(f"Deleting file: {file_path}") + os.remove(file_path) + upload_volume.commit() + except Exception as e: + print(f"Error cleaning up {unique_filename}: {e}") + + return app + + +class NoStdStreams: + def __init__(self): + self.devnull = open(os.devnull, "w") + + def __enter__(self): + self._stdout, self._stderr = sys.stdout, sys.stderr + self._stdout.flush() + self._stderr.flush() + sys.stdout, sys.stderr = self.devnull, self.devnull + + def __exit__(self, exc_type, exc_value, traceback): + sys.stdout, sys.stderr = self._stdout, self._stderr + self.devnull.close() diff --git a/server/pyproject.toml b/server/pyproject.toml index f7f97dbc..8703210c 100644 --- a/server/pyproject.toml +++ b/server/pyproject.toml @@ -32,7 +32,6 @@ dependencies = [ "redis>=5.0.1", "python-jose[cryptography]>=3.3.0", "python-multipart>=0.0.6", - "faster-whisper>=0.10.0", "transformers>=4.36.2", "jsonschema>=4.23.0", "openai>=1.59.7", @@ -41,6 +40,7 @@ dependencies = [ "llama-index-llms-openai-like>=0.4.0", "pytest-env>=1.1.5", "webvtt-py>=0.5.0", + "silero-vad>=5.1.2", ] [dependency-groups] @@ -57,6 +57,7 @@ tests = [ "httpx-ws>=0.4.1", "pytest-httpx>=0.23.1", "pytest-celery>=0.0.0", + "pytest-recording>=0.13.4", "pytest-docker>=3.2.3", "asgi-lifespan>=2.1.0", ] @@ -67,6 +68,10 @@ evaluation = [ "tqdm>=4.66.0", "pydantic>=2.1.1", ] +local = [ + "pyannote-audio>=3.3.2", + "faster-whisper>=0.10.0", +] [tool.uv] default-groups = [ @@ -74,6 +79,7 @@ default-groups = [ "tests", "aws", "evaluation", + "local" ] [build-system] @@ -94,6 +100,9 @@ DATABASE_URL = "postgresql://test_user:test_password@localhost:15432/reflector_t addopts = "-ra -q --disable-pytest-warnings --cov --cov-report html -v" testpaths = ["tests"] asyncio_mode = "auto" +markers = [ + "gpu_modal: mark test to run only with GPU Modal endpoints (deselect with '-m \"not gpu_modal\"')", +] [tool.ruff.lint] select = [ diff --git a/server/reflector/pipelines/main_file_pipeline.py b/server/reflector/pipelines/main_file_pipeline.py new file mode 100644 index 00000000..f2c8fb85 --- /dev/null +++ b/server/reflector/pipelines/main_file_pipeline.py @@ -0,0 +1,375 @@ +""" +File-based processing pipeline +============================== + +Optimized pipeline for processing complete audio/video files. +Uses parallel processing for transcription, diarization, and waveform generation. +""" + +import asyncio +from pathlib import Path + +import av +import structlog +from celery import shared_task + +from reflector.db.transcripts import ( + Transcript, + transcripts_controller, +) +from reflector.logger import logger +from reflector.pipelines.main_live_pipeline import PipelineMainBase, asynctask +from reflector.processors import ( + AudioFileWriterProcessor, + TranscriptFinalSummaryProcessor, + TranscriptFinalTitleProcessor, + TranscriptTopicDetectorProcessor, +) +from reflector.processors.audio_waveform_processor import AudioWaveformProcessor +from reflector.processors.file_diarization import FileDiarizationInput +from reflector.processors.file_diarization_auto import FileDiarizationAutoProcessor +from reflector.processors.file_transcript import FileTranscriptInput +from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor +from reflector.processors.transcript_diarization_assembler import ( + TranscriptDiarizationAssemblerInput, + TranscriptDiarizationAssemblerProcessor, +) +from reflector.processors.types import ( + DiarizationSegment, + TitleSummary, +) +from reflector.processors.types import ( + Transcript as TranscriptType, +) +from reflector.settings import settings +from reflector.storage import get_transcripts_storage + + +class EmptyPipeline: + """Empty pipeline for processors that need a pipeline reference""" + + def __init__(self, logger: structlog.BoundLogger): + self.logger = logger + + def get_pref(self, k, d=None): + return d + + async def emit(self, event): + pass + + +class PipelineMainFile(PipelineMainBase): + """ + Optimized file processing pipeline. + Processes complete audio/video files with parallel execution. + """ + + logger: structlog.BoundLogger = None + empty_pipeline = None + + def __init__(self, transcript_id: str): + super().__init__(transcript_id=transcript_id) + self.logger = logger.bind(transcript_id=self.transcript_id) + self.empty_pipeline = EmptyPipeline(logger=self.logger) + + def _handle_gather_exceptions(self, results: list, operation: str) -> None: + """Handle exceptions from asyncio.gather with return_exceptions=True""" + for i, result in enumerate(results): + if not isinstance(result, Exception): + continue + self.logger.error( + f"Error in {operation} (task {i}): {result}", + transcript_id=self.transcript_id, + exc_info=result, + ) + + async def process(self, file_path: Path): + """Main entry point for file processing""" + self.logger.info(f"Starting file pipeline for {file_path}") + + transcript = await self.get_transcript() + + # Extract audio and write to transcript location + audio_path = await self.extract_and_write_audio(file_path, transcript) + + # Upload for processing + audio_url = await self.upload_audio(audio_path, transcript) + + # Run parallel processing + await self.run_parallel_processing( + audio_path, + audio_url, + transcript.source_language, + transcript.target_language, + ) + + self.logger.info("File pipeline complete") + + async def extract_and_write_audio( + self, file_path: Path, transcript: Transcript + ) -> Path: + """Extract audio from video if needed and write to transcript location as MP3""" + self.logger.info(f"Processing audio file: {file_path}") + + # Check if it's already audio-only + container = av.open(str(file_path)) + has_video = len(container.streams.video) > 0 + container.close() + + # Use AudioFileWriterProcessor to write MP3 to transcript location + mp3_writer = AudioFileWriterProcessor( + path=transcript.audio_mp3_filename, + on_duration=self.on_duration, + ) + + # Process audio frames and write to transcript location + input_container = av.open(str(file_path)) + for frame in input_container.decode(audio=0): + await mp3_writer.push(frame) + + await mp3_writer.flush() + input_container.close() + + if has_video: + self.logger.info( + f"Extracted audio from video and saved to {transcript.audio_mp3_filename}" + ) + else: + self.logger.info( + f"Converted audio file and saved to {transcript.audio_mp3_filename}" + ) + + return transcript.audio_mp3_filename + + async def upload_audio(self, audio_path: Path, transcript: Transcript) -> str: + """Upload audio to storage for processing""" + storage = get_transcripts_storage() + + if not storage: + raise Exception( + "Storage backend required for file processing. Configure TRANSCRIPT_STORAGE_* settings." + ) + + self.logger.info("Uploading audio to storage") + + with open(audio_path, "rb") as f: + audio_data = f.read() + + storage_path = f"file_pipeline/{transcript.id}/audio.mp3" + await storage.put_file(storage_path, audio_data) + + audio_url = await storage.get_file_url(storage_path) + + self.logger.info(f"Audio uploaded to {audio_url}") + return audio_url + + async def run_parallel_processing( + self, + audio_path: Path, + audio_url: str, + source_language: str, + target_language: str, + ): + """Coordinate parallel processing of transcription, diarization, and waveform""" + self.logger.info( + "Starting parallel processing", transcript_id=self.transcript_id + ) + + # Phase 1: Parallel processing of independent tasks + transcription_task = self.transcribe_file(audio_url, source_language) + diarization_task = self.diarize_file(audio_url) + waveform_task = self.generate_waveform(audio_path) + + results = await asyncio.gather( + transcription_task, diarization_task, waveform_task, return_exceptions=True + ) + + transcript_result = results[0] + diarization_result = results[1] + + # Handle errors - raise any exception that occurred + self._handle_gather_exceptions(results, "parallel processing") + for result in results: + if isinstance(result, Exception): + raise result + + # Phase 2: Assemble transcript with diarization + self.logger.info( + "Assembling transcript with diarization", transcript_id=self.transcript_id + ) + processor = TranscriptDiarizationAssemblerProcessor() + input_data = TranscriptDiarizationAssemblerInput( + transcript=transcript_result, diarization=diarization_result or [] + ) + + # Store result for retrieval + diarized_transcript: Transcript | None = None + + async def capture_result(transcript): + nonlocal diarized_transcript + diarized_transcript = transcript + + processor.on(capture_result) + await processor.push(input_data) + await processor.flush() + + if not diarized_transcript: + raise ValueError("No diarized transcript captured") + + # Phase 3: Generate topics from diarized transcript + self.logger.info("Generating topics", transcript_id=self.transcript_id) + topics = await self.detect_topics(diarized_transcript, target_language) + + # Phase 4: Generate title and summaries in parallel + self.logger.info( + "Generating title and summaries", transcript_id=self.transcript_id + ) + results = await asyncio.gather( + self.generate_title(topics), + self.generate_summaries(topics), + return_exceptions=True, + ) + + self._handle_gather_exceptions(results, "title and summary generation") + + async def transcribe_file(self, audio_url: str, language: str) -> TranscriptType: + """Transcribe complete file""" + processor = FileTranscriptAutoProcessor() + input_data = FileTranscriptInput(audio_url=audio_url, language=language) + + # Store result for retrieval + result: TranscriptType | None = None + + async def capture_result(transcript): + nonlocal result + result = transcript + + processor.on(capture_result) + await processor.push(input_data) + await processor.flush() + + if not result: + raise ValueError("No transcript captured") + + return result + + async def diarize_file(self, audio_url: str) -> list[DiarizationSegment] | None: + """Get diarization for file""" + if not settings.DIARIZATION_BACKEND: + self.logger.info("Diarization disabled") + return None + + processor = FileDiarizationAutoProcessor() + input_data = FileDiarizationInput(audio_url=audio_url) + + # Store result for retrieval + result = None + + async def capture_result(diarization_output): + nonlocal result + result = diarization_output.diarization + + try: + processor.on(capture_result) + await processor.push(input_data) + await processor.flush() + return result + except Exception as e: + self.logger.error(f"Diarization failed: {e}") + return None + + async def generate_waveform(self, audio_path: Path): + """Generate and save waveform""" + transcript = await self.get_transcript() + + processor = AudioWaveformProcessor( + audio_path=audio_path, + waveform_path=transcript.audio_waveform_filename, + on_waveform=self.on_waveform, + ) + processor.set_pipeline(self.empty_pipeline) + + await processor.flush() + + async def detect_topics( + self, transcript: TranscriptType, target_language: str + ) -> list[TitleSummary]: + """Detect topics from complete transcript""" + chunk_size = 300 + topics: list[TitleSummary] = [] + + async def on_topic(topic: TitleSummary): + topics.append(topic) + return await self.on_topic(topic) + + topic_detector = TranscriptTopicDetectorProcessor(callback=on_topic) + topic_detector.set_pipeline(self.empty_pipeline) + + for i in range(0, len(transcript.words), chunk_size): + chunk_words = transcript.words[i : i + chunk_size] + if not chunk_words: + continue + + chunk_transcript = TranscriptType( + words=chunk_words, translation=transcript.translation + ) + + await topic_detector.push(chunk_transcript) + + await topic_detector.flush() + return topics + + async def generate_title(self, topics: list[TitleSummary]): + """Generate title from topics""" + if not topics: + self.logger.warning("No topics for title generation") + return + + processor = TranscriptFinalTitleProcessor(callback=self.on_title) + processor.set_pipeline(self.empty_pipeline) + + for topic in topics: + await processor.push(topic) + + await processor.flush() + + async def generate_summaries(self, topics: list[TitleSummary]): + """Generate long and short summaries from topics""" + if not topics: + self.logger.warning("No topics for summary generation") + return + + transcript = await self.get_transcript() + processor = TranscriptFinalSummaryProcessor( + transcript=transcript, + callback=self.on_long_summary, + on_short_summary=self.on_short_summary, + ) + processor.set_pipeline(self.empty_pipeline) + + for topic in topics: + await processor.push(topic) + + await processor.flush() + + +@shared_task +@asynctask +async def task_pipeline_file_process(*, transcript_id: str): + """Celery task for file pipeline processing""" + + transcript = await transcripts_controller.get_by_id(transcript_id) + if not transcript: + raise Exception(f"Transcript {transcript_id} not found") + + # Find the file to process + audio_file = next(transcript.data_path.glob("upload.*"), None) + if not audio_file: + audio_file = next(transcript.data_path.glob("audio.*"), None) + + if not audio_file: + raise Exception("No audio file found to process") + + # Run file pipeline + pipeline = PipelineMainFile(transcript_id=transcript_id) + await pipeline.process(audio_file) diff --git a/server/reflector/pipelines/main_live_pipeline.py b/server/reflector/pipelines/main_live_pipeline.py index 4c5ab097..03fdbd65 100644 --- a/server/reflector/pipelines/main_live_pipeline.py +++ b/server/reflector/pipelines/main_live_pipeline.py @@ -147,15 +147,18 @@ class StrValue(BaseModel): class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]): - transcript_id: str - ws_room_id: str | None = None - ws_manager: WebsocketManager | None = None - - def prepare(self): - # prepare websocket + def __init__(self, transcript_id: str): + super().__init__() self._lock = asyncio.Lock() + self.transcript_id = transcript_id self.ws_room_id = f"ts:{self.transcript_id}" - self.ws_manager = get_ws_manager() + self._ws_manager = None + + @property + def ws_manager(self) -> WebsocketManager: + if self._ws_manager is None: + self._ws_manager = get_ws_manager() + return self._ws_manager async def get_transcript(self) -> Transcript: # fetch the transcript @@ -355,7 +358,6 @@ class PipelineMainLive(PipelineMainBase): async def create(self) -> Pipeline: # create a context for the whole rtc transaction # add a customised logger to the context - self.prepare() transcript = await self.get_transcript() processors = [ @@ -376,6 +378,7 @@ class PipelineMainLive(PipelineMainBase): pipeline.set_pref("audio:target_language", transcript.target_language) pipeline.logger.bind(transcript_id=transcript.id) pipeline.logger.info("Pipeline main live created") + pipeline.describe() return pipeline @@ -394,7 +397,6 @@ class PipelineMainDiarization(PipelineMainBase[AudioDiarizationInput]): async def create(self) -> Pipeline: # create a context for the whole rtc transaction # add a customised logger to the context - self.prepare() pipeline = Pipeline( AudioDiarizationAutoProcessor(callback=self.on_topic), ) @@ -435,8 +437,6 @@ class PipelineMainFromTopics(PipelineMainBase[TitleSummaryWithIdProcessorType]): raise NotImplementedError async def create(self) -> Pipeline: - self.prepare() - # get transcript self._transcript = transcript = await self.get_transcript() diff --git a/server/reflector/pipelines/runner.py b/server/reflector/pipelines/runner.py index 5cb40002..0c2b6c67 100644 --- a/server/reflector/pipelines/runner.py +++ b/server/reflector/pipelines/runner.py @@ -18,22 +18,14 @@ During its lifecycle, it will emit the following status: import asyncio from typing import Generic, TypeVar -from pydantic import BaseModel, ConfigDict - from reflector.logger import logger from reflector.processors import Pipeline PipelineMessage = TypeVar("PipelineMessage") -class PipelineRunner(BaseModel, Generic[PipelineMessage]): - model_config = ConfigDict(arbitrary_types_allowed=True) - - status: str = "idle" - pipeline: Pipeline | None = None - - def __init__(self, **kwargs): - super().__init__(**kwargs) +class PipelineRunner(Generic[PipelineMessage]): + def __init__(self): self._task = None self._q_cmd = asyncio.Queue(maxsize=4096) self._ev_done = asyncio.Event() @@ -42,6 +34,8 @@ class PipelineRunner(BaseModel, Generic[PipelineMessage]): runner=id(self), runner_cls=self.__class__.__name__, ) + self.status = "idle" + self.pipeline: Pipeline | None = None async def create(self) -> Pipeline: """ diff --git a/server/reflector/processors/__init__.py b/server/reflector/processors/__init__.py index 0aa89f87..e95d949e 100644 --- a/server/reflector/processors/__init__.py +++ b/server/reflector/processors/__init__.py @@ -11,6 +11,13 @@ from .base import ( # noqa: F401 Processor, ThreadedProcessor, ) +from .file_diarization import FileDiarizationProcessor # noqa: F401 +from .file_diarization_auto import FileDiarizationAutoProcessor # noqa: F401 +from .file_transcript import FileTranscriptProcessor # noqa: F401 +from .file_transcript_auto import FileTranscriptAutoProcessor # noqa: F401 +from .transcript_diarization_assembler import ( + TranscriptDiarizationAssemblerProcessor, # noqa: F401 +) from .transcript_final_summary import TranscriptFinalSummaryProcessor # noqa: F401 from .transcript_final_title import TranscriptFinalTitleProcessor # noqa: F401 from .transcript_liner import TranscriptLinerProcessor # noqa: F401 diff --git a/server/reflector/processors/audio_chunker.py b/server/reflector/processors/audio_chunker.py index ffe38a37..af12de89 100644 --- a/server/reflector/processors/audio_chunker.py +++ b/server/reflector/processors/audio_chunker.py @@ -1,28 +1,340 @@ +from typing import Optional + import av +import numpy as np +import torch +from silero_vad import VADIterator, load_silero_vad from reflector.processors.base import Processor class AudioChunkerProcessor(Processor): """ - Assemble audio frames into chunks + Assemble audio frames into chunks with VAD-based speech detection """ INPUT_TYPE = av.AudioFrame OUTPUT_TYPE = list[av.AudioFrame] - def __init__(self, max_frames=256): + def __init__( + self, + block_frames=256, + max_frames=1024, + vad_threshold=0.5, + use_onnx=False, + min_frames=2, + ): super().__init__() self.frames: list[av.AudioFrame] = [] + self.block_frames = block_frames self.max_frames = max_frames + self.vad_threshold = vad_threshold + self.min_frames = min_frames + + # Initialize Silero VAD + self._init_vad(use_onnx) + + def _init_vad(self, use_onnx=False): + """Initialize Silero VAD model""" + try: + torch.set_num_threads(1) + self.vad_model = load_silero_vad(onnx=use_onnx) + self.vad_iterator = VADIterator(self.vad_model, sampling_rate=16000) + self.logger.info("Silero VAD initialized successfully") + + except Exception as e: + self.logger.error(f"Failed to initialize Silero VAD: {e}") + self.vad_model = None + self.vad_iterator = None async def _push(self, data: av.AudioFrame): self.frames.append(data) - if len(self.frames) >= self.max_frames: - await self.flush() + # print("timestamp", data.pts * data.time_base * 1000) + + # Check for speech segments every 32 frames (~1 second) + if len(self.frames) >= 32 and len(self.frames) % 32 == 0: + await self._process_block() + + # Safety fallback - emit if we hit max frames + elif len(self.frames) >= self.max_frames: + self.logger.warning( + f"AudioChunkerProcessor: Reached max frames ({self.max_frames}), " + f"emitting first {self.max_frames // 2} frames" + ) + frames_to_emit = self.frames[: self.max_frames // 2] + self.frames = self.frames[self.max_frames // 2 :] + if len(frames_to_emit) >= self.min_frames: + await self.emit(frames_to_emit) + else: + self.logger.debug( + f"Ignoring fallback segment with {len(frames_to_emit)} frames " + f"(< {self.min_frames} minimum)" + ) + + async def _process_block(self): + # Need at least 32 frames for VAD detection (~1 second) + if len(self.frames) < 32 or self.vad_iterator is None: + return + + # Processing block with current buffer size + # print(f"Processing block: {len(self.frames)} frames in buffer") + + try: + # Convert frames to numpy array for VAD + audio_array = self._frames_to_numpy(self.frames) + + if audio_array is None: + # Fallback: emit all frames if conversion failed + frames_to_emit = self.frames[:] + self.frames = [] + if len(frames_to_emit) >= self.min_frames: + await self.emit(frames_to_emit) + else: + self.logger.debug( + f"Ignoring conversion-failed segment with {len(frames_to_emit)} frames " + f"(< {self.min_frames} minimum)" + ) + return + + # Find complete speech segments in the buffer + speech_end_frame = self._find_speech_segment_end(audio_array) + + if speech_end_frame is None or speech_end_frame <= 0: + # No speech found but buffer is getting large + if len(self.frames) > 512: + # Check if it's all silence and can be discarded + # No speech segment found, buffer at {len(self.frames)} frames + + # Could emit silence or discard old frames here + # For now, keep first 256 frames and discard older silence + if len(self.frames) > 768: + self.logger.debug( + f"Discarding {len(self.frames) - 256} old frames (likely silence)" + ) + self.frames = self.frames[-256:] + return + + # Calculate segment timing information + frames_to_emit = self.frames[:speech_end_frame] + + # Get timing from av.AudioFrame + if frames_to_emit: + first_frame = frames_to_emit[0] + last_frame = frames_to_emit[-1] + sample_rate = first_frame.sample_rate + + # Calculate duration + total_samples = sum(f.samples for f in frames_to_emit) + duration_seconds = total_samples / sample_rate if sample_rate > 0 else 0 + + # Get timestamps if available + start_time = ( + first_frame.pts * first_frame.time_base if first_frame.pts else 0 + ) + end_time = ( + last_frame.pts * last_frame.time_base if last_frame.pts else 0 + ) + + # Convert to HH:MM:SS format for logging + def format_time(seconds): + if not seconds: + return "00:00:00" + total_seconds = int(float(seconds)) + hours = total_seconds // 3600 + minutes = (total_seconds % 3600) // 60 + secs = total_seconds % 60 + return f"{hours:02d}:{minutes:02d}:{secs:02d}" + + start_formatted = format_time(start_time) + end_formatted = format_time(end_time) + + # Keep remaining frames for next processing + remaining_after = len(self.frames) - speech_end_frame + + # Single structured log line + self.logger.info( + "Speech segment found", + start=start_formatted, + end=end_formatted, + frames=speech_end_frame, + duration=round(duration_seconds, 2), + buffer_before=len(self.frames), + remaining=remaining_after, + ) + + # Keep remaining frames for next processing + self.frames = self.frames[speech_end_frame:] + + # Filter out segments with too few frames + if len(frames_to_emit) >= self.min_frames: + await self.emit(frames_to_emit) + else: + self.logger.debug( + f"Ignoring segment with {len(frames_to_emit)} frames " + f"(< {self.min_frames} minimum)" + ) + + except Exception as e: + self.logger.error(f"Error in VAD processing: {e}") + # Fallback to simple chunking + if len(self.frames) >= self.block_frames: + frames_to_emit = self.frames[: self.block_frames] + self.frames = self.frames[self.block_frames :] + if len(frames_to_emit) >= self.min_frames: + await self.emit(frames_to_emit) + else: + self.logger.debug( + f"Ignoring exception-fallback segment with {len(frames_to_emit)} frames " + f"(< {self.min_frames} minimum)" + ) + + def _frames_to_numpy(self, frames: list[av.AudioFrame]) -> Optional[np.ndarray]: + """Convert av.AudioFrame list to numpy array for VAD processing""" + if not frames: + return None + + try: + first_frame = frames[0] + original_sample_rate = first_frame.sample_rate + + audio_data = [] + for frame in frames: + frame_array = frame.to_ndarray() + + # Handle stereo -> mono conversion + if len(frame_array.shape) == 2 and frame_array.shape[0] > 1: + frame_array = np.mean(frame_array, axis=0) + elif len(frame_array.shape) == 2: + frame_array = frame_array.flatten() + + audio_data.append(frame_array) + + if not audio_data: + return None + + combined_audio = np.concatenate(audio_data) + + # Resample from 48kHz to 16kHz if needed + if original_sample_rate != 16000: + combined_audio = self._resample_audio( + combined_audio, original_sample_rate, 16000 + ) + + # Ensure float32 format + if combined_audio.dtype == np.int16: + # Normalize int16 audio to float32 in range [-1.0, 1.0] + combined_audio = combined_audio.astype(np.float32) / 32768.0 + elif combined_audio.dtype != np.float32: + combined_audio = combined_audio.astype(np.float32) + + return combined_audio + + except Exception as e: + self.logger.error(f"Error converting frames to numpy: {e}") + + return None + + def _resample_audio( + self, audio: np.ndarray, from_sr: int, to_sr: int + ) -> np.ndarray: + """Simple linear resampling from from_sr to to_sr""" + if from_sr == to_sr: + return audio + + try: + # Simple linear interpolation resampling + ratio = to_sr / from_sr + new_length = int(len(audio) * ratio) + + # Create indices for interpolation + old_indices = np.linspace(0, len(audio) - 1, new_length) + resampled = np.interp(old_indices, np.arange(len(audio)), audio) + + return resampled.astype(np.float32) + + except Exception as e: + self.logger.error("Resampling error", exc_info=e) + # Fallback: simple decimation/repetition + if from_sr > to_sr: + # Downsample by taking every nth sample + step = from_sr // to_sr + return audio[::step] + else: + # Upsample by repeating samples + repeat = to_sr // from_sr + return np.repeat(audio, repeat) + + def _find_speech_segment_end(self, audio_array: np.ndarray) -> Optional[int]: + """Find complete speech segments and return frame index at segment end""" + if self.vad_iterator is None or len(audio_array) == 0: + return None + + try: + # Process audio in 512-sample windows for VAD + window_size = 512 + min_silence_windows = 3 # Require 3 windows of silence after speech + + # Track speech state + in_speech = False + speech_start = None + speech_end = None + silence_count = 0 + + for i in range(0, len(audio_array), window_size): + chunk = audio_array[i : i + window_size] + if len(chunk) < window_size: + chunk = np.pad(chunk, (0, window_size - len(chunk))) + + # Detect if this window has speech + speech_dict = self.vad_iterator(chunk, return_seconds=True) + + # VADIterator returns dict with 'start' and 'end' when speech segments are detected + if speech_dict: + if not in_speech: + # Speech started + speech_start = i + in_speech = True + # Debug: print(f"Speech START at sample {i}, VAD: {speech_dict}") + silence_count = 0 # Reset silence counter + continue + + if not in_speech: + continue + + # We're in speech but found silence + silence_count += 1 + if silence_count < min_silence_windows: + continue + + # Found end of speech segment + speech_end = i - (min_silence_windows - 1) * window_size + # Debug: print(f"Speech END at sample {speech_end}") + + # Convert sample position to frame index + samples_per_frame = self.frames[0].samples if self.frames else 1024 + # Account for resampling: we process at 16kHz but frames might be 48kHz + resample_ratio = 48000 / 16000 # 3x + actual_sample_pos = int(speech_end * resample_ratio) + frame_index = actual_sample_pos // samples_per_frame + + # Ensure we don't exceed buffer + frame_index = min(frame_index, len(self.frames)) + return frame_index + + return None + + except Exception as e: + self.logger.error(f"Error finding speech segment: {e}") + return None async def _flush(self): frames = self.frames[:] self.frames = [] if frames: - await self.emit(frames) + if len(frames) >= self.min_frames: + await self.emit(frames) + else: + self.logger.debug( + f"Ignoring flush segment with {len(frames)} frames " + f"(< {self.min_frames} minimum)" + ) diff --git a/server/reflector/processors/audio_diarization.py b/server/reflector/processors/audio_diarization.py index b139e038..9cae7a7e 100644 --- a/server/reflector/processors/audio_diarization.py +++ b/server/reflector/processors/audio_diarization.py @@ -1,6 +1,7 @@ from reflector.processors.base import Processor from reflector.processors.types import ( AudioDiarizationInput, + DiarizationSegment, TitleSummary, Word, ) @@ -38,7 +39,7 @@ class AudioDiarizationProcessor(Processor): raise NotImplementedError @classmethod - def assign_speaker(cls, words: list[Word], diarization: list[dict]): + def assign_speaker(cls, words: list[Word], diarization: list[DiarizationSegment]): cls._diarization_remove_overlap(diarization) cls._diarization_remove_segment_without_words(words, diarization) cls._diarization_merge_same_speaker(diarization) @@ -65,7 +66,7 @@ class AudioDiarizationProcessor(Processor): return True @staticmethod - def _diarization_remove_overlap(diarization: list[dict]): + def _diarization_remove_overlap(diarization: list[DiarizationSegment]): """ Remove overlap in diarization results @@ -92,7 +93,7 @@ class AudioDiarizationProcessor(Processor): @staticmethod def _diarization_remove_segment_without_words( - words: list[Word], diarization: list[dict] + words: list[Word], diarization: list[DiarizationSegment] ): """ Remove diarization segments without words @@ -122,7 +123,7 @@ class AudioDiarizationProcessor(Processor): diarization_idx += 1 @staticmethod - def _diarization_merge_same_speaker(diarization: list[dict]): + def _diarization_merge_same_speaker(diarization: list[DiarizationSegment]): """ Merge diarization contigous segments with the same speaker @@ -140,7 +141,9 @@ class AudioDiarizationProcessor(Processor): diarization_idx += 1 @classmethod - def _diarization_assign_speaker(cls, words: list[Word], diarization: list[dict]): + def _diarization_assign_speaker( + cls, words: list[Word], diarization: list[DiarizationSegment] + ): """ Assign speaker to words based on diarization @@ -148,7 +151,7 @@ class AudioDiarizationProcessor(Processor): """ word_idx = 0 - last_speaker = None + last_speaker = 0 for d in diarization: start = d["start"] end = d["end"] diff --git a/server/reflector/processors/audio_diarization_pyannote.py b/server/reflector/processors/audio_diarization_pyannote.py new file mode 100644 index 00000000..5778a732 --- /dev/null +++ b/server/reflector/processors/audio_diarization_pyannote.py @@ -0,0 +1,74 @@ +import os + +import torch +import torchaudio +from pyannote.audio import Pipeline + +from reflector.processors.audio_diarization import AudioDiarizationProcessor +from reflector.processors.audio_diarization_auto import AudioDiarizationAutoProcessor +from reflector.processors.types import AudioDiarizationInput, DiarizationSegment + + +class AudioDiarizationPyannoteProcessor(AudioDiarizationProcessor): + """Local diarization processor using pyannote.audio library""" + + def __init__( + self, + model_name: str = "pyannote/speaker-diarization-3.1", + pyannote_auth_token: str | None = None, + device: str | None = None, + **kwargs, + ): + super().__init__(**kwargs) + self.model_name = model_name + self.auth_token = pyannote_auth_token or os.environ.get("HF_TOKEN") + self.device = device + + if device is None: + self.device = "cuda" if torch.cuda.is_available() else "cpu" + + self.logger.info(f"Loading pyannote diarization model: {self.model_name}") + self.diarization_pipeline = Pipeline.from_pretrained( + self.model_name, use_auth_token=self.auth_token + ) + self.diarization_pipeline.to(torch.device(self.device)) + self.logger.info(f"Diarization model loaded on device: {self.device}") + + async def _diarize(self, data: AudioDiarizationInput) -> list[DiarizationSegment]: + try: + # Load audio file (audio_url is assumed to be a local file path) + self.logger.info(f"Loading local audio file: {data.audio_url}") + waveform, sample_rate = torchaudio.load(data.audio_url) + audio_input = {"waveform": waveform, "sample_rate": sample_rate} + self.logger.info("Running speaker diarization") + diarization = self.diarization_pipeline(audio_input) + + # Convert pyannote diarization output to our format + segments = [] + for segment, _, speaker in diarization.itertracks(yield_label=True): + # Extract speaker number from label (e.g., "SPEAKER_00" -> 0) + speaker_id = 0 + if speaker.startswith("SPEAKER_"): + try: + speaker_id = int(speaker.split("_")[-1]) + except (ValueError, IndexError): + # Fallback to hash-based ID if parsing fails + speaker_id = hash(speaker) % 1000 + + segments.append( + { + "start": round(segment.start, 3), + "end": round(segment.end, 3), + "speaker": speaker_id, + } + ) + + self.logger.info(f"Diarization completed with {len(segments)} segments") + return segments + + except Exception as e: + self.logger.exception(f"Diarization failed: {e}") + raise + + +AudioDiarizationAutoProcessor.register("pyannote", AudioDiarizationPyannoteProcessor) diff --git a/server/reflector/processors/audio_merge.py b/server/reflector/processors/audio_merge.py index 710de562..84d6e856 100644 --- a/server/reflector/processors/audio_merge.py +++ b/server/reflector/processors/audio_merge.py @@ -3,11 +3,24 @@ from time import monotonic_ns from uuid import uuid4 import av +from av.audio.resampler import AudioResampler from reflector.processors.base import Processor from reflector.processors.types import AudioFile +def copy_frame(frame: av.AudioFrame) -> av.AudioFrame: + frame_copy = frame.from_ndarray( + frame.to_ndarray(), + format=frame.format.name, + layout=frame.layout.name, + ) + frame_copy.sample_rate = frame.sample_rate + frame_copy.pts = frame.pts + frame_copy.time_base = frame.time_base + return frame_copy + + class AudioMergeProcessor(Processor): """ Merge audio frame into a single file @@ -16,37 +29,92 @@ class AudioMergeProcessor(Processor): INPUT_TYPE = list[av.AudioFrame] OUTPUT_TYPE = AudioFile + def __init__(self, downsample_to_16k_mono: bool = True, **kwargs): + super().__init__(**kwargs) + self.downsample_to_16k_mono = downsample_to_16k_mono + async def _push(self, data: list[av.AudioFrame]): if not data: return # get audio information from first frame frame = data[0] - channels = len(frame.layout.channels) - sample_rate = frame.sample_rate - sample_width = frame.format.bytes + original_channels = len(frame.layout.channels) + original_sample_rate = frame.sample_rate + original_sample_width = frame.format.bytes + + # determine if we need processing + needs_processing = self.downsample_to_16k_mono and ( + original_sample_rate != 16000 or original_channels != 1 + ) + + # determine output parameters + if self.downsample_to_16k_mono: + output_sample_rate = 16000 + output_channels = 1 + output_sample_width = 2 # 16-bit = 2 bytes + else: + output_sample_rate = original_sample_rate + output_channels = original_channels + output_sample_width = original_sample_width # create audio file uu = uuid4().hex fd = io.BytesIO() - out_container = av.open(fd, "w", format="wav") - out_stream = out_container.add_stream("pcm_s16le", rate=sample_rate) - for frame in data: - for packet in out_stream.encode(frame): + if needs_processing: + # Process with PyAV resampler + out_container = av.open(fd, "w", format="wav") + out_stream = out_container.add_stream("pcm_s16le", rate=16000) + out_stream.layout = "mono" + + # Create resampler if needed + resampler = None + if original_sample_rate != 16000 or original_channels != 1: + resampler = AudioResampler(format="s16", layout="mono", rate=16000) + + for frame in data: + if resampler: + # Resample and convert to mono + # XXX for an unknown reason, if we don't use a copy of the frame, we get + # Invalid Argumment from resample. Debugging indicate that when a previous processor + # already used the frame (like AudioFileWriter), it make it invalid argument here. + resampled_frames = resampler.resample(copy_frame(frame)) + for resampled_frame in resampled_frames: + for packet in out_stream.encode(resampled_frame): + out_container.mux(packet) + else: + # Direct encoding without resampling + for packet in out_stream.encode(frame): + out_container.mux(packet) + + # Flush the encoder + for packet in out_stream.encode(None): out_container.mux(packet) - for packet in out_stream.encode(None): - out_container.mux(packet) - out_container.close() + out_container.close() + else: + # Use PyAV for original frames (no processing needed) + out_container = av.open(fd, "w", format="wav") + out_stream = out_container.add_stream("pcm_s16le", rate=output_sample_rate) + out_stream.layout = "mono" if output_channels == 1 else frame.layout + + for frame in data: + for packet in out_stream.encode(frame): + out_container.mux(packet) + + for packet in out_stream.encode(None): + out_container.mux(packet) + out_container.close() + fd.seek(0) # emit audio file audiofile = AudioFile( name=f"{monotonic_ns()}-{uu}.wav", fd=fd, - sample_rate=sample_rate, - channels=channels, - sample_width=sample_width, + sample_rate=output_sample_rate, + channels=output_channels, + sample_width=output_sample_width, timestamp=data[0].pts * data[0].time_base, ) diff --git a/server/reflector/processors/audio_transcript_modal.py b/server/reflector/processors/audio_transcript_modal.py index 3e53261c..efe0319f 100644 --- a/server/reflector/processors/audio_transcript_modal.py +++ b/server/reflector/processors/audio_transcript_modal.py @@ -12,6 +12,9 @@ API will be a POST request to TRANSCRIPT_URL: """ +from typing import List + +import aiohttp from openai import AsyncOpenAI from reflector.processors.audio_transcript import AudioTranscriptProcessor @@ -21,7 +24,9 @@ from reflector.settings import settings class AudioTranscriptModalProcessor(AudioTranscriptProcessor): - def __init__(self, modal_api_key: str | None = None, **kwargs): + def __init__( + self, modal_api_key: str | None = None, batch_enabled: bool = True, **kwargs + ): super().__init__() if not settings.TRANSCRIPT_URL: raise Exception( @@ -30,6 +35,126 @@ class AudioTranscriptModalProcessor(AudioTranscriptProcessor): self.transcript_url = settings.TRANSCRIPT_URL + "/v1" self.timeout = settings.TRANSCRIPT_TIMEOUT self.modal_api_key = modal_api_key + self.max_batch_duration = 10.0 + self.max_batch_files = 15 + self.batch_enabled = batch_enabled + self.pending_files: List[AudioFile] = [] # Files waiting to be processed + + @classmethod + def _calculate_duration(cls, audio_file: AudioFile) -> float: + """Calculate audio duration in seconds from AudioFile metadata""" + # Duration = total_samples / sample_rate + # We need to estimate total samples from the file data + import wave + + try: + # Try to read as WAV file to get duration + audio_file.fd.seek(0) + with wave.open(audio_file.fd, "rb") as wav_file: + frames = wav_file.getnframes() + sample_rate = wav_file.getframerate() + duration = frames / sample_rate + return duration + except Exception: + # Fallback: estimate from file size and audio parameters + audio_file.fd.seek(0, 2) # Seek to end + file_size = audio_file.fd.tell() + audio_file.fd.seek(0) # Reset to beginning + + # Estimate: file_size / (sample_rate * channels * sample_width) + bytes_per_second = ( + audio_file.sample_rate + * audio_file.channels + * (audio_file.sample_width // 8) + ) + estimated_duration = ( + file_size / bytes_per_second if bytes_per_second > 0 else 0 + ) + return max(0, estimated_duration) + + def _create_batches(self, audio_files: List[AudioFile]) -> List[List[AudioFile]]: + """Group audio files into batches with maximum 30s total duration""" + batches = [] + current_batch = [] + current_duration = 0.0 + + for audio_file in audio_files: + duration = self._calculate_duration(audio_file) + + # If adding this file exceeds max duration, start a new batch + if current_duration + duration > self.max_batch_duration and current_batch: + batches.append(current_batch) + current_batch = [audio_file] + current_duration = duration + else: + current_batch.append(audio_file) + current_duration += duration + + # Add the last batch if not empty + if current_batch: + batches.append(current_batch) + + return batches + + async def _transcript_batch(self, audio_files: List[AudioFile]) -> List[Transcript]: + """Transcribe a batch of audio files using the parakeet backend""" + if not audio_files: + return [] + + self.logger.debug(f"Batch transcribing {len(audio_files)} files") + + # Prepare form data for batch request + data = aiohttp.FormData() + data.add_field("language", self.get_pref("audio:source_language", "en")) + data.add_field("batch", "true") + + for i, audio_file in enumerate(audio_files): + audio_file.fd.seek(0) + data.add_field( + "files", + audio_file.fd, + filename=f"{audio_file.name}", + content_type="audio/wav", + ) + + # Make batch request + headers = {"Authorization": f"Bearer {self.modal_api_key}"} + + async with aiohttp.ClientSession( + timeout=aiohttp.ClientTimeout(total=self.timeout) + ) as session: + async with session.post( + f"{self.transcript_url}/audio/transcriptions", + data=data, + headers=headers, + ) as response: + if response.status != 200: + error_text = await response.text() + raise Exception( + f"Batch transcription failed: {response.status} {error_text}" + ) + + result = await response.json() + + # Process batch results + transcripts = [] + results = result.get("results", []) + + for i, (audio_file, file_result) in enumerate(zip(audio_files, results)): + transcript = Transcript( + words=[ + Word( + text=word_info["word"], + start=word_info["start"], + end=word_info["end"], + ) + for word_info in file_result.get("words", []) + ] + ) + transcript.add_offset(audio_file.timestamp) + transcripts.append(transcript) + + return transcripts async def _transcript(self, data: AudioFile): async with AsyncOpenAI( @@ -62,5 +187,96 @@ class AudioTranscriptModalProcessor(AudioTranscriptProcessor): return transcript + async def transcript_multiple( + self, audio_files: List[AudioFile] + ) -> List[Transcript]: + """Transcribe multiple audio files using batching""" + if len(audio_files) == 1: + # Single file, use existing method + return [await self._transcript(audio_files[0])] + + # Create batches with max 30s duration each + batches = self._create_batches(audio_files) + + self.logger.debug( + f"Processing {len(audio_files)} files in {len(batches)} batches" + ) + + # Process all batches concurrently + all_transcripts = [] + + for batch in batches: + batch_transcripts = await self._transcript_batch(batch) + all_transcripts.extend(batch_transcripts) + + return all_transcripts + + async def _push(self, data: AudioFile): + """Override _push to support batching""" + if not self.batch_enabled: + # Use parent implementation for single file processing + return await super()._push(data) + + # Add file to pending batch + self.pending_files.append(data) + self.logger.debug( + f"Added file to batch: {data.name}, batch size: {len(self.pending_files)}" + ) + + # Calculate total duration of pending files + total_duration = sum(self._calculate_duration(f) for f in self.pending_files) + + # Process batch if it reaches max duration or has multiple files ready for optimization + should_process_batch = ( + total_duration >= self.max_batch_duration + or len(self.pending_files) >= self.max_batch_files + ) + + if should_process_batch: + await self._process_pending_batch() + + async def _process_pending_batch(self): + """Process all pending files as batches""" + if not self.pending_files: + return + + self.logger.debug(f"Processing batch of {len(self.pending_files)} files") + + try: + # Create batches respecting duration limit + batches = self._create_batches(self.pending_files) + + # Process each batch + for batch in batches: + self.m_transcript_call.inc() + try: + with self.m_transcript.time(): + # Use batch transcription + transcripts = await self._transcript_batch(batch) + + self.m_transcript_success.inc() + + # Emit each transcript + for transcript in transcripts: + if transcript: + await self.emit(transcript) + + except Exception: + self.m_transcript_failure.inc() + raise + finally: + # Release audio files + for audio_file in batch: + audio_file.release() + + finally: + # Clear pending files + self.pending_files.clear() + + async def _flush(self): + """Process any remaining files when flushing""" + await self._process_pending_batch() + await super()._flush() + AudioTranscriptAutoProcessor.register("modal", AudioTranscriptModalProcessor) diff --git a/server/reflector/processors/base.py b/server/reflector/processors/base.py index 9ba971be..79955bab 100644 --- a/server/reflector/processors/base.py +++ b/server/reflector/processors/base.py @@ -241,33 +241,45 @@ class ThreadedProcessor(Processor): self.INPUT_TYPE = processor.INPUT_TYPE self.OUTPUT_TYPE = processor.OUTPUT_TYPE self.executor = ThreadPoolExecutor(max_workers=max_workers) - self.queue = asyncio.Queue() - self.task = asyncio.get_running_loop().create_task(self.loop()) + self.queue = asyncio.Queue(maxsize=50) + self.task: asyncio.Task | None = None def set_pipeline(self, pipeline: "Pipeline"): super().set_pipeline(pipeline) self.processor.set_pipeline(pipeline) async def loop(self): - while True: - data = await self.queue.get() - self.m_processor_queue.set(self.queue.qsize()) - with self.m_processor_queue_in_progress.track_inprogress(): - try: - if data is None: - await self.processor.flush() - break + try: + while True: + data = await self.queue.get() + self.m_processor_queue.set(self.queue.qsize()) + with self.m_processor_queue_in_progress.track_inprogress(): try: - await self.processor.push(data) - except Exception: - self.logger.error( - f"Error in push {self.processor.__class__.__name__}" - ", continue" - ) - finally: - self.queue.task_done() + if data is None: + await self.processor.flush() + break + try: + await self.processor.push(data) + except Exception: + self.logger.error( + f"Error in push {self.processor.__class__.__name__}" + ", continue" + ) + finally: + self.queue.task_done() + except Exception as e: + logger.error(f"Crash in {self.__class__.__name__}: {e}", exc_info=e) + + async def _ensure_task(self): + if self.task is None: + self.task = asyncio.get_running_loop().create_task(self.loop()) + + # XXX not doing a sleep here make the whole pipeline prior the thread + # to be running without having a chance to work on the task here. + await asyncio.sleep(0) async def _push(self, data): + await self._ensure_task() await self.queue.put(data) async def _flush(self): diff --git a/server/reflector/processors/file_diarization.py b/server/reflector/processors/file_diarization.py new file mode 100644 index 00000000..9fcacda5 --- /dev/null +++ b/server/reflector/processors/file_diarization.py @@ -0,0 +1,33 @@ +from pydantic import BaseModel + +from reflector.processors.base import Processor +from reflector.processors.types import DiarizationSegment + + +class FileDiarizationInput(BaseModel): + """Input for file diarization containing audio URL""" + + audio_url: str + + +class FileDiarizationOutput(BaseModel): + """Output for file diarization containing speaker segments""" + + diarization: list[DiarizationSegment] + + +class FileDiarizationProcessor(Processor): + """ + Diarize complete audio files from URL + """ + + INPUT_TYPE = FileDiarizationInput + OUTPUT_TYPE = FileDiarizationOutput + + async def _push(self, data: FileDiarizationInput): + result = await self._diarize(data) + if result: + await self.emit(result) + + async def _diarize(self, data: FileDiarizationInput): + raise NotImplementedError diff --git a/server/reflector/processors/file_diarization_auto.py b/server/reflector/processors/file_diarization_auto.py new file mode 100644 index 00000000..462f2b35 --- /dev/null +++ b/server/reflector/processors/file_diarization_auto.py @@ -0,0 +1,33 @@ +import importlib + +from reflector.processors.file_diarization import FileDiarizationProcessor +from reflector.settings import settings + + +class FileDiarizationAutoProcessor(FileDiarizationProcessor): + _registry = {} + + @classmethod + def register(cls, name, kclass): + cls._registry[name] = kclass + + def __new__(cls, name: str | None = None, **kwargs): + if name is None: + name = settings.DIARIZATION_BACKEND + + if name not in cls._registry: + module_name = f"reflector.processors.file_diarization_{name}" + importlib.import_module(module_name) + + # gather specific configuration for the processor + # search `DIARIZATION_BACKEND_XXX_YYY`, push to constructor as `backend_xxx_yyy` + config = {} + name_upper = name.upper() + settings_prefix = "DIARIZATION_" + config_prefix = f"{settings_prefix}{name_upper}_" + for key, value in settings: + if key.startswith(config_prefix): + config_name = key[len(settings_prefix) :].lower() + config[config_name] = value + + return cls._registry[name](**config | kwargs) diff --git a/server/reflector/processors/file_diarization_modal.py b/server/reflector/processors/file_diarization_modal.py new file mode 100644 index 00000000..518f444e --- /dev/null +++ b/server/reflector/processors/file_diarization_modal.py @@ -0,0 +1,57 @@ +""" +File diarization implementation using the GPU service from modal.com + +API will be a POST request to DIARIZATION_URL: + +``` +POST /diarize?audio_file_url=...×tamp=0 +Authorization: Bearer +``` +""" + +import httpx + +from reflector.processors.file_diarization import ( + FileDiarizationInput, + FileDiarizationOutput, + FileDiarizationProcessor, +) +from reflector.processors.file_diarization_auto import FileDiarizationAutoProcessor +from reflector.settings import settings + + +class FileDiarizationModalProcessor(FileDiarizationProcessor): + def __init__(self, modal_api_key: str | None = None, **kwargs): + super().__init__(**kwargs) + if not settings.DIARIZATION_URL: + raise Exception( + "DIARIZATION_URL required to use FileDiarizationModalProcessor" + ) + self.diarization_url = settings.DIARIZATION_URL + "/diarize" + self.file_timeout = settings.DIARIZATION_FILE_TIMEOUT + self.modal_api_key = modal_api_key + + async def _diarize(self, data: FileDiarizationInput): + """Get speaker diarization for file""" + self.logger.info(f"Starting diarization from {data.audio_url}") + + headers = {} + if self.modal_api_key: + headers["Authorization"] = f"Bearer {self.modal_api_key}" + + async with httpx.AsyncClient(timeout=self.file_timeout) as client: + response = await client.post( + self.diarization_url, + headers=headers, + params={ + "audio_file_url": data.audio_url, + "timestamp": 0, + }, + ) + response.raise_for_status() + diarization_data = response.json()["diarization"] + + return FileDiarizationOutput(diarization=diarization_data) + + +FileDiarizationAutoProcessor.register("modal", FileDiarizationModalProcessor) diff --git a/server/reflector/processors/file_transcript.py b/server/reflector/processors/file_transcript.py new file mode 100644 index 00000000..c1ccaaf9 --- /dev/null +++ b/server/reflector/processors/file_transcript.py @@ -0,0 +1,65 @@ +from prometheus_client import Counter, Histogram + +from reflector.processors.base import Processor +from reflector.processors.types import Transcript + + +class FileTranscriptInput: + """Input for file transcription containing audio URL and language settings""" + + def __init__(self, audio_url: str, language: str = "en"): + self.audio_url = audio_url + self.language = language + + +class FileTranscriptProcessor(Processor): + """ + Transcript complete audio files from URL + """ + + INPUT_TYPE = FileTranscriptInput + OUTPUT_TYPE = Transcript + + m_transcript = Histogram( + "file_transcript", + "Time spent in FileTranscript.transcript", + ["backend"], + ) + m_transcript_call = Counter( + "file_transcript_call", + "Number of calls to FileTranscript.transcript", + ["backend"], + ) + m_transcript_success = Counter( + "file_transcript_success", + "Number of successful calls to FileTranscript.transcript", + ["backend"], + ) + m_transcript_failure = Counter( + "file_transcript_failure", + "Number of failed calls to FileTranscript.transcript", + ["backend"], + ) + + def __init__(self, *args, **kwargs): + name = self.__class__.__name__ + self.m_transcript = self.m_transcript.labels(name) + self.m_transcript_call = self.m_transcript_call.labels(name) + self.m_transcript_success = self.m_transcript_success.labels(name) + self.m_transcript_failure = self.m_transcript_failure.labels(name) + super().__init__(*args, **kwargs) + + async def _push(self, data: FileTranscriptInput): + try: + self.m_transcript_call.inc() + with self.m_transcript.time(): + result = await self._transcript(data) + self.m_transcript_success.inc() + if result: + await self.emit(result) + except Exception: + self.m_transcript_failure.inc() + raise + + async def _transcript(self, data: FileTranscriptInput): + raise NotImplementedError diff --git a/server/reflector/processors/file_transcript_auto.py b/server/reflector/processors/file_transcript_auto.py new file mode 100644 index 00000000..4b9f1bd9 --- /dev/null +++ b/server/reflector/processors/file_transcript_auto.py @@ -0,0 +1,32 @@ +import importlib + +from reflector.processors.file_transcript import FileTranscriptProcessor +from reflector.settings import settings + + +class FileTranscriptAutoProcessor(FileTranscriptProcessor): + _registry = {} + + @classmethod + def register(cls, name, kclass): + cls._registry[name] = kclass + + def __new__(cls, name: str | None = None, **kwargs): + if name is None: + name = settings.TRANSCRIPT_BACKEND + if name not in cls._registry: + module_name = f"reflector.processors.file_transcript_{name}" + importlib.import_module(module_name) + + # gather specific configuration for the processor + # search `TRANSCRIPT_BACKEND_XXX_YYY`, push to constructor as `backend_xxx_yyy` + config = {} + name_upper = name.upper() + settings_prefix = "TRANSCRIPT_" + config_prefix = f"{settings_prefix}{name_upper}_" + for key, value in settings: + if key.startswith(config_prefix): + config_name = key[len(settings_prefix) :].lower() + config[config_name] = value + + return cls._registry[name](**config | kwargs) diff --git a/server/reflector/processors/file_transcript_modal.py b/server/reflector/processors/file_transcript_modal.py new file mode 100644 index 00000000..21c378ec --- /dev/null +++ b/server/reflector/processors/file_transcript_modal.py @@ -0,0 +1,74 @@ +""" +File transcription implementation using the GPU service from modal.com + +API will be a POST request to TRANSCRIPT_URL: + +```json +{ + "audio_file_url": "https://...", + "language": "en", + "model": "parakeet-tdt-0.6b-v2", + "batch": true +} +``` +""" + +import httpx + +from reflector.processors.file_transcript import ( + FileTranscriptInput, + FileTranscriptProcessor, +) +from reflector.processors.file_transcript_auto import FileTranscriptAutoProcessor +from reflector.processors.types import Transcript, Word +from reflector.settings import settings + + +class FileTranscriptModalProcessor(FileTranscriptProcessor): + def __init__(self, modal_api_key: str | None = None, **kwargs): + super().__init__(**kwargs) + if not settings.TRANSCRIPT_URL: + raise Exception( + "TRANSCRIPT_URL required to use FileTranscriptModalProcessor" + ) + self.transcript_url = settings.TRANSCRIPT_URL + self.file_timeout = settings.TRANSCRIPT_FILE_TIMEOUT + self.modal_api_key = modal_api_key + + async def _transcript(self, data: FileTranscriptInput): + """Send full file to Modal for transcription""" + url = f"{self.transcript_url}/v1/audio/transcriptions-from-url" + + self.logger.info(f"Starting file transcription from {data.audio_url}") + + headers = {} + if self.modal_api_key: + headers["Authorization"] = f"Bearer {self.modal_api_key}" + + async with httpx.AsyncClient(timeout=self.file_timeout) as client: + response = await client.post( + url, + headers=headers, + json={ + "audio_file_url": data.audio_url, + "language": data.language, + "batch": True, + }, + ) + response.raise_for_status() + result = response.json() + + words = [ + Word( + text=word_info["word"], + start=word_info["start"], + end=word_info["end"], + ) + for word_info in result.get("words", []) + ] + + return Transcript(words=words) + + +# Register with the auto processor +FileTranscriptAutoProcessor.register("modal", FileTranscriptModalProcessor) diff --git a/server/reflector/processors/transcript_diarization_assembler.py b/server/reflector/processors/transcript_diarization_assembler.py new file mode 100644 index 00000000..24c51100 --- /dev/null +++ b/server/reflector/processors/transcript_diarization_assembler.py @@ -0,0 +1,45 @@ +""" +Processor to assemble transcript with diarization results +""" + +from reflector.processors.audio_diarization import AudioDiarizationProcessor +from reflector.processors.base import Processor +from reflector.processors.types import DiarizationSegment, Transcript + + +class TranscriptDiarizationAssemblerInput: + """Input containing transcript and diarization data""" + + def __init__(self, transcript: Transcript, diarization: list[DiarizationSegment]): + self.transcript = transcript + self.diarization = diarization + + +class TranscriptDiarizationAssemblerProcessor(Processor): + """ + Assemble transcript with diarization results by applying speaker assignments + """ + + INPUT_TYPE = TranscriptDiarizationAssemblerInput + OUTPUT_TYPE = Transcript + + async def _push(self, data: TranscriptDiarizationAssemblerInput): + result = await self._assemble(data) + if result: + await self.emit(result) + + async def _assemble(self, data: TranscriptDiarizationAssemblerInput): + """Apply diarization to transcript words""" + if not data.diarization: + self.logger.info( + "No diarization data provided, returning original transcript" + ) + return data.transcript + + # Reuse logic from AudioDiarizationProcessor + processor = AudioDiarizationProcessor() + words = data.transcript.words + processor.assign_speaker(words, data.diarization) + + self.logger.info(f"Applied diarization to {len(words)} words") + return data.transcript diff --git a/server/reflector/processors/types.py b/server/reflector/processors/types.py index 90c9c846..480086af 100644 --- a/server/reflector/processors/types.py +++ b/server/reflector/processors/types.py @@ -2,13 +2,22 @@ import io import re import tempfile from pathlib import Path -from typing import Annotated +from typing import Annotated, TypedDict from profanityfilter import ProfanityFilter from pydantic import BaseModel, Field, PrivateAttr from reflector.redis_cache import redis_cache + +class DiarizationSegment(TypedDict): + """Type definition for diarization segment containing speaker information""" + + start: float + end: float + speaker: int + + PUNC_RE = re.compile(r"[.;:?!…]") profanity_filter = ProfanityFilter() diff --git a/server/reflector/settings.py b/server/reflector/settings.py index 463881b2..7b50911b 100644 --- a/server/reflector/settings.py +++ b/server/reflector/settings.py @@ -26,6 +26,7 @@ class Settings(BaseSettings): TRANSCRIPT_BACKEND: str = "whisper" TRANSCRIPT_URL: str | None = None TRANSCRIPT_TIMEOUT: int = 90 + TRANSCRIPT_FILE_TIMEOUT: int = 600 # Audio Transcription: modal backend TRANSCRIPT_MODAL_API_KEY: str | None = None @@ -66,10 +67,14 @@ class Settings(BaseSettings): DIARIZATION_ENABLED: bool = True DIARIZATION_BACKEND: str = "modal" DIARIZATION_URL: str | None = None + DIARIZATION_FILE_TIMEOUT: int = 600 # Diarization: modal backend DIARIZATION_MODAL_API_KEY: str | None = None + # Diarization: local pyannote.audio + DIARIZATION_PYANNOTE_AUTH_TOKEN: str | None = None + # Sentry SENTRY_DSN: str | None = None diff --git a/server/reflector/tools/process.py b/server/reflector/tools/process.py index 91ac8e7c..43ec06ab 100644 --- a/server/reflector/tools/process.py +++ b/server/reflector/tools/process.py @@ -1,10 +1,23 @@ +""" +Process audio file with diarization support +=========================================== + +Extended version of process.py that includes speaker diarization. +This tool processes audio files locally without requiring the full server infrastructure. +""" + import asyncio +import tempfile +import uuid +from pathlib import Path +from typing import List import av from reflector.logger import logger from reflector.processors import ( AudioChunkerProcessor, + AudioFileWriterProcessor, AudioMergeProcessor, AudioTranscriptAutoProcessor, Pipeline, @@ -15,7 +28,43 @@ from reflector.processors import ( TranscriptTopicDetectorProcessor, TranscriptTranslatorAutoProcessor, ) -from reflector.processors.base import BroadcastProcessor +from reflector.processors.base import BroadcastProcessor, Processor +from reflector.processors.types import ( + AudioDiarizationInput, + TitleSummary, + TitleSummaryWithId, +) + + +class TopicCollectorProcessor(Processor): + """Collect topics for diarization""" + + INPUT_TYPE = TitleSummary + OUTPUT_TYPE = TitleSummary + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.topics: List[TitleSummaryWithId] = [] + self._topic_id = 0 + + async def _push(self, data: TitleSummary): + # Convert to TitleSummaryWithId and collect + self._topic_id += 1 + topic_with_id = TitleSummaryWithId( + id=str(self._topic_id), + title=data.title, + summary=data.summary, + timestamp=data.timestamp, + duration=data.duration, + transcript=data.transcript, + ) + self.topics.append(topic_with_id) + + # Pass through the original topic + await self.emit(data) + + def get_topics(self) -> List[TitleSummaryWithId]: + return self.topics async def process_audio_file( @@ -24,18 +73,40 @@ async def process_audio_file( only_transcript=False, source_language="en", target_language="en", + enable_diarization=True, + diarization_backend="pyannote", ): - # build pipeline for audio processing - processors = [ + # Create temp file for audio if diarization is enabled + audio_temp_path = None + if enable_diarization: + audio_temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False) + audio_temp_path = audio_temp_file.name + audio_temp_file.close() + + # Create processor for collecting topics + topic_collector = TopicCollectorProcessor() + + # Build pipeline for audio processing + processors = [] + + # Add audio file writer at the beginning if diarization is enabled + if enable_diarization: + processors.append(AudioFileWriterProcessor(audio_temp_path)) + + # Add the rest of the processors + processors += [ AudioChunkerProcessor(), AudioMergeProcessor(), AudioTranscriptAutoProcessor.as_threaded(), TranscriptLinerProcessor(), TranscriptTranslatorAutoProcessor.as_threaded(), ] + if not only_transcript: processors += [ TranscriptTopicDetectorProcessor.as_threaded(), + # Collect topics for diarization + topic_collector, BroadcastProcessor( processors=[ TranscriptFinalTitleProcessor.as_threaded(), @@ -44,14 +115,14 @@ async def process_audio_file( ), ] - # transcription output + # Create main pipeline pipeline = Pipeline(*processors) pipeline.set_pref("audio:source_language", source_language) pipeline.set_pref("audio:target_language", target_language) pipeline.describe() pipeline.on(event_callback) - # start processing audio + # Start processing audio logger.info(f"Opening {filename}") container = av.open(filename) try: @@ -62,43 +133,242 @@ async def process_audio_file( logger.info("Flushing the pipeline") await pipeline.flush() - logger.info("All done !") + # Run diarization if enabled and we have topics + if enable_diarization and not only_transcript and audio_temp_path: + topics = topic_collector.get_topics() + + if topics: + logger.info(f"Starting diarization with {len(topics)} topics") + + try: + from reflector.processors import AudioDiarizationAutoProcessor + + diarization_processor = AudioDiarizationAutoProcessor( + name=diarization_backend + ) + + diarization_processor.set_pipeline(pipeline) + + # For Modal backend, we need to upload the file to S3 first + if diarization_backend == "modal": + from datetime import datetime + + from reflector.storage import get_transcripts_storage + from reflector.utils.s3_temp_file import S3TemporaryFile + + storage = get_transcripts_storage() + + # Generate a unique filename in evaluation folder + timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S") + audio_filename = f"evaluation/diarization_temp/{timestamp}_{uuid.uuid4().hex}.wav" + + # Use context manager for automatic cleanup + async with S3TemporaryFile(storage, audio_filename) as s3_file: + # Read and upload the audio file + with open(audio_temp_path, "rb") as f: + audio_data = f.read() + + audio_url = await s3_file.upload(audio_data) + logger.info(f"Uploaded audio to S3: {audio_filename}") + + # Create diarization input with S3 URL + diarization_input = AudioDiarizationInput( + audio_url=audio_url, topics=topics + ) + + # Run diarization + await diarization_processor.push(diarization_input) + await diarization_processor.flush() + + logger.info("Diarization complete") + # File will be automatically cleaned up when exiting the context + else: + # For local backend, use local file path + audio_url = audio_temp_path + + # Create diarization input + diarization_input = AudioDiarizationInput( + audio_url=audio_url, topics=topics + ) + + # Run diarization + await diarization_processor.push(diarization_input) + await diarization_processor.flush() + + logger.info("Diarization complete") + + except ImportError as e: + logger.error(f"Failed to import diarization dependencies: {e}") + logger.error( + "Install with: uv pip install pyannote.audio torch torchaudio" + ) + logger.error( + "And set HF_TOKEN environment variable for pyannote models" + ) + raise SystemExit(1) + except Exception as e: + logger.error(f"Diarization failed: {e}") + raise SystemExit(1) + else: + logger.warning("Skipping diarization: no topics available") + + # Clean up temp file + if audio_temp_path: + try: + Path(audio_temp_path).unlink() + except Exception as e: + logger.warning(f"Failed to clean up temp file {audio_temp_path}: {e}") + + logger.info("All done!") + + +async def process_file_pipeline( + filename: str, + event_callback, + source_language="en", + target_language="en", + enable_diarization=True, + diarization_backend="modal", +): + """Process audio/video file using the optimized file pipeline""" + try: + from reflector.db import database + from reflector.db.transcripts import SourceKind, transcripts_controller + from reflector.pipelines.main_file_pipeline import PipelineMainFile + + await database.connect() + try: + # Create a temporary transcript for processing + transcript = await transcripts_controller.add( + "", + source_kind=SourceKind.FILE, + source_language=source_language, + target_language=target_language, + ) + + # Process the file + pipeline = PipelineMainFile(transcript_id=transcript.id) + await pipeline.process(Path(filename)) + + logger.info("File pipeline processing complete") + + finally: + await database.disconnect() + except ImportError as e: + logger.error(f"File pipeline not available: {e}") + logger.info("Falling back to stream pipeline") + # Fall back to stream pipeline + await process_audio_file( + filename, + event_callback, + only_transcript=False, + source_language=source_language, + target_language=target_language, + enable_diarization=enable_diarization, + diarization_backend=diarization_backend, + ) if __name__ == "__main__": import argparse + import os - parser = argparse.ArgumentParser() + parser = argparse.ArgumentParser( + description="Process audio files with optional speaker diarization" + ) parser.add_argument("source", help="Source file (mp3, wav, mp4...)") - parser.add_argument("--only-transcript", "-t", action="store_true") - parser.add_argument("--source-language", default="en") - parser.add_argument("--target-language", default="en") + parser.add_argument( + "--stream", + action="store_true", + help="Use streaming pipeline (original frame-based processing)", + ) + parser.add_argument( + "--only-transcript", + "-t", + action="store_true", + help="Only generate transcript without topics/summaries", + ) + parser.add_argument( + "--source-language", default="en", help="Source language code (default: en)" + ) + parser.add_argument( + "--target-language", default="en", help="Target language code (default: en)" + ) parser.add_argument("--output", "-o", help="Output file (output.jsonl)") + parser.add_argument( + "--enable-diarization", + "-d", + action="store_true", + help="Enable speaker diarization", + ) + parser.add_argument( + "--diarization-backend", + default="pyannote", + choices=["pyannote", "modal"], + help="Diarization backend to use (default: pyannote)", + ) args = parser.parse_args() + if "REDIS_HOST" not in os.environ: + os.environ["REDIS_HOST"] = "localhost" + output_fd = None if args.output: output_fd = open(args.output, "w") async def event_callback(event: PipelineEvent): processor = event.processor - # ignore some processor - if processor in ("AudioChunkerProcessor", "AudioMergeProcessor"): + data = event.data + + # Ignore internal processors + if processor in ( + "AudioChunkerProcessor", + "AudioMergeProcessor", + "AudioFileWriterProcessor", + "TopicCollectorProcessor", + "BroadcastProcessor", + ): return - logger.info(f"Event: {event}") + + # If diarization is enabled, skip the original topic events from the pipeline + # The diarization processor will emit the same topics but with speaker info + if processor == "TranscriptTopicDetectorProcessor" and args.enable_diarization: + return + + # Log all events + logger.info(f"Event: {processor} - {type(data).__name__}") + + # Write to output if output_fd: output_fd.write(event.model_dump_json()) output_fd.write("\n") + output_fd.flush() - asyncio.run( - process_audio_file( - args.source, - event_callback, - only_transcript=args.only_transcript, - source_language=args.source_language, - target_language=args.target_language, + if args.stream: + # Use original streaming pipeline + asyncio.run( + process_audio_file( + args.source, + event_callback, + only_transcript=args.only_transcript, + source_language=args.source_language, + target_language=args.target_language, + enable_diarization=args.enable_diarization, + diarization_backend=args.diarization_backend, + ) + ) + else: + # Use optimized file pipeline (default) + asyncio.run( + process_file_pipeline( + args.source, + event_callback, + source_language=args.source_language, + target_language=args.target_language, + enable_diarization=args.enable_diarization, + diarization_backend=args.diarization_backend, + ) ) - ) if output_fd: output_fd.close() diff --git a/server/reflector/worker/process.py b/server/reflector/worker/process.py index c3704207..00126514 100644 --- a/server/reflector/worker/process.py +++ b/server/reflector/worker/process.py @@ -14,7 +14,8 @@ from reflector.db.meetings import meetings_controller from reflector.db.recordings import Recording, recordings_controller from reflector.db.rooms import rooms_controller from reflector.db.transcripts import SourceKind, transcripts_controller -from reflector.pipelines.main_live_pipeline import asynctask, task_pipeline_process +from reflector.pipelines.main_file_pipeline import task_pipeline_file_process +from reflector.pipelines.main_live_pipeline import asynctask from reflector.settings import settings from reflector.whereby import get_room_sessions @@ -140,7 +141,7 @@ async def process_recording(bucket_name: str, object_key: str): await transcripts_controller.update(transcript, {"status": "uploaded"}) - task_pipeline_process.delay(transcript_id=transcript.id) + task_pipeline_file_process.delay(transcript_id=transcript.id) @shared_task diff --git a/server/tests/cassettes/test_processors_modal/test_file_diarization_modal_processor.yaml b/server/tests/cassettes/test_processors_modal/test_file_diarization_modal_processor.yaml new file mode 100644 index 00000000..8ba58937 --- /dev/null +++ b/server/tests/cassettes/test_processors_modal/test_file_diarization_modal_processor.yaml @@ -0,0 +1,40 @@ +interactions: +- request: + body: '' + headers: + accept: + - '*/*' + accept-encoding: + - gzip, deflate + authorization: + - DUMMY_API_KEY + connection: + - keep-alive + content-length: + - '0' + host: + - monadical-sas--reflector-diarizer-web.modal.run + user-agent: + - python-httpx/0.27.2 + method: POST + uri: https://monadical-sas--reflector-diarizer-web.modal.run/diarize?audio_file_url=https%3A%2F%2Freflector-github-pytest.s3.us-east-1.amazonaws.com%2Ftest_mathieu_hello.mp3×tamp=0 + response: + body: + string: '{"diarization":[{"start":0.823,"end":1.91,"speaker":0},{"start":2.572,"end":6.409,"speaker":0},{"start":6.783,"end":10.62,"speaker":0},{"start":11.231,"end":14.168,"speaker":0},{"start":14.796,"end":19.295,"speaker":0}]}' + headers: + Alt-Svc: + - h3=":443"; ma=2592000 + Content-Length: + - '220' + Content-Type: + - application/json + Date: + - Wed, 13 Aug 2025 18:25:34 GMT + Modal-Function-Call-Id: + - fc-01K2JAVNEP6N7Y1Y7W3T98BCXK + Vary: + - accept-encoding + status: + code: 200 + message: OK +version: 1 diff --git a/server/tests/cassettes/test_processors_modal/test_file_transcript_modal_processor.yaml b/server/tests/cassettes/test_processors_modal/test_file_transcript_modal_processor.yaml new file mode 100644 index 00000000..1cd95fcc --- /dev/null +++ b/server/tests/cassettes/test_processors_modal/test_file_transcript_modal_processor.yaml @@ -0,0 +1,46 @@ +interactions: +- request: + body: '{"audio_file_url": "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3", + "language": "en", "batch": true}' + headers: + accept: + - '*/*' + accept-encoding: + - gzip, deflate + authorization: + - DUMMY_API_KEY + connection: + - keep-alive + content-length: + - '136' + content-type: + - application/json + host: + - monadical-sas--reflector-transcriber-parakeet-web.modal.run + user-agent: + - python-httpx/0.27.2 + method: POST + uri: https://monadical-sas--reflector-transcriber-parakeet-web.modal.run/v1/audio/transcriptions-from-url + response: + body: + string: '{"text":"Hi there everyone. Today I want to share my incredible experience + with Reflector. a Q teenage product that revolutionizes audio processing. + With reflector, I can easily convert any audio into accurate transcription. + saving me hours of tedious manual work.","words":[{"word":"Hi","start":0.87,"end":1.19},{"word":"there","start":1.19,"end":1.35},{"word":"everyone.","start":1.51,"end":1.83},{"word":"Today","start":2.63,"end":2.87},{"word":"I","start":3.36,"end":3.52},{"word":"want","start":3.6,"end":3.76},{"word":"to","start":3.76,"end":3.92},{"word":"share","start":3.92,"end":4.16},{"word":"my","start":4.16,"end":4.4},{"word":"incredible","start":4.32,"end":4.96},{"word":"experience","start":4.96,"end":5.44},{"word":"with","start":5.44,"end":5.68},{"word":"Reflector.","start":5.68,"end":6.24},{"word":"a","start":6.93,"end":7.01},{"word":"Q","start":7.01,"end":7.17},{"word":"teenage","start":7.25,"end":7.65},{"word":"product","start":7.89,"end":8.29},{"word":"that","start":8.29,"end":8.61},{"word":"revolutionizes","start":8.61,"end":9.65},{"word":"audio","start":9.65,"end":10.05},{"word":"processing.","start":10.05,"end":10.53},{"word":"With","start":11.27,"end":11.43},{"word":"reflector,","start":11.51,"end":12.15},{"word":"I","start":12.31,"end":12.39},{"word":"can","start":12.39,"end":12.55},{"word":"easily","start":12.55,"end":12.95},{"word":"convert","start":12.95,"end":13.43},{"word":"any","start":13.43,"end":13.67},{"word":"audio","start":13.67,"end":13.99},{"word":"into","start":14.98,"end":15.06},{"word":"accurate","start":15.22,"end":15.54},{"word":"transcription.","start":15.7,"end":16.34},{"word":"saving","start":16.99,"end":17.15},{"word":"me","start":17.31,"end":17.47},{"word":"hours","start":17.47,"end":17.87},{"word":"of","start":17.87,"end":18.11},{"word":"tedious","start":18.11,"end":18.67},{"word":"manual","start":18.67,"end":19.07},{"word":"work.","start":19.07,"end":19.31}]}' + headers: + Alt-Svc: + - h3=":443"; ma=2592000 + Content-Length: + - '1933' + Content-Type: + - application/json + Date: + - Wed, 13 Aug 2025 18:26:59 GMT + Modal-Function-Call-Id: + - fc-01K2JAWC7GAMKX4DSJ21WV31NG + Vary: + - accept-encoding + status: + code: 200 + message: OK +version: 1 diff --git a/server/tests/cassettes/test_processors_modal/test_full_modal_pipeline_integration.yaml b/server/tests/cassettes/test_processors_modal/test_full_modal_pipeline_integration.yaml new file mode 100644 index 00000000..64e8ffb4 --- /dev/null +++ b/server/tests/cassettes/test_processors_modal/test_full_modal_pipeline_integration.yaml @@ -0,0 +1,84 @@ +interactions: +- request: + body: '{"audio_file_url": "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3", + "language": "en", "batch": true}' + headers: + accept: + - '*/*' + accept-encoding: + - gzip, deflate + authorization: + - DUMMY_API_KEY + connection: + - keep-alive + content-length: + - '136' + content-type: + - application/json + host: + - monadical-sas--reflector-transcriber-parakeet-web.modal.run + user-agent: + - python-httpx/0.27.2 + method: POST + uri: https://monadical-sas--reflector-transcriber-parakeet-web.modal.run/v1/audio/transcriptions-from-url + response: + body: + string: '{"text":"Hi there everyone. Today I want to share my incredible experience + with Reflector. a Q teenage product that revolutionizes audio processing. + With reflector, I can easily convert any audio into accurate transcription. + saving me hours of tedious manual work.","words":[{"word":"Hi","start":0.87,"end":1.19},{"word":"there","start":1.19,"end":1.35},{"word":"everyone.","start":1.51,"end":1.83},{"word":"Today","start":2.63,"end":2.87},{"word":"I","start":3.36,"end":3.52},{"word":"want","start":3.6,"end":3.76},{"word":"to","start":3.76,"end":3.92},{"word":"share","start":3.92,"end":4.16},{"word":"my","start":4.16,"end":4.4},{"word":"incredible","start":4.32,"end":4.96},{"word":"experience","start":4.96,"end":5.44},{"word":"with","start":5.44,"end":5.68},{"word":"Reflector.","start":5.68,"end":6.24},{"word":"a","start":6.93,"end":7.01},{"word":"Q","start":7.01,"end":7.17},{"word":"teenage","start":7.25,"end":7.65},{"word":"product","start":7.89,"end":8.29},{"word":"that","start":8.29,"end":8.61},{"word":"revolutionizes","start":8.61,"end":9.65},{"word":"audio","start":9.65,"end":10.05},{"word":"processing.","start":10.05,"end":10.53},{"word":"With","start":11.27,"end":11.43},{"word":"reflector,","start":11.51,"end":12.15},{"word":"I","start":12.31,"end":12.39},{"word":"can","start":12.39,"end":12.55},{"word":"easily","start":12.55,"end":12.95},{"word":"convert","start":12.95,"end":13.43},{"word":"any","start":13.43,"end":13.67},{"word":"audio","start":13.67,"end":13.99},{"word":"into","start":14.98,"end":15.06},{"word":"accurate","start":15.22,"end":15.54},{"word":"transcription.","start":15.7,"end":16.34},{"word":"saving","start":16.99,"end":17.15},{"word":"me","start":17.31,"end":17.47},{"word":"hours","start":17.47,"end":17.87},{"word":"of","start":17.87,"end":18.11},{"word":"tedious","start":18.11,"end":18.67},{"word":"manual","start":18.67,"end":19.07},{"word":"work.","start":19.07,"end":19.31}]}' + headers: + Alt-Svc: + - h3=":443"; ma=2592000 + Content-Length: + - '1933' + Content-Type: + - application/json + Date: + - Wed, 13 Aug 2025 18:27:02 GMT + Modal-Function-Call-Id: + - fc-01K2JAYZ1AR2HE422VJVKBWX9Z + Vary: + - accept-encoding + status: + code: 200 + message: OK +- request: + body: '' + headers: + accept: + - '*/*' + accept-encoding: + - gzip, deflate + authorization: + - DUMMY_API_KEY + connection: + - keep-alive + content-length: + - '0' + host: + - monadical-sas--reflector-diarizer-web.modal.run + user-agent: + - python-httpx/0.27.2 + method: POST + uri: https://monadical-sas--reflector-diarizer-web.modal.run/diarize?audio_file_url=https%3A%2F%2Freflector-github-pytest.s3.us-east-1.amazonaws.com%2Ftest_mathieu_hello.mp3×tamp=0 + response: + body: + string: '{"diarization":[{"start":0.823,"end":1.91,"speaker":0},{"start":2.572,"end":6.409,"speaker":0},{"start":6.783,"end":10.62,"speaker":0},{"start":11.231,"end":14.168,"speaker":0},{"start":14.796,"end":19.295,"speaker":0}]}' + headers: + Alt-Svc: + - h3=":443"; ma=2592000 + Content-Length: + - '220' + Content-Type: + - application/json + Date: + - Wed, 13 Aug 2025 18:27:18 GMT + Modal-Function-Call-Id: + - fc-01K2JAZ1M34NQRJK03CCFK95D6 + Vary: + - accept-encoding + status: + code: 200 + message: OK +version: 1 diff --git a/server/tests/conftest.py b/server/tests/conftest.py index e5550a86..d739751d 100644 --- a/server/tests/conftest.py +++ b/server/tests/conftest.py @@ -5,7 +5,29 @@ from unittest.mock import patch import pytest -# Pytest-docker configuration +@pytest.fixture(scope="session", autouse=True) +def settings_configuration(): + # theses settings are linked to monadical for pytest-recording + # if a fork is done, they have to provide their own url when cassettes needs to be updated + # modal api keys has to be defined by the user + from reflector.settings import settings + + settings.TRANSCRIPT_BACKEND = "modal" + settings.TRANSCRIPT_URL = ( + "https://monadical-sas--reflector-transcriber-parakeet-web.modal.run" + ) + settings.DIARIZATION_BACKEND = "modal" + settings.DIARIZATION_URL = "https://monadical-sas--reflector-diarizer-web.modal.run" + + +@pytest.fixture(scope="module") +def vcr_config(): + """VCR configuration to filter sensitive headers""" + return { + "filter_headers": [("authorization", "DUMMY_API_KEY")], + } + + @pytest.fixture(scope="session") def docker_compose_file(pytestconfig): return os.path.join(str(pytestconfig.rootdir), "tests", "docker-compose.test.yml") diff --git a/server/tests/docker-compose.test.yml b/server/tests/docker-compose.test.yml index e848a4ee..9537a1ef 100644 --- a/server/tests/docker-compose.test.yml +++ b/server/tests/docker-compose.test.yml @@ -1,7 +1,7 @@ -version: '3.8' +version: "3.8" services: postgres_test: - image: postgres:15 + image: postgres:17 environment: POSTGRES_DB: reflector_test POSTGRES_USER: test_user @@ -10,4 +10,4 @@ services: - "15432:5432" command: postgres -c fsync=off -c synchronous_commit=off -c full_page_writes=off tmpfs: - - /var/lib/postgresql/data:rw,noexec,nosuid,size=1g \ No newline at end of file + - /var/lib/postgresql/data:rw,noexec,nosuid,size=1g diff --git a/server/tests/test_gpu_modal_transcript.py b/server/tests/test_gpu_modal_transcript.py new file mode 100644 index 00000000..9b37fbe6 --- /dev/null +++ b/server/tests/test_gpu_modal_transcript.py @@ -0,0 +1,330 @@ +""" +Tests for GPU Modal transcription endpoints. + +These tests are marked with the "gpu-modal" group and will not run by default. +Run them with: pytest -m gpu-modal tests/test_gpu_modal_transcript_parakeet.py + +Required environment variables: +- TRANSCRIPT_URL: URL to the Modal.com endpoint (required) +- TRANSCRIPT_MODAL_API_KEY: API key for authentication (optional) +- TRANSCRIPT_MODEL: Model name to use (optional, defaults to nvidia/parakeet-tdt-0.6b-v2) + +Example with pytest (override default addopts to run ONLY gpu_modal tests): + TRANSCRIPT_URL=https://monadical-sas--reflector-transcriber-parakeet-web-dev.modal.run \ + TRANSCRIPT_MODAL_API_KEY=your-api-key \ + uv run -m pytest -m gpu_modal --no-cov tests/test_gpu_modal_transcript.py + + # Or with completely clean options: + uv run -m pytest -m gpu_modal -o addopts="" tests/ + +Running Modal locally for testing: + modal serve gpu/modal_deployments/reflector_transcriber_parakeet.py + # This will give you a local URL like https://xxxxx--reflector-transcriber-parakeet-web-dev.modal.run to test against +""" + +import os +import tempfile +from pathlib import Path + +import httpx +import pytest + +# Test audio file URL for testing +TEST_AUDIO_URL = ( + "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3" +) + + +def get_modal_transcript_url(): + """Get and validate the Modal transcript URL from environment.""" + url = os.environ.get("TRANSCRIPT_URL") + if not url: + pytest.skip( + "TRANSCRIPT_URL environment variable is required for GPU Modal tests" + ) + return url + + +def get_auth_headers(): + """Get authentication headers if API key is available.""" + api_key = os.environ.get("TRANSCRIPT_MODAL_API_KEY") + if api_key: + return {"Authorization": f"Bearer {api_key}"} + return {} + + +def get_model_name(): + """Get the model name from environment or use default.""" + return os.environ.get("TRANSCRIPT_MODEL", "nvidia/parakeet-tdt-0.6b-v2") + + +@pytest.mark.gpu_modal +class TestGPUModalTranscript: + """Test suite for GPU Modal transcription endpoints.""" + + def test_transcriptions_from_url(self): + """Test the /v1/audio/transcriptions-from-url endpoint.""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + with httpx.Client(timeout=60.0) as client: + response = client.post( + f"{url}/v1/audio/transcriptions-from-url", + json={ + "audio_file_url": TEST_AUDIO_URL, + "model": get_model_name(), + "language": "en", + "timestamp_offset": 0.0, + }, + headers=headers, + ) + + assert response.status_code == 200, f"Request failed: {response.text}" + result = response.json() + + # Verify response structure + assert "text" in result + assert "words" in result + assert isinstance(result["text"], str) + assert isinstance(result["words"], list) + + # Verify content is meaningful + assert len(result["text"]) > 0, "Transcript text should not be empty" + assert len(result["words"]) > 0, "Words list must not be empty" + + # Verify word structure + for word in result["words"]: + assert "word" in word + assert "start" in word + assert "end" in word + assert isinstance(word["start"], (int, float)) + assert isinstance(word["end"], (int, float)) + assert word["start"] <= word["end"] + + def test_transcriptions_single_file(self): + """Test the /v1/audio/transcriptions endpoint with a single file.""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + # Download test audio file to upload + with httpx.Client(timeout=60.0) as client: + audio_response = client.get(TEST_AUDIO_URL) + audio_response.raise_for_status() + + with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file: + tmp_file.write(audio_response.content) + tmp_file_path = tmp_file.name + + try: + # Upload the file for transcription + with open(tmp_file_path, "rb") as f: + files = {"file": ("test_audio.mp3", f, "audio/mpeg")} + data = { + "model": get_model_name(), + "language": "en", + "batch": "false", + } + + response = client.post( + f"{url}/v1/audio/transcriptions", + files=files, + data=data, + headers=headers, + ) + + assert response.status_code == 200, f"Request failed: {response.text}" + result = response.json() + + # Verify response structure for single file + assert "text" in result + assert "words" in result + assert "filename" in result + assert isinstance(result["text"], str) + assert isinstance(result["words"], list) + + # Verify content + assert len(result["text"]) > 0, "Transcript text should not be empty" + + finally: + Path(tmp_file_path).unlink(missing_ok=True) + + def test_transcriptions_multiple_files(self): + """Test the /v1/audio/transcriptions endpoint with multiple files (non-batch mode).""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + # Create multiple test files (we'll use the same audio content for simplicity) + with httpx.Client(timeout=60.0) as client: + audio_response = client.get(TEST_AUDIO_URL) + audio_response.raise_for_status() + audio_content = audio_response.content + + temp_files = [] + try: + # Create 3 temporary files + for i in range(3): + tmp_file = tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) + tmp_file.write(audio_content) + tmp_file.close() + temp_files.append(tmp_file.name) + + # Upload multiple files for transcription (non-batch) + files = [ + ("files", (f"test_audio_{i}.mp3", open(f, "rb"), "audio/mpeg")) + for i, f in enumerate(temp_files) + ] + data = { + "model": get_model_name(), + "language": "en", + "batch": "false", + } + + response = client.post( + f"{url}/v1/audio/transcriptions", + files=files, + data=data, + headers=headers, + ) + + # Close file handles + for _, file_tuple in files: + file_tuple[1].close() + + assert response.status_code == 200, f"Request failed: {response.text}" + result = response.json() + + # Verify response structure for multiple files (non-batch) + assert "results" in result + assert isinstance(result["results"], list) + assert len(result["results"]) == 3 + + for idx, file_result in enumerate(result["results"]): + assert "text" in file_result + assert "words" in file_result + assert "filename" in file_result + assert isinstance(file_result["text"], str) + assert isinstance(file_result["words"], list) + assert len(file_result["text"]) > 0 + + finally: + for f in temp_files: + Path(f).unlink(missing_ok=True) + + def test_transcriptions_multiple_files_batch(self): + """Test the /v1/audio/transcriptions endpoint with multiple files in batch mode.""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + # Create multiple test files + with httpx.Client(timeout=60.0) as client: + audio_response = client.get(TEST_AUDIO_URL) + audio_response.raise_for_status() + audio_content = audio_response.content + + temp_files = [] + try: + # Create 3 temporary files + for i in range(3): + tmp_file = tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) + tmp_file.write(audio_content) + tmp_file.close() + temp_files.append(tmp_file.name) + + # Upload multiple files for batch transcription + files = [ + ("files", (f"test_audio_{i}.mp3", open(f, "rb"), "audio/mpeg")) + for i, f in enumerate(temp_files) + ] + data = { + "model": get_model_name(), + "language": "en", + "batch": "true", + } + + response = client.post( + f"{url}/v1/audio/transcriptions", + files=files, + data=data, + headers=headers, + ) + + # Close file handles + for _, file_tuple in files: + file_tuple[1].close() + + assert response.status_code == 200, f"Request failed: {response.text}" + result = response.json() + + # Verify response structure for batch mode + assert "results" in result + assert isinstance(result["results"], list) + assert len(result["results"]) == 3 + + for idx, batch_result in enumerate(result["results"]): + assert "text" in batch_result + assert "words" in batch_result + assert "filename" in batch_result + assert isinstance(batch_result["text"], str) + assert isinstance(batch_result["words"], list) + assert len(batch_result["text"]) > 0 + + finally: + for f in temp_files: + Path(f).unlink(missing_ok=True) + + def test_transcriptions_error_handling(self): + """Test error handling for invalid requests.""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + with httpx.Client(timeout=60.0) as client: + # Test with unsupported language + response = client.post( + f"{url}/v1/audio/transcriptions-from-url", + json={ + "audio_file_url": TEST_AUDIO_URL, + "model": get_model_name(), + "language": "fr", # Parakeet only supports English + "timestamp_offset": 0.0, + }, + headers=headers, + ) + + assert response.status_code == 400 + assert "only supports English" in response.text + + def test_transcriptions_with_timestamp_offset(self): + """Test transcription with timestamp offset parameter.""" + url = get_modal_transcript_url() + headers = get_auth_headers() + + with httpx.Client(timeout=60.0) as client: + # Test with timestamp offset + response = client.post( + f"{url}/v1/audio/transcriptions-from-url", + json={ + "audio_file_url": TEST_AUDIO_URL, + "model": get_model_name(), + "language": "en", + "timestamp_offset": 10.0, # Add 10 second offset + }, + headers=headers, + ) + + assert response.status_code == 200, f"Request failed: {response.text}" + result = response.json() + + # Verify response structure + assert "text" in result + assert "words" in result + assert len(result["words"]) > 0, "Words list must not be empty" + + # Verify that timestamps have been offset + for word in result["words"]: + # All timestamps should be >= 10.0 due to offset + assert ( + word["start"] >= 10.0 + ), f"Word start time {word['start']} should be >= 10.0" + assert ( + word["end"] >= 10.0 + ), f"Word end time {word['end']} should be >= 10.0" diff --git a/server/tests/test_pipeline_main_file.py b/server/tests/test_pipeline_main_file.py new file mode 100644 index 00000000..f86dc85d --- /dev/null +++ b/server/tests/test_pipeline_main_file.py @@ -0,0 +1,633 @@ +""" +Tests for PipelineMainFile - file-based processing pipeline + +This test verifies the complete file processing pipeline without mocking much, +ensuring all processors are correctly invoked and the happy path works correctly. +""" + +from pathlib import Path +from unittest.mock import AsyncMock, MagicMock, patch +from uuid import uuid4 + +import pytest + +from reflector.pipelines.main_file_pipeline import PipelineMainFile +from reflector.processors.file_diarization import FileDiarizationOutput +from reflector.processors.types import ( + DiarizationSegment, + TitleSummary, + Word, +) +from reflector.processors.types import ( + Transcript as TranscriptType, +) + + +@pytest.fixture +async def dummy_file_transcript(): + """Mock FileTranscriptAutoProcessor for file processing""" + from reflector.processors.file_transcript import FileTranscriptProcessor + + class TestFileTranscriptProcessor(FileTranscriptProcessor): + async def _transcript(self, data): + return TranscriptType( + text="Hello world. How are you today?", + words=[ + Word(start=0.0, end=0.5, text="Hello", speaker=0), + Word(start=0.5, end=0.6, text=" ", speaker=0), + Word(start=0.6, end=1.0, text="world", speaker=0), + Word(start=1.0, end=1.1, text=".", speaker=0), + Word(start=1.1, end=1.2, text=" ", speaker=0), + Word(start=1.2, end=1.5, text="How", speaker=0), + Word(start=1.5, end=1.6, text=" ", speaker=0), + Word(start=1.6, end=1.8, text="are", speaker=0), + Word(start=1.8, end=1.9, text=" ", speaker=0), + Word(start=1.9, end=2.1, text="you", speaker=0), + Word(start=2.1, end=2.2, text=" ", speaker=0), + Word(start=2.2, end=2.5, text="today", speaker=0), + Word(start=2.5, end=2.6, text="?", speaker=0), + ], + ) + + with patch( + "reflector.processors.file_transcript_auto.FileTranscriptAutoProcessor.__new__" + ) as mock_auto: + mock_auto.return_value = TestFileTranscriptProcessor() + yield + + +@pytest.fixture +async def dummy_file_diarization(): + """Mock FileDiarizationAutoProcessor for file processing""" + from reflector.processors.file_diarization import FileDiarizationProcessor + + class TestFileDiarizationProcessor(FileDiarizationProcessor): + async def _diarize(self, data): + return FileDiarizationOutput( + diarization=[ + DiarizationSegment(start=0.0, end=1.1, speaker=0), + DiarizationSegment(start=1.2, end=2.6, speaker=1), + ] + ) + + with patch( + "reflector.processors.file_diarization_auto.FileDiarizationAutoProcessor.__new__" + ) as mock_auto: + mock_auto.return_value = TestFileDiarizationProcessor() + yield + + +@pytest.fixture +async def mock_transcript_in_db(tmpdir): + """Create a mock transcript in the database""" + from reflector.db.transcripts import Transcript + from reflector.settings import settings + + # Set the DATA_DIR to our tmpdir + original_data_dir = settings.DATA_DIR + settings.DATA_DIR = str(tmpdir) + + transcript_id = str(uuid4()) + data_path = Path(tmpdir) / transcript_id + data_path.mkdir(parents=True, exist_ok=True) + + # Create mock transcript object + transcript = Transcript( + id=transcript_id, + name="Test Transcript", + status="processing", + source_kind="file", + source_language="en", + target_language="en", + ) + + # Mock the controller to return our transcript + try: + with patch( + "reflector.pipelines.main_file_pipeline.transcripts_controller.get_by_id" + ) as mock_get: + mock_get.return_value = transcript + with patch( + "reflector.pipelines.main_live_pipeline.transcripts_controller.get_by_id" + ) as mock_get2: + mock_get2.return_value = transcript + with patch( + "reflector.pipelines.main_live_pipeline.transcripts_controller.update" + ) as mock_update: + mock_update.return_value = None + yield transcript + finally: + # Restore original DATA_DIR + settings.DATA_DIR = original_data_dir + + +@pytest.fixture +async def mock_storage(): + """Mock storage for file uploads""" + from reflector.storage.base import Storage + + class TestStorage(Storage): + async def _put_file(self, path, data): + return None + + async def _get_file_url(self, path): + return f"http://test-storage/{path}" + + async def _get_file(self, path): + return b"test_audio_data" + + async def _delete_file(self, path): + return None + + storage = TestStorage() + # Add mock tracking for verification + storage._put_file = AsyncMock(side_effect=storage._put_file) + storage._get_file_url = AsyncMock(side_effect=storage._get_file_url) + + with patch( + "reflector.pipelines.main_file_pipeline.get_transcripts_storage" + ) as mock_get: + mock_get.return_value = storage + yield storage + + +@pytest.fixture +async def mock_audio_file_writer(): + """Mock AudioFileWriterProcessor to avoid actual file writing""" + with patch( + "reflector.pipelines.main_file_pipeline.AudioFileWriterProcessor" + ) as mock_writer_class: + mock_writer = AsyncMock() + mock_writer.push = AsyncMock() + mock_writer.flush = AsyncMock() + mock_writer_class.return_value = mock_writer + yield mock_writer + + +@pytest.fixture +async def mock_waveform_processor(): + """Mock AudioWaveformProcessor""" + with patch( + "reflector.pipelines.main_file_pipeline.AudioWaveformProcessor" + ) as mock_waveform_class: + mock_waveform = AsyncMock() + mock_waveform.set_pipeline = MagicMock() + mock_waveform.flush = AsyncMock() + mock_waveform_class.return_value = mock_waveform + yield mock_waveform + + +@pytest.fixture +async def mock_topic_detector(): + """Mock TranscriptTopicDetectorProcessor""" + with patch( + "reflector.pipelines.main_file_pipeline.TranscriptTopicDetectorProcessor" + ) as mock_topic_class: + mock_topic = AsyncMock() + mock_topic.set_pipeline = MagicMock() + mock_topic.push = AsyncMock() + mock_topic.flush_called = False + + # When flush is called, simulate topic detection by calling the callback + async def flush_with_callback(): + mock_topic.flush_called = True + if hasattr(mock_topic, "_callback"): + # Create a minimal transcript for the TitleSummary + test_transcript = TranscriptType(words=[], text="test transcript") + await mock_topic._callback( + TitleSummary( + title="Test Topic", + summary="Test topic summary", + timestamp=0.0, + duration=10.0, + transcript=test_transcript, + ) + ) + + mock_topic.flush = flush_with_callback + + def init_with_callback(callback=None): + mock_topic._callback = callback + return mock_topic + + mock_topic_class.side_effect = init_with_callback + yield mock_topic + + +@pytest.fixture +async def mock_title_processor(): + """Mock TranscriptFinalTitleProcessor""" + with patch( + "reflector.pipelines.main_file_pipeline.TranscriptFinalTitleProcessor" + ) as mock_title_class: + mock_title = AsyncMock() + mock_title.set_pipeline = MagicMock() + mock_title.push = AsyncMock() + mock_title.flush_called = False + + # When flush is called, simulate title generation by calling the callback + async def flush_with_callback(): + mock_title.flush_called = True + if hasattr(mock_title, "_callback"): + from reflector.processors.types import FinalTitle + + await mock_title._callback(FinalTitle(title="Test Title")) + + mock_title.flush = flush_with_callback + + def init_with_callback(callback=None): + mock_title._callback = callback + return mock_title + + mock_title_class.side_effect = init_with_callback + yield mock_title + + +@pytest.fixture +async def mock_summary_processor(): + """Mock TranscriptFinalSummaryProcessor""" + with patch( + "reflector.pipelines.main_file_pipeline.TranscriptFinalSummaryProcessor" + ) as mock_summary_class: + mock_summary = AsyncMock() + mock_summary.set_pipeline = MagicMock() + mock_summary.push = AsyncMock() + mock_summary.flush_called = False + + # When flush is called, simulate summary generation by calling the callbacks + async def flush_with_callback(): + mock_summary.flush_called = True + from reflector.processors.types import FinalLongSummary, FinalShortSummary + + if hasattr(mock_summary, "_callback"): + await mock_summary._callback( + FinalLongSummary(long_summary="Test long summary", duration=10.0) + ) + if hasattr(mock_summary, "_on_short_summary"): + await mock_summary._on_short_summary( + FinalShortSummary(short_summary="Test short summary", duration=10.0) + ) + + mock_summary.flush = flush_with_callback + + def init_with_callback(transcript=None, callback=None, on_short_summary=None): + mock_summary._callback = callback + mock_summary._on_short_summary = on_short_summary + return mock_summary + + mock_summary_class.side_effect = init_with_callback + yield mock_summary + + +@pytest.mark.asyncio +async def test_pipeline_main_file_process( + tmpdir, + mock_transcript_in_db, + dummy_file_transcript, + dummy_file_diarization, + mock_storage, + mock_audio_file_writer, + mock_waveform_processor, + mock_topic_detector, + mock_title_processor, + mock_summary_processor, +): + """ + Test the complete PipelineMainFile processing pipeline. + + This test verifies: + 1. Audio extraction and writing + 2. Audio upload to storage + 3. Parallel processing of transcription, diarization, and waveform + 4. Assembly of transcript with diarization + 5. Topic detection + 6. Title and summary generation + """ + # Create a test audio file + test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav" + + # Copy test audio to the transcript's data path as if it was uploaded + upload_path = mock_transcript_in_db.data_path / "upload.wav" + upload_path.write_bytes(test_audio_path.read_bytes()) + + # Also create the audio.mp3 file that would be created by AudioFileWriterProcessor + # Since we're mocking AudioFileWriterProcessor, we need to create this manually + mp3_path = mock_transcript_in_db.data_path / "audio.mp3" + mp3_path.write_bytes(b"mock_mp3_data") + + # Track callback invocations + callback_marks = { + "on_status": [], + "on_duration": [], + "on_waveform": [], + "on_topic": [], + "on_title": [], + "on_long_summary": [], + "on_short_summary": [], + } + + # Create pipeline with mocked callbacks + pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id) + + # Override callbacks to track invocations + async def track_callback(name, data): + callback_marks[name].append(data) + # Call the original callback + original = getattr(PipelineMainFile, name) + return await original(pipeline, data) + + for callback_name in callback_marks.keys(): + setattr( + pipeline, + callback_name, + lambda data, n=callback_name: track_callback(n, data), + ) + + # Mock av.open for audio processing + with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av: + # Mock container for checking video streams + mock_container = MagicMock() + mock_container.streams.video = [] # No video streams (audio only) + mock_container.close = MagicMock() + + # Mock container for decoding audio frames + mock_decode_container = MagicMock() + mock_decode_container.decode.return_value = iter( + [MagicMock()] + ) # One mock audio frame + mock_decode_container.close = MagicMock() + + # Return different containers for different calls + mock_av.side_effect = [mock_container, mock_decode_container] + + # Run the pipeline + await pipeline.process(upload_path) + + # Verify audio extraction and writing + assert mock_audio_file_writer.push.called + assert mock_audio_file_writer.flush.called + + # Verify storage upload + assert mock_storage._put_file.called + assert mock_storage._get_file_url.called + + # Verify waveform generation + assert mock_waveform_processor.flush.called + assert mock_waveform_processor.set_pipeline.called + + # Verify topic detection + assert mock_topic_detector.push.called + assert mock_topic_detector.flush_called + + # Verify title generation + assert mock_title_processor.push.called + assert mock_title_processor.flush_called + + # Verify summary generation + assert mock_summary_processor.push.called + assert mock_summary_processor.flush_called + + # Verify callbacks were invoked + assert len(callback_marks["on_topic"]) > 0, "Topic callback should be invoked" + assert len(callback_marks["on_title"]) > 0, "Title callback should be invoked" + assert ( + len(callback_marks["on_long_summary"]) > 0 + ), "Long summary callback should be invoked" + assert ( + len(callback_marks["on_short_summary"]) > 0 + ), "Short summary callback should be invoked" + + print(f"Callback marks: {callback_marks}") + + # Verify the pipeline completed successfully + assert pipeline.logger is not None + print("PipelineMainFile test completed successfully!") + + +@pytest.mark.asyncio +async def test_pipeline_main_file_with_video( + tmpdir, + mock_transcript_in_db, + dummy_file_transcript, + dummy_file_diarization, + mock_storage, + mock_audio_file_writer, + mock_waveform_processor, + mock_topic_detector, + mock_title_processor, + mock_summary_processor, +): + """ + Test PipelineMainFile with video input (verifies audio extraction). + """ + # Create a test audio file + test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav" + + # Copy test audio to the transcript's data path as if it was a video upload + upload_path = mock_transcript_in_db.data_path / "upload.mp4" + upload_path.write_bytes(test_audio_path.read_bytes()) + + # Also create the audio.mp3 file that would be created by AudioFileWriterProcessor + mp3_path = mock_transcript_in_db.data_path / "audio.mp3" + mp3_path.write_bytes(b"mock_mp3_data") + + # Create pipeline + pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id) + + # Mock av.open for video processing + with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av: + # Mock container for checking video streams + mock_container = MagicMock() + mock_container.streams.video = [MagicMock()] # Has video streams + mock_container.close = MagicMock() + + # Mock container for decoding audio frames + mock_decode_container = MagicMock() + mock_decode_container.decode.return_value = iter( + [MagicMock()] + ) # One mock audio frame + mock_decode_container.close = MagicMock() + + # Return different containers for different calls + mock_av.side_effect = [mock_container, mock_decode_container] + + # Run the pipeline + await pipeline.process(upload_path) + + # Verify audio extraction from video + assert mock_audio_file_writer.push.called + assert mock_audio_file_writer.flush.called + + # Verify the rest of the pipeline completed + assert mock_storage._put_file.called + assert mock_waveform_processor.flush.called + assert mock_topic_detector.push.called + assert mock_title_processor.push.called + assert mock_summary_processor.push.called + + print("PipelineMainFile video test completed successfully!") + + +@pytest.mark.asyncio +async def test_pipeline_main_file_no_diarization( + tmpdir, + mock_transcript_in_db, + dummy_file_transcript, + mock_storage, + mock_audio_file_writer, + mock_waveform_processor, + mock_topic_detector, + mock_title_processor, + mock_summary_processor, +): + """ + Test PipelineMainFile with diarization disabled. + """ + from reflector.settings import settings + + # Disable diarization + with patch.object(settings, "DIARIZATION_BACKEND", None): + # Create a test audio file + test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav" + + # Copy test audio to the transcript's data path + upload_path = mock_transcript_in_db.data_path / "upload.wav" + upload_path.write_bytes(test_audio_path.read_bytes()) + + # Also create the audio.mp3 file + mp3_path = mock_transcript_in_db.data_path / "audio.mp3" + mp3_path.write_bytes(b"mock_mp3_data") + + # Create pipeline + pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id) + + # Mock av.open for audio processing + with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av: + # Mock container for checking video streams + mock_container = MagicMock() + mock_container.streams.video = [] # No video streams + mock_container.close = MagicMock() + + # Mock container for decoding audio frames + mock_decode_container = MagicMock() + mock_decode_container.decode.return_value = iter([MagicMock()]) + mock_decode_container.close = MagicMock() + + # Return different containers for different calls + mock_av.side_effect = [mock_container, mock_decode_container] + + # Run the pipeline + await pipeline.process(upload_path) + + # Verify the pipeline completed without diarization + assert mock_storage._put_file.called + assert mock_waveform_processor.flush.called + assert mock_topic_detector.push.called + assert mock_title_processor.push.called + assert mock_summary_processor.push.called + + print("PipelineMainFile no-diarization test completed successfully!") + + +@pytest.mark.asyncio +async def test_task_pipeline_file_process( + tmpdir, + mock_transcript_in_db, + dummy_file_transcript, + dummy_file_diarization, + mock_storage, + mock_audio_file_writer, + mock_waveform_processor, + mock_topic_detector, + mock_title_processor, + mock_summary_processor, +): + """ + Test the Celery task entry point for file pipeline processing. + """ + # Direct import of the underlying async function, bypassing the asynctask decorator + + # Create a test audio file in the transcript's data path + test_audio_path = Path(__file__).parent / "records" / "test_mathieu_hello.wav" + upload_path = mock_transcript_in_db.data_path / "upload.wav" + upload_path.write_bytes(test_audio_path.read_bytes()) + + # Also create the audio.mp3 file + mp3_path = mock_transcript_in_db.data_path / "audio.mp3" + mp3_path.write_bytes(b"mock_mp3_data") + + # Mock av.open for audio processing + with patch("reflector.pipelines.main_file_pipeline.av.open") as mock_av: + # Mock container for checking video streams + mock_container = MagicMock() + mock_container.streams.video = [] # No video streams + mock_container.close = MagicMock() + + # Mock container for decoding audio frames + mock_decode_container = MagicMock() + mock_decode_container.decode.return_value = iter([MagicMock()]) + mock_decode_container.close = MagicMock() + + # Return different containers for different calls + mock_av.side_effect = [mock_container, mock_decode_container] + + # Get the original async function without the asynctask decorator + # The function is wrapped, so we need to call it differently + # For now, we test the pipeline directly since the task is just a thin wrapper + from reflector.pipelines.main_file_pipeline import PipelineMainFile + + pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id) + await pipeline.process(upload_path) + + # Verify the pipeline was executed through the task + assert mock_audio_file_writer.push.called + assert mock_audio_file_writer.flush.called + assert mock_storage._put_file.called + assert mock_waveform_processor.flush.called + assert mock_topic_detector.push.called + assert mock_title_processor.push.called + assert mock_summary_processor.push.called + + print("task_pipeline_file_process test completed successfully!") + + +@pytest.mark.asyncio +async def test_pipeline_file_process_no_transcript(): + """ + Test the pipeline with a non-existent transcript. + """ + from reflector.pipelines.main_file_pipeline import PipelineMainFile + + # Mock the controller to return None (transcript not found) + with patch( + "reflector.pipelines.main_file_pipeline.transcripts_controller.get_by_id" + ) as mock_get: + mock_get.return_value = None + + pipeline = PipelineMainFile(transcript_id=str(uuid4())) + + # Should raise an exception for missing transcript when get_transcript is called + with pytest.raises(Exception, match="Transcript not found"): + await pipeline.get_transcript() + + +@pytest.mark.asyncio +async def test_pipeline_file_process_no_audio_file( + mock_transcript_in_db, +): + """ + Test the pipeline when no audio file is found. + """ + from reflector.pipelines.main_file_pipeline import PipelineMainFile + + # Don't create any audio files in the data path + # The pipeline's process should handle missing files gracefully + + pipeline = PipelineMainFile(transcript_id=mock_transcript_in_db.id) + + # Try to process a non-existent file + non_existent_path = mock_transcript_in_db.data_path / "nonexistent.wav" + + # This should fail when trying to open the file with av + with pytest.raises(Exception): + await pipeline.process(non_existent_path) diff --git a/server/tests/test_processors_modal.py b/server/tests/test_processors_modal.py new file mode 100644 index 00000000..4b320638 --- /dev/null +++ b/server/tests/test_processors_modal.py @@ -0,0 +1,265 @@ +""" +Tests for Modal-based processors using pytest-recording for HTTP recording/playbook + +Note: theses tests require full modal configuration to be able to record + vcr cassettes + +Configuration required for the first recording: +- TRANSCRIPT_BACKEND=modal +- TRANSCRIPT_URL=https://xxxxx--reflector-transcriber-parakeet-web.modal.run +- TRANSCRIPT_MODAL_API_KEY=xxxxx +- DIARIZATION_BACKEND=modal +- DIARIZATION_URL=https://xxxxx--reflector-diarizer-web.modal.run +- DIARIZATION_MODAL_API_KEY=xxxxx +""" + +from unittest.mock import patch + +import pytest + +from reflector.processors.file_diarization import FileDiarizationInput +from reflector.processors.file_diarization_modal import FileDiarizationModalProcessor +from reflector.processors.file_transcript import FileTranscriptInput +from reflector.processors.file_transcript_modal import FileTranscriptModalProcessor +from reflector.processors.transcript_diarization_assembler import ( + TranscriptDiarizationAssemblerInput, + TranscriptDiarizationAssemblerProcessor, +) +from reflector.processors.types import DiarizationSegment, Transcript, Word + +# Public test audio file hosted on S3 specifically for reflector pytests +TEST_AUDIO_URL = ( + "https://reflector-github-pytest.s3.us-east-1.amazonaws.com/test_mathieu_hello.mp3" +) + + +@pytest.mark.asyncio +async def test_file_transcript_modal_processor_missing_url(): + with patch("reflector.processors.file_transcript_modal.settings") as mock_settings: + mock_settings.TRANSCRIPT_URL = None + with pytest.raises(Exception, match="TRANSCRIPT_URL required"): + FileTranscriptModalProcessor(modal_api_key="test-api-key") + + +@pytest.mark.asyncio +async def test_file_diarization_modal_processor_missing_url(): + with patch("reflector.processors.file_diarization_modal.settings") as mock_settings: + mock_settings.DIARIZATION_URL = None + with pytest.raises(Exception, match="DIARIZATION_URL required"): + FileDiarizationModalProcessor(modal_api_key="test-api-key") + + +@pytest.mark.vcr() +@pytest.mark.asyncio +async def test_file_diarization_modal_processor(vcr): + """Test FileDiarizationModalProcessor using public audio URL and Modal API""" + from reflector.settings import settings + + processor = FileDiarizationModalProcessor( + modal_api_key=settings.DIARIZATION_MODAL_API_KEY + ) + + test_input = FileDiarizationInput(audio_url=TEST_AUDIO_URL) + result = await processor._diarize(test_input) + + # Verify the result structure + assert result is not None + assert hasattr(result, "diarization") + assert isinstance(result.diarization, list) + + # Check structure of each diarization segment + for segment in result.diarization: + assert "start" in segment + assert "end" in segment + assert "speaker" in segment + assert isinstance(segment["start"], (int, float)) + assert isinstance(segment["end"], (int, float)) + assert isinstance(segment["speaker"], int) + # Basic sanity check - start should be before end + assert segment["start"] < segment["end"] + + +@pytest.mark.vcr() +@pytest.mark.asyncio +async def test_file_transcript_modal_processor(): + """Test FileTranscriptModalProcessor using public audio URL and Modal API""" + from reflector.settings import settings + + processor = FileTranscriptModalProcessor( + modal_api_key=settings.TRANSCRIPT_MODAL_API_KEY + ) + + test_input = FileTranscriptInput( + audio_url=TEST_AUDIO_URL, + language="en", + ) + + # This will record the HTTP interaction on first run, replay on subsequent runs + result = await processor._transcript(test_input) + + # Verify the result structure + assert result is not None + assert hasattr(result, "words") + assert isinstance(result.words, list) + + # Check structure of each word if present + for word in result.words: + assert hasattr(word, "text") + assert hasattr(word, "start") + assert hasattr(word, "end") + assert isinstance(word.start, (int, float)) + assert isinstance(word.end, (int, float)) + assert isinstance(word.text, str) + # Basic sanity check - start should be before or equal to end + assert word.start <= word.end + + +@pytest.mark.asyncio +async def test_transcript_diarization_assembler_processor(): + """Test TranscriptDiarizationAssemblerProcessor without VCR (no HTTP requests)""" + # Create test transcript with words + words = [ + Word(text="Hello", start=0.0, end=1.0, speaker=0), + Word(text=" ", start=1.0, end=1.1, speaker=0), + Word(text="world", start=1.1, end=2.0, speaker=0), + Word(text=".", start=2.0, end=2.1, speaker=0), + Word(text=" ", start=2.1, end=2.2, speaker=0), + Word(text="How", start=2.2, end=2.8, speaker=0), + Word(text=" ", start=2.8, end=2.9, speaker=0), + Word(text="are", start=2.9, end=3.2, speaker=0), + Word(text=" ", start=3.2, end=3.3, speaker=0), + Word(text="you", start=3.3, end=3.8, speaker=0), + Word(text="?", start=3.8, end=3.9, speaker=0), + ] + transcript = Transcript(words=words) + + # Create test diarization segments + diarization = [ + DiarizationSegment(start=0.0, end=2.1, speaker=0), + DiarizationSegment(start=2.1, end=3.9, speaker=1), + ] + + # Create processor and test input + processor = TranscriptDiarizationAssemblerProcessor() + test_input = TranscriptDiarizationAssemblerInput( + transcript=transcript, diarization=diarization + ) + + # Track emitted results + emitted_results = [] + + async def capture_result(result): + emitted_results.append(result) + + processor.on(capture_result) + + # Process the input + await processor.push(test_input) + + # Verify result was emitted + assert len(emitted_results) == 1 + result = emitted_results[0] + + # Verify result structure + assert isinstance(result, Transcript) + assert len(result.words) == len(words) + + # Verify speaker assignments were applied + # Words 0-3 (indices) should be speaker 0 (time 0.0-2.0) + # Words 4-10 (indices) should be speaker 1 (time 2.1-3.9) + for i in range(4): # First 4 words (Hello, space, world, .) + assert ( + result.words[i].speaker == 0 + ), f"Word {i} '{result.words[i].text}' should be speaker 0, got {result.words[i].speaker}" + + for i in range(4, 11): # Remaining words (space, How, space, are, space, you, ?) + assert ( + result.words[i].speaker == 1 + ), f"Word {i} '{result.words[i].text}' should be speaker 1, got {result.words[i].speaker}" + + +@pytest.mark.asyncio +async def test_transcript_diarization_assembler_no_diarization(): + """Test TranscriptDiarizationAssemblerProcessor with no diarization data""" + # Create test transcript + words = [Word(text="Hello", start=0.0, end=1.0, speaker=0)] + transcript = Transcript(words=words) + + # Create processor and test input with empty diarization + processor = TranscriptDiarizationAssemblerProcessor() + test_input = TranscriptDiarizationAssemblerInput( + transcript=transcript, diarization=[] + ) + + # Track emitted results + emitted_results = [] + + async def capture_result(result): + emitted_results.append(result) + + processor.on(capture_result) + + # Process the input + await processor.push(test_input) + + # Verify original transcript was returned unchanged + assert len(emitted_results) == 1 + result = emitted_results[0] + assert result is transcript # Should be the same object + assert result.words[0].speaker == 0 # Original speaker unchanged + + +@pytest.mark.vcr() +@pytest.mark.asyncio +async def test_full_modal_pipeline_integration(vcr): + """Integration test: Transcription -> Diarization -> Assembly + + This test demonstrates the full pipeline: + 1. Run transcription via Modal + 2. Run diarization via Modal + 3. Assemble transcript with diarization + """ + from reflector.settings import settings + + # Step 1: Transcription + transcript_processor = FileTranscriptModalProcessor( + modal_api_key=settings.TRANSCRIPT_MODAL_API_KEY + ) + transcript_input = FileTranscriptInput(audio_url=TEST_AUDIO_URL, language="en") + transcript = await transcript_processor._transcript(transcript_input) + + # Step 2: Diarization + diarization_processor = FileDiarizationModalProcessor( + modal_api_key=settings.DIARIZATION_MODAL_API_KEY + ) + diarization_input = FileDiarizationInput(audio_url=TEST_AUDIO_URL) + diarization_result = await diarization_processor._diarize(diarization_input) + + # Step 3: Assembly + assembler = TranscriptDiarizationAssemblerProcessor() + assembly_input = TranscriptDiarizationAssemblerInput( + transcript=transcript, diarization=diarization_result.diarization + ) + + # Track assembled result + assembled_results = [] + + async def capture_result(result): + assembled_results.append(result) + + assembler.on(capture_result) + + await assembler.push(assembly_input) + + # Verify the full pipeline worked + assert len(assembled_results) == 1 + final_transcript = assembled_results[0] + + # Verify the final transcript has the original words with updated speaker info + assert isinstance(final_transcript, Transcript) + assert len(final_transcript.words) == len(transcript.words) + assert len(final_transcript.words) > 0 + + # Verify some words have been assigned speakers from diarization + speakers_found = set(word.speaker for word in final_transcript.words) + assert len(speakers_found) > 0 # At least some speaker assignments diff --git a/server/tests/test_processors_pipeline.py b/server/tests/test_processors_pipeline.py index accd80c6..7ae22a6c 100644 --- a/server/tests/test_processors_pipeline.py +++ b/server/tests/test_processors_pipeline.py @@ -2,10 +2,13 @@ import pytest @pytest.mark.asyncio +@pytest.mark.parametrize("enable_diarization", [False, True]) async def test_basic_process( dummy_transcript, dummy_llm, dummy_processors, + enable_diarization, + dummy_diarization, ): # goal is to start the server, and send rtc audio to it # validate the events received @@ -28,12 +31,31 @@ async def test_basic_process( # invoke the process and capture events path = Path(__file__).parent / "records" / "test_mathieu_hello.wav" - await process_audio_file(path.as_posix(), event_callback) - print(marks) + + if enable_diarization: + # Test with diarization - may fail if pyannote.audio is not installed + try: + await process_audio_file( + path.as_posix(), event_callback, enable_diarization=True + ) + except SystemExit: + pytest.skip("pyannote.audio not installed - skipping diarization test") + else: + # Test without diarization - should always work + await process_audio_file( + path.as_posix(), event_callback, enable_diarization=False + ) + + print(f"Diarization: {enable_diarization}, Marks: {marks}") # validate the events - assert marks["TranscriptLinerProcessor"] == 1 - assert marks["TranscriptTranslatorPassthroughProcessor"] == 1 + # Each processor should be called for each audio segment processed + # The final processors (Topic, Title, Summary) should be called once at the end + assert marks["TranscriptLinerProcessor"] > 0 + assert marks["TranscriptTranslatorPassthroughProcessor"] > 0 assert marks["TranscriptTopicDetectorProcessor"] == 1 assert marks["TranscriptFinalSummaryProcessor"] == 1 assert marks["TranscriptFinalTitleProcessor"] == 1 + + if enable_diarization: + assert marks["TestAudioDiarizationProcessor"] == 1 diff --git a/server/uv.lock b/server/uv.lock index adeace10..9b48225d 100644 --- a/server/uv.lock +++ b/server/uv.lock @@ -1,9 +1,11 @@ version = 1 -revision = 2 +revision = 3 requires-python = ">=3.11, <3.13" resolution-markers = [ - "python_full_version >= '3.12'", - "python_full_version < '3.12'", + "python_full_version >= '3.12' and platform_python_implementation == 'PyPy'", + "python_full_version >= '3.12' and platform_python_implementation != 'PyPy'", + "python_full_version < '3.12' and platform_python_implementation == 'PyPy'", + "python_full_version < '3.12' and platform_python_implementation != 'PyPy'", ] [[package]] @@ -224,6 +226,12 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" }, ] +[[package]] +name = "antlr4-python3-runtime" +version = "4.9.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/3e/38/7859ff46355f76f8d19459005ca000b6e7012f2f1ca597746cbcd1fbfe5e/antlr4-python3-runtime-4.9.3.tar.gz", hash = "sha256:f224469b4168294902bb1efa80a8bf7855f24c99aef99cbefc1bcd3cce77881b", size = 117034, upload-time = "2021-11-06T17:52:23.524Z" } + [[package]] name = "anyio" version = "4.9.0" @@ -250,6 +258,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2f/f5/c36551e93acba41a59939ae6a0fb77ddb3f2e8e8caa716410c65f7341f72/asgi_lifespan-2.1.0-py3-none-any.whl", hash = "sha256:ed840706680e28428c01e14afb3875d7d76d3206f3d5b2f2294e059b5c23804f", size = 10895, upload-time = "2023-03-28T17:35:47.772Z" }, ] +[[package]] +name = "asteroid-filterbanks" +version = "0.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "torch" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/90/fa/5c2be1f96dc179f83cdd3bb267edbd1f47d08f756785c016d5c2163901a7/asteroid-filterbanks-0.4.0.tar.gz", hash = "sha256:415f89d1dcf2b13b35f03f7a9370968ac4e6fa6800633c522dac992b283409b9", size = 24599, upload-time = "2021-04-09T20:03:07.456Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c5/7c/83ff6046176a675e6a1e8aeefed8892cd97fe7c46af93cc540d1b24b8323/asteroid_filterbanks-0.4.0-py3-none-any.whl", hash = "sha256:4932ac8b6acc6e08fb87cbe8ece84215b5a74eee284fe83acf3540a72a02eaf5", size = 29912, upload-time = "2021-04-09T20:03:05.817Z" }, +] + [[package]] name = "async-timeout" version = "5.0.1" @@ -575,6 +597,56 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a7/06/3d6badcf13db419e25b07041d9c7b4a2c331d3f4e7134445ec5df57714cd/coloredlogs-15.0.1-py2.py3-none-any.whl", hash = "sha256:612ee75c546f53e92e70049c9dbfcc18c935a2b9a53b66085ce9ef6a6e5c0934", size = 46018, upload-time = "2021-06-11T10:22:42.561Z" }, ] +[[package]] +name = "colorlog" +version = "6.9.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/d3/7a/359f4d5df2353f26172b3cc39ea32daa39af8de522205f512f458923e677/colorlog-6.9.0.tar.gz", hash = "sha256:bfba54a1b93b94f54e1f4fe48395725a3d92fd2a4af702f6bd70946bdc0c6ac2", size = 16624, upload-time = "2024-10-29T18:34:51.011Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e3/51/9b208e85196941db2f0654ad0357ca6388ab3ed67efdbfc799f35d1f83aa/colorlog-6.9.0-py3-none-any.whl", hash = "sha256:5906e71acd67cb07a71e779c47c4bcb45fb8c2993eebe9e5adcd6a6f1b283eff", size = 11424, upload-time = "2024-10-29T18:34:49.815Z" }, +] + +[[package]] +name = "contourpy" +version = "1.3.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/58/01/1253e6698a07380cd31a736d248a3f2a50a7c88779a1813da27503cadc2a/contourpy-1.3.3.tar.gz", hash = "sha256:083e12155b210502d0bca491432bb04d56dc3432f95a979b429f2848c3dbe880", size = 13466174, upload-time = "2025-07-26T12:03:12.549Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/91/2e/c4390a31919d8a78b90e8ecf87cd4b4c4f05a5b48d05ec17db8e5404c6f4/contourpy-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:709a48ef9a690e1343202916450bc48b9e51c049b089c7f79a267b46cffcdaa1", size = 288773, upload-time = "2025-07-26T12:01:02.277Z" }, + { url = "https://files.pythonhosted.org/packages/0d/44/c4b0b6095fef4dc9c420e041799591e3b63e9619e3044f7f4f6c21c0ab24/contourpy-1.3.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:23416f38bfd74d5d28ab8429cc4d63fa67d5068bd711a85edb1c3fb0c3e2f381", size = 270149, upload-time = "2025-07-26T12:01:04.072Z" }, + { url = "https://files.pythonhosted.org/packages/30/2e/dd4ced42fefac8470661d7cb7e264808425e6c5d56d175291e93890cce09/contourpy-1.3.3-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:929ddf8c4c7f348e4c0a5a3a714b5c8542ffaa8c22954862a46ca1813b667ee7", size = 329222, upload-time = "2025-07-26T12:01:05.688Z" }, + { url = "https://files.pythonhosted.org/packages/f2/74/cc6ec2548e3d276c71389ea4802a774b7aa3558223b7bade3f25787fafc2/contourpy-1.3.3-cp311-cp311-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:9e999574eddae35f1312c2b4b717b7885d4edd6cb46700e04f7f02db454e67c1", size = 377234, upload-time = "2025-07-26T12:01:07.054Z" }, + { url = "https://files.pythonhosted.org/packages/03/b3/64ef723029f917410f75c09da54254c5f9ea90ef89b143ccadb09df14c15/contourpy-1.3.3-cp311-cp311-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:0bf67e0e3f482cb69779dd3061b534eb35ac9b17f163d851e2a547d56dba0a3a", size = 380555, upload-time = "2025-07-26T12:01:08.801Z" }, + { url = "https://files.pythonhosted.org/packages/5f/4b/6157f24ca425b89fe2eb7e7be642375711ab671135be21e6faa100f7448c/contourpy-1.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:51e79c1f7470158e838808d4a996fa9bac72c498e93d8ebe5119bc1e6becb0db", size = 355238, upload-time = "2025-07-26T12:01:10.319Z" }, + { url = "https://files.pythonhosted.org/packages/98/56/f914f0dd678480708a04cfd2206e7c382533249bc5001eb9f58aa693e200/contourpy-1.3.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:598c3aaece21c503615fd59c92a3598b428b2f01bfb4b8ca9c4edeecc2438620", size = 1326218, upload-time = "2025-07-26T12:01:12.659Z" }, + { url = "https://files.pythonhosted.org/packages/fb/d7/4a972334a0c971acd5172389671113ae82aa7527073980c38d5868ff1161/contourpy-1.3.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:322ab1c99b008dad206d406bb61d014cf0174df491ae9d9d0fac6a6fda4f977f", size = 1392867, upload-time = "2025-07-26T12:01:15.533Z" }, + { url = "https://files.pythonhosted.org/packages/75/3e/f2cc6cd56dc8cff46b1a56232eabc6feea52720083ea71ab15523daab796/contourpy-1.3.3-cp311-cp311-win32.whl", hash = "sha256:fd907ae12cd483cd83e414b12941c632a969171bf90fc937d0c9f268a31cafff", size = 183677, upload-time = "2025-07-26T12:01:17.088Z" }, + { url = "https://files.pythonhosted.org/packages/98/4b/9bd370b004b5c9d8045c6c33cf65bae018b27aca550a3f657cdc99acdbd8/contourpy-1.3.3-cp311-cp311-win_amd64.whl", hash = "sha256:3519428f6be58431c56581f1694ba8e50626f2dd550af225f82fb5f5814d2a42", size = 225234, upload-time = "2025-07-26T12:01:18.256Z" }, + { url = "https://files.pythonhosted.org/packages/d9/b6/71771e02c2e004450c12b1120a5f488cad2e4d5b590b1af8bad060360fe4/contourpy-1.3.3-cp311-cp311-win_arm64.whl", hash = "sha256:15ff10bfada4bf92ec8b31c62bf7c1834c244019b4a33095a68000d7075df470", size = 193123, upload-time = "2025-07-26T12:01:19.848Z" }, + { url = "https://files.pythonhosted.org/packages/be/45/adfee365d9ea3d853550b2e735f9d66366701c65db7855cd07621732ccfc/contourpy-1.3.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b08a32ea2f8e42cf1d4be3169a98dd4be32bafe4f22b6c4cb4ba810fa9e5d2cb", size = 293419, upload-time = "2025-07-26T12:01:21.16Z" }, + { url = "https://files.pythonhosted.org/packages/53/3e/405b59cfa13021a56bba395a6b3aca8cec012b45bf177b0eaf7a202cde2c/contourpy-1.3.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:556dba8fb6f5d8742f2923fe9457dbdd51e1049c4a43fd3986a0b14a1d815fc6", size = 273979, upload-time = "2025-07-26T12:01:22.448Z" }, + { url = "https://files.pythonhosted.org/packages/d4/1c/a12359b9b2ca3a845e8f7f9ac08bdf776114eb931392fcad91743e2ea17b/contourpy-1.3.3-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:92d9abc807cf7d0e047b95ca5d957cf4792fcd04e920ca70d48add15c1a90ea7", size = 332653, upload-time = "2025-07-26T12:01:24.155Z" }, + { url = "https://files.pythonhosted.org/packages/63/12/897aeebfb475b7748ea67b61e045accdfcf0d971f8a588b67108ed7f5512/contourpy-1.3.3-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2e8faa0ed68cb29af51edd8e24798bb661eac3bd9f65420c1887b6ca89987c8", size = 379536, upload-time = "2025-07-26T12:01:25.91Z" }, + { url = "https://files.pythonhosted.org/packages/43/8a/a8c584b82deb248930ce069e71576fc09bd7174bbd35183b7943fb1064fd/contourpy-1.3.3-cp312-cp312-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:626d60935cf668e70a5ce6ff184fd713e9683fb458898e4249b63be9e28286ea", size = 384397, upload-time = "2025-07-26T12:01:27.152Z" }, + { url = "https://files.pythonhosted.org/packages/cc/8f/ec6289987824b29529d0dfda0d74a07cec60e54b9c92f3c9da4c0ac732de/contourpy-1.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4d00e655fcef08aba35ec9610536bfe90267d7ab5ba944f7032549c55a146da1", size = 362601, upload-time = "2025-07-26T12:01:28.808Z" }, + { url = "https://files.pythonhosted.org/packages/05/0a/a3fe3be3ee2dceb3e615ebb4df97ae6f3828aa915d3e10549ce016302bd1/contourpy-1.3.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:451e71b5a7d597379ef572de31eeb909a87246974d960049a9848c3bc6c41bf7", size = 1331288, upload-time = "2025-07-26T12:01:31.198Z" }, + { url = "https://files.pythonhosted.org/packages/33/1d/acad9bd4e97f13f3e2b18a3977fe1b4a37ecf3d38d815333980c6c72e963/contourpy-1.3.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:459c1f020cd59fcfe6650180678a9993932d80d44ccde1fa1868977438f0b411", size = 1403386, upload-time = "2025-07-26T12:01:33.947Z" }, + { url = "https://files.pythonhosted.org/packages/cf/8f/5847f44a7fddf859704217a99a23a4f6417b10e5ab1256a179264561540e/contourpy-1.3.3-cp312-cp312-win32.whl", hash = "sha256:023b44101dfe49d7d53932be418477dba359649246075c996866106da069af69", size = 185018, upload-time = "2025-07-26T12:01:35.64Z" }, + { url = "https://files.pythonhosted.org/packages/19/e8/6026ed58a64563186a9ee3f29f41261fd1828f527dd93d33b60feca63352/contourpy-1.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:8153b8bfc11e1e4d75bcb0bff1db232f9e10b274e0929de9d608027e0d34ff8b", size = 226567, upload-time = "2025-07-26T12:01:36.804Z" }, + { url = "https://files.pythonhosted.org/packages/d1/e2/f05240d2c39a1ed228d8328a78b6f44cd695f7ef47beb3e684cf93604f86/contourpy-1.3.3-cp312-cp312-win_arm64.whl", hash = "sha256:07ce5ed73ecdc4a03ffe3e1b3e3c1166db35ae7584be76f65dbbe28a7791b0cc", size = 193655, upload-time = "2025-07-26T12:01:37.999Z" }, + { url = "https://files.pythonhosted.org/packages/a5/29/8dcfe16f0107943fa92388c23f6e05cff0ba58058c4c95b00280d4c75a14/contourpy-1.3.3-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:cd5dfcaeb10f7b7f9dc8941717c6c2ade08f587be2226222c12b25f0483ed497", size = 278809, upload-time = "2025-07-26T12:02:52.74Z" }, + { url = "https://files.pythonhosted.org/packages/85/a9/8b37ef4f7dafeb335daee3c8254645ef5725be4d9c6aa70b50ec46ef2f7e/contourpy-1.3.3-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:0c1fc238306b35f246d61a1d416a627348b5cf0648648a031e14bb8705fcdfe8", size = 261593, upload-time = "2025-07-26T12:02:54.037Z" }, + { url = "https://files.pythonhosted.org/packages/0a/59/ebfb8c677c75605cc27f7122c90313fd2f375ff3c8d19a1694bda74aaa63/contourpy-1.3.3-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:70f9aad7de812d6541d29d2bbf8feb22ff7e1c299523db288004e3157ff4674e", size = 302202, upload-time = "2025-07-26T12:02:55.947Z" }, + { url = "https://files.pythonhosted.org/packages/3c/37/21972a15834d90bfbfb009b9d004779bd5a07a0ec0234e5ba8f64d5736f4/contourpy-1.3.3-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5ed3657edf08512fc3fe81b510e35c2012fbd3081d2e26160f27ca28affec989", size = 329207, upload-time = "2025-07-26T12:02:57.468Z" }, + { url = "https://files.pythonhosted.org/packages/0c/58/bd257695f39d05594ca4ad60df5bcb7e32247f9951fd09a9b8edb82d1daa/contourpy-1.3.3-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:3d1a3799d62d45c18bafd41c5fa05120b96a28079f2393af559b843d1a966a77", size = 225315, upload-time = "2025-07-26T12:02:58.801Z" }, +] + [[package]] name = "coverage" version = "7.9.2" @@ -675,6 +747,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ec/4c/0ecd260233290bee4b2facec4d8e755e57d8781d68f276e1248433993c9f/ctranslate2-4.6.0-cp312-cp312-win_amd64.whl", hash = "sha256:511cdf810a5bf6a2cec735799e5cd47966e63f8f7688fdee1b97fed621abda00", size = 19470040, upload-time = "2025-04-08T19:49:55.274Z" }, ] +[[package]] +name = "cycler" +version = "0.12.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a9/95/a3dbbb5028f35eafb79008e7522a75244477d2838f38cbb722248dabc2a8/cycler-0.12.1.tar.gz", hash = "sha256:88bb128f02ba341da8ef447245a9e138fae777f6a23943da4540077d3601eb1c", size = 7615, upload-time = "2023-10-07T05:32:18.335Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" }, +] + [[package]] name = "databases" version = "0.8.0" @@ -787,6 +868,12 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e3/26/57c6fb270950d476074c087527a558ccb6f4436657314bfb6cdf484114c4/docker-7.1.0-py3-none-any.whl", hash = "sha256:c96b93b7f0a746f9e77d325bcfb87422a3d8bd4f03136ae8a85b37f1898d5fc0", size = 147774, upload-time = "2024-05-23T11:13:55.01Z" }, ] +[[package]] +name = "docopt" +version = "0.6.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/a2/55/8f8cab2afd404cf578136ef2cc5dfb50baa1761b68c9da1fb1e4eed343c9/docopt-0.6.2.tar.gz", hash = "sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491", size = 25901, upload-time = "2014-06-16T11:18:57.406Z" } + [[package]] name = "ecdsa" version = "0.19.1" @@ -799,6 +886,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/cb/a3/460c57f094a4a165c84a1341c373b0a4f5ec6ac244b998d5021aade89b77/ecdsa-0.19.1-py2.py3-none-any.whl", hash = "sha256:30638e27cf77b7e15c4c4cc1973720149e1033827cfd00661ca5c8cc0cdb24c3", size = 150607, upload-time = "2025-03-13T11:52:41.757Z" }, ] +[[package]] +name = "einops" +version = "0.8.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/e5/81/df4fbe24dff8ba3934af99044188e20a98ed441ad17a274539b74e82e126/einops-0.8.1.tar.gz", hash = "sha256:de5d960a7a761225532e0f1959e5315ebeafc0cd43394732f103ca44b9837e84", size = 54805, upload-time = "2025-02-09T03:17:00.434Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/87/62/9773de14fe6c45c23649e98b83231fffd7b9892b6cf863251dc2afa73643/einops-0.8.1-py3-none-any.whl", hash = "sha256:919387eb55330f5757c6bea9165c5ff5cfe63a642682ea788a6d472576d81737", size = 64359, upload-time = "2025-02-09T03:17:01.998Z" }, +] + [[package]] name = "email-validator" version = "2.2.0" @@ -932,6 +1028,31 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b8/25/155f9f080d5e4bc0082edfda032ea2bc2b8fab3f4d25d46c1e9dd22a1a89/flatbuffers-25.2.10-py2.py3-none-any.whl", hash = "sha256:ebba5f4d5ea615af3f7fd70fc310636fbb2bbd1f566ac0a23d98dd412de50051", size = 30953, upload-time = "2025-02-11T04:26:44.484Z" }, ] +[[package]] +name = "fonttools" +version = "4.59.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/11/7f/29c9c3fe4246f6ad96fee52b88d0dc3a863c7563b0afc959e36d78b965dc/fonttools-4.59.1.tar.gz", hash = "sha256:74995b402ad09822a4c8002438e54940d9f1ecda898d2bb057729d7da983e4cb", size = 3534394, upload-time = "2025-08-14T16:28:14.266Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/34/62/9667599561f623d4a523cc9eb4f66f3b94b6155464110fa9aebbf90bbec7/fonttools-4.59.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:4909cce2e35706f3d18c54d3dcce0414ba5e0fb436a454dffec459c61653b513", size = 2778815, upload-time = "2025-08-14T16:26:28.484Z" }, + { url = "https://files.pythonhosted.org/packages/8f/78/cc25bcb2ce86033a9df243418d175e58f1956a35047c685ef553acae67d6/fonttools-4.59.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:efbec204fa9f877641747f2d9612b2b656071390d7a7ef07a9dbf0ecf9c7195c", size = 2341631, upload-time = "2025-08-14T16:26:30.396Z" }, + { url = "https://files.pythonhosted.org/packages/a4/cc/fcbb606dd6871f457ac32f281c20bcd6cc77d9fce77b5a4e2b2afab1f500/fonttools-4.59.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:39dfd42cc2dc647b2c5469bc7a5b234d9a49e72565b96dd14ae6f11c2c59ef15", size = 5022222, upload-time = "2025-08-14T16:26:32.447Z" }, + { url = "https://files.pythonhosted.org/packages/61/96/c0b1cf2b74d08eb616a80dbf5564351fe4686147291a25f7dce8ace51eb3/fonttools-4.59.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b11bc177a0d428b37890825d7d025040d591aa833f85f8d8878ed183354f47df", size = 4966512, upload-time = "2025-08-14T16:26:34.621Z" }, + { url = "https://files.pythonhosted.org/packages/a4/26/51ce2e3e0835ffc2562b1b11d1fb9dafd0aca89c9041b64a9e903790a761/fonttools-4.59.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:5b9b4c35b3be45e5bc774d3fc9608bbf4f9a8d371103b858c80edbeed31dd5aa", size = 5001645, upload-time = "2025-08-14T16:26:36.876Z" }, + { url = "https://files.pythonhosted.org/packages/36/11/ef0b23f4266349b6d5ccbd1a07b7adc998d5bce925792aa5d1ec33f593e3/fonttools-4.59.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:01158376b8a418a0bae9625c476cebfcfcb5e6761e9d243b219cd58341e7afbb", size = 5113777, upload-time = "2025-08-14T16:26:39.002Z" }, + { url = "https://files.pythonhosted.org/packages/d0/da/b398fe61ef433da0a0472cdb5d4399124f7581ffe1a31b6242c91477d802/fonttools-4.59.1-cp311-cp311-win32.whl", hash = "sha256:cf7c5089d37787387123f1cb8f1793a47c5e1e3d1e4e7bfbc1cc96e0f925eabe", size = 2215076, upload-time = "2025-08-14T16:26:41.196Z" }, + { url = "https://files.pythonhosted.org/packages/94/bd/e2624d06ab94e41c7c77727b2941f1baed7edb647e63503953e6888020c9/fonttools-4.59.1-cp311-cp311-win_amd64.whl", hash = "sha256:c866eef7a0ba320486ade6c32bfc12813d1a5db8567e6904fb56d3d40acc5116", size = 2262779, upload-time = "2025-08-14T16:26:43.483Z" }, + { url = "https://files.pythonhosted.org/packages/ac/fe/6e069cc4cb8881d164a9bd956e9df555bc62d3eb36f6282e43440200009c/fonttools-4.59.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:43ab814bbba5f02a93a152ee61a04182bb5809bd2bc3609f7822e12c53ae2c91", size = 2769172, upload-time = "2025-08-14T16:26:45.729Z" }, + { url = "https://files.pythonhosted.org/packages/b9/98/ec4e03f748fefa0dd72d9d95235aff6fef16601267f4a2340f0e16b9330f/fonttools-4.59.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4f04c3ffbfa0baafcbc550657cf83657034eb63304d27b05cff1653b448ccff6", size = 2337281, upload-time = "2025-08-14T16:26:47.921Z" }, + { url = "https://files.pythonhosted.org/packages/8b/b1/890360a7e3d04a30ba50b267aca2783f4c1364363797e892e78a4f036076/fonttools-4.59.1-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:d601b153e51a5a6221f0d4ec077b6bfc6ac35bfe6c19aeaa233d8990b2b71726", size = 4909215, upload-time = "2025-08-14T16:26:49.682Z" }, + { url = "https://files.pythonhosted.org/packages/8a/ec/2490599550d6c9c97a44c1e36ef4de52d6acf742359eaa385735e30c05c4/fonttools-4.59.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c735e385e30278c54f43a0d056736942023c9043f84ee1021eff9fd616d17693", size = 4951958, upload-time = "2025-08-14T16:26:51.616Z" }, + { url = "https://files.pythonhosted.org/packages/d1/40/bd053f6f7634234a9b9805ff8ae4f32df4f2168bee23cafd1271ba9915a9/fonttools-4.59.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:1017413cdc8555dce7ee23720da490282ab7ec1cf022af90a241f33f9a49afc4", size = 4894738, upload-time = "2025-08-14T16:26:53.836Z" }, + { url = "https://files.pythonhosted.org/packages/ac/a1/3cd12a010d288325a7cfcf298a84825f0f9c29b01dee1baba64edfe89257/fonttools-4.59.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5c6d8d773470a5107052874341ed3c487c16ecd179976d81afed89dea5cd7406", size = 5045983, upload-time = "2025-08-14T16:26:56.153Z" }, + { url = "https://files.pythonhosted.org/packages/a2/af/8a2c3f6619cc43cf87951405337cc8460d08a4e717bb05eaa94b335d11dc/fonttools-4.59.1-cp312-cp312-win32.whl", hash = "sha256:2a2d0d33307f6ad3a2086a95dd607c202ea8852fa9fb52af9b48811154d1428a", size = 2203407, upload-time = "2025-08-14T16:26:58.165Z" }, + { url = "https://files.pythonhosted.org/packages/8e/f2/a19b874ddbd3ebcf11d7e25188ef9ac3f68b9219c62263acb34aca8cde05/fonttools-4.59.1-cp312-cp312-win_amd64.whl", hash = "sha256:0b9e4fa7eaf046ed6ac470f6033d52c052481ff7a6e0a92373d14f556f298dc0", size = 2251561, upload-time = "2025-08-14T16:27:00.646Z" }, + { url = "https://files.pythonhosted.org/packages/0f/64/9d606e66d498917cd7a2ff24f558010d42d6fd4576d9dd57f0bd98333f5a/fonttools-4.59.1-py3-none-any.whl", hash = "sha256:647db657073672a8330608970a984d51573557f328030566521bc03415535042", size = 1130094, upload-time = "2025-08-14T16:28:12.048Z" }, +] + [[package]] name = "frozenlist" version = "1.7.0" @@ -984,6 +1105,11 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2f/e0/014d5d9d7a4564cf1c40b5039bc882db69fd881111e03ab3657ac0b218e2/fsspec-2025.7.0-py3-none-any.whl", hash = "sha256:8b012e39f63c7d5f10474de957f3ab793b47b45ae7d39f2fb735f8bbe25c0e21", size = 199597, upload-time = "2025-07-15T16:05:19.529Z" }, ] +[package.optional-dependencies] +http = [ + { name = "aiohttp" }, +] + [[package]] name = "google-crc32c" version = "1.7.1" @@ -1163,6 +1289,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl", hash = "sha256:1697e1a8a8f550fd43c2865cd84542fc175a61dcb779b6fee18cf6b6ccba1477", size = 86794, upload-time = "2021-09-17T21:40:39.897Z" }, ] +[[package]] +name = "hyperpyyaml" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pyyaml" }, + { name = "ruamel-yaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/52/e3/3ac46d9a662b037f699a6948b39c8d03bfcff0b592335d5953ba0c55d453/HyperPyYAML-1.2.2.tar.gz", hash = "sha256:bdb734210d18770a262f500fe5755c7a44a5d3b91521b06e24f7a00a36ee0f87", size = 17085, upload-time = "2023-09-21T14:45:27.779Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/33/c9/751b6401887f4b50f9307cc1e53d287b3dc77c375c126aeb6335aff73ccb/HyperPyYAML-1.2.2-py3-none-any.whl", hash = "sha256:3c5864bdc8864b2f0fbd7bc495e7e8fdf2dfd5dd80116f72da27ca96a128bdeb", size = 16118, upload-time = "2023-09-21T14:45:25.101Z" }, +] + [[package]] name = "idna" version = "3.10" @@ -1301,6 +1440,54 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/01/0e/b27cdbaccf30b890c40ed1da9fd4a3593a5cf94dae54fb34f8a4b74fcd3f/jsonschema_specifications-2025.4.1-py3-none-any.whl", hash = "sha256:4653bffbd6584f7de83a67e0d620ef16900b390ddc7939d56684d6c81e33f1af", size = 18437, upload-time = "2025-04-23T12:34:05.422Z" }, ] +[[package]] +name = "julius" +version = "0.2.7" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "torch" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a1/19/c9e1596b5572c786b93428d0904280e964c930fae7e6c9368ed9e1b63922/julius-0.2.7.tar.gz", hash = "sha256:3c0f5f5306d7d6016fcc95196b274cae6f07e2c9596eed314e4e7641554fbb08", size = 59640, upload-time = "2022-09-19T16:13:34.2Z" } + +[[package]] +name = "kiwisolver" +version = "1.4.9" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/5c/3c/85844f1b0feb11ee581ac23fe5fce65cd049a200c1446708cc1b7f922875/kiwisolver-1.4.9.tar.gz", hash = "sha256:c3b22c26c6fd6811b0ae8363b95ca8ce4ea3c202d3d0975b2914310ceb1bcc4d", size = 97564, upload-time = "2025-08-10T21:27:49.279Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6f/ab/c80b0d5a9d8a1a65f4f815f2afff9798b12c3b9f31f1d304dd233dd920e2/kiwisolver-1.4.9-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:eb14a5da6dc7642b0f3a18f13654847cd8b7a2550e2645a5bda677862b03ba16", size = 124167, upload-time = "2025-08-10T21:25:53.403Z" }, + { url = "https://files.pythonhosted.org/packages/a0/c0/27fe1a68a39cf62472a300e2879ffc13c0538546c359b86f149cc19f6ac3/kiwisolver-1.4.9-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:39a219e1c81ae3b103643d2aedb90f1ef22650deb266ff12a19e7773f3e5f089", size = 66579, upload-time = "2025-08-10T21:25:54.79Z" }, + { url = "https://files.pythonhosted.org/packages/31/a2/a12a503ac1fd4943c50f9822678e8015a790a13b5490354c68afb8489814/kiwisolver-1.4.9-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2405a7d98604b87f3fc28b1716783534b1b4b8510d8142adca34ee0bc3c87543", size = 65309, upload-time = "2025-08-10T21:25:55.76Z" }, + { url = "https://files.pythonhosted.org/packages/66/e1/e533435c0be77c3f64040d68d7a657771194a63c279f55573188161e81ca/kiwisolver-1.4.9-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:dc1ae486f9abcef254b5618dfb4113dd49f94c68e3e027d03cf0143f3f772b61", size = 1435596, upload-time = "2025-08-10T21:25:56.861Z" }, + { url = "https://files.pythonhosted.org/packages/67/1e/51b73c7347f9aabdc7215aa79e8b15299097dc2f8e67dee2b095faca9cb0/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8a1f570ce4d62d718dce3f179ee78dac3b545ac16c0c04bb363b7607a949c0d1", size = 1246548, upload-time = "2025-08-10T21:25:58.246Z" }, + { url = "https://files.pythonhosted.org/packages/21/aa/72a1c5d1e430294f2d32adb9542719cfb441b5da368d09d268c7757af46c/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:cb27e7b78d716c591e88e0a09a2139c6577865d7f2e152488c2cc6257f460872", size = 1263618, upload-time = "2025-08-10T21:25:59.857Z" }, + { url = "https://files.pythonhosted.org/packages/a3/af/db1509a9e79dbf4c260ce0cfa3903ea8945f6240e9e59d1e4deb731b1a40/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:15163165efc2f627eb9687ea5f3a28137217d217ac4024893d753f46bce9de26", size = 1317437, upload-time = "2025-08-10T21:26:01.105Z" }, + { url = "https://files.pythonhosted.org/packages/e0/f2/3ea5ee5d52abacdd12013a94130436e19969fa183faa1e7c7fbc89e9a42f/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:bdee92c56a71d2b24c33a7d4c2856bd6419d017e08caa7802d2963870e315028", size = 2195742, upload-time = "2025-08-10T21:26:02.675Z" }, + { url = "https://files.pythonhosted.org/packages/6f/9b/1efdd3013c2d9a2566aa6a337e9923a00590c516add9a1e89a768a3eb2fc/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:412f287c55a6f54b0650bd9b6dce5aceddb95864a1a90c87af16979d37c89771", size = 2290810, upload-time = "2025-08-10T21:26:04.009Z" }, + { url = "https://files.pythonhosted.org/packages/fb/e5/cfdc36109ae4e67361f9bc5b41323648cb24a01b9ade18784657e022e65f/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:2c93f00dcba2eea70af2be5f11a830a742fe6b579a1d4e00f47760ef13be247a", size = 2461579, upload-time = "2025-08-10T21:26:05.317Z" }, + { url = "https://files.pythonhosted.org/packages/62/86/b589e5e86c7610842213994cdea5add00960076bef4ae290c5fa68589cac/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f117e1a089d9411663a3207ba874f31be9ac8eaa5b533787024dc07aeb74f464", size = 2268071, upload-time = "2025-08-10T21:26:06.686Z" }, + { url = "https://files.pythonhosted.org/packages/3b/c6/f8df8509fd1eee6c622febe54384a96cfaf4d43bf2ccec7a0cc17e4715c9/kiwisolver-1.4.9-cp311-cp311-win_amd64.whl", hash = "sha256:be6a04e6c79819c9a8c2373317d19a96048e5a3f90bec587787e86a1153883c2", size = 73840, upload-time = "2025-08-10T21:26:07.94Z" }, + { url = "https://files.pythonhosted.org/packages/e2/2d/16e0581daafd147bc11ac53f032a2b45eabac897f42a338d0a13c1e5c436/kiwisolver-1.4.9-cp311-cp311-win_arm64.whl", hash = "sha256:0ae37737256ba2de764ddc12aed4956460277f00c4996d51a197e72f62f5eec7", size = 65159, upload-time = "2025-08-10T21:26:09.048Z" }, + { url = "https://files.pythonhosted.org/packages/86/c9/13573a747838aeb1c76e3267620daa054f4152444d1f3d1a2324b78255b5/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ac5a486ac389dddcc5bef4f365b6ae3ffff2c433324fb38dd35e3fab7c957999", size = 123686, upload-time = "2025-08-10T21:26:10.034Z" }, + { url = "https://files.pythonhosted.org/packages/51/ea/2ecf727927f103ffd1739271ca19c424d0e65ea473fbaeea1c014aea93f6/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:f2ba92255faa7309d06fe44c3a4a97efe1c8d640c2a79a5ef728b685762a6fd2", size = 66460, upload-time = "2025-08-10T21:26:11.083Z" }, + { url = "https://files.pythonhosted.org/packages/5b/5a/51f5464373ce2aeb5194508298a508b6f21d3867f499556263c64c621914/kiwisolver-1.4.9-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4a2899935e724dd1074cb568ce7ac0dce28b2cd6ab539c8e001a8578eb106d14", size = 64952, upload-time = "2025-08-10T21:26:12.058Z" }, + { url = "https://files.pythonhosted.org/packages/70/90/6d240beb0f24b74371762873e9b7f499f1e02166a2d9c5801f4dbf8fa12e/kiwisolver-1.4.9-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f6008a4919fdbc0b0097089f67a1eb55d950ed7e90ce2cc3e640abadd2757a04", size = 1474756, upload-time = "2025-08-10T21:26:13.096Z" }, + { url = "https://files.pythonhosted.org/packages/12/42/f36816eaf465220f683fb711efdd1bbf7a7005a2473d0e4ed421389bd26c/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:67bb8b474b4181770f926f7b7d2f8c0248cbcb78b660fdd41a47054b28d2a752", size = 1276404, upload-time = "2025-08-10T21:26:14.457Z" }, + { url = "https://files.pythonhosted.org/packages/2e/64/bc2de94800adc830c476dce44e9b40fd0809cddeef1fde9fcf0f73da301f/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2327a4a30d3ee07d2fbe2e7933e8a37c591663b96ce42a00bc67461a87d7df77", size = 1294410, upload-time = "2025-08-10T21:26:15.73Z" }, + { url = "https://files.pythonhosted.org/packages/5f/42/2dc82330a70aa8e55b6d395b11018045e58d0bb00834502bf11509f79091/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:7a08b491ec91b1d5053ac177afe5290adacf1f0f6307d771ccac5de30592d198", size = 1343631, upload-time = "2025-08-10T21:26:17.045Z" }, + { url = "https://files.pythonhosted.org/packages/22/fd/f4c67a6ed1aab149ec5a8a401c323cee7a1cbe364381bb6c9c0d564e0e20/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d8fc5c867c22b828001b6a38d2eaeb88160bf5783c6cb4a5e440efc981ce286d", size = 2224963, upload-time = "2025-08-10T21:26:18.737Z" }, + { url = "https://files.pythonhosted.org/packages/45/aa/76720bd4cb3713314677d9ec94dcc21ced3f1baf4830adde5bb9b2430a5f/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:3b3115b2581ea35bb6d1f24a4c90af37e5d9b49dcff267eeed14c3893c5b86ab", size = 2321295, upload-time = "2025-08-10T21:26:20.11Z" }, + { url = "https://files.pythonhosted.org/packages/80/19/d3ec0d9ab711242f56ae0dc2fc5d70e298bb4a1f9dfab44c027668c673a1/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:858e4c22fb075920b96a291928cb7dea5644e94c0ee4fcd5af7e865655e4ccf2", size = 2487987, upload-time = "2025-08-10T21:26:21.49Z" }, + { url = "https://files.pythonhosted.org/packages/39/e9/61e4813b2c97e86b6fdbd4dd824bf72d28bcd8d4849b8084a357bc0dd64d/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ed0fecd28cc62c54b262e3736f8bb2512d8dcfdc2bcf08be5f47f96bf405b145", size = 2291817, upload-time = "2025-08-10T21:26:22.812Z" }, + { url = "https://files.pythonhosted.org/packages/a0/41/85d82b0291db7504da3c2defe35c9a8a5c9803a730f297bd823d11d5fb77/kiwisolver-1.4.9-cp312-cp312-win_amd64.whl", hash = "sha256:f68208a520c3d86ea51acf688a3e3002615a7f0238002cccc17affecc86a8a54", size = 73895, upload-time = "2025-08-10T21:26:24.37Z" }, + { url = "https://files.pythonhosted.org/packages/e2/92/5f3068cf15ee5cb624a0c7596e67e2a0bb2adee33f71c379054a491d07da/kiwisolver-1.4.9-cp312-cp312-win_arm64.whl", hash = "sha256:2c1a4f57df73965f3f14df20b80ee29e6a7930a57d2d9e8491a25f676e197c60", size = 64992, upload-time = "2025-08-10T21:26:25.732Z" }, + { url = "https://files.pythonhosted.org/packages/a3/0f/36d89194b5a32c054ce93e586d4049b6c2c22887b0eb229c61c68afd3078/kiwisolver-1.4.9-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:720e05574713db64c356e86732c0f3c5252818d05f9df320f0ad8380641acea5", size = 60104, upload-time = "2025-08-10T21:27:43.287Z" }, + { url = "https://files.pythonhosted.org/packages/52/ba/4ed75f59e4658fd21fe7dde1fee0ac397c678ec3befba3fe6482d987af87/kiwisolver-1.4.9-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:17680d737d5335b552994a2008fab4c851bcd7de33094a82067ef3a576ff02fa", size = 58592, upload-time = "2025-08-10T21:27:44.314Z" }, + { url = "https://files.pythonhosted.org/packages/33/01/a8ea7c5ea32a9b45ceeaee051a04c8ed4320f5add3c51bfa20879b765b70/kiwisolver-1.4.9-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:85b5352f94e490c028926ea567fc569c52ec79ce131dadb968d3853e809518c2", size = 80281, upload-time = "2025-08-10T21:27:45.369Z" }, + { url = "https://files.pythonhosted.org/packages/da/e3/dbd2ecdce306f1d07a1aaf324817ee993aab7aee9db47ceac757deabafbe/kiwisolver-1.4.9-pp311-pypy311_pp73-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:464415881e4801295659462c49461a24fb107c140de781d55518c4b80cb6790f", size = 78009, upload-time = "2025-08-10T21:27:46.376Z" }, + { url = "https://files.pythonhosted.org/packages/da/e9/0d4add7873a73e462aeb45c036a2dead2562b825aa46ba326727b3f31016/kiwisolver-1.4.9-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:fb940820c63a9590d31d88b815e7a3aa5915cad3ce735ab45f0c730b39547de1", size = 73929, upload-time = "2025-08-10T21:27:48.236Z" }, +] + [[package]] name = "kombu" version = "5.5.4" @@ -1363,6 +1550,40 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/dc/1e/408fd10217eac0e43aea0604be22b4851a09e03d761d44d4ea12089dd70e/levenshtein-0.27.1-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:7987ef006a3cf56a4532bd4c90c2d3b7b4ca9ad3bf8ae1ee5713c4a3bdfda913", size = 98045, upload-time = "2025-03-02T19:44:44.527Z" }, ] +[[package]] +name = "lightning" +version = "2.5.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "fsspec", extra = ["http"] }, + { name = "lightning-utilities" }, + { name = "packaging" }, + { name = "pytorch-lightning" }, + { name = "pyyaml" }, + { name = "torch" }, + { name = "torchmetrics" }, + { name = "tqdm" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/01/80/dddb5a382aa0ff18045aee6491f81e40371102cb05da2ad5a8436a51c475/lightning-2.5.3.tar.gz", hash = "sha256:4ed3e12369a1e0f928beecf5c9f5efdabda60a9216057954851e2d89f1abecde", size = 636577, upload-time = "2025-08-13T20:29:32.361Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/00/6b/00e9c2b03a449c21d7a4d73a7104ac94f56c37a1e6eae77b1c702d8dddf0/lightning-2.5.3-py3-none-any.whl", hash = "sha256:c551111fda0db0bce267791f9a90cd4f9cf94bc327d36348af0ef79ec752d666", size = 824181, upload-time = "2025-08-13T20:29:30.244Z" }, +] + +[[package]] +name = "lightning-utilities" +version = "0.15.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, + { name = "setuptools" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b8/39/6fc58ca81492db047149b4b8fd385aa1bfb8c28cd7cacb0c7eb0c44d842f/lightning_utilities-0.15.2.tar.gz", hash = "sha256:cdf12f530214a63dacefd713f180d1ecf5d165338101617b4742e8f22c032e24", size = 31090, upload-time = "2025-08-06T13:57:39.242Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/de/73/3d757cb3fc16f0f9794dd289bcd0c4a031d9cf54d8137d6b984b2d02edf3/lightning_utilities-0.15.2-py3-none-any.whl", hash = "sha256:ad3ab1703775044bbf880dbf7ddaaac899396c96315f3aa1779cec9d618a9841", size = 29431, upload-time = "2025-08-06T13:57:38.046Z" }, +] + [[package]] name = "llama-cloud" version = "0.1.32" @@ -1723,6 +1944,42 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/34/75/51952c7b2d3873b44a0028b1bd26a25078c18f92f256608e8d1dc61b39fd/marshmallow-3.26.1-py3-none-any.whl", hash = "sha256:3350409f20a70a7e4e11a27661187b77cdcaeb20abca41c1454fe33636bea09c", size = 50878, upload-time = "2025-02-03T15:32:22.295Z" }, ] +[[package]] +name = "matplotlib" +version = "3.10.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "contourpy" }, + { name = "cycler" }, + { name = "fonttools" }, + { name = "kiwisolver" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pillow" }, + { name = "pyparsing" }, + { name = "python-dateutil" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/43/91/f2939bb60b7ebf12478b030e0d7f340247390f402b3b189616aad790c366/matplotlib-3.10.5.tar.gz", hash = "sha256:352ed6ccfb7998a00881692f38b4ca083c691d3e275b4145423704c34c909076", size = 34804044, upload-time = "2025-07-31T18:09:33.805Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/aa/c7/1f2db90a1d43710478bb1e9b57b162852f79234d28e4f48a28cc415aa583/matplotlib-3.10.5-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:dcfc39c452c6a9f9028d3e44d2d721484f665304857188124b505b2c95e1eecf", size = 8239216, upload-time = "2025-07-31T18:07:51.947Z" }, + { url = "https://files.pythonhosted.org/packages/82/6d/ca6844c77a4f89b1c9e4d481c412e1d1dbabf2aae2cbc5aa2da4a1d6683e/matplotlib-3.10.5-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:903352681b59f3efbf4546985142a9686ea1d616bb054b09a537a06e4b892ccf", size = 8102130, upload-time = "2025-07-31T18:07:53.65Z" }, + { url = "https://files.pythonhosted.org/packages/1d/1e/5e187a30cc673a3e384f3723e5f3c416033c1d8d5da414f82e4e731128ea/matplotlib-3.10.5-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:080c3676a56b8ee1c762bcf8fca3fe709daa1ee23e6ef06ad9f3fc17332f2d2a", size = 8666471, upload-time = "2025-07-31T18:07:55.304Z" }, + { url = "https://files.pythonhosted.org/packages/03/c0/95540d584d7d645324db99a845ac194e915ef75011a0d5e19e1b5cee7e69/matplotlib-3.10.5-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4b4984d5064a35b6f66d2c11d668565f4389b1119cc64db7a4c1725bc11adffc", size = 9500518, upload-time = "2025-07-31T18:07:57.199Z" }, + { url = "https://files.pythonhosted.org/packages/ba/2e/e019352099ea58b4169adb9c6e1a2ad0c568c6377c2b677ee1f06de2adc7/matplotlib-3.10.5-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:3967424121d3a46705c9fa9bdb0931de3228f13f73d7bb03c999c88343a89d89", size = 9552372, upload-time = "2025-07-31T18:07:59.41Z" }, + { url = "https://files.pythonhosted.org/packages/b7/81/3200b792a5e8b354f31f4101ad7834743ad07b6d620259f2059317b25e4d/matplotlib-3.10.5-cp311-cp311-win_amd64.whl", hash = "sha256:33775bbeb75528555a15ac29396940128ef5613cf9a2d31fb1bfd18b3c0c0903", size = 8100634, upload-time = "2025-07-31T18:08:01.801Z" }, + { url = "https://files.pythonhosted.org/packages/52/46/a944f6f0c1f5476a0adfa501969d229ce5ae60cf9a663be0e70361381f89/matplotlib-3.10.5-cp311-cp311-win_arm64.whl", hash = "sha256:c61333a8e5e6240e73769d5826b9a31d8b22df76c0778f8480baf1b4b01c9420", size = 7978880, upload-time = "2025-07-31T18:08:03.407Z" }, + { url = "https://files.pythonhosted.org/packages/66/1e/c6f6bcd882d589410b475ca1fc22e34e34c82adff519caf18f3e6dd9d682/matplotlib-3.10.5-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:00b6feadc28a08bd3c65b2894f56cf3c94fc8f7adcbc6ab4516ae1e8ed8f62e2", size = 8253056, upload-time = "2025-07-31T18:08:05.385Z" }, + { url = "https://files.pythonhosted.org/packages/53/e6/d6f7d1b59413f233793dda14419776f5f443bcccb2dfc84b09f09fe05dbe/matplotlib-3.10.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:ee98a5c5344dc7f48dc261b6ba5d9900c008fc12beb3fa6ebda81273602cc389", size = 8110131, upload-time = "2025-07-31T18:08:07.293Z" }, + { url = "https://files.pythonhosted.org/packages/66/2b/bed8a45e74957549197a2ac2e1259671cd80b55ed9e1fe2b5c94d88a9202/matplotlib-3.10.5-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a17e57e33de901d221a07af32c08870ed4528db0b6059dce7d7e65c1122d4bea", size = 8669603, upload-time = "2025-07-31T18:08:09.064Z" }, + { url = "https://files.pythonhosted.org/packages/7e/a7/315e9435b10d057f5e52dfc603cd353167ae28bb1a4e033d41540c0067a4/matplotlib-3.10.5-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:97b9d6443419085950ee4a5b1ee08c363e5c43d7176e55513479e53669e88468", size = 9508127, upload-time = "2025-07-31T18:08:10.845Z" }, + { url = "https://files.pythonhosted.org/packages/7f/d9/edcbb1f02ca99165365d2768d517898c22c6040187e2ae2ce7294437c413/matplotlib-3.10.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ceefe5d40807d29a66ae916c6a3915d60ef9f028ce1927b84e727be91d884369", size = 9566926, upload-time = "2025-07-31T18:08:13.186Z" }, + { url = "https://files.pythonhosted.org/packages/3b/d9/6dd924ad5616c97b7308e6320cf392c466237a82a2040381163b7500510a/matplotlib-3.10.5-cp312-cp312-win_amd64.whl", hash = "sha256:c04cba0f93d40e45b3c187c6c52c17f24535b27d545f757a2fffebc06c12b98b", size = 8107599, upload-time = "2025-07-31T18:08:15.116Z" }, + { url = "https://files.pythonhosted.org/packages/0e/f3/522dc319a50f7b0279fbe74f86f7a3506ce414bc23172098e8d2bdf21894/matplotlib-3.10.5-cp312-cp312-win_arm64.whl", hash = "sha256:a41bcb6e2c8e79dc99c5511ae6f7787d2fb52efd3d805fff06d5d4f667db16b2", size = 7978173, upload-time = "2025-07-31T18:08:21.518Z" }, + { url = "https://files.pythonhosted.org/packages/dc/d6/e921be4e1a5f7aca5194e1f016cb67ec294548e530013251f630713e456d/matplotlib-3.10.5-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:160e125da27a749481eaddc0627962990f6029811dbeae23881833a011a0907f", size = 8233224, upload-time = "2025-07-31T18:09:27.512Z" }, + { url = "https://files.pythonhosted.org/packages/ec/74/a2b9b04824b9c349c8f1b2d21d5af43fa7010039427f2b133a034cb09e59/matplotlib-3.10.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:ac3d50760394d78a3c9be6b28318fe22b494c4fcf6407e8fd4794b538251899b", size = 8098539, upload-time = "2025-07-31T18:09:29.629Z" }, + { url = "https://files.pythonhosted.org/packages/fc/66/cd29ebc7f6c0d2a15d216fb572573e8fc38bd5d6dec3bd9d7d904c0949f7/matplotlib-3.10.5-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:6c49465bf689c4d59d174d0c7795fb42a21d4244d11d70e52b8011987367ac61", size = 8672192, upload-time = "2025-07-31T18:09:31.407Z" }, +] + [[package]] name = "mdurl" version = "0.1.2" @@ -1864,6 +2121,145 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/48/6b/1c6b515a83d5564b1698a61efa245727c8feecf308f4091f565988519d20/numpy-2.3.1-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:e610832418a2bc09d974cc9fecebfa51e9532d6190223bc5ef6a7402ebf3b5cb", size = 12927246, upload-time = "2025-06-21T12:27:38.618Z" }, ] +[[package]] +name = "nvidia-cublas-cu12" +version = "12.8.4.1" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/dc/61/e24b560ab2e2eaeb3c839129175fb330dfcfc29e5203196e5541a4c44682/nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:8ac4e771d5a348c551b2a426eda6193c19aa630236b418086020df5ba9667142", size = 594346921, upload-time = "2025-03-07T01:44:31.254Z" }, +] + +[[package]] +name = "nvidia-cuda-cupti-cu12" +version = "12.8.90" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f8/02/2adcaa145158bf1a8295d83591d22e4103dbfd821bcaf6f3f53151ca4ffa/nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ea0cb07ebda26bb9b29ba82cda34849e73c166c18162d3913575b0c9db9a6182", size = 10248621, upload-time = "2025-03-07T01:40:21.213Z" }, +] + +[[package]] +name = "nvidia-cuda-nvrtc-cu12" +version = "12.8.93" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/05/6b/32f747947df2da6994e999492ab306a903659555dddc0fbdeb9d71f75e52/nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:a7756528852ef889772a84c6cd89d41dfa74667e24cca16bb31f8f061e3e9994", size = 88040029, upload-time = "2025-03-07T01:42:13.562Z" }, +] + +[[package]] +name = "nvidia-cuda-runtime-cu12" +version = "12.8.90" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0d/9b/a997b638fcd068ad6e4d53b8551a7d30fe8b404d6f1804abf1df69838932/nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adade8dcbd0edf427b7204d480d6066d33902cab2a4707dcfc48a2d0fd44ab90", size = 954765, upload-time = "2025-03-07T01:40:01.615Z" }, +] + +[[package]] +name = "nvidia-cudnn-cu12" +version = "9.10.2.21" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-cublas-cu12" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/ba/51/e123d997aa098c61d029f76663dedbfb9bc8dcf8c60cbd6adbe42f76d049/nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:949452be657fa16687d0930933f032835951ef0892b37d2d53824d1a84dc97a8", size = 706758467, upload-time = "2025-06-06T21:54:08.597Z" }, +] + +[[package]] +name = "nvidia-cufft-cu12" +version = "11.3.3.83" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-nvjitlink-cu12" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/1f/13/ee4e00f30e676b66ae65b4f08cb5bcbb8392c03f54f2d5413ea99a5d1c80/nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d2dd21ec0b88cf61b62e6b43564355e5222e4a3fb394cac0db101f2dd0d4f74", size = 193118695, upload-time = "2025-03-07T01:45:27.821Z" }, +] + +[[package]] +name = "nvidia-cufile-cu12" +version = "1.13.1.3" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/bb/fe/1bcba1dfbfb8d01be8d93f07bfc502c93fa23afa6fd5ab3fc7c1df71038a/nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1d069003be650e131b21c932ec3d8969c1715379251f8d23a1860554b1cb24fc", size = 1197834, upload-time = "2025-03-07T01:45:50.723Z" }, +] + +[[package]] +name = "nvidia-curand-cu12" +version = "10.3.9.90" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fb/aa/6584b56dc84ebe9cf93226a5cde4d99080c8e90ab40f0c27bda7a0f29aa1/nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:b32331d4f4df5d6eefa0554c565b626c7216f87a06a4f56fab27c3b68a830ec9", size = 63619976, upload-time = "2025-03-07T01:46:23.323Z" }, +] + +[[package]] +name = "nvidia-cusolver-cu12" +version = "11.7.3.90" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-cublas-cu12" }, + { name = "nvidia-cusparse-cu12" }, + { name = "nvidia-nvjitlink-cu12" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/85/48/9a13d2975803e8cf2777d5ed57b87a0b6ca2cc795f9a4f59796a910bfb80/nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:4376c11ad263152bd50ea295c05370360776f8c3427b30991df774f9fb26c450", size = 267506905, upload-time = "2025-03-07T01:47:16.273Z" }, +] + +[[package]] +name = "nvidia-cusparse-cu12" +version = "12.5.8.93" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "nvidia-nvjitlink-cu12" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/c2/f5/e1854cb2f2bcd4280c44736c93550cc300ff4b8c95ebe370d0aa7d2b473d/nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1ec05d76bbbd8b61b06a80e1eaf8cf4959c3d4ce8e711b65ebd0443bb0ebb13b", size = 288216466, upload-time = "2025-03-07T01:48:13.779Z" }, +] + +[[package]] +name = "nvidia-cusparselt-cu12" +version = "0.7.1" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/56/79/12978b96bd44274fe38b5dde5cfb660b1d114f70a65ef962bcbbed99b549/nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl", hash = "sha256:f1bb701d6b930d5a7cea44c19ceb973311500847f81b634d802b7b539dc55623", size = 287193691, upload-time = "2025-02-26T00:15:44.104Z" }, +] + +[[package]] +name = "nvidia-nccl-cu12" +version = "2.27.3" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5c/5b/4e4fff7bad39adf89f735f2bc87248c81db71205b62bcc0d5ca5b606b3c3/nvidia_nccl_cu12-2.27.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adf27ccf4238253e0b826bce3ff5fa532d65fc42322c8bfdfaf28024c0fbe039", size = 322364134, upload-time = "2025-06-03T21:58:04.013Z" }, +] + +[[package]] +name = "nvidia-nvjitlink-cu12" +version = "12.8.93" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f6/74/86a07f1d0f42998ca31312f998bd3b9a7eff7f52378f4f270c8679c77fb9/nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:81ff63371a7ebd6e6451970684f916be2eab07321b73c9d244dc2b4da7f73b88", size = 39254836, upload-time = "2025-03-07T01:49:55.661Z" }, +] + +[[package]] +name = "nvidia-nvtx-cu12" +version = "12.8.90" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a2/eb/86626c1bbc2edb86323022371c39aa48df6fd8b0a1647bc274577f72e90b/nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5b17e2001cc0d751a5bc2c6ec6d26ad95913324a4adb86788c944f8ce9ba441f", size = 89954, upload-time = "2025-03-07T01:42:44.131Z" }, +] + +[[package]] +name = "omegaconf" +version = "2.3.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "antlr4-python3-runtime" }, + { name = "pyyaml" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/09/48/6388f1bb9da707110532cb70ec4d2822858ddfb44f1cdf1233c20a80ea4b/omegaconf-2.3.0.tar.gz", hash = "sha256:d5d4b6d29955cc50ad50c46dc269bcd92c6e00f5f90d23ab5fee7bfca4ba4cc7", size = 3298120, upload-time = "2022-12-08T20:59:22.753Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e3/94/1843518e420fa3ed6919835845df698c7e27e183cb997394e4a670973a65/omegaconf-2.3.0-py3-none-any.whl", hash = "sha256:7b4df175cdb08ba400f45cae3bdcae7ba8365db4d165fc65fd04b050ab63b46b", size = 79500, upload-time = "2022-12-08T20:59:19.686Z" }, +] + [[package]] name = "onnxruntime" version = "1.22.1" @@ -1906,6 +2302,24 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/8a/91/1f1cf577f745e956b276a8b1d3d76fa7a6ee0c2b05db3b001b900f2c71db/openai-1.97.0-py3-none-any.whl", hash = "sha256:a1c24d96f4609f3f7f51c9e1c2606d97cc6e334833438659cfd687e9c972c610", size = 764953, upload-time = "2025-07-16T16:37:33.135Z" }, ] +[[package]] +name = "optuna" +version = "4.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "alembic" }, + { name = "colorlog" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pyyaml" }, + { name = "sqlalchemy" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a5/e0/b303190ae8032d12f320a24c42af04038bacb1f3b17ede354dd1044a5642/optuna-4.4.0.tar.gz", hash = "sha256:a9029f6a92a1d6c8494a94e45abd8057823b535c2570819072dbcdc06f1c1da4", size = 467708, upload-time = "2025-06-16T05:13:00.024Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/5c/5e/068798a8c7087863e7772e9363a880ab13fe55a5a7ede8ec42fab8a1acbb/optuna-4.4.0-py3-none-any.whl", hash = "sha256:fad8d9c5d5af993ae1280d6ce140aecc031c514a44c3b639d8c8658a8b7920ea", size = 395949, upload-time = "2025-06-16T05:12:58.37Z" }, +] + [[package]] name = "packaging" version = "25.0" @@ -2007,6 +2421,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" }, ] +[[package]] +name = "primepy" +version = "1.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/35/77/0cfa1b4697cfb5336f3a96e8bc73327f64610be3a64c97275f1801afb395/primePy-1.3.tar.gz", hash = "sha256:25fd7e25344b0789a5984c75d89f054fcf1f180bef20c998e4befbac92de4669", size = 3914, upload-time = "2018-05-29T17:18:18.683Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/74/c1/bb7e334135859c3a92ec399bc89293ea73f28e815e35b43929c8db6af030/primePy-1.3-py3-none-any.whl", hash = "sha256:5ed443718765be9bf7e2ff4c56cdff71b42140a15b39d054f9d99f0009e2317a", size = 4040, upload-time = "2018-05-29T17:18:17.53Z" }, +] + [[package]] name = "profanityfilter" version = "2.1.0" @@ -2155,6 +2578,106 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/92/29/06261ea000e2dc1e22907dbbc483a1093665509ea586b29b8986a0e56733/psycopg2_binary-2.9.10-cp312-cp312-win_amd64.whl", hash = "sha256:18c5ee682b9c6dd3696dad6e54cc7ff3a1a9020df6a5c0f861ef8bfd338c3ca0", size = 1164031, upload-time = "2024-10-16T11:21:34.211Z" }, ] +[[package]] +name = "pyannote-audio" +version = "3.3.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "asteroid-filterbanks" }, + { name = "einops" }, + { name = "huggingface-hub" }, + { name = "lightning" }, + { name = "omegaconf" }, + { name = "pyannote-core" }, + { name = "pyannote-database" }, + { name = "pyannote-metrics" }, + { name = "pyannote-pipeline" }, + { name = "pytorch-metric-learning" }, + { name = "rich" }, + { name = "semver" }, + { name = "soundfile" }, + { name = "speechbrain" }, + { name = "tensorboardx" }, + { name = "torch" }, + { name = "torch-audiomentations" }, + { name = "torchaudio" }, + { name = "torchmetrics" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/e9/00/3b96ca7ad0641e4f64cfaa2af153dc7da0998ff972280e1c1681b1fcc243/pyannote_audio-3.3.2.tar.gz", hash = "sha256:b2115e86b0db5faedb9f36ee1a150cebd07f7758e65e815accdac1a12ca9c777", size = 13664309, upload-time = "2024-09-11T11:07:48.274Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/17/e6/76049470d90217f9a15a34abf3e92d782cabc3fb4ab27515c9baaa5495d1/pyannote.audio-3.3.2-py2.py3-none-any.whl", hash = "sha256:599c694acd5d193215147ff82d0bf638bb191204ed502bd9fde8ff582e20aa1c", size = 898707, upload-time = "2024-09-11T11:07:46.12Z" }, +] + +[[package]] +name = "pyannote-core" +version = "5.0.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "scipy" }, + { name = "sortedcontainers" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/65/03/feaf7534206f02c75baf151ce4b8c322b402a6f477c2be82f69d9269cbe6/pyannote.core-5.0.0.tar.gz", hash = "sha256:1a55bcc8bd680ba6be5fa53efa3b6f3d2cdd67144c07b6b4d8d66d5cb0d2096f", size = 59247, upload-time = "2022-12-15T13:02:05.312Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/84/c4/370bc8ba66815a5832ece753a1009388bb07ea353d21c83f2d5a1a436f2c/pyannote.core-5.0.0-py3-none-any.whl", hash = "sha256:04920a6754492242ce0dc6017545595ab643870fe69a994f20c1a5f2da0544d0", size = 58475, upload-time = "2022-12-15T13:02:03.265Z" }, +] + +[[package]] +name = "pyannote-database" +version = "5.1.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pandas" }, + { name = "pyannote-core" }, + { name = "pyyaml" }, + { name = "typer" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a9/ae/de36413d69a46be87cb612ebbcdc4eacbeebce3bc809124603e44a88fe26/pyannote.database-5.1.3.tar.gz", hash = "sha256:0eaf64c1cc506718de60d2d702f1359b1ae7ff252ee3e4799f1c5e378cd52c31", size = 49957, upload-time = "2025-01-15T20:28:26.437Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a1/64/92d51a3a05615ba58be8ba62a43f9f9f952d9f3646f7e4fb7826e5a3a24e/pyannote.database-5.1.3-py3-none-any.whl", hash = "sha256:37887844c7dfbcc075cb591eddc00aff45fae1ed905344e1f43e0090e63bd40a", size = 48127, upload-time = "2025-01-15T20:28:25.326Z" }, +] + +[[package]] +name = "pyannote-metrics" +version = "3.2.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "docopt" }, + { name = "matplotlib" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "pyannote-core" }, + { name = "pyannote-database" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "sympy" }, + { name = "tabulate" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/39/2b/6c5f01d3c49aa1c160765946e23782ca6436ae8b9bc514b56319ff5f16e7/pyannote.metrics-3.2.1.tar.gz", hash = "sha256:08024255a3550e96a8e9da4f5f4af326886548480de891414567c8900920ee5c", size = 49086, upload-time = "2022-06-20T14:10:34.618Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6c/7d/035b370ab834b30e849fe9cd092b7bd7f321fcc4a2c56b84e96476b7ede5/pyannote.metrics-3.2.1-py3-none-any.whl", hash = "sha256:46be797cdade26c82773e5018659ae610145260069c7c5bf3d3c8a029ade8e22", size = 51386, upload-time = "2022-06-20T14:10:32.621Z" }, +] + +[[package]] +name = "pyannote-pipeline" +version = "3.0.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "docopt" }, + { name = "filelock" }, + { name = "optuna" }, + { name = "pyannote-core" }, + { name = "pyannote-database" }, + { name = "pyyaml" }, + { name = "scikit-learn" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/35/04/4bcfe0dd588577a188328b806f3a7213d8cead0ce5fe5784d01fd57df93f/pyannote.pipeline-3.0.1.tar.gz", hash = "sha256:021794e26a2cf5d8fb5bb1835951e71f5fac33eb14e23dfb7468e16b1b805151", size = 34486, upload-time = "2023-09-22T20:16:49.951Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/83/42/1bf7cbf061ed05c580bfb63bffdd3f3474cbd5c02bee4fac518eea9e9d9e/pyannote.pipeline-3.0.1-py3-none-any.whl", hash = "sha256:819bde4c4dd514f740f2373dfec794832b9fc8e346a35e43a7681625ee187393", size = 31517, upload-time = "2023-09-22T20:16:48.153Z" }, +] + [[package]] name = "pyasn1" version = "0.6.1" @@ -2334,6 +2857,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/80/28/2659c02301b9500751f8d42f9a6632e1508aa5120de5e43042b8b30f8d5d/pyopenssl-25.1.0-py3-none-any.whl", hash = "sha256:2b11f239acc47ac2e5aca04fd7fa829800aeee22a2eb30d744572a157bd8a1ab", size = 56771, upload-time = "2025-05-17T16:28:29.197Z" }, ] +[[package]] +name = "pyparsing" +version = "3.2.3" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/bb/22/f1129e69d94ffff626bdb5c835506b3a5b4f3d070f17ea295e12c2c6f60f/pyparsing-3.2.3.tar.gz", hash = "sha256:b9c13f1ab8b3b542f72e28f634bad4de758ab3ce4546e4301970ad6fa77c38be", size = 1088608, upload-time = "2025-03-25T05:01:28.114Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/05/e7/df2285f3d08fee213f2d041540fa4fc9ca6c2d44cf36d3a035bf2a8d2bcc/pyparsing-3.2.3-py3-none-any.whl", hash = "sha256:a749938e02d6fd0b59b356ca504a24982314bb090c383e3cf201c95ef7e2bfcf", size = 111120, upload-time = "2025-03-25T05:01:24.908Z" }, +] + [[package]] name = "pypdf" version = "5.8.0" @@ -2478,6 +3010,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/2c/72/7138a0faf5d780d6b9ceedef22da0b66ae8e22a676a12fd55a05c0cdd979/pytest_httpx-0.34.0-py3-none-any.whl", hash = "sha256:42cf0a66f7b71b9111db2897e8b38a903abd33a27b11c48aff4a3c7650313af2", size = 19440, upload-time = "2024-11-18T18:49:55.384Z" }, ] +[[package]] +name = "pytest-recording" +version = "0.13.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pytest" }, + { name = "vcrpy", version = "5.1.0", source = { registry = "https://pypi.org/simple" }, marker = "platform_python_implementation == 'PyPy'" }, + { name = "vcrpy", version = "7.0.0", source = { registry = "https://pypi.org/simple" }, marker = "platform_python_implementation != 'PyPy'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/32/9c/f4027c5f1693847b06d11caf4b4f6bb09f22c1581ada4663877ec166b8c6/pytest_recording-0.13.4.tar.gz", hash = "sha256:568d64b2a85992eec4ae0a419c855d5fd96782c5fb016784d86f18053792768c", size = 26576, upload-time = "2025-05-08T10:41:11.231Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/42/c2/ce34735972cc42d912173e79f200fe66530225190c06655c5632a9d88f1e/pytest_recording-0.13.4-py3-none-any.whl", hash = "sha256:ad49a434b51b1c4f78e85b1e6b74fdcc2a0a581ca16e52c798c6ace971f7f439", size = 13723, upload-time = "2025-05-08T10:41:09.684Z" }, +] + [[package]] name = "python-dateutil" version = "2.9.0.post0" @@ -2527,6 +3073,40 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/45/58/38b5afbc1a800eeea951b9285d3912613f2603bdf897a4ab0f4bd7f405fc/python_multipart-0.0.20-py3-none-any.whl", hash = "sha256:8a62d3a8335e06589fe01f2a3e178cdcc632f3fbe0d492ad9ee0ec35aab1f104", size = 24546, upload-time = "2024-12-16T19:45:44.423Z" }, ] +[[package]] +name = "pytorch-lightning" +version = "2.5.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "fsspec", extra = ["http"] }, + { name = "lightning-utilities" }, + { name = "packaging" }, + { name = "pyyaml" }, + { name = "torch" }, + { name = "torchmetrics" }, + { name = "tqdm" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/32/a8/31fe79bf96dab33cee5537ed6f08230ed6f032834bb4ff529cc487fb40e8/pytorch_lightning-2.5.3.tar.gz", hash = "sha256:65f4eee774ee1adba181aacacffb9f677fe5c5f9fd3d01a95f603403f940be6a", size = 639897, upload-time = "2025-08-13T20:29:39.161Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/6a/a2/5f2b7b40ec5213db5282e98dd32fd419fe5b73b5b53895dfff56fe12fed0/pytorch_lightning-2.5.3-py3-none-any.whl", hash = "sha256:7476bd36282d9253dda175b9263b07942489d70ad90bbd1bc0a59c46e012f353", size = 828186, upload-time = "2025-08-13T20:29:37.41Z" }, +] + +[[package]] +name = "pytorch-metric-learning" +version = "2.8.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "scikit-learn" }, + { name = "torch" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/78/94/1bfb2c3eaf195b2d72912b65b3d417f2d9ac22491563eca360d453512c59/pytorch-metric-learning-2.8.1.tar.gz", hash = "sha256:fcc4d3b4a805e5fce25fb2e67505c47ba6fea0563fc09c5655ea1f08d1e8ed93", size = 83117, upload-time = "2024-12-11T19:21:15.982Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/60/15/eee4e24c3f5a63b3e73692ff79766a66cab8844e24f5912be29350937592/pytorch_metric_learning-2.8.1-py3-none-any.whl", hash = "sha256:aba6da0508d29ee9661a67fbfee911cdf62e65fc07e404b167d82871ca7e3e88", size = 125923, upload-time = "2024-12-11T19:21:13.448Z" }, +] + [[package]] name = "pytz" version = "2025.2" @@ -2659,7 +3239,6 @@ dependencies = [ { name = "databases", extra = ["aiosqlite", "asyncpg"] }, { name = "fastapi", extra = ["standard"] }, { name = "fastapi-pagination" }, - { name = "faster-whisper" }, { name = "httpx" }, { name = "jsonschema" }, { name = "llama-index" }, @@ -2679,6 +3258,7 @@ dependencies = [ { name = "requests" }, { name = "sentencepiece" }, { name = "sentry-sdk", extra = ["fastapi"] }, + { name = "silero-vad" }, { name = "sortedcontainers" }, { name = "sqlalchemy" }, { name = "structlog" }, @@ -2702,6 +3282,10 @@ evaluation = [ { name = "pydantic" }, { name = "tqdm" }, ] +local = [ + { name = "faster-whisper" }, + { name = "pyannote-audio" }, +] tests = [ { name = "asgi-lifespan" }, { name = "httpx-ws" }, @@ -2712,6 +3296,7 @@ tests = [ { name = "pytest-cov" }, { name = "pytest-docker" }, { name = "pytest-httpx" }, + { name = "pytest-recording" }, ] [package.metadata] @@ -2725,7 +3310,6 @@ requires-dist = [ { name = "databases", extras = ["aiosqlite", "asyncpg"], specifier = ">=0.7.0" }, { name = "fastapi", extras = ["standard"], specifier = ">=0.100.1" }, { name = "fastapi-pagination", specifier = ">=0.12.6" }, - { name = "faster-whisper", specifier = ">=0.10.0" }, { name = "httpx", specifier = ">=0.24.1" }, { name = "jsonschema", specifier = ">=4.23.0" }, { name = "llama-index", specifier = ">=0.12.52" }, @@ -2745,6 +3329,7 @@ requires-dist = [ { name = "requests", specifier = ">=2.31.0" }, { name = "sentencepiece", specifier = ">=0.1.99" }, { name = "sentry-sdk", extras = ["fastapi"], specifier = ">=1.29.2" }, + { name = "silero-vad", specifier = ">=5.1.2" }, { name = "sortedcontainers", specifier = ">=2.4.0" }, { name = "sqlalchemy", specifier = "<1.5" }, { name = "structlog", specifier = ">=23.1.0" }, @@ -2766,6 +3351,10 @@ evaluation = [ { name = "pydantic", specifier = ">=2.1.1" }, { name = "tqdm", specifier = ">=4.66.0" }, ] +local = [ + { name = "faster-whisper", specifier = ">=0.10.0" }, + { name = "pyannote-audio", specifier = ">=3.3.2" }, +] tests = [ { name = "asgi-lifespan", specifier = ">=2.1.0" }, { name = "httpx-ws", specifier = ">=0.4.1" }, @@ -2776,6 +3365,7 @@ tests = [ { name = "pytest-cov", specifier = ">=4.1.0" }, { name = "pytest-docker", specifier = ">=3.2.3" }, { name = "pytest-httpx", specifier = ">=0.23.1" }, + { name = "pytest-recording", specifier = ">=0.13.4" }, ] [[package]] @@ -2963,6 +3553,44 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/64/8d/0133e4eb4beed9e425d9a98ed6e081a55d195481b7632472be1af08d2f6b/rsa-4.9.1-py3-none-any.whl", hash = "sha256:68635866661c6836b8d39430f97a996acbd61bfa49406748ea243539fe239762", size = 34696, upload-time = "2025-04-16T09:51:17.142Z" }, ] +[[package]] +name = "ruamel-yaml" +version = "0.18.14" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "ruamel-yaml-clib", marker = "platform_python_implementation == 'CPython'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/39/87/6da0df742a4684263261c253f00edd5829e6aca970fff69e75028cccc547/ruamel.yaml-0.18.14.tar.gz", hash = "sha256:7227b76aaec364df15936730efbf7d72b30c0b79b1d578bbb8e3dcb2d81f52b7", size = 145511, upload-time = "2025-06-09T08:51:09.828Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/af/6d/6fe4805235e193aad4aaf979160dd1f3c487c57d48b810c816e6e842171b/ruamel.yaml-0.18.14-py3-none-any.whl", hash = "sha256:710ff198bb53da66718c7db27eec4fbcc9aa6ca7204e4c1df2f282b6fe5eb6b2", size = 118570, upload-time = "2025-06-09T08:51:06.348Z" }, +] + +[[package]] +name = "ruamel-yaml-clib" +version = "0.2.12" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/20/84/80203abff8ea4993a87d823a5f632e4d92831ef75d404c9fc78d0176d2b5/ruamel.yaml.clib-0.2.12.tar.gz", hash = "sha256:6c8fbb13ec503f99a91901ab46e0b07ae7941cd527393187039aec586fdfd36f", size = 225315, upload-time = "2024-10-20T10:10:56.22Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/fb/8f/683c6ad562f558cbc4f7c029abcd9599148c51c54b5ef0f24f2638da9fbb/ruamel.yaml.clib-0.2.12-cp311-cp311-macosx_13_0_arm64.whl", hash = "sha256:4a6679521a58256a90b0d89e03992c15144c5f3858f40d7c18886023d7943db6", size = 132224, upload-time = "2024-10-20T10:12:45.162Z" }, + { url = "https://files.pythonhosted.org/packages/3c/d2/b79b7d695e2f21da020bd44c782490578f300dd44f0a4c57a92575758a76/ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux2014_aarch64.whl", hash = "sha256:d84318609196d6bd6da0edfa25cedfbabd8dbde5140a0a23af29ad4b8f91fb1e", size = 641480, upload-time = "2024-10-20T10:12:46.758Z" }, + { url = "https://files.pythonhosted.org/packages/68/6e/264c50ce2a31473a9fdbf4fa66ca9b2b17c7455b31ef585462343818bd6c/ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bb43a269eb827806502c7c8efb7ae7e9e9d0573257a46e8e952f4d4caba4f31e", size = 739068, upload-time = "2024-10-20T10:12:48.605Z" }, + { url = "https://files.pythonhosted.org/packages/86/29/88c2567bc893c84d88b4c48027367c3562ae69121d568e8a3f3a8d363f4d/ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:811ea1594b8a0fb466172c384267a4e5e367298af6b228931f273b111f17ef52", size = 703012, upload-time = "2024-10-20T10:12:51.124Z" }, + { url = "https://files.pythonhosted.org/packages/11/46/879763c619b5470820f0cd6ca97d134771e502776bc2b844d2adb6e37753/ruamel.yaml.clib-0.2.12-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:cf12567a7b565cbf65d438dec6cfbe2917d3c1bdddfce84a9930b7d35ea59642", size = 704352, upload-time = "2024-10-21T11:26:41.438Z" }, + { url = "https://files.pythonhosted.org/packages/02/80/ece7e6034256a4186bbe50dee28cd032d816974941a6abf6a9d65e4228a7/ruamel.yaml.clib-0.2.12-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:7dd5adc8b930b12c8fc5b99e2d535a09889941aa0d0bd06f4749e9a9397c71d2", size = 737344, upload-time = "2024-10-21T11:26:43.62Z" }, + { url = "https://files.pythonhosted.org/packages/f0/ca/e4106ac7e80efbabdf4bf91d3d32fc424e41418458251712f5672eada9ce/ruamel.yaml.clib-0.2.12-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1492a6051dab8d912fc2adeef0e8c72216b24d57bd896ea607cb90bb0c4981d3", size = 714498, upload-time = "2024-12-11T19:58:15.592Z" }, + { url = "https://files.pythonhosted.org/packages/67/58/b1f60a1d591b771298ffa0428237afb092c7f29ae23bad93420b1eb10703/ruamel.yaml.clib-0.2.12-cp311-cp311-win32.whl", hash = "sha256:bd0a08f0bab19093c54e18a14a10b4322e1eacc5217056f3c063bd2f59853ce4", size = 100205, upload-time = "2024-10-20T10:12:52.865Z" }, + { url = "https://files.pythonhosted.org/packages/b4/4f/b52f634c9548a9291a70dfce26ca7ebce388235c93588a1068028ea23fcc/ruamel.yaml.clib-0.2.12-cp311-cp311-win_amd64.whl", hash = "sha256:a274fb2cb086c7a3dea4322ec27f4cb5cc4b6298adb583ab0e211a4682f241eb", size = 118185, upload-time = "2024-10-20T10:12:54.652Z" }, + { url = "https://files.pythonhosted.org/packages/48/41/e7a405afbdc26af961678474a55373e1b323605a4f5e2ddd4a80ea80f628/ruamel.yaml.clib-0.2.12-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:20b0f8dc160ba83b6dcc0e256846e1a02d044e13f7ea74a3d1d56ede4e48c632", size = 133433, upload-time = "2024-10-20T10:12:55.657Z" }, + { url = "https://files.pythonhosted.org/packages/ec/b0/b850385604334c2ce90e3ee1013bd911aedf058a934905863a6ea95e9eb4/ruamel.yaml.clib-0.2.12-cp312-cp312-manylinux2014_aarch64.whl", hash = "sha256:943f32bc9dedb3abff9879edc134901df92cfce2c3d5c9348f172f62eb2d771d", size = 647362, upload-time = "2024-10-20T10:12:57.155Z" }, + { url = "https://files.pythonhosted.org/packages/44/d0/3f68a86e006448fb6c005aee66565b9eb89014a70c491d70c08de597f8e4/ruamel.yaml.clib-0.2.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95c3829bb364fdb8e0332c9931ecf57d9be3519241323c5274bd82f709cebc0c", size = 754118, upload-time = "2024-10-20T10:12:58.501Z" }, + { url = "https://files.pythonhosted.org/packages/52/a9/d39f3c5ada0a3bb2870d7db41901125dbe2434fa4f12ca8c5b83a42d7c53/ruamel.yaml.clib-0.2.12-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:749c16fcc4a2b09f28843cda5a193e0283e47454b63ec4b81eaa2242f50e4ccd", size = 706497, upload-time = "2024-10-20T10:13:00.211Z" }, + { url = "https://files.pythonhosted.org/packages/b0/fa/097e38135dadd9ac25aecf2a54be17ddf6e4c23e43d538492a90ab3d71c6/ruamel.yaml.clib-0.2.12-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:bf165fef1f223beae7333275156ab2022cffe255dcc51c27f066b4370da81e31", size = 698042, upload-time = "2024-10-21T11:26:46.038Z" }, + { url = "https://files.pythonhosted.org/packages/ec/d5/a659ca6f503b9379b930f13bc6b130c9f176469b73b9834296822a83a132/ruamel.yaml.clib-0.2.12-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:32621c177bbf782ca5a18ba4d7af0f1082a3f6e517ac2a18b3974d4edf349680", size = 745831, upload-time = "2024-10-21T11:26:47.487Z" }, + { url = "https://files.pythonhosted.org/packages/db/5d/36619b61ffa2429eeaefaab4f3374666adf36ad8ac6330d855848d7d36fd/ruamel.yaml.clib-0.2.12-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:b82a7c94a498853aa0b272fd5bc67f29008da798d4f93a2f9f289feb8426a58d", size = 715692, upload-time = "2024-12-11T19:58:17.252Z" }, + { url = "https://files.pythonhosted.org/packages/b1/82/85cb92f15a4231c89b95dfe08b09eb6adca929ef7df7e17ab59902b6f589/ruamel.yaml.clib-0.2.12-cp312-cp312-win32.whl", hash = "sha256:e8c4ebfcfd57177b572e2040777b8abc537cdef58a2120e830124946aa9b42c5", size = 98777, upload-time = "2024-10-20T10:13:01.395Z" }, + { url = "https://files.pythonhosted.org/packages/d7/8f/c3654f6f1ddb75daf3922c3d8fc6005b1ab56671ad56ffb874d908bfa668/ruamel.yaml.clib-0.2.12-cp312-cp312-win_amd64.whl", hash = "sha256:0467c5965282c62203273b838ae77c0d29d7638c8a4e3a1c8bdd3602c10904e4", size = 115523, upload-time = "2024-10-20T10:13:02.768Z" }, +] + [[package]] name = "s3transfer" version = "0.13.0" @@ -2997,6 +3625,68 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/69/e2/b011c38e5394c4c18fb5500778a55ec43ad6106126e74723ffaee246f56e/safetensors-0.5.3-cp38-abi3-win_amd64.whl", hash = "sha256:836cbbc320b47e80acd40e44c8682db0e8ad7123209f69b093def21ec7cafd11", size = 308878, upload-time = "2025-02-26T09:15:14.99Z" }, ] +[[package]] +name = "scikit-learn" +version = "1.7.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "joblib" }, + { name = "numpy" }, + { name = "scipy" }, + { name = "threadpoolctl" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/41/84/5f4af978fff619706b8961accac84780a6d298d82a8873446f72edb4ead0/scikit_learn-1.7.1.tar.gz", hash = "sha256:24b3f1e976a4665aa74ee0fcaac2b8fccc6ae77c8e07ab25da3ba6d3292b9802", size = 7190445, upload-time = "2025-07-18T08:01:54.5Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/b4/bd/a23177930abd81b96daffa30ef9c54ddbf544d3226b8788ce4c3ef1067b4/scikit_learn-1.7.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:90c8494ea23e24c0fb371afc474618c1019dc152ce4a10e4607e62196113851b", size = 9334838, upload-time = "2025-07-18T08:01:11.239Z" }, + { url = "https://files.pythonhosted.org/packages/8d/a1/d3a7628630a711e2ac0d1a482910da174b629f44e7dd8cfcd6924a4ef81a/scikit_learn-1.7.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:bb870c0daf3bf3be145ec51df8ac84720d9972170786601039f024bf6d61a518", size = 8651241, upload-time = "2025-07-18T08:01:13.234Z" }, + { url = "https://files.pythonhosted.org/packages/26/92/85ec172418f39474c1cd0221d611345d4f433fc4ee2fc68e01f524ccc4e4/scikit_learn-1.7.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:40daccd1b5623f39e8943ab39735cadf0bdce80e67cdca2adcb5426e987320a8", size = 9718677, upload-time = "2025-07-18T08:01:15.649Z" }, + { url = "https://files.pythonhosted.org/packages/df/ce/abdb1dcbb1d2b66168ec43b23ee0cee356b4cc4100ddee3943934ebf1480/scikit_learn-1.7.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:30d1f413cfc0aa5a99132a554f1d80517563c34a9d3e7c118fde2d273c6fe0f7", size = 9511189, upload-time = "2025-07-18T08:01:18.013Z" }, + { url = "https://files.pythonhosted.org/packages/b2/3b/47b5eaee01ef2b5a80ba3f7f6ecf79587cb458690857d4777bfd77371c6f/scikit_learn-1.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:c711d652829a1805a95d7fe96654604a8f16eab5a9e9ad87b3e60173415cb650", size = 8914794, upload-time = "2025-07-18T08:01:20.357Z" }, + { url = "https://files.pythonhosted.org/packages/cb/16/57f176585b35ed865f51b04117947fe20f130f78940c6477b6d66279c9c2/scikit_learn-1.7.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:3cee419b49b5bbae8796ecd690f97aa412ef1674410c23fc3257c6b8b85b8087", size = 9260431, upload-time = "2025-07-18T08:01:22.77Z" }, + { url = "https://files.pythonhosted.org/packages/67/4e/899317092f5efcab0e9bc929e3391341cec8fb0e816c4789686770024580/scikit_learn-1.7.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:2fd8b8d35817b0d9ebf0b576f7d5ffbbabdb55536b0655a8aaae629d7ffd2e1f", size = 8637191, upload-time = "2025-07-18T08:01:24.731Z" }, + { url = "https://files.pythonhosted.org/packages/f3/1b/998312db6d361ded1dd56b457ada371a8d8d77ca2195a7d18fd8a1736f21/scikit_learn-1.7.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:588410fa19a96a69763202f1d6b7b91d5d7a5d73be36e189bc6396bfb355bd87", size = 9486346, upload-time = "2025-07-18T08:01:26.713Z" }, + { url = "https://files.pythonhosted.org/packages/ad/09/a2aa0b4e644e5c4ede7006748f24e72863ba2ae71897fecfd832afea01b4/scikit_learn-1.7.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e3142f0abe1ad1d1c31a2ae987621e41f6b578144a911ff4ac94781a583adad7", size = 9290988, upload-time = "2025-07-18T08:01:28.938Z" }, + { url = "https://files.pythonhosted.org/packages/15/fa/c61a787e35f05f17fc10523f567677ec4eeee5f95aa4798dbbbcd9625617/scikit_learn-1.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:3ddd9092c1bd469acab337d87930067c87eac6bd544f8d5027430983f1e1ae88", size = 8735568, upload-time = "2025-07-18T08:01:30.936Z" }, +] + +[[package]] +name = "scipy" +version = "1.16.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/f5/4a/b927028464795439faec8eaf0b03b011005c487bb2d07409f28bf30879c4/scipy-1.16.1.tar.gz", hash = "sha256:44c76f9e8b6e8e488a586190ab38016e4ed2f8a038af7cd3defa903c0a2238b3", size = 30580861, upload-time = "2025-07-27T16:33:30.834Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/da/91/812adc6f74409b461e3a5fa97f4f74c769016919203138a3bf6fc24ba4c5/scipy-1.16.1-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:c033fa32bab91dc98ca59d0cf23bb876454e2bb02cbe592d5023138778f70030", size = 36552519, upload-time = "2025-07-27T16:26:29.658Z" }, + { url = "https://files.pythonhosted.org/packages/47/18/8e355edcf3b71418d9e9f9acd2708cc3a6c27e8f98fde0ac34b8a0b45407/scipy-1.16.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:6e5c2f74e5df33479b5cd4e97a9104c511518fbd979aa9b8f6aec18b2e9ecae7", size = 28638010, upload-time = "2025-07-27T16:26:38.196Z" }, + { url = "https://files.pythonhosted.org/packages/d9/eb/e931853058607bdfbc11b86df19ae7a08686121c203483f62f1ecae5989c/scipy-1.16.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:0a55ffe0ba0f59666e90951971a884d1ff6f4ec3275a48f472cfb64175570f77", size = 20909790, upload-time = "2025-07-27T16:26:43.93Z" }, + { url = "https://files.pythonhosted.org/packages/45/0c/be83a271d6e96750cd0be2e000f35ff18880a46f05ce8b5d3465dc0f7a2a/scipy-1.16.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:f8a5d6cd147acecc2603fbd382fed6c46f474cccfcf69ea32582e033fb54dcfe", size = 23513352, upload-time = "2025-07-27T16:26:50.017Z" }, + { url = "https://files.pythonhosted.org/packages/7c/bf/fe6eb47e74f762f933cca962db7f2c7183acfdc4483bd1c3813cfe83e538/scipy-1.16.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb18899127278058bcc09e7b9966d41a5a43740b5bb8dcba401bd983f82e885b", size = 33534643, upload-time = "2025-07-27T16:26:57.503Z" }, + { url = "https://files.pythonhosted.org/packages/bb/ba/63f402e74875486b87ec6506a4f93f6d8a0d94d10467280f3d9d7837ce3a/scipy-1.16.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:adccd93a2fa937a27aae826d33e3bfa5edf9aa672376a4852d23a7cd67a2e5b7", size = 35376776, upload-time = "2025-07-27T16:27:06.639Z" }, + { url = "https://files.pythonhosted.org/packages/c3/b4/04eb9d39ec26a1b939689102da23d505ea16cdae3dbb18ffc53d1f831044/scipy-1.16.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:18aca1646a29ee9a0625a1be5637fa798d4d81fdf426481f06d69af828f16958", size = 35698906, upload-time = "2025-07-27T16:27:14.943Z" }, + { url = "https://files.pythonhosted.org/packages/04/d6/bb5468da53321baeb001f6e4e0d9049eadd175a4a497709939128556e3ec/scipy-1.16.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:d85495cef541729a70cdddbbf3e6b903421bc1af3e8e3a9a72a06751f33b7c39", size = 38129275, upload-time = "2025-07-27T16:27:23.873Z" }, + { url = "https://files.pythonhosted.org/packages/c4/94/994369978509f227cba7dfb9e623254d0d5559506fe994aef4bea3ed469c/scipy-1.16.1-cp311-cp311-win_amd64.whl", hash = "sha256:226652fca853008119c03a8ce71ffe1b3f6d2844cc1686e8f9806edafae68596", size = 38644572, upload-time = "2025-07-27T16:27:32.637Z" }, + { url = "https://files.pythonhosted.org/packages/f8/d9/ec4864f5896232133f51382b54a08de91a9d1af7a76dfa372894026dfee2/scipy-1.16.1-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:81b433bbeaf35728dad619afc002db9b189e45eebe2cd676effe1fb93fef2b9c", size = 36575194, upload-time = "2025-07-27T16:27:41.321Z" }, + { url = "https://files.pythonhosted.org/packages/5c/6d/40e81ecfb688e9d25d34a847dca361982a6addf8e31f0957b1a54fbfa994/scipy-1.16.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:886cc81fdb4c6903a3bb0464047c25a6d1016fef77bb97949817d0c0d79f9e04", size = 28594590, upload-time = "2025-07-27T16:27:49.204Z" }, + { url = "https://files.pythonhosted.org/packages/0e/37/9f65178edfcc629377ce9a64fc09baebea18c80a9e57ae09a52edf84880b/scipy-1.16.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:15240c3aac087a522b4eaedb09f0ad061753c5eebf1ea430859e5bf8640d5919", size = 20866458, upload-time = "2025-07-27T16:27:54.98Z" }, + { url = "https://files.pythonhosted.org/packages/2c/7b/749a66766871ea4cb1d1ea10f27004db63023074c22abed51f22f09770e0/scipy-1.16.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:65f81a25805f3659b48126b5053d9e823d3215e4a63730b5e1671852a1705921", size = 23539318, upload-time = "2025-07-27T16:28:01.604Z" }, + { url = "https://files.pythonhosted.org/packages/c4/db/8d4afec60eb833a666434d4541a3151eedbf2494ea6d4d468cbe877f00cd/scipy-1.16.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:6c62eea7f607f122069b9bad3f99489ddca1a5173bef8a0c75555d7488b6f725", size = 33292899, upload-time = "2025-07-27T16:28:09.147Z" }, + { url = "https://files.pythonhosted.org/packages/51/1e/79023ca3bbb13a015d7d2757ecca3b81293c663694c35d6541b4dca53e98/scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f965bbf3235b01c776115ab18f092a95aa74c271a52577bcb0563e85738fd618", size = 35162637, upload-time = "2025-07-27T16:28:17.535Z" }, + { url = "https://files.pythonhosted.org/packages/b6/49/0648665f9c29fdaca4c679182eb972935b3b4f5ace41d323c32352f29816/scipy-1.16.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f006e323874ffd0b0b816d8c6a8e7f9a73d55ab3b8c3f72b752b226d0e3ac83d", size = 35490507, upload-time = "2025-07-27T16:28:25.705Z" }, + { url = "https://files.pythonhosted.org/packages/62/8f/66cbb9d6bbb18d8c658f774904f42a92078707a7c71e5347e8bf2f52bb89/scipy-1.16.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e8fd15fc5085ab4cca74cb91fe0a4263b1f32e4420761ddae531ad60934c2119", size = 37923998, upload-time = "2025-07-27T16:28:34.339Z" }, + { url = "https://files.pythonhosted.org/packages/14/c3/61f273ae550fbf1667675701112e380881905e28448c080b23b5a181df7c/scipy-1.16.1-cp312-cp312-win_amd64.whl", hash = "sha256:f7b8013c6c066609577d910d1a2a077021727af07b6fab0ee22c2f901f22352a", size = 38508060, upload-time = "2025-07-27T16:28:43.242Z" }, +] + +[[package]] +name = "semver" +version = "3.0.4" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/72/d1/d3159231aec234a59dd7d601e9dd9fe96f3afff15efd33c1070019b26132/semver-3.0.4.tar.gz", hash = "sha256:afc7d8c584a5ed0a11033af086e8af226a9c0b206f313e0301f8dd7b6b589602", size = 269730, upload-time = "2025-01-24T13:19:27.617Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/a6/24/4d91e05817e92e3a61c8a21e08fd0f390f5301f1c448b137c57c4bc6e543/semver-3.0.4-py3-none-any.whl", hash = "sha256:9c824d87ba7f7ab4a1890799cec8596f15c1241cb473404ea1cb0c55e4b04746", size = 17912, upload-time = "2025-01-24T13:19:24.949Z" }, +] + [[package]] name = "sentencepiece" version = "0.2.0" @@ -3057,6 +3747,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl", hash = "sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686", size = 9755, upload-time = "2023-10-24T04:13:38.866Z" }, ] +[[package]] +name = "silero-vad" +version = "5.1.2" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "onnxruntime" }, + { name = "torch" }, + { name = "torchaudio" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b1/b4/d0311b2e6220a11f8f4699f4a278cb088131573286cdfe804c87c7eb5123/silero_vad-5.1.2.tar.gz", hash = "sha256:c442971160026d2d7aa0ad83f0c7ee86c89797a65289fe625c8ea59fc6fb828d", size = 5098526, upload-time = "2024-10-09T09:50:47.019Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/98/f7/5ae11d13fbb733cd3bfd7ff1c3a3902e6f55437df4b72307c1f168146268/silero_vad-5.1.2-py3-none-any.whl", hash = "sha256:93b41953d7774b165407fda6b533c119c5803864e367d5034dc626c82cfdf661", size = 5026737, upload-time = "2024-10-09T09:50:44.355Z" }, +] + [[package]] name = "six" version = "1.17.0" @@ -3084,6 +3788,25 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/46/9cb0e58b2deb7f82b84065f37f3bffeb12413f947f9388e4cac22c4621ce/sortedcontainers-2.4.0-py2.py3-none-any.whl", hash = "sha256:a163dcaede0f1c021485e957a39245190e74249897e2ae4b2aa38595db237ee0", size = 29575, upload-time = "2021-05-16T22:03:41.177Z" }, ] +[[package]] +name = "soundfile" +version = "0.13.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "cffi" }, + { name = "numpy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/e1/41/9b873a8c055582859b239be17902a85339bec6a30ad162f98c9b0288a2cc/soundfile-0.13.1.tar.gz", hash = "sha256:b2c68dab1e30297317080a5b43df57e302584c49e2942defdde0acccc53f0e5b", size = 46156, upload-time = "2025-01-25T09:17:04.831Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/64/28/e2a36573ccbcf3d57c00626a21fe51989380636e821b341d36ccca0c1c3a/soundfile-0.13.1-py2.py3-none-any.whl", hash = "sha256:a23c717560da2cf4c7b5ae1142514e0fd82d6bbd9dfc93a50423447142f2c445", size = 25751, upload-time = "2025-01-25T09:16:44.235Z" }, + { url = "https://files.pythonhosted.org/packages/ea/ab/73e97a5b3cc46bba7ff8650a1504348fa1863a6f9d57d7001c6b67c5f20e/soundfile-0.13.1-py2.py3-none-macosx_10_9_x86_64.whl", hash = "sha256:82dc664d19831933fe59adad199bf3945ad06d84bc111a5b4c0d3089a5b9ec33", size = 1142250, upload-time = "2025-01-25T09:16:47.583Z" }, + { url = "https://files.pythonhosted.org/packages/a0/e5/58fd1a8d7b26fc113af244f966ee3aecf03cb9293cb935daaddc1e455e18/soundfile-0.13.1-py2.py3-none-macosx_11_0_arm64.whl", hash = "sha256:743f12c12c4054921e15736c6be09ac26b3b3d603aef6fd69f9dde68748f2593", size = 1101406, upload-time = "2025-01-25T09:16:49.662Z" }, + { url = "https://files.pythonhosted.org/packages/58/ae/c0e4a53d77cf6e9a04179535766b3321b0b9ced5f70522e4caf9329f0046/soundfile-0.13.1-py2.py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:9c9e855f5a4d06ce4213f31918653ab7de0c5a8d8107cd2427e44b42df547deb", size = 1235729, upload-time = "2025-01-25T09:16:53.018Z" }, + { url = "https://files.pythonhosted.org/packages/57/5e/70bdd9579b35003a489fc850b5047beeda26328053ebadc1fb60f320f7db/soundfile-0.13.1-py2.py3-none-manylinux_2_28_x86_64.whl", hash = "sha256:03267c4e493315294834a0870f31dbb3b28a95561b80b134f0bd3cf2d5f0e618", size = 1313646, upload-time = "2025-01-25T09:16:54.872Z" }, + { url = "https://files.pythonhosted.org/packages/fe/df/8c11dc4dfceda14e3003bb81a0d0edcaaf0796dd7b4f826ea3e532146bba/soundfile-0.13.1-py2.py3-none-win32.whl", hash = "sha256:c734564fab7c5ddf8e9be5bf70bab68042cd17e9c214c06e365e20d64f9a69d5", size = 899881, upload-time = "2025-01-25T09:16:56.663Z" }, + { url = "https://files.pythonhosted.org/packages/14/e9/6b761de83277f2f02ded7e7ea6f07828ec78e4b229b80e4ca55dd205b9dc/soundfile-0.13.1-py2.py3-none-win_amd64.whl", hash = "sha256:1e70a05a0626524a69e9f0f4dd2ec174b4e9567f4d8b6c11d38b5c289be36ee9", size = 1019162, upload-time = "2025-01-25T09:16:59.573Z" }, +] + [[package]] name = "soupsieve" version = "2.7" @@ -3093,6 +3816,27 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e7/9c/0e6afc12c269578be5c0c1c9f4b49a8d32770a080260c333ac04cc1c832d/soupsieve-2.7-py3-none-any.whl", hash = "sha256:6e60cc5c1ffaf1cebcc12e8188320b72071e922c2e897f737cadce79ad5d30c4", size = 36677, upload-time = "2025-04-20T18:50:07.196Z" }, ] +[[package]] +name = "speechbrain" +version = "1.0.3" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "huggingface-hub" }, + { name = "hyperpyyaml" }, + { name = "joblib" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "scipy" }, + { name = "sentencepiece" }, + { name = "torch" }, + { name = "torchaudio" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/ab/10/87e666544a4e0cec7cbdc09f26948994831ae0f8bbc58de3bf53b68285ff/speechbrain-1.0.3.tar.gz", hash = "sha256:fcab3c6e90012cecb1eed40ea235733b550137e73da6bfa2340ba191ec714052", size = 747735, upload-time = "2025-04-07T17:17:06.749Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/58/13/e61f1085aebee17d5fc2df19fcc5177c10379be52578afbecdd615a831c9/speechbrain-1.0.3-py3-none-any.whl", hash = "sha256:9859d4c1b1fb3af3b85523c0c89f52e45a04f305622ed55f31aa32dd2fba19e9", size = 864091, upload-time = "2025-04-07T17:17:04.706Z" }, +] + [[package]] name = "sqlalchemy" version = "1.4.54" @@ -3174,6 +3918,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" }, ] +[[package]] +name = "tabulate" +version = "0.9.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/ec/fe/802052aecb21e3797b8f7902564ab6ea0d60ff8ca23952079064155d1ae1/tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c", size = 81090, upload-time = "2022-10-06T17:21:48.54Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f", size = 35252, upload-time = "2022-10-06T17:21:44.262Z" }, +] + [[package]] name = "tenacity" version = "9.1.2" @@ -3183,6 +3936,29 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e5/30/643397144bfbfec6f6ef821f36f33e57d35946c44a2352d3c9f0ae847619/tenacity-9.1.2-py3-none-any.whl", hash = "sha256:f77bf36710d8b73a50b2dd155c97b870017ad21afe6ab300326b0371b3b05138", size = 28248, upload-time = "2025-04-02T08:25:07.678Z" }, ] +[[package]] +name = "tensorboardx" +version = "2.6.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "packaging" }, + { name = "protobuf" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/2b/c5/d4cc6e293fb837aaf9f76dd7745476aeba8ef7ef5146c3b3f9ee375fe7a5/tensorboardx-2.6.4.tar.gz", hash = "sha256:b163ccb7798b31100b9f5fa4d6bc22dad362d7065c2f24b51e50731adde86828", size = 4769801, upload-time = "2025-06-10T22:37:07.419Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e0/1d/b5d63f1a6b824282b57f7b581810d20b7a28ca951f2d5b59f1eb0782c12b/tensorboardx-2.6.4-py3-none-any.whl", hash = "sha256:5970cf3a1f0a6a6e8b180ccf46f3fe832b8a25a70b86e5a237048a7c0beb18e2", size = 87201, upload-time = "2025-06-10T22:37:05.44Z" }, +] + +[[package]] +name = "threadpoolctl" +version = "3.6.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/b7/4d/08c89e34946fce2aec4fbb45c9016efd5f4d7f24af8e5d93296e935631d8/threadpoolctl-3.6.0.tar.gz", hash = "sha256:8ab8b4aa3491d812b623328249fab5302a68d2d71745c8a4c719a2fcaba9f44e", size = 21274, upload-time = "2025-03-13T13:49:23.031Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" }, +] + [[package]] name = "tiktoken" version = "0.9.0" @@ -3261,6 +4037,108 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/6e/c2/61d3e0f47e2b74ef40a68b9e6ad5984f6241a942f7cd3bbfbdbd03861ea9/tomli-2.2.1-py3-none-any.whl", hash = "sha256:cb55c73c5f4408779d0cf3eef9f762b9c9f147a77de7b258bef0a5628adc85cc", size = 14257, upload-time = "2024-11-27T22:38:35.385Z" }, ] +[[package]] +name = "torch" +version = "2.8.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "filelock" }, + { name = "fsspec" }, + { name = "jinja2" }, + { name = "networkx" }, + { name = "nvidia-cublas-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-cupti-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-nvrtc-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cuda-runtime-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cudnn-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cufft-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cufile-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-curand-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusolver-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusparse-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-cusparselt-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nccl-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nvjitlink-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "nvidia-nvtx-cu12", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "setuptools", marker = "python_full_version >= '3.12'" }, + { name = "sympy" }, + { name = "triton", marker = "platform_machine == 'x86_64' and sys_platform == 'linux'" }, + { name = "typing-extensions" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/8f/c4/3e7a3887eba14e815e614db70b3b529112d1513d9dae6f4d43e373360b7f/torch-2.8.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:220a06fd7af8b653c35d359dfe1aaf32f65aa85befa342629f716acb134b9710", size = 102073391, upload-time = "2025-08-06T14:53:20.937Z" }, + { url = "https://files.pythonhosted.org/packages/5a/63/4fdc45a0304536e75a5e1b1bbfb1b56dd0e2743c48ee83ca729f7ce44162/torch-2.8.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:c12fa219f51a933d5f80eeb3a7a5d0cbe9168c0a14bbb4055f1979431660879b", size = 888063640, upload-time = "2025-08-06T14:55:05.325Z" }, + { url = "https://files.pythonhosted.org/packages/84/57/2f64161769610cf6b1c5ed782bd8a780e18a3c9d48931319f2887fa9d0b1/torch-2.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:8c7ef765e27551b2fbfc0f41bcf270e1292d9bf79f8e0724848b1682be6e80aa", size = 241366752, upload-time = "2025-08-06T14:53:38.692Z" }, + { url = "https://files.pythonhosted.org/packages/a4/5e/05a5c46085d9b97e928f3f037081d3d2b87fb4b4195030fc099aaec5effc/torch-2.8.0-cp311-none-macosx_11_0_arm64.whl", hash = "sha256:5ae0524688fb6707c57a530c2325e13bb0090b745ba7b4a2cd6a3ce262572916", size = 73621174, upload-time = "2025-08-06T14:53:25.44Z" }, + { url = "https://files.pythonhosted.org/packages/49/0c/2fd4df0d83a495bb5e54dca4474c4ec5f9c62db185421563deeb5dabf609/torch-2.8.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:e2fab4153768d433f8ed9279c8133a114a034a61e77a3a104dcdf54388838705", size = 101906089, upload-time = "2025-08-06T14:53:52.631Z" }, + { url = "https://files.pythonhosted.org/packages/99/a8/6acf48d48838fb8fe480597d98a0668c2beb02ee4755cc136de92a0a956f/torch-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:b2aca0939fb7e4d842561febbd4ffda67a8e958ff725c1c27e244e85e982173c", size = 887913624, upload-time = "2025-08-06T14:56:44.33Z" }, + { url = "https://files.pythonhosted.org/packages/af/8a/5c87f08e3abd825c7dfecef5a0f1d9aa5df5dd0e3fd1fa2f490a8e512402/torch-2.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:2f4ac52f0130275d7517b03a33d2493bab3693c83dcfadf4f81688ea82147d2e", size = 241326087, upload-time = "2025-08-06T14:53:46.503Z" }, + { url = "https://files.pythonhosted.org/packages/be/66/5c9a321b325aaecb92d4d1855421e3a055abd77903b7dab6575ca07796db/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:619c2869db3ada2c0105487ba21b5008defcc472d23f8b80ed91ac4a380283b0", size = 73630478, upload-time = "2025-08-06T14:53:57.144Z" }, +] + +[[package]] +name = "torch-audiomentations" +version = "0.12.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "julius" }, + { name = "torch" }, + { name = "torch-pitch-shift" }, + { name = "torchaudio" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/31/8d/2f8fd7e34c75f5ee8de4310c3bd3f22270acd44d1f809e2fe7c12fbf35f8/torch_audiomentations-0.12.0.tar.gz", hash = "sha256:b02d4c5eb86376986a53eb405cca5e34f370ea9284411237508e720c529f7888", size = 52094, upload-time = "2025-01-15T09:07:01.071Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/21/9d/1ee04f49c15d2d632f6f7102061d7c07652858e6d91b58a091531034e84f/torch_audiomentations-0.12.0-py3-none-any.whl", hash = "sha256:1b80b91d2016ccf83979622cac8f702072a79b7dcc4c2bee40f00b26433a786b", size = 48506, upload-time = "2025-01-15T09:06:59.687Z" }, +] + +[[package]] +name = "torch-pitch-shift" +version = "1.2.5" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, + { name = "primepy" }, + { name = "torch" }, + { name = "torchaudio" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/79/a6/722a832bca75d5079f6731e005b3d0c2eec7c6c6863d030620952d143d57/torch_pitch_shift-1.2.5.tar.gz", hash = "sha256:6e1c7531f08d0f407a4c55e5ff8385a41355c5c5d27ab7fa08632e51defbd0ed", size = 4725, upload-time = "2024-09-25T19:10:12.922Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/27/4c/96ac2a09efb56cc3c41fb3ce9b6f4d8c0604499f7481d4a13a7b03e21382/torch_pitch_shift-1.2.5-py3-none-any.whl", hash = "sha256:6f8500cbc13f1c98b11cde1805ce5084f82cdd195c285f34287541f168a7c6a7", size = 5005, upload-time = "2024-09-25T19:10:11.521Z" }, +] + +[[package]] +name = "torchaudio" +version = "2.8.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "torch" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/dd/bf/6b01ef3defb8d0a772c863588711e9b2b011c27d6b37c1b9d15a359c8442/torchaudio-2.8.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c9276857d241c6de257af765c0f51fc011af38cb725401495121b280913007cf", size = 1859094, upload-time = "2025-08-06T14:58:35.078Z" }, + { url = "https://files.pythonhosted.org/packages/75/ca/da5d0a3bb7d114a8b590ecce14859ea0a05102bb4de68cdd1ed7a90634d6/torchaudio-2.8.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:4573c6042950c20278e3608a9a38050ba0bc72e0049e1bbfd249caf859a8029b", size = 1692033, upload-time = "2025-08-06T14:58:37.393Z" }, + { url = "https://files.pythonhosted.org/packages/b6/ef/62ac736d8f906cc414181050e08a495a637dab985186c34bd76ea37efbc0/torchaudio-2.8.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:776c0b4ba84b9e3ddf6304b9c47cd63549d7896a6f3d5184ece074cc3d76ed6b", size = 4011716, upload-time = "2025-08-06T14:58:40.138Z" }, + { url = "https://files.pythonhosted.org/packages/14/86/015337c8434abc604b8680371df783f66c421a7f211cbe40a374b0540b6d/torchaudio-2.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:078105bf80f725c0215a0bebac8cb2fb1b3993ab32bdc3fcd50145a5b4127001", size = 2505194, upload-time = "2025-08-06T14:58:57.301Z" }, + { url = "https://files.pythonhosted.org/packages/ac/cc/c2e2a3eb6ee956f73c68541e439916f8146170ea9cc61e72adea5c995312/torchaudio-2.8.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:ddef94bf181e6447cbb05f38beaca8f6c5bb8d2b9ddced1aa3452025b9fc70d3", size = 1856736, upload-time = "2025-08-06T14:58:36.3Z" }, + { url = "https://files.pythonhosted.org/packages/c7/0d/24dad878784f1edd62862f27173781669f0c71eb46368636787d1e364188/torchaudio-2.8.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:862e2e40bf09d865e5df080a84c1a39bbcef40e43140f4b1737eb3a389d3b38f", size = 1692930, upload-time = "2025-08-06T14:58:41.312Z" }, + { url = "https://files.pythonhosted.org/packages/c2/a6/84d80f34472503e9eb82245d7df501c59602d75d7360e717fb9b84f91c5e/torchaudio-2.8.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:93a8583f280fe83ba021aa713319381ea71362cc87b67ee38e97a43cb2254aee", size = 4014607, upload-time = "2025-08-06T14:58:47.234Z" }, + { url = "https://files.pythonhosted.org/packages/43/ab/96ad33afa320738a7cfb4b51ba97e2f3cfb1e04ae3115d5057655103ba4f/torchaudio-2.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:4b82cacd1b8ccd543b1149d8cab257a40dfda8119023d2e3a96c66349c84bffb", size = 2499890, upload-time = "2025-08-06T14:58:55.066Z" }, +] + +[[package]] +name = "torchmetrics" +version = "1.8.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "lightning-utilities" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "torch" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/78/1f/2cd9eb8f3390c3ec4693ac0871913d4b468964b3833638e4091a70817e0a/torchmetrics-1.8.1.tar.gz", hash = "sha256:04ca021105871637c5d34d0a286b3ab665a1e3d2b395e561f14188a96e862fdb", size = 580373, upload-time = "2025-08-07T20:44:44.631Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/8f/59/5c1c1cb08c494621901cf549a543f87143019fac1e6dd191eb4630bbc8fb/torchmetrics-1.8.1-py3-none-any.whl", hash = "sha256:2437501351e0da3d294c71210ce8139b9c762b5e20604f7a051a725443db8f4b", size = 982961, upload-time = "2025-08-07T20:44:42.608Z" }, +] + [[package]] name = "tqdm" version = "4.67.1" @@ -3294,6 +4172,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/96/88/beb33a79a382fcd2aed0be5222bdc47f41e4bfe7aaa90ae1374f1d8ea2af/transformers-4.53.2-py3-none-any.whl", hash = "sha256:db8f4819bb34f000029c73c3c557e7d06fc1b8e612ec142eecdae3947a9c78bf", size = 10826609, upload-time = "2025-07-11T12:39:05.461Z" }, ] +[[package]] +name = "triton" +version = "3.4.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "setuptools" }, +] +wheels = [ + { url = "https://files.pythonhosted.org/packages/7d/39/43325b3b651d50187e591eefa22e236b2981afcebaefd4f2fc0ea99df191/triton-3.4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7b70f5e6a41e52e48cfc087436c8a28c17ff98db369447bcaff3b887a3ab4467", size = 155531138, upload-time = "2025-07-30T19:58:29.908Z" }, + { url = "https://files.pythonhosted.org/packages/d0/66/b1eb52839f563623d185f0927eb3530ee4d5ffe9d377cdaf5346b306689e/triton-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:31c1d84a5c0ec2c0f8e8a072d7fd150cab84a9c239eaddc6706c081bfae4eb04", size = 155560068, upload-time = "2025-07-30T19:58:37.081Z" }, +] + [[package]] name = "typer" version = "0.16.0" @@ -3405,6 +4295,43 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/8f/eb/f7032be105877bcf924709c97b1bf3b90255b4ec251f9340cef912559f28/uvloop-0.21.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:183aef7c8730e54c9a3ee3227464daed66e37ba13040bb3f350bc2ddc040f22f", size = 4659022, upload-time = "2024-10-14T23:37:58.195Z" }, ] +[[package]] +name = "vcrpy" +version = "5.1.0" +source = { registry = "https://pypi.org/simple" } +resolution-markers = [ + "python_full_version >= '3.12' and platform_python_implementation == 'PyPy'", + "python_full_version < '3.12' and platform_python_implementation == 'PyPy'", +] +dependencies = [ + { name = "pyyaml", marker = "platform_python_implementation == 'PyPy'" }, + { name = "wrapt", marker = "platform_python_implementation == 'PyPy'" }, + { name = "yarl", marker = "platform_python_implementation == 'PyPy'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/a5/ea/a166a3cce4ac5958ba9bbd9768acdb1ba38ae17ff7986da09fa5b9dbc633/vcrpy-5.1.0.tar.gz", hash = "sha256:bbf1532f2618a04f11bce2a99af3a9647a32c880957293ff91e0a5f187b6b3d2", size = 84576, upload-time = "2023-07-31T03:19:32.231Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2a/5b/3f70bcb279ad30026cc4f1df0a0491a0205a24dddd88301f396c485de9e7/vcrpy-5.1.0-py2.py3-none-any.whl", hash = "sha256:605e7b7a63dcd940db1df3ab2697ca7faf0e835c0852882142bafb19649d599e", size = 41969, upload-time = "2023-07-31T03:19:30.128Z" }, +] + +[[package]] +name = "vcrpy" +version = "7.0.0" +source = { registry = "https://pypi.org/simple" } +resolution-markers = [ + "python_full_version >= '3.12' and platform_python_implementation != 'PyPy'", + "python_full_version < '3.12' and platform_python_implementation != 'PyPy'", +] +dependencies = [ + { name = "pyyaml", marker = "platform_python_implementation != 'PyPy'" }, + { name = "urllib3", marker = "platform_python_implementation != 'PyPy'" }, + { name = "wrapt", marker = "platform_python_implementation != 'PyPy'" }, + { name = "yarl", marker = "platform_python_implementation != 'PyPy'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/25/d3/856e06184d4572aada1dd559ddec3bedc46df1f2edc5ab2c91121a2cccdb/vcrpy-7.0.0.tar.gz", hash = "sha256:176391ad0425edde1680c5b20738ea3dc7fb942520a48d2993448050986b3a50", size = 85502, upload-time = "2024-12-31T00:07:57.894Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/13/5d/1f15b252890c968d42b348d1e9b0aa12d5bf3e776704178ec37cceccdb63/vcrpy-7.0.0-py2.py3-none-any.whl", hash = "sha256:55791e26c18daa363435054d8b35bd41a4ac441b6676167635d1b37a71dbe124", size = 42321, upload-time = "2024-12-31T00:07:55.277Z" }, +] + [[package]] name = "vine" version = "5.1.0"