This commit restore the original behavior with frame cutting. While
silero is used on our gpu for files, look like it's not working great on
the live pipeline. To be investigated, but at the moment, what we keep
is:
- refactored to extract the downscale for further processing in the
pipeline
- remove any downscale implementation from audio_chunker and audio_merge
- removed batching from audio_merge too for now
* feat: improve pipeline threading, and transcriber (parakeet and silero vad)
* refactor: remove whisperx, implement parakeet
* refactor: make audio_chunker more smart and wait for speech, instead of fixed frame
* refactor: make audio merge to always downscale the audio to 16k for transcription
* refactor: make the audio transcript modal accepting batches
* refactor: improve type safety and remove prometheus metrics
- Add DiarizationSegment TypedDict for proper diarization typing
- Replace List/Optional with modern Python list/| None syntax
- Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor
- Add comprehensive file processing pipeline with parallel execution
- Update processor imports and type annotations throughout
- Implement optimized file pipeline as default in process.py tool
* refactor: convert FileDiarizationProcessor I/O types to BaseModel
Update FileDiarizationInput and FileDiarizationOutput to inherit from
BaseModel instead of plain classes, following the standard pattern
used by other processors in the codebase.
* test: add tests for file transcript and diarization with pytest-recording
* build: add pytest-recording
* feat: add local pyannote for testing
* fix: replace PyAV AudioResampler with torchaudio for reliable audio processing
- Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument
- Use torchaudio.functional.resample for robust sample rate conversion
- Optimize processing: skip conversion for already 16kHz mono audio
- Add direct WAV writing with Python wave module for better performance
- Consolidate duplicate downsample checks for cleaner code
- Maintain list[av.AudioFrame] input interface
- Required for Silero VAD which needs 16kHz mono audio
* fix: replace PyAV AudioResampler with torchaudio solution
- Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor
- Replaces problematic PyAV AudioResampler with torchaudio.functional.resample
- Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono
- Uses direct WAV writing with Python's wave module for better performance
- Fixes test_basic_process to disable diarization (pyannote dependency not installed)
- Updates test expectations to match actual processor behavior
- Removes unused pydub dependency from pyproject.toml
- Adds comprehensive TEST_ANALYSIS.md documenting test suite status
* feat: add parameterized test for both diarization modes
- Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True]
- Test with diarization=False always passes (tests core AudioMergeProcessor functionality)
- Test with diarization=True gracefully skips when pyannote.audio is not installed
- Provides comprehensive test coverage for both pipeline configurations
* fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor
- Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute
- Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called
- Updates property usage in _diarize method to use new name
- Now correctly supports pipeline initialization for diarization processing
* fix: add local for pyannote
* test: add diarization test
* fix: resample on audio merge now working
* fix: correctly restore timestamp
* fix: display exception in a threaded processor if that happen
* Update pyproject.toml
* ci: remove option
* ci: update astral-sh/setup-uv
* test: add monadical url for pytest-recording
* refactor: remove previous version
* build: move faster whisper to local dep
* test: fix missing import
* refactor: improve main_file_pipeline organization and error handling
- Move all imports to the top of the file
- Create unified EmptyPipeline class to replace duplicate mock pipeline code
- Remove timeout and fallback logic - let processors handle their own retries
- Fix error handling to raise any exception from parallel tasks
- Add proper type hints and validation for captured results
* fix: wrong function
* fix: remove task_done
* feat: add configurable file processing timeouts for modal processors
- Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription
- Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization
- Replace hardcoded timeout=600 with configurable settings in modal processors
- Allows customization of timeout values via environment variables
* fix: use logger
* fix: worker process meetings now use file pipeline
* fix: topic not gathered
* refactor: remove prepare(), pipeline now work
* refactor: implement many review from Igor
* test: add test for test_pipeline_main_file
* refactor: remove doc
* doc: add doc
* ci: update build to use native arm64 builder
* fix: merge fixes
* refactor: changes from Igor review + add test (not by default) to test gpu modal part
* ci: update to our own runner linux-amd64
* ci: try using suggested mode=min
* fix: update diarizer for latest modal, and use volume
* fix: modal file extension detection
* fix: put the diarizer as A100
* fix: refactor modal API key configuration for better separation of concerns
- Split generic MODAL_API_KEY into service-specific keys:
- TRANSCRIPT_API_KEY for transcription service
- DIARIZATION_API_KEY for diarization service
- TRANSLATE_API_KEY for translation service
- Remove deprecated *_MODAL_API_KEY settings
- Add proper validation to ensure URLs are set when using modal processors
- Update README with new configuration format
BREAKING CHANGE: Configuration keys have changed. Update your .env file:
- TRANSCRIPT_MODAL_API_KEY → TRANSCRIPT_API_KEY
- LLM_MODAL_API_KEY → (removed, use TRANSCRIPT_API_KEY)
- Add DIARIZATION_API_KEY and TRANSLATE_API_KEY if using those services
* fix: update Modal backend configuration to use service-specific API keys
- Changed from generic MODAL_API_KEY to service-specific keys:
- TRANSCRIPT_MODAL_API_KEY for transcription
- DIARIZATION_MODAL_API_KEY for diarization
- TRANSLATION_MODAL_API_KEY for translation
- Updated audio_transcript_modal.py and audio_diarization_modal.py to use modal_api_key parameter
- Updated documentation in README.md, CLAUDE.md, and env.example
* feat: implement auto/modal pattern for translation processor
- Created TranscriptTranslatorAutoProcessor following the same pattern as transcript/diarization
- Created TranscriptTranslatorModalProcessor with TRANSLATION_MODAL_API_KEY support
- Added TRANSLATION_BACKEND setting (defaults to "modal")
- Updated all imports to use TranscriptTranslatorAutoProcessor instead of TranscriptTranslatorProcessor
- Updated env.example with TRANSLATION_BACKEND and TRANSLATION_MODAL_API_KEY
- Updated test to expect TranscriptTranslatorModalProcessor name
- All tests passing
* refactor: simplify transcript_translator base class to match other processors
- Moved all implementation from base class to modal processor
- Base class now only defines abstract _translate method
- Follows the same minimal pattern as audio_diarization and audio_transcript base classes
- Updated test mock to use _translate instead of get_translation
- All tests passing
* chore: clean up settings and improve type annotations
- Remove deprecated generic API key variables from settings
- Add comments to group Modal-specific settings
- Improve type annotations for modal_api_key parameters
* fix: typing
* fix: passing key to openai
* test: fix rtc test failing due to change on transcript
It also correctly setup database from sqlite, in case our configuration
is setup to postgres.
* ci: deactivate translation backend by default
* test: fix modal->mock
* refactor: implementing igor review, mock to passthrough
This feature a new modal endpoint, and a complete new way to build the
summary.
## SummaryBuilder
The summary builder is based on conversational model, where an exchange
between the model and the user is made. This allow more context
inclusion and a better respect of the rules.
It requires an endpoint with OpenAI-like completions endpoint
(/v1/chat/completions)
## vLLM Hermes3
Unlike previous deployment, this one use vLLM, which gives OpenAI-like
completions endpoint out of the box. It could also handle guided JSON
generation, so jsonformer is not needed. But, the model is quite good to
follow JSON schema if asked in the prompt.
## Conversion of long/short into summary builder
The builder is identifying participants, find key subjects, get a
summary for each, then get a quick recap.
The quick recap is used as a short_summary, while the markdown including
the quick recap + key subjects + summaries are used for the
long_summary.
This is why the nextjs component has to be updated, to correctly style
h1 and keep the new line of the markdown.