Mathieu Virbel 3ea7f6b7b6 feat: pipeline improvement with file processing, parakeet, silero-vad (#540)
* feat: improve pipeline threading, and transcriber (parakeet and silero vad)

* refactor: remove whisperx, implement parakeet

* refactor: make audio_chunker more smart and wait for speech, instead of fixed frame

* refactor: make audio merge to always downscale the audio to 16k for transcription

* refactor: make the audio transcript modal accepting batches

* refactor: improve type safety and remove prometheus metrics

- Add DiarizationSegment TypedDict for proper diarization typing
- Replace List/Optional with modern Python list/| None syntax
- Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor
- Add comprehensive file processing pipeline with parallel execution
- Update processor imports and type annotations throughout
- Implement optimized file pipeline as default in process.py tool

* refactor: convert FileDiarizationProcessor I/O types to BaseModel

Update FileDiarizationInput and FileDiarizationOutput to inherit from
BaseModel instead of plain classes, following the standard pattern
used by other processors in the codebase.

* test: add tests for file transcript and diarization with pytest-recording

* build: add pytest-recording

* feat: add local pyannote for testing

* fix: replace PyAV AudioResampler with torchaudio for reliable audio processing

- Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument
- Use torchaudio.functional.resample for robust sample rate conversion
- Optimize processing: skip conversion for already 16kHz mono audio
- Add direct WAV writing with Python wave module for better performance
- Consolidate duplicate downsample checks for cleaner code
- Maintain list[av.AudioFrame] input interface
- Required for Silero VAD which needs 16kHz mono audio

* fix: replace PyAV AudioResampler with torchaudio solution

- Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor
- Replaces problematic PyAV AudioResampler with torchaudio.functional.resample
- Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono
- Uses direct WAV writing with Python's wave module for better performance
- Fixes test_basic_process to disable diarization (pyannote dependency not installed)
- Updates test expectations to match actual processor behavior
- Removes unused pydub dependency from pyproject.toml
- Adds comprehensive TEST_ANALYSIS.md documenting test suite status

* feat: add parameterized test for both diarization modes

- Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True]
- Test with diarization=False always passes (tests core AudioMergeProcessor functionality)
- Test with diarization=True gracefully skips when pyannote.audio is not installed
- Provides comprehensive test coverage for both pipeline configurations

* fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor

- Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute
- Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called
- Updates property usage in _diarize method to use new name
- Now correctly supports pipeline initialization for diarization processing

* fix: add local for pyannote

* test: add diarization test

* fix: resample on audio merge now working

* fix: correctly restore timestamp

* fix: display exception in a threaded processor if that happen

* Update pyproject.toml

* ci: remove option

* ci: update astral-sh/setup-uv

* test: add monadical url for pytest-recording

* refactor: remove previous version

* build: move faster whisper to local dep

* test: fix missing import

* refactor: improve main_file_pipeline organization and error handling

- Move all imports to the top of the file
- Create unified EmptyPipeline class to replace duplicate mock pipeline code
- Remove timeout and fallback logic - let processors handle their own retries
- Fix error handling to raise any exception from parallel tasks
- Add proper type hints and validation for captured results

* fix: wrong function

* fix: remove task_done

* feat: add configurable file processing timeouts for modal processors

- Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription
- Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization
- Replace hardcoded timeout=600 with configurable settings in modal processors
- Allows customization of timeout values via environment variables

* fix: use logger

* fix: worker process meetings now use file pipeline

* fix: topic not gathered

* refactor: remove prepare(), pipeline now work

* refactor: implement many review from Igor

* test: add test for test_pipeline_main_file

* refactor: remove doc

* doc: add doc

* ci: update build to use native arm64 builder

* fix: merge fixes

* refactor: changes from Igor review + add test (not by default) to test gpu modal part

* ci: update to our own runner linux-amd64

* ci: try using suggested mode=min

* fix: update diarizer for latest modal, and use volume

* fix: modal file extension detection

* fix: put the diarizer as A100
2025-08-20 20:07:19 -06:00
2023-11-20 21:39:33 +07:00
2025-08-20 20:56:45 -04:00
2025-08-13 10:03:38 -04:00
2025-08-06 19:38:43 -06:00

Reflector

Reflector Audio Management and Analysis is a cutting-edge web application under development by Monadical. It utilizes AI to record meetings, providing a permanent record with transcripts, translations, and automated summaries.

Tests License: MIT

Screenshots

image image image

Background

The project architecture consists of three primary components:

  • Front-End: NextJS React project hosted on Vercel, located in www/.
  • Back-End: Python server that offers an API and data persistence, found in server/.
  • GPU implementation: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations. Most reliable option is Modal deployment

It also uses authentik for authentication if activated, and Vercel for deployment and configuration of the front-end.

Contribution Guidelines

All new contributions should be made in a separate branch, and goes through a Pull Request. Conventional commits must be used for the PR title and commits.

Usage

To record both your voice and the meeting you're taking part in, you need:

  • For an in-person meeting, make sure your microphone is in range of all participants.
  • If using several microphones, make sure to merge the audio feeds into one with an external tool.
  • For an online meeting, if you do not use headphones, your microphone should be able to pick up both your voice and the audio feed of the meeting.
  • If you want to use headphones, you need to merge the audio feeds with an external tool.

Permissions:

You may have to add permission for browser's microphone access to record audio in System Preferences -> Privacy & Security -> Microphone System Preferences -> Privacy & Security -> Accessibility. You will be prompted to provide these when you try to connect.

How to Install Blackhole (Mac Only)

This is an external tool for merging the audio feeds as explained in the previous section of this document. Note: We currently do not have instructions for Windows users.

  • Install Blackhole-2ch (2 ch is enough) by 1 of 2 options listed.
  • Setup "Aggregate device" to route web audio and local microphone input.
  • Setup Multi-Output device
  • Then goto System Preferences -> Sound and choose the devices created from the Output and Input tabs.
  • The input from your local microphone, the browser run meeting should be aggregated into one virtual stream to listen to and the output should be fed back to your specified output devices if everything is configured properly.

Installation

Frontend

Start with cd www.

Installation

pnpm install
cp .env_template .env
cp config-template.ts config.ts

Then, fill in the environment variables in .env and the configuration in config.ts as needed. If you are unsure on how to proceed, ask in Zulip.

Run in development mode

pnpm dev

Then (after completing server setup and starting it) open http://localhost:3000 to view it in the browser.

OpenAPI Code Generation

To generate the TypeScript files from the openapi.json file, make sure the python server is running, then run:

pnpm openapi

Backend

Start with cd server.

Run in development mode

docker compose up -d redis

# on the first run, or if the schemas changed
uv run alembic upgrade head

# start the worker
uv run celery -A reflector.worker.app worker --loglevel=info

# start the app
uv run -m reflector.app --reload

Then fill .env with the omitted values (ask in Zulip).

Crontab (optional)

For crontab (only healthcheck for now), start the celery beat (you don't need it on your local dev environment):

uv run celery -A reflector.worker.app beat

GPU models

Currently, reflector heavily use custom local models, deployed on modal. All the micro services are available in server/gpu/

To deploy llm changes to modal, you need:

  • a modal account
  • set up the required secret in your modal account (REFLECTOR_GPU_APIKEY)
  • install the modal cli
  • connect your modal cli to your account if not done previously
  • modal run path/to/required/llm

Using local files

You can manually process an audio file by calling the process tool:

uv run python -m reflector.tools.process path/to/audio.wav
Description
100% local ML models for meeting transcription and analysis
Readme MIT 84 MiB
Languages
Python 72.3%
TypeScript 26.9%
JavaScript 0.3%
Dockerfile 0.2%