mirror of https://github.com/Monadical-SAS/reflector.git synced 2026-02-04 18:06:48 +00:00

Files

Mathieu Virbel 9265d201b5 fix: restore previous behavior on live pipeline + audio downscaler (#561 )

This commit restore the original behavior with frame cutting. While
silero is used on our gpu for files, look like it's not working great on
the live pipeline. To be investigated, but at the moment, what we keep
is:

- refactored to extract the downscale for further processing in the
pipeline
- remove any downscale implementation from audio_chunker and audio_merge
- removed batching from audio_merge too for now

2025-08-22 10:49:26 -06:00

README.md

feat: pipeline improvement with file processing, parakeet, silero-vad (#540 )

2025-08-20 20:07:19 -06:00

reflector_diarizer.py

feat: pipeline improvement with file processing, parakeet, silero-vad (#540 )

2025-08-20 20:07:19 -06:00

reflector_transcriber_parakeet.py

fix: restore previous behavior on live pipeline + audio downscaler (#561 )

2025-08-22 10:49:26 -06:00

reflector_transcriber.py

Upgrade modal apps

2025-03-25 11:09:01 +01:00

reflector_translator.py

Upgrade modal apps

2025-03-25 11:09:01 +01:00

README.md

Reflector GPU implementation - Transcription and LLM

This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com

reflector_diarizer.py - Diarization API
reflector_transcriber.py - Transcription API (Whisper)
reflector_transcriber_parakeet.py - Transcription API (NVIDIA Parakeet)
reflector_translator.py - Translation API

Modal.com deployment

Create a modal secret, and name it reflector-gpu. It should contain an REFLECTOR_APIKEY environment variable with a value.

The deployment is done using Modal.com service.

$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run

$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run

$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run

Then in your reflector api configuration .env, you can set these keys:

TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY

TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY

API

Authentication must be passed with the Authorization header, using the bearer scheme.

Authorization: bearer <REFLECTOR_APIKEY>

LLM

POST /llm

request

{
    "prompt": "xxx"
}

response

{
    "text": "xxx completed"
}

Transcription

Parakeet Transcriber (`reflector_transcriber_parakeet.py`)

NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.

GPU Configuration:

A10G GPU - Used for /v1/audio/transcriptions endpoint (small files, live transcription)
- Higher concurrency (max_inputs=10)
- Optimized for multiple small audio files
- Supports batch processing for efficiency
L40S GPU - Used for /v1/audio/transcriptions-from-url endpoint (large files)
- Lower concurrency but more powerful processing
- Optimized for single large audio files
- VAD-based chunking for long-form audio

`/v1/audio/transcriptions` - Small file transcription

request (multipart/form-data)

file or files[] - audio file(s) to transcribe
model - model name (default: nvidia/parakeet-tdt-0.6b-v2)
language - language code (default: en)
batch - whether to use batch processing for multiple files (default: true)

response

{
    "text": "transcribed text",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ],
    "filename": "audio.mp3"
}

For multiple files with batch=true:

{
    "results": [
        {
            "filename": "audio1.mp3",
            "text": "transcribed text",
            "words": [...]
        },
        {
            "filename": "audio2.mp3",
            "text": "transcribed text",
            "words": [...]
        }
    ]
}

`/v1/audio/transcriptions-from-url` - Large file transcription

request (application/json)

{
    "audio_file_url": "https://example.com/audio.mp3",
    "model": "nvidia/parakeet-tdt-0.6b-v2",
    "language": "en",
    "timestamp_offset": 0.0
}

response

{
    "text": "transcribed text from large file",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ]
}

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper Transcriber (`reflector_transcriber.py`)

POST /transcribe

request (multipart/form-data)

file - audio file
language - language code (e.g. en)

response

{
    "text": "xxx",
    "words": [
        {"text": "xxx", "start": 0.0, "end": 1.0}
    ]
}

README.md

Reflector GPU implementation - Transcription and LLM

Modal.com deployment

API

LLM

Transcription

Parakeet Transcriber (reflector_transcriber_parakeet.py)

/v1/audio/transcriptions - Small file transcription

/v1/audio/transcriptions-from-url - Large file transcription

Whisper Transcriber (reflector_transcriber.py)

Parakeet Transcriber (`reflector_transcriber_parakeet.py`)

`/v1/audio/transcriptions` - Small file transcription

`/v1/audio/transcriptions-from-url` - Large file transcription

Whisper Transcriber (`reflector_transcriber.py`)