This commit restore the original behavior with frame cutting. While silero is used on our gpu for files, look like it's not working great on the live pipeline. To be investigated, but at the moment, what we keep is: - refactored to extract the downscale for further processing in the pipeline - remove any downscale implementation from audio_chunker and audio_merge - removed batching from audio_merge too for now
Reflector GPU implementation - Transcription and LLM
This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com
reflector_diarizer.py- Diarization APIreflector_transcriber.py- Transcription API (Whisper)reflector_transcriber_parakeet.py- Transcription API (NVIDIA Parakeet)reflector_translator.py- Translation API
Modal.com deployment
Create a modal secret, and name it reflector-gpu.
It should contain an REFLECTOR_APIKEY environment variable with a value.
The deployment is done using Modal.com service.
$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run
$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run
$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run
Then in your reflector api configuration .env, you can set these keys:
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY
DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY
TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY
API
Authentication must be passed with the Authorization header, using the bearer scheme.
Authorization: bearer <REFLECTOR_APIKEY>
LLM
POST /llm
request
{
"prompt": "xxx"
}
response
{
"text": "xxx completed"
}
Transcription
Parakeet Transcriber (reflector_transcriber_parakeet.py)
NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.
GPU Configuration:
-
A10G GPU - Used for
/v1/audio/transcriptionsendpoint (small files, live transcription)- Higher concurrency (max_inputs=10)
- Optimized for multiple small audio files
- Supports batch processing for efficiency
-
L40S GPU - Used for
/v1/audio/transcriptions-from-urlendpoint (large files)- Lower concurrency but more powerful processing
- Optimized for single large audio files
- VAD-based chunking for long-form audio
/v1/audio/transcriptions - Small file transcription
request (multipart/form-data)
fileorfiles[]- audio file(s) to transcribemodel- model name (default:nvidia/parakeet-tdt-0.6b-v2)language- language code (default:en)batch- whether to use batch processing for multiple files (default:true)
response
{
"text": "transcribed text",
"words": [
{"word": "hello", "start": 0.0, "end": 0.5},
{"word": "world", "start": 0.5, "end": 1.0}
],
"filename": "audio.mp3"
}
For multiple files with batch=true:
{
"results": [
{
"filename": "audio1.mp3",
"text": "transcribed text",
"words": [...]
},
{
"filename": "audio2.mp3",
"text": "transcribed text",
"words": [...]
}
]
}
/v1/audio/transcriptions-from-url - Large file transcription
request (application/json)
{
"audio_file_url": "https://example.com/audio.mp3",
"model": "nvidia/parakeet-tdt-0.6b-v2",
"language": "en",
"timestamp_offset": 0.0
}
response
{
"text": "transcribed text from large file",
"words": [
{"word": "hello", "start": 0.0, "end": 0.5},
{"word": "world", "start": 0.5, "end": 1.0}
]
}
Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm
Whisper Transcriber (reflector_transcriber.py)
POST /transcribe
request (multipart/form-data)
file- audio filelanguage- language code (e.g.en)
response
{
"text": "xxx",
"words": [
{"text": "xxx", "start": 0.0, "end": 1.0}
]
}