Files
reflector/server/gpu/modal_deployments
Mathieu Virbel 9265d201b5 fix: restore previous behavior on live pipeline + audio downscaler (#561)
This commit restore the original behavior with frame cutting. While
silero is used on our gpu for files, look like it's not working great on
the live pipeline. To be investigated, but at the moment, what we keep
is:

- refactored to extract the downscale for further processing in the
pipeline
- remove any downscale implementation from audio_chunker and audio_merge
- removed batching from audio_merge too for now
2025-08-22 10:49:26 -06:00
..
2025-03-25 11:09:01 +01:00
2025-03-25 11:09:01 +01:00

Reflector GPU implementation - Transcription and LLM

This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com

  • reflector_diarizer.py - Diarization API
  • reflector_transcriber.py - Transcription API (Whisper)
  • reflector_transcriber_parakeet.py - Transcription API (NVIDIA Parakeet)
  • reflector_translator.py - Translation API

Modal.com deployment

Create a modal secret, and name it reflector-gpu. It should contain an REFLECTOR_APIKEY environment variable with a value.

The deployment is done using Modal.com service.

$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run

$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run

$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run

Then in your reflector api configuration .env, you can set these keys:

TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY

TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY

API

Authentication must be passed with the Authorization header, using the bearer scheme.

Authorization: bearer <REFLECTOR_APIKEY>

LLM

POST /llm

request

{
    "prompt": "xxx"
}

response

{
    "text": "xxx completed"
}

Transcription

Parakeet Transcriber (reflector_transcriber_parakeet.py)

NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.

GPU Configuration:

  • A10G GPU - Used for /v1/audio/transcriptions endpoint (small files, live transcription)

    • Higher concurrency (max_inputs=10)
    • Optimized for multiple small audio files
    • Supports batch processing for efficiency
  • L40S GPU - Used for /v1/audio/transcriptions-from-url endpoint (large files)

    • Lower concurrency but more powerful processing
    • Optimized for single large audio files
    • VAD-based chunking for long-form audio
/v1/audio/transcriptions - Small file transcription

request (multipart/form-data)

  • file or files[] - audio file(s) to transcribe
  • model - model name (default: nvidia/parakeet-tdt-0.6b-v2)
  • language - language code (default: en)
  • batch - whether to use batch processing for multiple files (default: true)

response

{
    "text": "transcribed text",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ],
    "filename": "audio.mp3"
}

For multiple files with batch=true:

{
    "results": [
        {
            "filename": "audio1.mp3",
            "text": "transcribed text",
            "words": [...]
        },
        {
            "filename": "audio2.mp3",
            "text": "transcribed text",
            "words": [...]
        }
    ]
}
/v1/audio/transcriptions-from-url - Large file transcription

request (application/json)

{
    "audio_file_url": "https://example.com/audio.mp3",
    "model": "nvidia/parakeet-tdt-0.6b-v2",
    "language": "en",
    "timestamp_offset": 0.0
}

response

{
    "text": "transcribed text from large file",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ]
}

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper Transcriber (reflector_transcriber.py)

POST /transcribe

request (multipart/form-data)

  • file - audio file
  • language - language code (e.g. en)

response

{
    "text": "xxx",
    "words": [
        {"text": "xxx", "start": 0.0, "end": 1.0}
    ]
}