Files
reflector/server/gpu/modal_deployments
Mathieu Virbel f0a4fd10bc feat: new parakeet v3 implementation
Multi languages, but less performant than v2. I tried on french, and it
was switching from english to french. Maybe some configuration is
required to get it right, but at the moment we cannot select any kind of
source translation from the UI, only target translation
2025-08-21 20:41:56 -06:00
..
2025-03-25 11:09:01 +01:00
2025-03-25 11:09:01 +01:00

Reflector GPU implementation - Transcription and LLM

This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com

  • reflector_diarizer.py - Diarization API
  • reflector_transcriber.py - Transcription API (Whisper)
  • reflector_transcriber_parakeet.py - Transcription API (NVIDIA Parakeet)
  • reflector_translator.py - Translation API

Modal.com deployment

Create a modal secret, and name it reflector-gpu. It should contain an REFLECTOR_APIKEY environment variable with a value.

The deployment is done using Modal.com service.

$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run

$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run

$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run

Then in your reflector api configuration .env, you can set these keys:

TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY

TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY

API

Authentication must be passed with the Authorization header, using the bearer scheme.

Authorization: bearer <REFLECTOR_APIKEY>

LLM

POST /llm

request

{
    "prompt": "xxx"
}

response

{
    "text": "xxx completed"
}

Transcription

Parakeet Transcriber (reflector_transcriber_parakeet.py)

NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.

GPU Configuration:

  • A10G GPU - Used for /v1/audio/transcriptions endpoint (small files, live transcription)

    • Higher concurrency (max_inputs=10)
    • Optimized for multiple small audio files
    • Supports batch processing for efficiency
  • L40S GPU - Used for /v1/audio/transcriptions-from-url endpoint (large files)

    • Lower concurrency but more powerful processing
    • Optimized for single large audio files
    • VAD-based chunking for long-form audio
/v1/audio/transcriptions - Small file transcription

request (multipart/form-data)

  • file or files[] - audio file(s) to transcribe
  • model - model name (default: nvidia/parakeet-tdt-0.6b-v2)
  • language - language code (default: en)
  • batch - whether to use batch processing for multiple files (default: true)

response

{
    "text": "transcribed text",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ],
    "filename": "audio.mp3"
}

For multiple files with batch=true:

{
    "results": [
        {
            "filename": "audio1.mp3",
            "text": "transcribed text",
            "words": [...]
        },
        {
            "filename": "audio2.mp3",
            "text": "transcribed text",
            "words": [...]
        }
    ]
}
/v1/audio/transcriptions-from-url - Large file transcription

request (application/json)

{
    "audio_file_url": "https://example.com/audio.mp3",
    "model": "nvidia/parakeet-tdt-0.6b-v2",
    "language": "en",
    "timestamp_offset": 0.0
}

response

{
    "text": "transcribed text from large file",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ]
}

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper Transcriber (reflector_transcriber.py)

POST /transcribe

request (multipart/form-data)

  • file - audio file
  • language - language code (e.g. en)

response

{
    "text": "xxx",
    "words": [
        {"text": "xxx", "start": 0.0, "end": 1.0}
    ]
}