Files
reflector/gpu/modal_deployments

Reflector GPU implementation - Transcription and LLM

This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com

  • reflector_diarizer.py - Diarization API
  • reflector_transcriber.py - Transcription API (Whisper)
  • reflector_transcriber_parakeet.py - Transcription API (NVIDIA Parakeet)
  • reflector_translator.py - Translation API

Modal.com deployment

Create a modal secret, and name it reflector-gpu. It should contain an REFLECTOR_APIKEY environment variable with a value.

The deployment is done using Modal.com service.

$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run

$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run

$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run

Then in your reflector api configuration .env, you can set these keys:

TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY

TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY

API

Authentication must be passed with the Authorization header, using the bearer scheme.

Authorization: bearer <REFLECTOR_APIKEY>

LLM

POST /llm

request

{
    "prompt": "xxx"
}

response

{
    "text": "xxx completed"
}

Transcription

Parakeet Transcriber (reflector_transcriber_parakeet.py)

NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.

GPU Configuration:

  • A10G GPU - Used for /v1/audio/transcriptions endpoint (small files, live transcription)

    • Higher concurrency (max_inputs=10)
    • Optimized for multiple small audio files
    • Supports batch processing for efficiency
  • L40S GPU - Used for /v1/audio/transcriptions-from-url endpoint (large files)

    • Lower concurrency but more powerful processing
    • Optimized for single large audio files
    • VAD-based chunking for long-form audio
/v1/audio/transcriptions - Small file transcription

request (multipart/form-data)

  • file or files[] - audio file(s) to transcribe
  • model - model name (default: nvidia/parakeet-tdt-0.6b-v2)
  • language - language code (default: en)
  • batch - whether to use batch processing for multiple files (default: true)

response

{
    "text": "transcribed text",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ],
    "filename": "audio.mp3"
}

For multiple files with batch=true:

{
    "results": [
        {
            "filename": "audio1.mp3",
            "text": "transcribed text",
            "words": [...]
        },
        {
            "filename": "audio2.mp3",
            "text": "transcribed text",
            "words": [...]
        }
    ]
}
/v1/audio/transcriptions-from-url - Large file transcription

request (application/json)

{
    "audio_file_url": "https://example.com/audio.mp3",
    "model": "nvidia/parakeet-tdt-0.6b-v2",
    "language": "en",
    "timestamp_offset": 0.0
}

response

{
    "text": "transcribed text from large file",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ]
}

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper Transcriber (reflector_transcriber.py)

POST /transcribe

request (multipart/form-data)

  • file - audio file
  • language - language code (e.g. en)

response

{
    "text": "xxx",
    "words": [
        {"text": "xxx", "start": 0.0, "end": 1.0}
    ]
}