mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-20 20:29:06 +00:00

Files

Sergey Mankovsky ab859d65a6 feat: self-hosted gpu api (#636 )

* Self-hosted gpu api

* Refactor self-hosted api

* Rename model api tests

* Use lifespan instead of startup event

* Fix self hosted imports

* Add newlines

* Add response models

* Move gpu dir to the root

* Add project description

* Refactor lifespan

* Update env var names for model api tests

* Preload diarizarion service

* Refactor uploaded file paths

2025-09-17 18:52:03 +02:00

4.1 KiB

Raw Blame History

Reflector GPU implementation - Transcription and LLM

This repository hold an API for the GPU implementation of the Reflector API service, and use Modal.com

reflector_diarizer.py - Diarization API
reflector_transcriber.py - Transcription API (Whisper)
reflector_transcriber_parakeet.py - Transcription API (NVIDIA Parakeet)
reflector_translator.py - Translation API

Modal.com deployment

Create a modal secret, and name it reflector-gpu. It should contain an REFLECTOR_APIKEY environment variable with a value.

The deployment is done using Modal.com service.

$ modal deploy reflector_transcriber.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run

$ modal deploy reflector_transcriber_parakeet.py
...
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run

$ modal deploy reflector_llm.py
...
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run

Then in your reflector api configuration .env, you can set these keys:

TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY

TRANSLATION_BACKEND=modal
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY

API

Authentication must be passed with the Authorization header, using the bearer scheme.

Authorization: bearer <REFLECTOR_APIKEY>

LLM

POST /llm

request

{
    "prompt": "xxx"
}

response

{
    "text": "xxx completed"
}

Transcription

Parakeet Transcriber (`reflector_transcriber_parakeet.py`)

NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.

GPU Configuration:

A10G GPU - Used for /v1/audio/transcriptions endpoint (small files, live transcription)
- Higher concurrency (max_inputs=10)
- Optimized for multiple small audio files
- Supports batch processing for efficiency
L40S GPU - Used for /v1/audio/transcriptions-from-url endpoint (large files)
- Lower concurrency but more powerful processing
- Optimized for single large audio files
- VAD-based chunking for long-form audio

`/v1/audio/transcriptions` - Small file transcription

request (multipart/form-data)

file or files[] - audio file(s) to transcribe
model - model name (default: nvidia/parakeet-tdt-0.6b-v2)
language - language code (default: en)
batch - whether to use batch processing for multiple files (default: true)

response

{
    "text": "transcribed text",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ],
    "filename": "audio.mp3"
}

For multiple files with batch=true:

{
    "results": [
        {
            "filename": "audio1.mp3",
            "text": "transcribed text",
            "words": [...]
        },
        {
            "filename": "audio2.mp3",
            "text": "transcribed text",
            "words": [...]
        }
    ]
}

`/v1/audio/transcriptions-from-url` - Large file transcription

request (application/json)

{
    "audio_file_url": "https://example.com/audio.mp3",
    "model": "nvidia/parakeet-tdt-0.6b-v2",
    "language": "en",
    "timestamp_offset": 0.0
}

response

{
    "text": "transcribed text from large file",
    "words": [
        {"word": "hello", "start": 0.0, "end": 0.5},
        {"word": "world", "start": 0.5, "end": 1.0}
    ]
}

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper Transcriber (`reflector_transcriber.py`)

POST /transcribe

request (multipart/form-data)

file - audio file
language - language code (e.g. en)

response

{
    "text": "xxx",
    "words": [
        {"text": "xxx", "start": 0.0, "end": 1.0}
    ]
}

4.1 KiB Raw Blame History

Reflector GPU implementation - Transcription and LLM

Modal.com deployment

API

LLM

Transcription

Parakeet Transcriber (reflector_transcriber_parakeet.py)

/v1/audio/transcriptions - Small file transcription

/v1/audio/transcriptions-from-url - Large file transcription

Whisper Transcriber (reflector_transcriber.py)

4.1 KiB

Raw Blame History

Parakeet Transcriber (`reflector_transcriber_parakeet.py`)

`/v1/audio/transcriptions` - Small file transcription

`/v1/audio/transcriptions-from-url` - Large file transcription

Whisper Transcriber (`reflector_transcriber.py`)