mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-20 20:29:06 +00:00
* Self-hosted gpu api * Refactor self-hosted api * Rename model api tests * Use lifespan instead of startup event * Fix self hosted imports * Add newlines * Add response models * Move gpu dir to the root * Add project description * Refactor lifespan * Update env var names for model api tests * Preload diarizarion service * Refactor uploaded file paths
172 lines
4.1 KiB
Markdown
172 lines
4.1 KiB
Markdown
# Reflector GPU implementation - Transcription and LLM
|
|
|
|
This repository hold an API for the GPU implementation of the Reflector API service,
|
|
and use [Modal.com](https://modal.com)
|
|
|
|
- `reflector_diarizer.py` - Diarization API
|
|
- `reflector_transcriber.py` - Transcription API (Whisper)
|
|
- `reflector_transcriber_parakeet.py` - Transcription API (NVIDIA Parakeet)
|
|
- `reflector_translator.py` - Translation API
|
|
|
|
## Modal.com deployment
|
|
|
|
Create a modal secret, and name it `reflector-gpu`.
|
|
It should contain an `REFLECTOR_APIKEY` environment variable with a value.
|
|
|
|
The deployment is done using [Modal.com](https://modal.com) service.
|
|
|
|
```
|
|
$ modal deploy reflector_transcriber.py
|
|
...
|
|
└── 🔨 Created web => https://xxxx--reflector-transcriber-web.modal.run
|
|
|
|
$ modal deploy reflector_transcriber_parakeet.py
|
|
...
|
|
└── 🔨 Created web => https://xxxx--reflector-transcriber-parakeet-web.modal.run
|
|
|
|
$ modal deploy reflector_llm.py
|
|
...
|
|
└── 🔨 Created web => https://xxxx--reflector-llm-web.modal.run
|
|
```
|
|
|
|
Then in your reflector api configuration `.env`, you can set these keys:
|
|
|
|
```
|
|
TRANSCRIPT_BACKEND=modal
|
|
TRANSCRIPT_URL=https://xxxx--reflector-transcriber-web.modal.run
|
|
TRANSCRIPT_MODAL_API_KEY=REFLECTOR_APIKEY
|
|
|
|
DIARIZATION_BACKEND=modal
|
|
DIARIZATION_URL=https://xxxx--reflector-diarizer-web.modal.run
|
|
DIARIZATION_MODAL_API_KEY=REFLECTOR_APIKEY
|
|
|
|
TRANSLATION_BACKEND=modal
|
|
TRANSLATION_URL=https://xxxx--reflector-translator-web.modal.run
|
|
TRANSLATION_MODAL_API_KEY=REFLECTOR_APIKEY
|
|
```
|
|
|
|
## API
|
|
|
|
Authentication must be passed with the `Authorization` header, using the `bearer` scheme.
|
|
|
|
```
|
|
Authorization: bearer <REFLECTOR_APIKEY>
|
|
```
|
|
|
|
### LLM
|
|
|
|
`POST /llm`
|
|
|
|
**request**
|
|
```
|
|
{
|
|
"prompt": "xxx"
|
|
}
|
|
```
|
|
|
|
**response**
|
|
```
|
|
{
|
|
"text": "xxx completed"
|
|
}
|
|
```
|
|
|
|
### Transcription
|
|
|
|
#### Parakeet Transcriber (`reflector_transcriber_parakeet.py`)
|
|
|
|
NVIDIA Parakeet is a state-of-the-art ASR model optimized for real-time transcription with superior word-level timestamps.
|
|
|
|
**GPU Configuration:**
|
|
- **A10G GPU** - Used for `/v1/audio/transcriptions` endpoint (small files, live transcription)
|
|
- Higher concurrency (max_inputs=10)
|
|
- Optimized for multiple small audio files
|
|
- Supports batch processing for efficiency
|
|
|
|
- **L40S GPU** - Used for `/v1/audio/transcriptions-from-url` endpoint (large files)
|
|
- Lower concurrency but more powerful processing
|
|
- Optimized for single large audio files
|
|
- VAD-based chunking for long-form audio
|
|
|
|
##### `/v1/audio/transcriptions` - Small file transcription
|
|
|
|
**request** (multipart/form-data)
|
|
- `file` or `files[]` - audio file(s) to transcribe
|
|
- `model` - model name (default: `nvidia/parakeet-tdt-0.6b-v2`)
|
|
- `language` - language code (default: `en`)
|
|
- `batch` - whether to use batch processing for multiple files (default: `true`)
|
|
|
|
**response**
|
|
```json
|
|
{
|
|
"text": "transcribed text",
|
|
"words": [
|
|
{"word": "hello", "start": 0.0, "end": 0.5},
|
|
{"word": "world", "start": 0.5, "end": 1.0}
|
|
],
|
|
"filename": "audio.mp3"
|
|
}
|
|
```
|
|
|
|
For multiple files with batch=true:
|
|
```json
|
|
{
|
|
"results": [
|
|
{
|
|
"filename": "audio1.mp3",
|
|
"text": "transcribed text",
|
|
"words": [...]
|
|
},
|
|
{
|
|
"filename": "audio2.mp3",
|
|
"text": "transcribed text",
|
|
"words": [...]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
##### `/v1/audio/transcriptions-from-url` - Large file transcription
|
|
|
|
**request** (application/json)
|
|
```json
|
|
{
|
|
"audio_file_url": "https://example.com/audio.mp3",
|
|
"model": "nvidia/parakeet-tdt-0.6b-v2",
|
|
"language": "en",
|
|
"timestamp_offset": 0.0
|
|
}
|
|
```
|
|
|
|
**response**
|
|
```json
|
|
{
|
|
"text": "transcribed text from large file",
|
|
"words": [
|
|
{"word": "hello", "start": 0.0, "end": 0.5},
|
|
{"word": "world", "start": 0.5, "end": 1.0}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Supported file types:** mp3, mp4, mpeg, mpga, m4a, wav, webm
|
|
|
|
#### Whisper Transcriber (`reflector_transcriber.py`)
|
|
|
|
`POST /transcribe`
|
|
|
|
**request** (multipart/form-data)
|
|
|
|
- `file` - audio file
|
|
- `language` - language code (e.g. `en`)
|
|
|
|
**response**
|
|
```
|
|
{
|
|
"text": "xxx",
|
|
"words": [
|
|
{"text": "xxx", "start": 0.0, "end": 1.0}
|
|
]
|
|
}
|
|
```
|