selfhostyourtech/reflector

Fork 0

mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-20 20:29:06 +00:00

Files

Igor Loskutov 2b3f28993f gpu self hosted setup guide (no-mistakes)

2025-12-09 11:25:09 -05:00

3.2 KiB

Raw Blame History

Local Development GPU Setup

Run transcription and diarization locally for development/testing.

For production deployment, see the Self-Hosted GPU Setup Guide.

Prerequisites

Python 3.12+ and uv package manager
FFmpeg installed and on PATH
HuggingFace account with access to pyannote models

Accept Pyannote Licenses (Required)

Before first run, accept licenses for these gated models (logged into HuggingFace):

Quick Start

1. Install dependencies

cd gpu/self_hosted
uv sync

2. Start the GPU service

cd gpu/self_hosted
HF_TOKEN=<your-huggingface-token> \
REFLECTOR_GPU_APIKEY=dev-key-12345 \
.venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000

Note: The .env file is NOT auto-loaded. Pass env vars explicitly or use:

export HF_TOKEN=<your-token>
export REFLECTOR_GPU_APIKEY=dev-key-12345
.venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000

3. Configure Reflector to use local GPU

Edit server/.env:

# Transcription - local GPU service
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=http://host.docker.internal:8000
TRANSCRIPT_MODAL_API_KEY=dev-key-12345

# Diarization - local GPU service
DIARIZATION_BACKEND=modal
DIARIZATION_URL=http://host.docker.internal:8000
DIARIZATION_MODAL_API_KEY=dev-key-12345

Note: Use host.docker.internal because Reflector server runs in Docker.

4. Restart Reflector server

cd server
docker compose restart server worker

Testing

Test transcription

curl -s -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer dev-key-12345" \
  -F "file=@/path/to/audio.wav" \
  -F "language=en"

Test diarization

curl -s -X POST "http://localhost:8000/diarize?audio_file_url=<audio-url>" \
  -H "Authorization: Bearer dev-key-12345"

Platform Notes

macOS (ARM)

Docker build fails - CUDA packages are x86_64 only. Use local Python instead:

uv sync
HF_TOKEN=xxx REFLECTOR_GPU_APIKEY=xxx .venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000

Linux with NVIDIA GPU

Docker works with CUDA acceleration:

docker compose up -d

CPU-only

Works on any platform, just slower. PyTorch auto-detects and falls back to CPU.

Switching Back to Modal.com