Files
reflector/docs/docs/installation/self-hosted-gpu-setup.md
Igor Monadical 407c15299f docs: docs website + installation (#778)
* feat: WIP doc (vibe started and iterated)

* install from scratch docs

* caddyfile.example

* gitignore

* authentik script

* authentik script

* authentik script

* llm doc

* authentik ongoing

* more daily setup logs

* doc website

* gpu self hosted setup guide (no-mistakes)

* doc review round

* doc review round

* doc review round

* update doc site sidebars

* feat(docs): add mermaid diagram support

* docs polishing

* live pipeline doc

* move pipeline dev docs to dev docs location

* doc pr review iteration

* dockerfile healthcheck

* docs/pr-comments

* remove jwt comment

* llm suggestion

* pr comments

* pr comments

* document auto migrations

* cleanup docs

---------

Co-authored-by: Mathieu Virbel <mat@meltingrocks.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-06 17:25:02 -05:00

7.1 KiB

sidebar_position, title
sidebar_position title
5 Self-Hosted GPU Setup

Self-Hosted GPU Setup

This guide covers deploying Reflector's GPU processing on your own server instead of Modal.com. For the complete deployment guide, see Deployment Guide.

When to Use Self-Hosted GPU

Choose self-hosted GPU if you:

  • Have GPU hardware available (NVIDIA required)
  • Want full control over processing
  • Prefer fixed infrastructure costs over pay-per-use
  • Have privacy or data locality requirements
  • Need to process audio without external API calls

Choose Modal.com instead if you:

  • Don't have GPU hardware
  • Want zero infrastructure management
  • Prefer pay-per-use pricing
  • Need instant scaling for variable workloads

See Modal.com Setup for cloud GPU deployment.

What Gets Deployed

The self-hosted GPU service provides the same API endpoints as Modal:

  • POST /v1/audio/transcriptions - Whisper transcription
  • POST /v1/audio/transcriptions-from-url - Transcribe from URL
  • POST /diarize - Pyannote speaker diarization
  • POST /translate - Audio translation

Your main Reflector server connects to this service exactly like it connects to Modal - only the URL changes.

Prerequisites

Hardware

  • GPU: NVIDIA GPU with 8GB+ VRAM (tested on Tesla T4 with 15GB)
  • CPU: 4+ cores recommended
  • RAM: 8GB minimum, 16GB recommended
  • Disk: 40-50GB minimum

Software

  • Public IP address
  • Domain name with DNS A record pointing to server

Accounts

Docker Deployment

Step 1: Install NVIDIA Driver

sudo apt update
sudo apt install -y nvidia-driver-535
sudo reboot

# After reboot, verify installation
nvidia-smi

Expected output: GPU details with driver version and CUDA version.

Step 2: Install Docker

Follow the official Docker installation guide for your distribution.

After installation, add your user to the docker group:

sudo usermod -aG docker $USER

# Log out and back in for group changes
exit
# SSH back in

Step 3: Install NVIDIA Container Toolkit

# Add NVIDIA repository and install toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 4: Clone Repository and Configure

git clone https://github.com/monadical-sas/reflector.git
cd reflector/gpu/self_hosted

# Create environment file
cat > .env << EOF
REFLECTOR_GPU_APIKEY=$(openssl rand -hex 16)
HF_TOKEN=your_huggingface_token_here
EOF

# Note the generated API key - you'll need it for main server config
cat .env

Step 5: Build and Start

The repository includes a compose.yml file. Build and start:

# Build image (takes ~5 minutes, downloads ~10GB)
sudo docker compose build

# Start service
sudo docker compose up -d

# Wait for startup and verify
sleep 30
sudo docker compose logs

Look for: INFO: Application startup complete. Uvicorn running on http://0.0.0.0:8000

Step 7: Verify GPU Access

# Check GPU is accessible from container
sudo docker exec $(sudo docker ps -q) nvidia-smi

Should show GPU with ~3GB VRAM used (models loaded).


Configure HTTPS with Caddy

Caddy handles SSL automatically.

Install Caddy

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
  sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | \
  sudo tee /etc/apt/sources.list.d/caddy-stable.list

sudo apt update
sudo apt install -y caddy

Configure Reverse Proxy

Edit the Caddyfile with your domain:

sudo nano /etc/caddy/Caddyfile

Add (replace gpu.example.com with your domain):

gpu.example.com {
    reverse_proxy localhost:8000
}

Reload Caddy (auto-provisions SSL certificate):

sudo systemctl reload caddy

Verify HTTPS

curl -I https://gpu.example.com/docs
# Should return HTTP/2 200

Configure Main Reflector Server

On your main Reflector server, update server/.env:

# GPU Processing - Self-hosted
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://gpu.example.com
TRANSCRIPT_MODAL_API_KEY=<your-generated-api-key>

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://gpu.example.com
DIARIZATION_MODAL_API_KEY=<your-generated-api-key>

Note: The backend type is modal because the self-hosted GPU service implements the same API contract as Modal.com. This allows you to switch between cloud and self-hosted GPU processing by only changing the URL and API key.

Restart services to apply:

docker compose -f docker-compose.prod.yml restart server worker

Service Management

All commands in this section assume you're in ~/reflector/gpu/self_hosted/.

# View logs
sudo docker compose logs -f

# Restart service
sudo docker compose restart

# Stop service
sudo docker compose down

# Check status
sudo docker compose ps

Monitor GPU

# Check GPU usage
nvidia-smi

# Watch in real-time
watch -n 1 nvidia-smi

Typical GPU memory usage:

  • Idle (models loaded): ~3GB VRAM
  • During transcription: ~4-5GB VRAM

Troubleshooting

nvidia-smi fails after driver install

# Manually load kernel modules
sudo modprobe nvidia
nvidia-smi

Service fails with "Could not download pyannote pipeline"

  1. Verify HF_TOKEN is valid: echo $HF_TOKEN
  2. Check model access at https://huggingface.co/pyannote/speaker-diarization-3.1
  3. Update .env with correct token
  4. Restart service: sudo docker compose restart

Cannot connect to HTTPS endpoint

  1. Verify DNS resolves: dig +short gpu.example.com
  2. Check firewall: sudo ufw status (ports 80, 443 must be open)
  3. Check Caddy: sudo systemctl status caddy
  4. View Caddy logs: sudo journalctl -u caddy -n 50

SSL certificate not provisioning

Requirements for Let's Encrypt:

  • Ports 80 and 443 publicly accessible
  • DNS resolves to server's public IP
  • Valid domain (not localhost or private IP)

Docker container won't start

# Check logs
sudo docker compose logs

# Common issues:
# - Port 8000 already in use
# - GPU not accessible (nvidia-ctk not configured)
# - Missing .env file

Updating

cd ~/reflector/gpu/self_hosted
git pull
sudo docker compose build
sudo docker compose up -d