mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-22 05:09:05 +00:00

Files

Igor Loskutov d890061056 doc review round

2025-12-09 12:11:22 -05:00

9.9 KiB

Raw Blame History

sidebar_position, title

sidebar_position	title
5	Self-Hosted GPU Setup

Self-Hosted GPU Setup

This guide covers deploying Reflector's GPU processing on your own server instead of Modal.com. For the complete deployment guide, see Deployment Guide.

When to Use Self-Hosted GPU

Choose self-hosted GPU if you:

Have GPU hardware available (NVIDIA required)
Want full control over processing
Prefer fixed infrastructure costs over pay-per-use
Have privacy or data locality requirements
Need to process audio without external API calls

Choose Modal.com instead if you:

Don't have GPU hardware
Want zero infrastructure management
Prefer pay-per-use pricing
Need instant scaling for variable workloads

See Modal.com Setup for cloud GPU deployment.

What Gets Deployed

The self-hosted GPU service provides the same API endpoints as Modal:

POST /v1/audio/transcriptions - Whisper transcription
POST /diarize - Pyannote speaker diarization

Your main Reflector server connects to this service exactly like it connects to Modal - only the URL changes.

Prerequisites

Hardware

GPU: NVIDIA GPU with 8GB+ VRAM (tested on Tesla T4 with 15GB)
CPU: 4+ cores recommended
RAM: 8GB minimum, 16GB recommended
Disk:
- Docker method: 40-50GB minimum
- Systemd method: 25-30GB minimum

Software

Public IP address
Domain name with DNS A record pointing to server

Accounts

HuggingFace account with accepted Pyannote licenses:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
HuggingFace access token from https://huggingface.co/settings/tokens

Choose Deployment Method

Docker Deployment

Step 1: Install NVIDIA Driver

sudo apt update
sudo apt install -y nvidia-driver-535

# Load kernel modules
sudo modprobe nvidia

# Verify installation
nvidia-smi

Expected output: GPU details with driver version and CUDA version.

Step 2: Install Docker

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER

# Log out and back in for group changes
exit
# SSH back in

Step 3: Install NVIDIA Container Toolkit

# Add NVIDIA repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Install toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 4: Clone Repository and Configure

git clone https://github.com/monadical-sas/reflector.git
cd reflector/gpu/self_hosted

# Create environment file
cat > .env << EOF
REFLECTOR_GPU_APIKEY=$(openssl rand -hex 16)
HF_TOKEN=your_huggingface_token_here
EOF

# Note the generated API key - you'll need it for main server config
cat .env

Step 5: Create Docker Compose File

cat > compose.yml << 'EOF'
services:
  reflector_gpu:
    build:
      context: .
    ports:
      - "8000:8000"
    env_file:
      - .env
    volumes:
      - ./cache:/root/.cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    restart: unless-stopped
EOF

Step 6: Build and Start

# Build image (takes ~5 minutes, downloads ~10GB)
sudo docker compose build

# Start service
sudo docker compose up -d

# Wait for startup and verify
sleep 30
sudo docker compose logs

Look for: INFO: Application startup complete. Uvicorn running on http://0.0.0.0:8000

Step 7: Verify GPU Access

# Check GPU is accessible from container
sudo docker exec $(sudo docker ps -q) nvidia-smi

Should show GPU with ~3GB VRAM used (models loaded).

Systemd Deployment

Step 1: Install NVIDIA Driver

sudo apt update
sudo apt install -y nvidia-driver-535

# Load kernel modules
sudo modprobe nvidia

# Verify installation
nvidia-smi

Step 2: Install Dependencies

# Install ffmpeg
sudo apt install -y ffmpeg

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.local/bin/env

# Clone repository
git clone https://github.com/monadical-sas/reflector.git
cd reflector/gpu/self_hosted

Step 3: Configure Environment

# Create environment file
cat > .env << EOF
REFLECTOR_GPU_APIKEY=$(openssl rand -hex 16)
HF_TOKEN=your_huggingface_token_here
EOF

# Note the generated API key
cat .env

Step 4: Install Python Packages

# Install dependencies (~3GB download)
uv sync

Step 5: Create Systemd Service

# Generate library paths for NVIDIA packages
export NVIDIA_LIBS=$(find ~/reflector/gpu/self_hosted/.venv/lib/python3.12/site-packages/nvidia -name lib -type d | tr '\n' ':')

# Load environment variables
source ~/reflector/gpu/self_hosted/.env

# Create service file
sudo tee /etc/systemd/system/reflector-gpu.service << EOFSVC
[Unit]
Description=Reflector GPU Service (Transcription & Diarization)
After=network.target

[Service]
Type=simple
User=$USER
WorkingDirectory=$HOME/reflector/gpu/self_hosted
Environment="PATH=$HOME/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
Environment="HF_TOKEN=${HF_TOKEN}"
Environment="REFLECTOR_GPU_APIKEY=${REFLECTOR_GPU_APIKEY}"
Environment="LD_LIBRARY_PATH=${NVIDIA_LIBS}"
ExecStart=$HOME/reflector/gpu/self_hosted/.venv/bin/uvicorn main:app --host 0.0.0.0 --port 8000
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOFSVC

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable reflector-gpu
sudo systemctl start reflector-gpu

Step 6: Verify Service

# Check status
sudo systemctl status reflector-gpu

# View logs
sudo journalctl -u reflector-gpu -f

Look for: INFO: Application startup complete.

Configure HTTPS with Caddy

Both deployment methods need HTTPS for production. Caddy handles SSL automatically.

Install Caddy

sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
  sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | \
  sudo tee /etc/apt/sources.list.d/caddy-stable.list

sudo apt update
sudo apt install -y caddy

Configure Reverse Proxy

sudo tee /etc/caddy/Caddyfile << 'EOF'
gpu.example.com {
    reverse_proxy localhost:8000
}
EOF

# Reload Caddy (auto-provisions SSL certificate)
sudo systemctl reload caddy

Replace gpu.example.com with your domain.

Verify HTTPS

curl -I https://gpu.example.com/docs
# Should return HTTP/2 200

Configure Main Reflector Server

On your main Reflector server, update server/.env:

# GPU Processing - Self-hosted
TRANSCRIPT_BACKEND=modal
TRANSCRIPT_URL=https://gpu.example.com
TRANSCRIPT_MODAL_API_KEY=<your-generated-api-key>

DIARIZATION_BACKEND=modal
DIARIZATION_URL=https://gpu.example.com
DIARIZATION_MODAL_API_KEY=<your-generated-api-key>

Note: The backend type is modal because the self-hosted GPU service implements the same API contract as Modal.com. This allows you to switch between cloud and self-hosted GPU processing by only changing the URL and API key.

Restart services to apply:

docker compose -f docker-compose.prod.yml restart server worker

Service Management

All commands in this section assume you're in ~/reflector/gpu/self_hosted/.

Docker

# View logs
sudo docker compose logs -f

# Restart service
sudo docker compose restart

# Stop service
sudo docker compose down

# Check status
sudo docker compose ps

Systemd

# View logs
sudo journalctl -u reflector-gpu -f

# Restart service
sudo systemctl restart reflector-gpu

# Stop service
sudo systemctl stop reflector-gpu

# Check status
sudo systemctl status reflector-gpu

Monitor GPU

# Check GPU usage
nvidia-smi

# Watch in real-time
watch -n 1 nvidia-smi

Typical GPU memory usage:

Idle (models loaded): ~3GB VRAM
During transcription: ~4-5GB VRAM

Troubleshooting

nvidia-smi fails after driver install

# Manually load kernel modules
sudo modprobe nvidia
nvidia-smi

Service fails with "Could not download pyannote pipeline"

Verify HF_TOKEN is valid: echo $HF_TOKEN
Check model access at https://huggingface.co/pyannote/speaker-diarization-3.1
Regenerate service/compose with correct token
Restart service

cuDNN library loading errors (Systemd only)

Symptom: Unable to load libcudnn_cnn.so

Regenerate the systemd service file - the LD_LIBRARY_PATH must include all NVIDIA package directories.

Cannot connect to HTTPS endpoint

Verify DNS resolves: dig +short gpu.example.com
Check firewall: sudo ufw status (ports 80, 443 must be open)
Check Caddy: sudo systemctl status caddy
View Caddy logs: sudo journalctl -u caddy -n 50

SSL certificate not provisioning

Requirements for Let's Encrypt:

Ports 80 and 443 publicly accessible
DNS resolves to server's public IP
Valid domain (not localhost or private IP)

Docker container won't start

# Check logs
sudo docker compose logs

# Common issues:
# - Port 8000 already in use
# - GPU not accessible (nvidia-ctk not configured)
# - Missing .env file

Updating

Docker

cd ~/reflector/gpu/self_hosted
git pull
sudo docker compose build
sudo docker compose up -d

Systemd

cd ~/reflector/gpu/self_hosted
git pull
uv sync
sudo systemctl restart reflector-gpu

9.9 KiB Raw Blame History

Self-Hosted GPU Setup

When to Use Self-Hosted GPU

What Gets Deployed

Prerequisites

Hardware

Software

Accounts

Choose Deployment Method

Docker Deployment

Step 1: Install NVIDIA Driver

Step 2: Install Docker

Step 3: Install NVIDIA Container Toolkit

Step 4: Clone Repository and Configure

Step 5: Create Docker Compose File

Step 6: Build and Start

Step 7: Verify GPU Access

Systemd Deployment

Step 1: Install NVIDIA Driver

Step 2: Install Dependencies

Step 3: Configure Environment

Step 4: Install Python Packages

Step 5: Create Systemd Service

Step 6: Verify Service

Configure HTTPS with Caddy

Install Caddy

Configure Reverse Proxy

Verify HTTPS

Configure Main Reflector Server

Service Management

Docker

Systemd

Monitor GPU

Troubleshooting

nvidia-smi fails after driver install

Service fails with "Could not download pyannote pipeline"

cuDNN library loading errors (Systemd only)

Cannot connect to HTTPS endpoint

SSL certificate not provisioning

Docker container won't start

Updating

Docker

Systemd

9.9 KiB

Raw Blame History