mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-02-05 18:36:45 +00:00
docs: docs website + installation (#778)
* feat: WIP doc (vibe started and iterated) * install from scratch docs * caddyfile.example * gitignore * authentik script * authentik script * authentik script * llm doc * authentik ongoing * more daily setup logs * doc website * gpu self hosted setup guide (no-mistakes) * doc review round * doc review round * doc review round * update doc site sidebars * feat(docs): add mermaid diagram support * docs polishing * live pipeline doc * move pipeline dev docs to dev docs location * doc pr review iteration * dockerfile healthcheck * docs/pr-comments * remove jwt comment * llm suggestion * pr comments * pr comments * document auto migrations * cleanup docs --------- Co-authored-by: Mathieu Virbel <mat@meltingrocks.com> Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
This commit is contained in:
307
docs/docs/installation/self-hosted-gpu-setup.md
Normal file
307
docs/docs/installation/self-hosted-gpu-setup.md
Normal file
@@ -0,0 +1,307 @@
|
||||
---
|
||||
sidebar_position: 5
|
||||
title: Self-Hosted GPU Setup
|
||||
---
|
||||
|
||||
# Self-Hosted GPU Setup
|
||||
|
||||
This guide covers deploying Reflector's GPU processing on your own server instead of Modal.com. For the complete deployment guide, see [Deployment Guide](./overview).
|
||||
|
||||
## When to Use Self-Hosted GPU
|
||||
|
||||
**Choose self-hosted GPU if you:**
|
||||
- Have GPU hardware available (NVIDIA required)
|
||||
- Want full control over processing
|
||||
- Prefer fixed infrastructure costs over pay-per-use
|
||||
- Have privacy or data locality requirements
|
||||
- Need to process audio without external API calls
|
||||
|
||||
**Choose Modal.com instead if you:**
|
||||
- Don't have GPU hardware
|
||||
- Want zero infrastructure management
|
||||
- Prefer pay-per-use pricing
|
||||
- Need instant scaling for variable workloads
|
||||
|
||||
See [Modal.com Setup](./modal-setup) for cloud GPU deployment.
|
||||
|
||||
## What Gets Deployed
|
||||
|
||||
The self-hosted GPU service provides the same API endpoints as Modal:
|
||||
- `POST /v1/audio/transcriptions` - Whisper transcription
|
||||
- `POST /v1/audio/transcriptions-from-url` - Transcribe from URL
|
||||
- `POST /diarize` - Pyannote speaker diarization
|
||||
- `POST /translate` - Audio translation
|
||||
|
||||
Your main Reflector server connects to this service exactly like it connects to Modal - only the URL changes.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Hardware
|
||||
- **GPU**: NVIDIA GPU with 8GB+ VRAM (tested on Tesla T4 with 15GB)
|
||||
- **CPU**: 4+ cores recommended
|
||||
- **RAM**: 8GB minimum, 16GB recommended
|
||||
- **Disk**: 40-50GB minimum
|
||||
|
||||
### Software
|
||||
- Public IP address
|
||||
- Domain name with DNS A record pointing to server
|
||||
|
||||
### Accounts
|
||||
- **HuggingFace account** with accepted Pyannote licenses:
|
||||
- https://huggingface.co/pyannote/speaker-diarization-3.1
|
||||
- https://huggingface.co/pyannote/segmentation-3.0
|
||||
- **HuggingFace access token** from https://huggingface.co/settings/tokens
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
### Step 1: Install NVIDIA Driver
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install -y nvidia-driver-535
|
||||
sudo reboot
|
||||
|
||||
# After reboot, verify installation
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
Expected output: GPU details with driver version and CUDA version.
|
||||
|
||||
### Step 2: Install Docker
|
||||
|
||||
Follow the [official Docker installation guide](https://docs.docker.com/engine/install/ubuntu/) for your distribution.
|
||||
|
||||
After installation, add your user to the docker group:
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
|
||||
# Log out and back in for group changes
|
||||
exit
|
||||
# SSH back in
|
||||
```
|
||||
|
||||
### Step 3: Install NVIDIA Container Toolkit
|
||||
|
||||
```bash
|
||||
# Add NVIDIA repository and install toolkit
|
||||
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
|
||||
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||
|
||||
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
||||
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
||||
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||
|
||||
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
|
||||
sudo nvidia-ctk runtime configure --runtime=docker
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
### Step 4: Clone Repository and Configure
|
||||
|
||||
```bash
|
||||
git clone https://github.com/monadical-sas/reflector.git
|
||||
cd reflector/gpu/self_hosted
|
||||
|
||||
# Create environment file
|
||||
cat > .env << EOF
|
||||
REFLECTOR_GPU_APIKEY=$(openssl rand -hex 16)
|
||||
HF_TOKEN=your_huggingface_token_here
|
||||
EOF
|
||||
|
||||
# Note the generated API key - you'll need it for main server config
|
||||
cat .env
|
||||
```
|
||||
|
||||
### Step 5: Build and Start
|
||||
|
||||
The repository includes a `compose.yml` file. Build and start:
|
||||
|
||||
|
||||
```bash
|
||||
# Build image (takes ~5 minutes, downloads ~10GB)
|
||||
sudo docker compose build
|
||||
|
||||
# Start service
|
||||
sudo docker compose up -d
|
||||
|
||||
# Wait for startup and verify
|
||||
sleep 30
|
||||
sudo docker compose logs
|
||||
```
|
||||
|
||||
Look for: `INFO: Application startup complete. Uvicorn running on http://0.0.0.0:8000`
|
||||
|
||||
### Step 7: Verify GPU Access
|
||||
|
||||
```bash
|
||||
# Check GPU is accessible from container
|
||||
sudo docker exec $(sudo docker ps -q) nvidia-smi
|
||||
```
|
||||
|
||||
Should show GPU with ~3GB VRAM used (models loaded).
|
||||
|
||||
---
|
||||
|
||||
## Configure HTTPS with Caddy
|
||||
|
||||
Caddy handles SSL automatically.
|
||||
|
||||
### Install Caddy
|
||||
|
||||
```bash
|
||||
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
|
||||
|
||||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | \
|
||||
sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
|
||||
|
||||
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | \
|
||||
sudo tee /etc/apt/sources.list.d/caddy-stable.list
|
||||
|
||||
sudo apt update
|
||||
sudo apt install -y caddy
|
||||
```
|
||||
|
||||
### Configure Reverse Proxy
|
||||
|
||||
Edit the Caddyfile with your domain:
|
||||
|
||||
```bash
|
||||
sudo nano /etc/caddy/Caddyfile
|
||||
```
|
||||
|
||||
Add (replace `gpu.example.com` with your domain):
|
||||
|
||||
```
|
||||
gpu.example.com {
|
||||
reverse_proxy localhost:8000
|
||||
}
|
||||
```
|
||||
|
||||
Reload Caddy (auto-provisions SSL certificate):
|
||||
|
||||
```bash
|
||||
sudo systemctl reload caddy
|
||||
```
|
||||
|
||||
### Verify HTTPS
|
||||
|
||||
```bash
|
||||
curl -I https://gpu.example.com/docs
|
||||
# Should return HTTP/2 200
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configure Main Reflector Server
|
||||
|
||||
On your main Reflector server, update `server/.env`:
|
||||
|
||||
```env
|
||||
# GPU Processing - Self-hosted
|
||||
TRANSCRIPT_BACKEND=modal
|
||||
TRANSCRIPT_URL=https://gpu.example.com
|
||||
TRANSCRIPT_MODAL_API_KEY=<your-generated-api-key>
|
||||
|
||||
DIARIZATION_BACKEND=modal
|
||||
DIARIZATION_URL=https://gpu.example.com
|
||||
DIARIZATION_MODAL_API_KEY=<your-generated-api-key>
|
||||
```
|
||||
|
||||
**Note:** The backend type is `modal` because the self-hosted GPU service implements the same API contract as Modal.com. This allows you to switch between cloud and self-hosted GPU processing by only changing the URL and API key.
|
||||
|
||||
Restart services to apply:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.prod.yml restart server worker
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Management
|
||||
|
||||
All commands in this section assume you're in `~/reflector/gpu/self_hosted/`.
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
sudo docker compose logs -f
|
||||
|
||||
# Restart service
|
||||
sudo docker compose restart
|
||||
|
||||
# Stop service
|
||||
sudo docker compose down
|
||||
|
||||
# Check status
|
||||
sudo docker compose ps
|
||||
```
|
||||
|
||||
### Monitor GPU
|
||||
|
||||
```bash
|
||||
# Check GPU usage
|
||||
nvidia-smi
|
||||
|
||||
# Watch in real-time
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
**Typical GPU memory usage:**
|
||||
- Idle (models loaded): ~3GB VRAM
|
||||
- During transcription: ~4-5GB VRAM
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### nvidia-smi fails after driver install
|
||||
|
||||
```bash
|
||||
# Manually load kernel modules
|
||||
sudo modprobe nvidia
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
### Service fails with "Could not download pyannote pipeline"
|
||||
|
||||
1. Verify HF_TOKEN is valid: `echo $HF_TOKEN`
|
||||
2. Check model access at https://huggingface.co/pyannote/speaker-diarization-3.1
|
||||
3. Update .env with correct token
|
||||
4. Restart service: `sudo docker compose restart`
|
||||
|
||||
### Cannot connect to HTTPS endpoint
|
||||
|
||||
1. Verify DNS resolves: `dig +short gpu.example.com`
|
||||
2. Check firewall: `sudo ufw status` (ports 80, 443 must be open)
|
||||
3. Check Caddy: `sudo systemctl status caddy`
|
||||
4. View Caddy logs: `sudo journalctl -u caddy -n 50`
|
||||
|
||||
### SSL certificate not provisioning
|
||||
|
||||
Requirements for Let's Encrypt:
|
||||
- Ports 80 and 443 publicly accessible
|
||||
- DNS resolves to server's public IP
|
||||
- Valid domain (not localhost or private IP)
|
||||
|
||||
### Docker container won't start
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
sudo docker compose logs
|
||||
|
||||
# Common issues:
|
||||
# - Port 8000 already in use
|
||||
# - GPU not accessible (nvidia-ctk not configured)
|
||||
# - Missing .env file
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Updating
|
||||
|
||||
```bash
|
||||
cd ~/reflector/gpu/self_hosted
|
||||
git pull
|
||||
sudo docker compose build
|
||||
sudo docker compose up -d
|
||||
```
|
||||
Reference in New Issue
Block a user