doc review round

This commit is contained in:
Igor Loskutov
2025-12-09 12:11:22 -05:00
parent 2b3f28993f
commit d890061056
3 changed files with 33 additions and 92 deletions

View File

@@ -26,13 +26,15 @@ flowchart LR
Before starting, you need: Before starting, you need:
- [ ] **Production server** - Ubuntu 22.04+, 4+ cores, 8GB+ RAM, public IP - **Production server** - 4+ cores, 8GB+ RAM, public IP
- [ ] **Two domain names** - e.g., `app.example.com` (frontend) and `api.example.com` (backend) - **Two domain names** - e.g., `app.example.com` (frontend) and `api.example.com` (backend)
- [ ] **GPU processing** - Choose one: - **GPU processing** - Choose one:
- Modal.com account (free tier at https://modal.com), OR - Modal.com account, OR
- GPU server with NVIDIA GPU (8GB+ VRAM) - GPU server with NVIDIA GPU (8GB+ VRAM)
- [ ] **HuggingFace account** - Free at https://huggingface.co - **HuggingFace account** - Free at https://huggingface.co
- [ ] **OpenAI API key** - For summaries and topic detection at https://platform.openai.com/account/api-keys - **LLM API** - For summaries and topic detection. Choose one:
- OpenAI API key at https://platform.openai.com/account/api-keys, OR
- Any OpenAI-compatible endpoint (vLLM, LiteLLM, Ollama, etc.)
### Optional (for live meeting rooms) ### Optional (for live meeting rooms)
@@ -41,52 +43,40 @@ Before starting, you need:
--- ---
## Step 1: Configure DNS ## Configure DNS
**Location: Your domain registrar / DNS provider**
Create A records pointing to your server:
``` ```
Type: A Name: app Value: <your-server-ip> Type: A Name: app Value: <your-server-ip>
Type: A Name: api Value: <your-server-ip> Type: A Name: api Value: <your-server-ip>
``` ```
Verify propagation (wait a few minutes):
```bash
dig app.example.com +short
dig api.example.com +short
# Both should return your server IP
```
--- ---
## Step 2: Deploy GPU Processing ## Deploy GPU Processing
Reflector requires GPU processing for transcription (Whisper) and speaker diarization (Pyannote). Choose one option: Reflector requires GPU processing for transcription and speaker diarization. Choose one option:
| | **Modal.com (Cloud)** | **Self-Hosted GPU** | | | **Modal.com (Cloud)** | **Self-Hosted GPU** |
|---|---|---| |---|---|---|
| **Best for** | No GPU hardware, zero maintenance | Own GPU server, full control | | **Best for** | No GPU hardware, zero maintenance | Own GPU server, full control |
| **Pricing** | Pay-per-use (~$0.01-0.10/min audio) | Fixed infrastructure cost | | **Pricing** | Pay-per-use | Fixed infrastructure cost |
| **Setup** | Run from laptop (browser auth) | Run on GPU server |
| **Scaling** | Automatic | Manual |
### Option A: Modal.com (Serverless Cloud GPU) ### Option A: Modal.com (Serverless Cloud GPU)
**Location: YOUR LOCAL COMPUTER (laptop/desktop)**
Modal requires browser authentication, so this runs locally - not on your server.
#### Accept HuggingFace Licenses #### Accept HuggingFace Licenses
Visit both pages and click "Accept": Visit both pages and click "Accept":
- https://huggingface.co/pyannote/speaker-diarization-3.1 - https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0 - https://huggingface.co/pyannote/segmentation-3.0
Then generate a token at https://huggingface.co/settings/tokens Generate a token at https://huggingface.co/settings/tokens
#### Deploy to Modal #### Deploy to Modal
There's an install script to help with this setup. It's using modal API to set all necessary moving parts.
As an alternative, all those operations that script does could be performed in modal settings in modal UI.
```bash ```bash
pip install modal pip install modal
modal setup # opens browser for authentication modal setup # opens browser for authentication
@@ -96,7 +86,7 @@ cd reflector/gpu/modal_deployments
./deploy-all.sh --hf-token YOUR_HUGGINGFACE_TOKEN ./deploy-all.sh --hf-token YOUR_HUGGINGFACE_TOKEN
``` ```
**Save the output** - copy the configuration block, you'll need it for Step 4. **Save the output** - copy the configuration block, you'll need it soon.
See [Modal Setup](./modal-setup) for troubleshooting and details. See [Modal Setup](./modal-setup) for troubleshooting and details.
@@ -114,13 +104,13 @@ See [Self-Hosted GPU Setup](./self-hosted-gpu-setup) for complete instructions.
4. Start service (Docker compose or systemd) 4. Start service (Docker compose or systemd)
5. Set up Caddy reverse proxy for HTTPS 5. Set up Caddy reverse proxy for HTTPS
**Save your API key and HTTPS URL** - you'll need them for Step 4. **Save your API key and HTTPS URL** - you'll need them soon.
--- ---
## Step 3: Prepare Server ## Prepare Server
**Location: YOUR SERVER (via SSH)** **Location: dedicated reflector server**
### Install Docker ### Install Docker
@@ -150,7 +140,7 @@ cd reflector
--- ---
## Step 4: Configure Environment ## Configure Environment
**Location: YOUR SERVER (via SSH, in the `reflector` directory)** **Location: YOUR SERVER (via SSH, in the `reflector` directory)**
@@ -183,7 +173,7 @@ CORS_ALLOW_CREDENTIALS=true
# Secret key - generate with: openssl rand -hex 32 # Secret key - generate with: openssl rand -hex 32
SECRET_KEY=<your-generated-secret> SECRET_KEY=<your-generated-secret>
# GPU Processing - choose ONE option from Step 2: # GPU Processing - choose ONE option:
# Option A: Modal.com (paste from deploy-all.sh output) # Option A: Modal.com (paste from deploy-all.sh output)
TRANSCRIPT_BACKEND=modal TRANSCRIPT_BACKEND=modal
@@ -208,7 +198,7 @@ TRANSCRIPT_STORAGE_BACKEND=local
LLM_API_KEY=sk-your-openai-api-key LLM_API_KEY=sk-your-openai-api-key
LLM_MODEL=gpt-4o-mini LLM_MODEL=gpt-4o-mini
# Auth - disable for initial setup (see Step 8 for authentication) # Auth - disable for initial setup (see a dedicated step for authentication)
AUTH_BACKEND=none AUTH_BACKEND=none
``` ```
@@ -237,7 +227,7 @@ FEATURE_REQUIRE_LOGIN=false
--- ---
## Step 5: Configure Caddy ## Configure Caddy
**Location: YOUR SERVER (via SSH)** **Location: YOUR SERVER (via SSH)**
@@ -260,7 +250,7 @@ Replace `example.com` with your domains. The `{$VAR:default}` syntax uses Caddy'
--- ---
## Step 6: Start Services ## Start Services
**Location: YOUR SERVER (via SSH)** **Location: YOUR SERVER (via SSH)**
@@ -280,7 +270,7 @@ docker compose -f docker-compose.prod.yml exec server uv run alembic upgrade hea
--- ---
## Step 7: Verify Deployment ## Verify Deployment
### Check services ### Check services
```bash ```bash
@@ -307,9 +297,9 @@ curl https://api.example.com/health
--- ---
## Step 8: Enable Authentication (Required for Live Rooms) ## Enable Authentication (Required for Live Rooms)
By default, Reflector is open (no login required). **Authentication is required if you want to use Live Meeting Rooms (Step 9).** By default, Reflector is open (no login required). **Authentication is required if you want to use Live Meeting Rooms.**
See [Authentication Setup](./auth-setup) for full Authentik OAuth configuration. See [Authentication Setup](./auth-setup) for full Authentik OAuth configuration.
@@ -323,9 +313,9 @@ Quick summary:
--- ---
## Step 9: Enable Live Meeting Rooms ## Enable Live Meeting Rooms
**Requires: Step 8 (Authentication)** **Requires: Authentication Step**
Live rooms require Daily.co and AWS S3. See [Daily.co Setup](./daily-setup) for complete S3/IAM configuration instructions. Live rooms require Daily.co and AWS S3. See [Daily.co Setup](./daily-setup) for complete S3/IAM configuration instructions.

View File

@@ -43,7 +43,6 @@ Your main Reflector server connects to this service exactly like it connects to
- Systemd method: 25-30GB minimum - Systemd method: 25-30GB minimum
### Software ### Software
- Ubuntu 22.04 or 24.04
- Public IP address - Public IP address
- Domain name with DNS A record pointing to server - Domain name with DNS A record pointing to server
@@ -55,34 +54,6 @@ Your main Reflector server connects to this service exactly like it connects to
## Choose Deployment Method ## Choose Deployment Method
### Docker Deployment (Recommended)
**Pros:**
- Container isolation and reproducibility
- No manual library path configuration
- Easier to replicate across servers
- Built-in restart policies
- Simpler dependency management
**Cons:**
- Higher disk usage (~15GB for container)
- Requires 40-50GB disk minimum
**Best for:** Teams wanting reproducible deployments, multiple GPU servers
### Systemd Deployment
**Pros:**
- Lower disk usage (~8GB total)
- Direct GPU access (no container layer)
- Works on smaller disks (25-30GB)
**Cons:**
- Manual `LD_LIBRARY_PATH` configuration
- Less portable across systems
**Best for:** Single GPU server, limited disk space
--- ---
## Docker Deployment ## Docker Deployment
@@ -422,16 +393,6 @@ watch -n 1 nvidia-smi
--- ---
## Performance Notes
**Tesla T4 benchmarks:**
- Transcription: ~2-3x real-time (10 min audio in 3-5 min)
- Diarization: ~1.5x real-time
- Max concurrent requests: 2-3 (depends on audio length)
- First request warmup: ~10 seconds (model loading)
---
## Troubleshooting ## Troubleshooting
### nvidia-smi fails after driver install ### nvidia-smi fails after driver install
@@ -483,16 +444,6 @@ sudo docker compose logs
--- ---
## Security Considerations
1. **API Key**: Keep `REFLECTOR_GPU_APIKEY` secret, rotate periodically
2. **HuggingFace Token**: Treat as password, never commit to git
3. **Firewall**: Only expose ports 80 and 443 publicly
4. **Updates**: Regularly update system packages
5. **Monitoring**: Set up alerts for service failures
---
## Updating ## Updating
### Docker ### Docker

View File

@@ -6,7 +6,7 @@ usage() {
echo "Usage: $0 [OPTIONS]" echo "Usage: $0 [OPTIONS]"
echo "" echo ""
echo "Options:" echo "Options:"
echo " --hf-token TOKEN HuggingFace token for Pyannote model" echo " --hf-token TOKEN HuggingFace token"
echo " --help Show this help message" echo " --help Show this help message"
echo "" echo ""
echo "Examples:" echo "Examples:"
@@ -88,7 +88,7 @@ if [[ ! "$HF_TOKEN" =~ ^hf_ ]]; then
fi fi
fi fi
# --- Auto-generate API Key --- # --- Auto-generate reflector<->GPU API Key ---
echo "" echo ""
echo "Generating API key for GPU services..." echo "Generating API key for GPU services..."
API_KEY=$(openssl rand -hex 32) API_KEY=$(openssl rand -hex 32)