From d890061056e847c64876dda5978494a3265ab20c Mon Sep 17 00:00:00 2001
From: Igor Loskutov <igor.loskutoff@gmail.com>
Date: Tue, 9 Dec 2025 12:11:22 -0500
Subject: [PATCH] doc review round

---
 docs/docs/installation/overview.md            | 72 ++++++++-----------
 .../installation/self-hosted-gpu-setup.md     | 49 -------------
 gpu/modal_deployments/deploy-all.sh           |  4 +-
 3 files changed, 33 insertions(+), 92 deletions(-)
diff --git a/docs/docs/installation/overview.md b/docs/docs/installation/overview.md
index 6813e989..fa2c8a21 100644
--- a/docs/docs/installation/overview.md
+++ b/docs/docs/installation/overview.md
@@ -26,13 +26,15 @@ flowchart LR
 
 Before starting, you need:
 
-- [ ] **Production server** - Ubuntu 22.04+, 4+ cores, 8GB+ RAM, public IP
-- [ ] **Two domain names** - e.g., `app.example.com` (frontend) and `api.example.com` (backend)
-- [ ] **GPU processing** - Choose one:
-  - Modal.com account (free tier at https://modal.com), OR
+- **Production server** -  4+ cores, 8GB+ RAM, public IP
+- **Two domain names** - e.g., `app.example.com` (frontend) and `api.example.com` (backend)
+- **GPU processing** - Choose one:
+  - Modal.com account, OR
   - GPU server with NVIDIA GPU (8GB+ VRAM)
-- [ ] **HuggingFace account** - Free at https://huggingface.co
-- [ ] **OpenAI API key** - For summaries and topic detection at https://platform.openai.com/account/api-keys
+- **HuggingFace account** - Free at https://huggingface.co
+- **LLM API** - For summaries and topic detection. Choose one:
+  - OpenAI API key at https://platform.openai.com/account/api-keys, OR
+  - Any OpenAI-compatible endpoint (vLLM, LiteLLM, Ollama, etc.)
 
 ### Optional (for live meeting rooms)
 
@@ -41,52 +43,40 @@ Before starting, you need:
 
 ---
 
-## Step 1: Configure DNS
+## Configure DNS
 
-**Location: Your domain registrar / DNS provider**
-
-Create A records pointing to your server:
 ```
 Type: A    Name: app    Value: <your-server-ip>
 Type: A    Name: api    Value: <your-server-ip>
 ```
 
-Verify propagation (wait a few minutes):
-```bash
-dig app.example.com +short
-dig api.example.com +short
-# Both should return your server IP
-```
-
 ---
 
-## Step 2: Deploy GPU Processing
+## Deploy GPU Processing
 
-Reflector requires GPU processing for transcription (Whisper) and speaker diarization (Pyannote). Choose one option:
+Reflector requires GPU processing for transcription and speaker diarization. Choose one option:
 
 | | **Modal.com (Cloud)** | **Self-Hosted GPU** |
 |---|---|---|
 | **Best for** | No GPU hardware, zero maintenance | Own GPU server, full control |
-| **Pricing** | Pay-per-use (~$0.01-0.10/min audio) | Fixed infrastructure cost |
-| **Setup** | Run from laptop (browser auth) | Run on GPU server |
-| **Scaling** | Automatic | Manual |
+| **Pricing** | Pay-per-use | Fixed infrastructure cost |
 
 ### Option A: Modal.com (Serverless Cloud GPU)
 
-**Location: YOUR LOCAL COMPUTER (laptop/desktop)**
-
-Modal requires browser authentication, so this runs locally - not on your server.
-
 #### Accept HuggingFace Licenses
 
 Visit both pages and click "Accept":
 - https://huggingface.co/pyannote/speaker-diarization-3.1
 - https://huggingface.co/pyannote/segmentation-3.0
 
-Then generate a token at https://huggingface.co/settings/tokens
+Generate a token at https://huggingface.co/settings/tokens
 
 #### Deploy to Modal
 
+There's an install script to help with this setup. It's using modal API to set all necessary moving parts.
+
+As an alternative, all those operations that script does could be performed in modal settings in modal UI.
+
 ```bash
 pip install modal
 modal setup  # opens browser for authentication
@@ -96,7 +86,7 @@ cd reflector/gpu/modal_deployments
 ./deploy-all.sh --hf-token YOUR_HUGGINGFACE_TOKEN
 ```
 
-**Save the output** - copy the configuration block, you'll need it for Step 4.
+**Save the output** - copy the configuration block, you'll need it soon.
 
 See [Modal Setup](./modal-setup) for troubleshooting and details.
 
@@ -114,13 +104,13 @@ See [Self-Hosted GPU Setup](./self-hosted-gpu-setup) for complete instructions.
 4. Start service (Docker compose or systemd)
 5. Set up Caddy reverse proxy for HTTPS
 
-**Save your API key and HTTPS URL** - you'll need them for Step 4.
+**Save your API key and HTTPS URL** - you'll need them soon.
 
 ---
 
-## Step 3: Prepare Server
+## Prepare Server
 
-**Location: YOUR SERVER (via SSH)**
+**Location: dedicated reflector server**
 
 ### Install Docker
 
@@ -150,7 +140,7 @@ cd reflector
 
 ---
 
-## Step 4: Configure Environment
+## Configure Environment
 
 **Location: YOUR SERVER (via SSH, in the `reflector` directory)**
 
@@ -183,7 +173,7 @@ CORS_ALLOW_CREDENTIALS=true
 # Secret key - generate with: openssl rand -hex 32
 SECRET_KEY=<your-generated-secret>
 
-# GPU Processing - choose ONE option from Step 2:
+# GPU Processing - choose ONE option:
 
 # Option A: Modal.com (paste from deploy-all.sh output)
 TRANSCRIPT_BACKEND=modal
@@ -208,7 +198,7 @@ TRANSCRIPT_STORAGE_BACKEND=local
 LLM_API_KEY=sk-your-openai-api-key
 LLM_MODEL=gpt-4o-mini
 
-# Auth - disable for initial setup (see Step 8 for authentication)
+# Auth - disable for initial setup (see a dedicated step for authentication)
 AUTH_BACKEND=none
 ```
 
@@ -237,7 +227,7 @@ FEATURE_REQUIRE_LOGIN=false
 
 ---
 
-## Step 5: Configure Caddy
+## Configure Caddy
 
 **Location: YOUR SERVER (via SSH)**
 
@@ -260,7 +250,7 @@ Replace `example.com` with your domains. The `{$VAR:default}` syntax uses Caddy'
 
 ---
 
-## Step 6: Start Services
+## Start Services
 
 **Location: YOUR SERVER (via SSH)**
 
@@ -280,7 +270,7 @@ docker compose -f docker-compose.prod.yml exec server uv run alembic upgrade hea
 
 ---
 
-## Step 7: Verify Deployment
+## Verify Deployment
 
 ### Check services
 ```bash
@@ -307,9 +297,9 @@ curl https://api.example.com/health
 
 ---
 
-## Step 8: Enable Authentication (Required for Live Rooms)
+## Enable Authentication (Required for Live Rooms)
 
-By default, Reflector is open (no login required). **Authentication is required if you want to use Live Meeting Rooms (Step 9).**
+By default, Reflector is open (no login required). **Authentication is required if you want to use Live Meeting Rooms.**
 
 See [Authentication Setup](./auth-setup) for full Authentik OAuth configuration.
 
@@ -323,9 +313,9 @@ Quick summary:
 
 ---
 
-## Step 9: Enable Live Meeting Rooms
+## Enable Live Meeting Rooms
 
-**Requires: Step 8 (Authentication)**
+**Requires: Authentication Step**
 
 Live rooms require Daily.co and AWS S3. See [Daily.co Setup](./daily-setup) for complete S3/IAM configuration instructions.
 
diff --git a/docs/docs/installation/self-hosted-gpu-setup.md b/docs/docs/installation/self-hosted-gpu-setup.md
index 288685e4..dd062ff5 100644
--- a/docs/docs/installation/self-hosted-gpu-setup.md
+++ b/docs/docs/installation/self-hosted-gpu-setup.md
@@ -43,7 +43,6 @@ Your main Reflector server connects to this service exactly like it connects to
   - Systemd method: 25-30GB minimum
 
 ### Software
-- Ubuntu 22.04 or 24.04
 - Public IP address
 - Domain name with DNS A record pointing to server
 
@@ -55,34 +54,6 @@ Your main Reflector server connects to this service exactly like it connects to
 
 ## Choose Deployment Method
 
-### Docker Deployment (Recommended)
-
-**Pros:**
-- Container isolation and reproducibility
-- No manual library path configuration
-- Easier to replicate across servers
-- Built-in restart policies
-- Simpler dependency management
-
-**Cons:**
-- Higher disk usage (~15GB for container)
-- Requires 40-50GB disk minimum
-
-**Best for:** Teams wanting reproducible deployments, multiple GPU servers
-
-### Systemd Deployment
-
-**Pros:**
-- Lower disk usage (~8GB total)
-- Direct GPU access (no container layer)
-- Works on smaller disks (25-30GB)
-
-**Cons:**
-- Manual `LD_LIBRARY_PATH` configuration
-- Less portable across systems
-
-**Best for:** Single GPU server, limited disk space
-
 ---
 
 ## Docker Deployment
@@ -422,16 +393,6 @@ watch -n 1 nvidia-smi
 
 ---
 
-## Performance Notes
-
-**Tesla T4 benchmarks:**
-- Transcription: ~2-3x real-time (10 min audio in 3-5 min)
-- Diarization: ~1.5x real-time
-- Max concurrent requests: 2-3 (depends on audio length)
-- First request warmup: ~10 seconds (model loading)
-
----
-
 ## Troubleshooting
 
 ### nvidia-smi fails after driver install
@@ -483,16 +444,6 @@ sudo docker compose logs
 
 ---
 
-## Security Considerations
-
-1. **API Key**: Keep `REFLECTOR_GPU_APIKEY` secret, rotate periodically
-2. **HuggingFace Token**: Treat as password, never commit to git
-3. **Firewall**: Only expose ports 80 and 443 publicly
-4. **Updates**: Regularly update system packages
-5. **Monitoring**: Set up alerts for service failures
-
----
-
 ## Updating
 
 ### Docker
diff --git a/gpu/modal_deployments/deploy-all.sh b/gpu/modal_deployments/deploy-all.sh
index f2eb60ef..42e589a2 100755
--- a/gpu/modal_deployments/deploy-all.sh
+++ b/gpu/modal_deployments/deploy-all.sh
@@ -6,7 +6,7 @@ usage() {
     echo "Usage: $0 [OPTIONS]"
     echo ""
     echo "Options:"
-    echo "  --hf-token TOKEN    HuggingFace token for Pyannote model"
+    echo "  --hf-token TOKEN    HuggingFace token"
     echo "  --help              Show this help message"
     echo ""
     echo "Examples:"
@@ -88,7 +88,7 @@ if [[ ! "$HF_TOKEN" =~ ^hf_ ]]; then
     fi
 fi
 
-# --- Auto-generate API Key ---
+# --- Auto-generate reflector<->GPU API Key ---
 echo ""
 echo "Generating API key for GPU services..."
 API_KEY=$(openssl rand -hex 32)