Files
reflector/docs/create-docs.sh

777 lines
19 KiB
Bash
Executable File

#!/bin/bash
# Create directory structure
mkdir -p docs/concepts
mkdir -p docs/installation
mkdir -p docs/pipelines
mkdir -p docs/reference/architecture
mkdir -p docs/reference/processors
mkdir -p docs/reference/api
# Create all documentation files with content
echo "Creating documentation files..."
# Concepts - Modes
cat > docs/concepts/modes.md << 'EOF'
---
sidebar_position: 2
title: Operating Modes
---
# Operating Modes
Reflector operates in two distinct modes to accommodate different use cases and security requirements.
## Public Mode
Public mode provides immediate access to core transcription features without requiring authentication.
### Features Available
- **File Upload**: Process audio files up to 2GB
- **Live Transcription**: Stream audio from microphone
- **Basic Processing**: Transcription and diarization
- **Temporary Storage**: Results available for 24 hours
### Limitations
- No persistent storage
- No meeting rooms
- Limited to single-user sessions
- No team collaboration features
### Use Cases
- Quick transcription needs
- Testing and evaluation
- Individual users
- Public demonstrations
## Private Mode
Private mode unlocks the full potential of Reflector with authentication and persistent storage.
### Additional Features
- **Virtual Meeting Rooms**: Whereby integration
- **Team Collaboration**: Share transcripts with team
- **Persistent Storage**: Long-term transcript archive
- **Advanced Analytics**: Meeting insights and trends
- **Custom Integration**: Webhooks and API access
- **User Management**: Role-based access control
### Authentication Options
#### Authentik Integration
Enterprise-grade SSO with support for:
- SAML 2.0
- OAuth 2.0 / OIDC
- LDAP / Active Directory
- Multi-factor authentication
#### JWT Authentication
Stateless token-based auth for:
- API access
- Service-to-service communication
- Mobile applications
### Room Management
Virtual rooms provide dedicated spaces for meetings:
- **Persistent URLs**: Same link for recurring meetings
- **Access Control**: Invite-only or open rooms
- **Recording Consent**: Automatic consent management
- **Custom Settings**: Per-room configuration
## Mode Selection
The mode is determined by your deployment configuration:
```yaml
# Public Mode (no authentication)
REFLECTOR_AUTH_BACKEND=none
# Private Mode (with authentication)
REFLECTOR_AUTH_BACKEND=jwt
# or
REFLECTOR_AUTH_BACKEND=authentik
```
## Feature Comparison
| Feature | Public Mode | Private Mode |
|---------|------------|--------------|
| File Upload | ✅ | ✅ |
| Live Transcription | ✅ | ✅ |
| Speaker Diarization | ✅ | ✅ |
| Translation | ✅ | ✅ |
| Summarization | ✅ | ✅ |
| Meeting Rooms | ❌ | ✅ |
| Persistent Storage | ❌ | ✅ |
| Team Collaboration | ❌ | ✅ |
| API Access | Limited | Full |
| User Management | ❌ | ✅ |
| Custom Branding | ❌ | ✅ |
| Analytics | ❌ | ✅ |
| Webhooks | ❌ | ✅ |
## Security Considerations
### Public Mode Security
- Rate limiting to prevent abuse
- File size restrictions
- Automatic cleanup of old data
- No PII storage
### Private Mode Security
- Encrypted data storage
- Audit logging
- Session management
- Access control lists
- Data retention policies
## Choosing the Right Mode
### Choose Public Mode if:
- You need quick, one-time transcriptions
- You're evaluating Reflector
- You don't need persistent storage
- You're processing non-sensitive content
### Choose Private Mode if:
- You need team collaboration
- You require persistent storage
- You're processing sensitive content
- You need meeting room functionality
- You want advanced analytics
EOF
# Concepts - Independence
cat > docs/concepts/independence.md << 'EOF'
---
sidebar_position: 3
title: Data Independence
---
# Data Independence & Privacy
Reflector is designed with privacy and data independence as core principles, giving you complete control over your data and processing.
## Privacy by Design
### No Third-Party Data Sharing
Your audio and transcripts are never shared with third parties:
- **Local Processing**: All ML models can run on your infrastructure
- **No Training on User Data**: Your content is never used to improve models
- **Isolated Processing**: Each transcript is processed in isolation
- **No Analytics Tracking**: No usage analytics sent to external services
### Data Ownership
You maintain complete ownership of all data:
- **Export Anytime**: Download all your transcripts and audio
- **Delete on Demand**: Permanent deletion with no recovery
- **API Access**: Full programmatic access to your data
- **No Vendor Lock-in**: Standard formats for easy migration
## Processing Transparency
### What Happens to Your Audio
1. **Upload/Stream**: Audio received by your server
2. **Temporary Storage**: Stored only for processing duration
3. **Processing**: ML models process audio locally or on Modal
4. **Results Storage**: Transcripts stored in your database
5. **Cleanup**: Original audio deleted (unless configured otherwise)
### Local vs Cloud Processing
#### Local Processing
When configured for local processing:
- All models run on your hardware
- No data leaves your infrastructure
- Complete air-gap capability
- Higher hardware requirements
#### Modal.com Processing
When using Modal for GPU acceleration:
- Audio chunks sent to Modal for processing
- Processed immediately and deleted
- No long-term storage on Modal
- Modal's security: SOC 2 Type II compliant
### Data Retention
Default retention policies:
- **Public Mode**: 24 hours then automatic deletion
- **Private Mode**: Configurable (default: indefinite)
- **Audio Files**: Deleted after processing (configurable)
- **Transcripts**: Retained based on policy
## Compliance Features
### GDPR Compliance
- **Right to Access**: Export all user data
- **Right to Deletion**: Permanent data removal
- **Data Portability**: Standard export formats
- **Privacy by Default**: Minimal data collection
### HIPAA Considerations
For healthcare deployments:
- **Self-hosted Option**: Complete infrastructure control
- **Encryption**: At rest and in transit
- **Audit Logging**: Complete access trail
- **Access Controls**: Role-based permissions
### Industry Standards
- **TLS 1.3**: Modern encryption for data in transit
- **AES-256**: Encryption for data at rest
- **JWT Tokens**: Secure, stateless authentication
- **OWASP Guidelines**: Security best practices
## Self-Hosted Deployment
### Complete Independence
Self-hosting provides maximum control:
- **Your Infrastructure**: Run on your servers
- **Your Network**: No external connections required
- **Your Policies**: Implement custom retention
- **Your Compliance**: Meet specific requirements
### Air-Gap Capability
Reflector can run completely offline:
1. Download all models during setup
2. Configure for local processing only
3. Disable all external integrations
4. Run in isolated network environment
## Data Flow Control
### Configurable Processing
Control where each step happens:
```yaml
# All local processing
TRANSCRIPT_BACKEND=local
DIARIZATION_BACKEND=local
TRANSLATION_BACKEND=local
# Hybrid approach
TRANSCRIPT_BACKEND=modal # Fast GPU processing
DIARIZATION_BACKEND=local # Sensitive speaker data
TRANSLATION_BACKEND=modal # Non-sensitive translation
```
### Storage Options
Choose where data is stored:
- **Local Filesystem**: Complete control
- **PostgreSQL**: Self-hosted database
- **S3-Compatible**: MinIO or AWS with encryption
- **Hybrid**: Different storage for different data types
## Security Architecture
### Defense in Depth
Multiple layers of security:
1. **Network Security**: Firewalls and VPNs
2. **Application Security**: Input validation and sanitization
3. **Data Security**: Encryption and access controls
4. **Operational Security**: Logging and monitoring
### Zero Trust Principles
- **Verify Everything**: All requests authenticated
- **Least Privilege**: Minimal permissions granted
- **Assume Breach**: Design for compromise containment
- **Encrypt Everything**: No plaintext transmission
## Audit and Compliance
### Audit Logging
Comprehensive logging of:
- **Access Events**: Who accessed what and when
- **Processing Events**: What was processed and how
- **Configuration Changes**: System modifications
- **Security Events**: Failed authentication attempts
### Compliance Reporting
Generate reports for:
- **Data Processing**: What data was processed
- **Data Access**: Who accessed the data
- **Data Retention**: What was retained or deleted
- **Security Events**: Security-related incidents
## Best Practices
### For Maximum Privacy
1. **Self-host** all components
2. **Use local processing** for all models
3. **Implement short retention** periods
4. **Encrypt all storage** at rest
5. **Use VPN** for all connections
6. **Regular audits** of access logs
### For Balanced Approach
1. **Self-host core services** (database, API)
2. **Use Modal for processing** (faster, cost-effective)
3. **Implement encryption** everywhere
4. **Regular backups** with encryption
5. **Monitor access** patterns
EOF
# Concepts - Pipeline
cat > docs/concepts/pipeline.md << 'EOF'
---
sidebar_position: 4
title: Processing Pipeline
---
# Processing Pipeline
Reflector uses a sophisticated pipeline architecture to process audio efficiently and accurately.
## Pipeline Overview
The processing pipeline consists of modular components that can be combined and configured based on your needs:
```mermaid
graph LR
A[Audio Input] --> B[Pre-processing]
B --> C[Chunking]
C --> D[Transcription]
D --> E[Diarization]
E --> F[Alignment]
F --> G[Post-processing]
G --> H[Output]
```
## Pipeline Components
### Audio Input
Accepts various input sources:
- **File Upload**: MP3, WAV, M4A, WebM, MP4
- **WebRTC Stream**: Live browser audio
- **Recording Integration**: Whereby recordings
- **API Upload**: Direct API submission
### Pre-processing
Prepares audio for optimal processing:
- **Format Conversion**: Convert to 16kHz mono WAV
- **Normalization**: Adjust volume to -23 LUFS
- **Noise Reduction**: Optional background noise removal
- **Validation**: Check duration and quality
### Chunking
Splits audio for parallel processing:
- **Fixed Size**: 30-second chunks by default
- **Overlap**: 1-second overlap for continuity
- **Smart Boundaries**: Attempt to split at silence
- **Metadata**: Track chunk positions
### Transcription
Converts speech to text:
- **Model Selection**: Whisper or Parakeet
- **Language Detection**: Automatic or specified
- **Timestamp Generation**: Word-level timing
- **Confidence Scores**: Quality indicators
### Diarization
Identifies different speakers:
- **Voice Activity Detection**: Find speech segments
- **Speaker Embedding**: Extract voice characteristics
- **Clustering**: Group similar voices
- **Label Assignment**: Assign speaker IDs
### Alignment
Merges all processing results:
- **Chunk Assembly**: Combine transcription chunks
- **Speaker Mapping**: Align speakers with text
- **Overlap Resolution**: Handle chunk boundaries
- **Timeline Creation**: Build unified timeline
### Post-processing
Enhances the final output:
- **Formatting**: Apply punctuation and capitalization
- **Translation**: Convert to target languages
- **Summarization**: Generate concise summaries
- **Topic Extraction**: Identify key themes
- **Action Items**: Extract tasks and decisions
## Processing Modes
### Batch Processing
For uploaded files:
- Optimized for throughput
- Parallel chunk processing
- Higher accuracy models
- Complete file analysis
### Stream Processing
For live audio:
- Optimized for latency
- Sequential processing
- Real-time feedback
- Progressive results
### Hybrid Processing
For meetings:
- Stream during meeting
- Batch after completion
- Best of both modes
- Maximum accuracy
## Pipeline Configuration
### Model Selection
Choose models based on requirements:
```python
# High accuracy (slower)
config = {
"transcription_model": "whisper-large-v3",
"diarization_model": "pyannote-3.1",
"translation_model": "seamless-m4t-large"
}
# Balanced (default)
config = {
"transcription_model": "whisper-base",
"diarization_model": "pyannote-3.1",
"translation_model": "seamless-m4t-medium"
}
# Fast processing
config = {
"transcription_model": "whisper-tiny",
"diarization_model": "pyannote-3.1-fast",
"translation_model": "seamless-m4t-small"
}
```
### Processing Options
Customize pipeline behavior:
```yaml
# Parallel processing
max_parallel_chunks: 10
chunk_size_seconds: 30
chunk_overlap_seconds: 1
# Quality settings
enable_noise_reduction: true
enable_normalization: true
min_speech_confidence: 0.5
# Post-processing
enable_translation: true
target_languages: ["es", "fr", "de"]
enable_summarization: true
summary_length: "medium"
```
## Performance Characteristics
### Processing Times
For 1 hour of audio:
| Pipeline Config | Processing Time | Accuracy |
|----------------|-----------------|----------|
| Fast | 2-3 minutes | 85-90% |
| Balanced | 5-8 minutes | 92-95% |
| High Accuracy | 15-20 minutes | 95-98% |
### Resource Usage
| Component | CPU Usage | Memory | GPU |
|-----------|-----------|---------|-----|
| Transcription | Medium | 2-4 GB | Required |
| Diarization | High | 4-8 GB | Required |
| Translation | Low | 2-3 GB | Optional |
| Post-processing | Low | 1-2 GB | Not needed |
## Pipeline Orchestration
### Celery Task Chain
The pipeline is orchestrated using Celery:
```python
chain = (
chunk_audio.s(audio_id) |
group(transcribe_chunk.s(chunk) for chunk in chunks) |
merge_transcriptions.s() |
diarize_audio.s() |
align_speakers.s() |
post_process.s()
)
```
### Error Handling
Robust error recovery:
- **Automatic Retry**: Failed tasks retry up to 3 times
- **Partial Recovery**: Continue with successful chunks
- **Fallback Models**: Use alternative models on failure
- **Error Reporting**: Detailed error messages
### Progress Tracking
Real-time progress updates:
- **Chunk Progress**: Track individual chunk processing
- **Overall Progress**: Percentage completion
- **ETA Calculation**: Estimated completion time
- **WebSocket Updates**: Live progress to clients
## Optimization Strategies
### GPU Utilization
Maximize GPU efficiency:
- **Batch Processing**: Process multiple chunks together
- **Model Caching**: Keep models loaded in memory
- **Dynamic Batching**: Adjust batch size based on GPU memory
- **Multi-GPU Support**: Distribute across available GPUs
### Memory Management
Efficient memory usage:
- **Streaming Processing**: Process large files in chunks
- **Garbage Collection**: Clean up after each chunk
- **Memory Limits**: Prevent out-of-memory errors
- **Disk Caching**: Use disk for large intermediate results
### Network Optimization
Minimize network overhead:
- **Compression**: Compress audio before transfer
- **CDN Integration**: Use CDN for static assets
- **Connection Pooling**: Reuse network connections
- **Parallel Uploads**: Multiple concurrent uploads
## Quality Assurance
### Accuracy Metrics
Monitor processing quality:
- **Word Error Rate (WER)**: Transcription accuracy
- **Diarization Error Rate (DER)**: Speaker identification accuracy
- **Translation BLEU Score**: Translation quality
- **Summary Coherence**: Summary quality metrics
### Validation Steps
Ensure output quality:
- **Confidence Thresholds**: Filter low-confidence segments
- **Consistency Checks**: Verify timeline consistency
- **Language Validation**: Ensure correct language detection
- **Format Validation**: Check output format compliance
## Advanced Features
### Custom Models
Use your own models:
- **Fine-tuned Whisper**: Domain-specific models
- **Custom Diarization**: Trained on your speakers
- **Specialized Post-processing**: Industry-specific formatting
### Pipeline Extensions
Add custom processing steps:
- **Sentiment Analysis**: Analyze emotional tone
- **Entity Extraction**: Identify people, places, organizations
- **Custom Metrics**: Calculate domain-specific metrics
- **Integration Hooks**: Call external services
EOF
# Create installation documentation
cat > docs/installation/overview.md << 'EOF'
---
sidebar_position: 1
title: Installation Overview
---
# Installation Overview
Reflector is designed for self-hosted deployment, giving you complete control over your infrastructure and data.
## Deployment Options
### Docker Deployment (Recommended)
The easiest way to deploy Reflector:
- Pre-configured containers
- Automated dependency management
- Consistent environment
- Easy updates
### Manual Installation
For custom deployments:
- Greater control over configuration
- Integration with existing infrastructure
- Custom optimization options
- Development environments
## Requirements
### System Requirements
**Minimum Requirements:**
- CPU: 4 cores
- RAM: 8 GB
- Storage: 50 GB
- OS: Ubuntu 20.04+ or similar Linux
**Recommended Requirements:**
- CPU: 8+ cores
- RAM: 16 GB
- Storage: 100 GB SSD
- GPU: NVIDIA GPU with 8GB+ VRAM (for local processing)
### Network Requirements
- Public IP address (for WebRTC)
- Ports: 80, 443, 8000, 3000
- Domain name (for SSL)
- SSL certificate (Let's Encrypt supported)
## Required Services
### Core Services
These services are required for basic operation:
1. **PostgreSQL** - Primary database
2. **Redis** - Message broker and cache
3. **Docker** - Container runtime
### GPU Processing
Choose one:
- **Modal.com** - Serverless GPU (recommended)
- **Local GPU** - Self-hosted GPU processing
### Optional Services
Enhance functionality with:
- **AWS S3** - Long-term storage
- **Whereby** - Video conferencing rooms
- **Authentik** - Enterprise authentication
- **Zulip** - Chat integration
## Quick Start
### Using Docker Compose
1. Clone the repository:
```bash
git clone https://github.com/monadical-sas/reflector.git
cd reflector
```
2. Navigate to docker directory:
```bash
cd docker
```
3. Copy and configure environment:
```bash
cp .env.example .env
# Edit .env with your settings
```
4. Start services:
```bash
docker compose up -d
```
5. Access Reflector:
- Frontend: https://your-domain.com
- API: https://your-domain.com/api
## Configuration Overview
### Essential Configuration
```env
# Database
DATABASE_URL=postgresql://user:pass@localhost/reflector
# Redis
REDIS_URL=redis://localhost:6379
# Modal.com (for GPU processing)
TRANSCRIPT_MODAL_API_KEY=your-key
DIARIZATION_MODAL_API_KEY=your-key
# Domain
DOMAIN=your-domain.com
```
### Security Configuration
```env
# Authentication
REFLECTOR_AUTH_BACKEND=jwt
NEXTAUTH_SECRET=generate-strong-secret
# SSL (handled by Caddy)
# Automatic with Let's Encrypt
```
## Service Architecture
```mermaid
graph TD
A[Caddy Reverse Proxy] --> B[Frontend - Next.js]
A --> C[Backend - FastAPI]
C --> D[PostgreSQL]
C --> E[Redis]
C --> F[Celery Workers]
F --> G[Modal.com GPU]
```
## Next Steps
1. **Review Requirements**: [System Requirements](./requirements)
2. **Docker Setup**: [Docker Deployment Guide](./docker-setup)
3. **Configure Services**:
- [Modal.com Setup](./modal-setup)
- [Whereby Setup](./whereby-setup)
- [AWS S3 Setup](./aws-setup)
4. **Optional Services**:
- [Authentik Setup](./authentik-setup)
- [Zulip Setup](./zulip-setup)
## Getting Help
- [Troubleshooting Guide](../reference/troubleshooting)
- [GitHub Issues](https://github.com/monadical-sas/reflector/issues)
- [Community Discord](#)
EOF
chmod +x create-docs.sh
echo "Documentation creation script ready. Run ./create-docs.sh to generate all docs."