#!/bin/bash # Create directory structure mkdir -p docs/concepts mkdir -p docs/installation mkdir -p docs/pipelines mkdir -p docs/reference/architecture mkdir -p docs/reference/processors mkdir -p docs/reference/api # Create all documentation files with content echo "Creating documentation files..." # Concepts - Modes cat > docs/concepts/modes.md << 'EOF' --- sidebar_position: 2 title: Operating Modes --- # Operating Modes Reflector operates in two distinct modes to accommodate different use cases and security requirements. ## Public Mode Public mode provides immediate access to core transcription features without requiring authentication. ### Features Available - **File Upload**: Process audio files up to 2GB - **Live Transcription**: Stream audio from microphone - **Basic Processing**: Transcription and diarization - **Temporary Storage**: Results available for 24 hours ### Limitations - No persistent storage - No meeting rooms - Limited to single-user sessions - No team collaboration features ### Use Cases - Quick transcription needs - Testing and evaluation - Individual users - Public demonstrations ## Private Mode Private mode unlocks the full potential of Reflector with authentication and persistent storage. ### Additional Features - **Virtual Meeting Rooms**: Whereby integration - **Team Collaboration**: Share transcripts with team - **Persistent Storage**: Long-term transcript archive - **Advanced Analytics**: Meeting insights and trends - **Custom Integration**: Webhooks and API access - **User Management**: Role-based access control ### Authentication Options #### Authentik Integration Enterprise-grade SSO with support for: - SAML 2.0 - OAuth 2.0 / OIDC - LDAP / Active Directory - Multi-factor authentication #### JWT Authentication Stateless token-based auth for: - API access - Service-to-service communication - Mobile applications ### Room Management Virtual rooms provide dedicated spaces for meetings: - **Persistent URLs**: Same link for recurring meetings - **Access Control**: Invite-only or open rooms - **Recording Consent**: Automatic consent management - **Custom Settings**: Per-room configuration ## Mode Selection The mode is determined by your deployment configuration: ```yaml # Public Mode (no authentication) REFLECTOR_AUTH_BACKEND=none # Private Mode (with authentication) REFLECTOR_AUTH_BACKEND=jwt # or REFLECTOR_AUTH_BACKEND=authentik ``` ## Feature Comparison | Feature | Public Mode | Private Mode | |---------|------------|--------------| | File Upload | ✅ | ✅ | | Live Transcription | ✅ | ✅ | | Speaker Diarization | ✅ | ✅ | | Translation | ✅ | ✅ | | Summarization | ✅ | ✅ | | Meeting Rooms | ❌ | ✅ | | Persistent Storage | ❌ | ✅ | | Team Collaboration | ❌ | ✅ | | API Access | Limited | Full | | User Management | ❌ | ✅ | | Custom Branding | ❌ | ✅ | | Analytics | ❌ | ✅ | | Webhooks | ❌ | ✅ | ## Security Considerations ### Public Mode Security - Rate limiting to prevent abuse - File size restrictions - Automatic cleanup of old data - No PII storage ### Private Mode Security - Encrypted data storage - Audit logging - Session management - Access control lists - Data retention policies ## Choosing the Right Mode ### Choose Public Mode if: - You need quick, one-time transcriptions - You're evaluating Reflector - You don't need persistent storage - You're processing non-sensitive content ### Choose Private Mode if: - You need team collaboration - You require persistent storage - You're processing sensitive content - You need meeting room functionality - You want advanced analytics EOF # Concepts - Independence cat > docs/concepts/independence.md << 'EOF' --- sidebar_position: 3 title: Data Independence --- # Data Independence & Privacy Reflector is designed with privacy and data independence as core principles, giving you complete control over your data and processing. ## Privacy by Design ### No Third-Party Data Sharing Your audio and transcripts are never shared with third parties: - **Local Processing**: All ML models can run on your infrastructure - **No Training on User Data**: Your content is never used to improve models - **Isolated Processing**: Each transcript is processed in isolation - **No Analytics Tracking**: No usage analytics sent to external services ### Data Ownership You maintain complete ownership of all data: - **Export Anytime**: Download all your transcripts and audio - **Delete on Demand**: Permanent deletion with no recovery - **API Access**: Full programmatic access to your data - **No Vendor Lock-in**: Standard formats for easy migration ## Processing Transparency ### What Happens to Your Audio 1. **Upload/Stream**: Audio received by your server 2. **Temporary Storage**: Stored only for processing duration 3. **Processing**: ML models process audio locally or on Modal 4. **Results Storage**: Transcripts stored in your database 5. **Cleanup**: Original audio deleted (unless configured otherwise) ### Local vs Cloud Processing #### Local Processing When configured for local processing: - All models run on your hardware - No data leaves your infrastructure - Complete air-gap capability - Higher hardware requirements #### Modal.com Processing When using Modal for GPU acceleration: - Audio chunks sent to Modal for processing - Processed immediately and deleted - No long-term storage on Modal - Modal's security: SOC 2 Type II compliant ### Data Retention Default retention policies: - **Public Mode**: 24 hours then automatic deletion - **Private Mode**: Configurable (default: indefinite) - **Audio Files**: Deleted after processing (configurable) - **Transcripts**: Retained based on policy ## Compliance Features ### GDPR Compliance - **Right to Access**: Export all user data - **Right to Deletion**: Permanent data removal - **Data Portability**: Standard export formats - **Privacy by Default**: Minimal data collection ### HIPAA Considerations For healthcare deployments: - **Self-hosted Option**: Complete infrastructure control - **Encryption**: At rest and in transit - **Audit Logging**: Complete access trail - **Access Controls**: Role-based permissions ### Industry Standards - **TLS 1.3**: Modern encryption for data in transit - **AES-256**: Encryption for data at rest - **JWT Tokens**: Secure, stateless authentication - **OWASP Guidelines**: Security best practices ## Self-Hosted Deployment ### Complete Independence Self-hosting provides maximum control: - **Your Infrastructure**: Run on your servers - **Your Network**: No external connections required - **Your Policies**: Implement custom retention - **Your Compliance**: Meet specific requirements ### Air-Gap Capability Reflector can run completely offline: 1. Download all models during setup 2. Configure for local processing only 3. Disable all external integrations 4. Run in isolated network environment ## Data Flow Control ### Configurable Processing Control where each step happens: ```yaml # All local processing TRANSCRIPT_BACKEND=local DIARIZATION_BACKEND=local TRANSLATION_BACKEND=local # Hybrid approach TRANSCRIPT_BACKEND=modal # Fast GPU processing DIARIZATION_BACKEND=local # Sensitive speaker data TRANSLATION_BACKEND=modal # Non-sensitive translation ``` ### Storage Options Choose where data is stored: - **Local Filesystem**: Complete control - **PostgreSQL**: Self-hosted database - **S3-Compatible**: MinIO or AWS with encryption - **Hybrid**: Different storage for different data types ## Security Architecture ### Defense in Depth Multiple layers of security: 1. **Network Security**: Firewalls and VPNs 2. **Application Security**: Input validation and sanitization 3. **Data Security**: Encryption and access controls 4. **Operational Security**: Logging and monitoring ### Zero Trust Principles - **Verify Everything**: All requests authenticated - **Least Privilege**: Minimal permissions granted - **Assume Breach**: Design for compromise containment - **Encrypt Everything**: No plaintext transmission ## Audit and Compliance ### Audit Logging Comprehensive logging of: - **Access Events**: Who accessed what and when - **Processing Events**: What was processed and how - **Configuration Changes**: System modifications - **Security Events**: Failed authentication attempts ### Compliance Reporting Generate reports for: - **Data Processing**: What data was processed - **Data Access**: Who accessed the data - **Data Retention**: What was retained or deleted - **Security Events**: Security-related incidents ## Best Practices ### For Maximum Privacy 1. **Self-host** all components 2. **Use local processing** for all models 3. **Implement short retention** periods 4. **Encrypt all storage** at rest 5. **Use VPN** for all connections 6. **Regular audits** of access logs ### For Balanced Approach 1. **Self-host core services** (database, API) 2. **Use Modal for processing** (faster, cost-effective) 3. **Implement encryption** everywhere 4. **Regular backups** with encryption 5. **Monitor access** patterns EOF # Concepts - Pipeline cat > docs/concepts/pipeline.md << 'EOF' --- sidebar_position: 4 title: Processing Pipeline --- # Processing Pipeline Reflector uses a sophisticated pipeline architecture to process audio efficiently and accurately. ## Pipeline Overview The processing pipeline consists of modular components that can be combined and configured based on your needs: ```mermaid graph LR A[Audio Input] --> B[Pre-processing] B --> C[Chunking] C --> D[Transcription] D --> E[Diarization] E --> F[Alignment] F --> G[Post-processing] G --> H[Output] ``` ## Pipeline Components ### Audio Input Accepts various input sources: - **File Upload**: MP3, WAV, M4A, WebM, MP4 - **WebRTC Stream**: Live browser audio - **Recording Integration**: Whereby recordings - **API Upload**: Direct API submission ### Pre-processing Prepares audio for optimal processing: - **Format Conversion**: Convert to 16kHz mono WAV - **Normalization**: Adjust volume to -23 LUFS - **Noise Reduction**: Optional background noise removal - **Validation**: Check duration and quality ### Chunking Splits audio for parallel processing: - **Fixed Size**: 30-second chunks by default - **Overlap**: 1-second overlap for continuity - **Smart Boundaries**: Attempt to split at silence - **Metadata**: Track chunk positions ### Transcription Converts speech to text: - **Model Selection**: Whisper or Parakeet - **Language Detection**: Automatic or specified - **Timestamp Generation**: Word-level timing - **Confidence Scores**: Quality indicators ### Diarization Identifies different speakers: - **Voice Activity Detection**: Find speech segments - **Speaker Embedding**: Extract voice characteristics - **Clustering**: Group similar voices - **Label Assignment**: Assign speaker IDs ### Alignment Merges all processing results: - **Chunk Assembly**: Combine transcription chunks - **Speaker Mapping**: Align speakers with text - **Overlap Resolution**: Handle chunk boundaries - **Timeline Creation**: Build unified timeline ### Post-processing Enhances the final output: - **Formatting**: Apply punctuation and capitalization - **Translation**: Convert to target languages - **Summarization**: Generate concise summaries - **Topic Extraction**: Identify key themes - **Action Items**: Extract tasks and decisions ## Processing Modes ### Batch Processing For uploaded files: - Optimized for throughput - Parallel chunk processing - Higher accuracy models - Complete file analysis ### Stream Processing For live audio: - Optimized for latency - Sequential processing - Real-time feedback - Progressive results ### Hybrid Processing For meetings: - Stream during meeting - Batch after completion - Best of both modes - Maximum accuracy ## Pipeline Configuration ### Model Selection Choose models based on requirements: ```python # High accuracy (slower) config = { "transcription_model": "whisper-large-v3", "diarization_model": "pyannote-3.1", "translation_model": "seamless-m4t-large" } # Balanced (default) config = { "transcription_model": "whisper-base", "diarization_model": "pyannote-3.1", "translation_model": "seamless-m4t-medium" } # Fast processing config = { "transcription_model": "whisper-tiny", "diarization_model": "pyannote-3.1-fast", "translation_model": "seamless-m4t-small" } ``` ### Processing Options Customize pipeline behavior: ```yaml # Parallel processing max_parallel_chunks: 10 chunk_size_seconds: 30 chunk_overlap_seconds: 1 # Quality settings enable_noise_reduction: true enable_normalization: true min_speech_confidence: 0.5 # Post-processing enable_translation: true target_languages: ["es", "fr", "de"] enable_summarization: true summary_length: "medium" ``` ## Performance Characteristics ### Processing Times For 1 hour of audio: | Pipeline Config | Processing Time | Accuracy | |----------------|-----------------|----------| | Fast | 2-3 minutes | 85-90% | | Balanced | 5-8 minutes | 92-95% | | High Accuracy | 15-20 minutes | 95-98% | ### Resource Usage | Component | CPU Usage | Memory | GPU | |-----------|-----------|---------|-----| | Transcription | Medium | 2-4 GB | Required | | Diarization | High | 4-8 GB | Required | | Translation | Low | 2-3 GB | Optional | | Post-processing | Low | 1-2 GB | Not needed | ## Pipeline Orchestration ### Celery Task Chain The pipeline is orchestrated using Celery: ```python chain = ( chunk_audio.s(audio_id) | group(transcribe_chunk.s(chunk) for chunk in chunks) | merge_transcriptions.s() | diarize_audio.s() | align_speakers.s() | post_process.s() ) ``` ### Error Handling Robust error recovery: - **Automatic Retry**: Failed tasks retry up to 3 times - **Partial Recovery**: Continue with successful chunks - **Fallback Models**: Use alternative models on failure - **Error Reporting**: Detailed error messages ### Progress Tracking Real-time progress updates: - **Chunk Progress**: Track individual chunk processing - **Overall Progress**: Percentage completion - **ETA Calculation**: Estimated completion time - **WebSocket Updates**: Live progress to clients ## Optimization Strategies ### GPU Utilization Maximize GPU efficiency: - **Batch Processing**: Process multiple chunks together - **Model Caching**: Keep models loaded in memory - **Dynamic Batching**: Adjust batch size based on GPU memory - **Multi-GPU Support**: Distribute across available GPUs ### Memory Management Efficient memory usage: - **Streaming Processing**: Process large files in chunks - **Garbage Collection**: Clean up after each chunk - **Memory Limits**: Prevent out-of-memory errors - **Disk Caching**: Use disk for large intermediate results ### Network Optimization Minimize network overhead: - **Compression**: Compress audio before transfer - **CDN Integration**: Use CDN for static assets - **Connection Pooling**: Reuse network connections - **Parallel Uploads**: Multiple concurrent uploads ## Quality Assurance ### Accuracy Metrics Monitor processing quality: - **Word Error Rate (WER)**: Transcription accuracy - **Diarization Error Rate (DER)**: Speaker identification accuracy - **Translation BLEU Score**: Translation quality - **Summary Coherence**: Summary quality metrics ### Validation Steps Ensure output quality: - **Confidence Thresholds**: Filter low-confidence segments - **Consistency Checks**: Verify timeline consistency - **Language Validation**: Ensure correct language detection - **Format Validation**: Check output format compliance ## Advanced Features ### Custom Models Use your own models: - **Fine-tuned Whisper**: Domain-specific models - **Custom Diarization**: Trained on your speakers - **Specialized Post-processing**: Industry-specific formatting ### Pipeline Extensions Add custom processing steps: - **Sentiment Analysis**: Analyze emotional tone - **Entity Extraction**: Identify people, places, organizations - **Custom Metrics**: Calculate domain-specific metrics - **Integration Hooks**: Call external services EOF # Create installation documentation cat > docs/installation/overview.md << 'EOF' --- sidebar_position: 1 title: Installation Overview --- # Installation Overview Reflector is designed for self-hosted deployment, giving you complete control over your infrastructure and data. ## Deployment Options ### Docker Deployment (Recommended) The easiest way to deploy Reflector: - Pre-configured containers - Automated dependency management - Consistent environment - Easy updates ### Manual Installation For custom deployments: - Greater control over configuration - Integration with existing infrastructure - Custom optimization options - Development environments ## Requirements ### System Requirements **Minimum Requirements:** - CPU: 4 cores - RAM: 8 GB - Storage: 50 GB - OS: Ubuntu 20.04+ or similar Linux **Recommended Requirements:** - CPU: 8+ cores - RAM: 16 GB - Storage: 100 GB SSD - GPU: NVIDIA GPU with 8GB+ VRAM (for local processing) ### Network Requirements - Public IP address (for WebRTC) - Ports: 80, 443, 8000, 3000 - Domain name (for SSL) - SSL certificate (Let's Encrypt supported) ## Required Services ### Core Services These services are required for basic operation: 1. **PostgreSQL** - Primary database 2. **Redis** - Message broker and cache 3. **Docker** - Container runtime ### GPU Processing Choose one: - **Modal.com** - Serverless GPU (recommended) - **Local GPU** - Self-hosted GPU processing ### Optional Services Enhance functionality with: - **AWS S3** - Long-term storage - **Whereby** - Video conferencing rooms - **Authentik** - Enterprise authentication - **Zulip** - Chat integration ## Quick Start ### Using Docker Compose 1. Clone the repository: ```bash git clone https://github.com/monadical-sas/reflector.git cd reflector ``` 2. Navigate to docker directory: ```bash cd docker ``` 3. Copy and configure environment: ```bash cp .env.example .env # Edit .env with your settings ``` 4. Start services: ```bash docker compose up -d ``` 5. Access Reflector: - Frontend: https://your-domain.com - API: https://your-domain.com/api ## Configuration Overview ### Essential Configuration ```env # Database DATABASE_URL=postgresql://user:pass@localhost/reflector # Redis REDIS_URL=redis://localhost:6379 # Modal.com (for GPU processing) TRANSCRIPT_MODAL_API_KEY=your-key DIARIZATION_MODAL_API_KEY=your-key # Domain DOMAIN=your-domain.com ``` ### Security Configuration ```env # Authentication REFLECTOR_AUTH_BACKEND=jwt NEXTAUTH_SECRET=generate-strong-secret # SSL (handled by Caddy) # Automatic with Let's Encrypt ``` ## Service Architecture ```mermaid graph TD A[Caddy Reverse Proxy] --> B[Frontend - Next.js] A --> C[Backend - FastAPI] C --> D[PostgreSQL] C --> E[Redis] C --> F[Celery Workers] F --> G[Modal.com GPU] ``` ## Next Steps 1. **Review Requirements**: [System Requirements](./requirements) 2. **Docker Setup**: [Docker Deployment Guide](./docker-setup) 3. **Configure Services**: - [Modal.com Setup](./modal-setup) - [Whereby Setup](./whereby-setup) - [AWS S3 Setup](./aws-setup) 4. **Optional Services**: - [Authentik Setup](./authentik-setup) - [Zulip Setup](./zulip-setup) ## Getting Help - [Troubleshooting Guide](../reference/troubleshooting) - [GitHub Issues](https://github.com/monadical-sas/reflector/issues) - [Community Discord](#) EOF chmod +x create-docs.sh echo "Documentation creation script ready. Run ./create-docs.sh to generate all docs."