feat: WIP doc (vibe started and iterated)

This commit is contained in:
2025-11-24 20:39:22 -06:00
parent 37f0110892
commit 0ea7ffac89
61 changed files with 29834 additions and 0 deletions

127
docs/docs/concepts/modes.md Normal file
View File

@@ -0,0 +1,127 @@
---
sidebar_position: 2
title: Operating Modes
---
# Operating Modes
Reflector operates in two distinct modes to accommodate different use cases and security requirements.
## Public Mode
Public mode provides immediate access to core transcription features without requiring authentication.
### Features Available
- **File Upload**: Process audio files up to 2GB
- **Live Transcription**: Stream audio from microphone
- **Basic Processing**: Transcription and diarization
- **Temporary Storage**: Results available for 24 hours
### Limitations
- No persistent storage
- No meeting rooms
- Limited to single-user sessions
- No team collaboration features
### Use Cases
- Quick transcription needs
- Testing and evaluation
- Individual users
- Public demonstrations
## Private Mode
Private mode unlocks the full potential of Reflector with authentication and persistent storage.
### Additional Features
- **Virtual Meeting Rooms**: Whereby integration
- **Team Collaboration**: Share transcripts with team
- **Persistent Storage**: Long-term transcript archive
- **Advanced Analytics**: Meeting insights and trends
- **Custom Integration**: Webhooks and API access
- **User Management**: Role-based access control
### Authentication Options
#### Authentik Integration
Enterprise-grade SSO with support for:
- SAML 2.0
- OAuth 2.0 / OIDC
- LDAP / Active Directory
- Multi-factor authentication
#### JWT Authentication
Stateless token-based auth for:
- API access
- Service-to-service communication
- Mobile applications
### Room Management
Virtual rooms provide dedicated spaces for meetings:
- **Persistent URLs**: Same link for recurring meetings
- **Access Control**: Invite-only or open rooms
- **Recording Consent**: Automatic consent management
- **Custom Settings**: Per-room configuration
## Mode Selection
The mode is determined by your deployment configuration:
```yaml
# Public Mode (no authentication)
REFLECTOR_AUTH_BACKEND=none
# Private Mode (with authentication)
REFLECTOR_AUTH_BACKEND=jwt
# or
REFLECTOR_AUTH_BACKEND=authentik
```
## Feature Comparison
| Feature | Public Mode | Private Mode |
|---------|------------|--------------|
| File Upload | ✅ | ✅ |
| Live Transcription | ✅ | ✅ |
| Speaker Diarization | ✅ | ✅ |
| Translation | ✅ | ✅ |
| Summarization | ✅ | ✅ |
| Meeting Rooms | ❌ | ✅ |
| Persistent Storage | ❌ | ✅ |
| Team Collaboration | ❌ | ✅ |
| API Access | Limited | Full |
| User Management | ❌ | ✅ |
| Custom Branding | ❌ | ✅ |
| Analytics | ❌ | ✅ |
| Webhooks | ❌ | ✅ |
## Security Considerations
### Public Mode Security
- Rate limiting to prevent abuse
- File size restrictions
- Automatic cleanup of old data
- No PII storage
### Private Mode Security
- Encrypted data storage
- Audit logging
- Session management
- Access control lists
- Data retention policies
## Choosing the Right Mode
### Choose Public Mode if:
- You need quick, one-time transcriptions
- You're evaluating Reflector
- You don't need persistent storage
- You're processing non-sensitive content
### Choose Private Mode if:
- You need team collaboration
- You require persistent storage
- You're processing sensitive content
- You need meeting room functionality
- You want advanced analytics

View File

@@ -0,0 +1,194 @@
---
sidebar_position: 1
title: Architecture Overview
---
# Architecture Overview
Reflector is built as a modern, scalable, microservices-based application designed to handle audio processing workloads efficiently while maintaining data privacy and control.
## System Components
### Frontend Application
The user interface is built with **Next.js 14** using the App Router pattern, providing:
- Server-side rendering for optimal performance
- Real-time WebSocket connections for live transcription
- WebRTC support for audio streaming
- Responsive design with Chakra UI components
### Backend API Server
The core API is powered by **FastAPI**, a modern Python framework that provides:
- High-performance async request handling
- Automatic OpenAPI documentation generation
- Type safety with Pydantic models
- WebSocket support for real-time updates
### Processing Pipeline
Audio processing is handled through a modular pipeline architecture:
```
Audio Input → Chunking → Transcription → Diarization → Post-Processing → Storage
```
Each step can run independently and in parallel, allowing for:
- Scalable processing of large files
- Real-time streaming capabilities
- Fault tolerance and retry mechanisms
### Worker Architecture
Background tasks are managed by **Celery** workers with **Redis** as the message broker:
- Distributed task processing
- Priority queues for time-sensitive operations
- Automatic retry on failure
- Progress tracking and notifications
### GPU Acceleration
ML models run on GPU-accelerated infrastructure:
- **Modal.com** for serverless GPU processing
- Support for local GPU deployment (coming soon)
- Automatic scaling based on demand
- Cost-effective pay-per-use model
## Data Flow
### File Processing Flow
1. **Upload**: User uploads audio file through web interface
2. **Storage**: File stored temporarily or in S3
3. **Queue**: Processing job added to Celery queue
4. **Chunking**: Audio split into 30-second segments
5. **Parallel Processing**: Chunks processed simultaneously
6. **Assembly**: Results merged and aligned
7. **Post-Processing**: Summary, topics, translation
8. **Delivery**: Results stored and user notified
### Live Streaming Flow
1. **WebRTC Connection**: Browser establishes peer connection
2. **Audio Capture**: Microphone audio streamed to server
3. **Buffering**: Audio buffered for processing
4. **VAD**: Voice activity detection segments speech
5. **Real-time Processing**: Segments transcribed immediately
6. **WebSocket Updates**: Results streamed back to client
7. **Continuous Assembly**: Full transcript built progressively
## Deployment Architecture
### Container-Based Deployment
All components are containerized for consistent deployment:
```yaml
services:
frontend: # Next.js application
backend: # FastAPI server
worker: # Celery workers
redis: # Message broker
postgres: # Database
caddy: # Reverse proxy
```
### Networking
- **Host Network Mode**: Required for WebRTC/ICE compatibility
- **Caddy Reverse Proxy**: Handles SSL termination and routing
- **WebSocket Upgrade**: Supports real-time connections
## Scalability Considerations
### Horizontal Scaling
- **Stateless Backend**: Multiple API server instances
- **Worker Pools**: Add workers based on queue depth
- **Database Pooling**: Connection management for concurrent access
### Vertical Scaling
- **GPU Workers**: Scale up for faster model inference
- **Memory Optimization**: Efficient audio buffering
- **CPU Optimization**: Multi-threaded processing where applicable
## Security Architecture
### Authentication & Authorization
- **JWT Tokens**: Stateless authentication
- **Authentik Integration**: Enterprise SSO support
- **Role-Based Access**: Granular permissions
### Data Protection
- **Encryption at Rest**: Database and S3 encryption
- **Encryption in Transit**: TLS for all connections
- **Temporary Storage**: Automatic cleanup of processed files
### Privacy by Design
- **Local Processing**: Option to process entirely on-premises
- **No Training on User Data**: Models are pre-trained
- **Data Isolation**: Multi-tenant data separation
## Integration Points
### External Services
- **Modal.com**: GPU processing
- **AWS S3**: Long-term storage
- **Whereby**: Video conferencing rooms
- **Zulip**: Chat integration (optional)
### APIs and Webhooks
- **RESTful API**: Standard CRUD operations
- **WebSocket API**: Real-time updates
- **Webhook Notifications**: Processing completion events
- **OpenAPI Specification**: Machine-readable API definition
## Performance Optimization
### Caching Strategy
- **Redis Cache**: Frequently accessed data
- **CDN**: Static asset delivery
- **Browser Cache**: Client-side optimization
### Database Optimization
- **Indexed Queries**: Fast search and retrieval
- **Connection Pooling**: Efficient resource usage
- **Query Optimization**: N+1 query prevention
### Processing Optimization
- **Batch Processing**: Efficient GPU utilization
- **Parallel Execution**: Multi-core CPU usage
- **Stream Processing**: Reduced memory footprint
## Monitoring and Observability
### Metrics Collection
- **Application Metrics**: Request rates, response times
- **System Metrics**: CPU, memory, disk usage
- **Business Metrics**: Transcription accuracy, processing times
### Logging
- **Structured Logging**: JSON format for analysis
- **Log Aggregation**: Centralized log management
- **Error Tracking**: Sentry integration
### Health Checks
- **Liveness Probes**: Component availability
- **Readiness Probes**: Service readiness
- **Dependency Checks**: External service status

View File

@@ -0,0 +1,274 @@
---
sidebar_position: 4
title: Processing Pipeline
---
# Processing Pipeline
Reflector uses a modular pipeline architecture to process audio efficiently and accurately.
## Pipeline Overview
The processing pipeline consists of modular components that can be combined and configured based on your needs:
```mermaid
graph LR
A[Audio Input] --> B[Pre-processing]
B --> C[Chunking]
C --> D[Transcription]
D --> E[Diarization]
E --> F[Alignment]
F --> G[Post-processing]
G --> H[Output]
```
## Pipeline Components
### Audio Input
Accepts various input sources:
- **File Upload**: MP3, WAV, M4A, WebM, MP4
- **WebRTC Stream**: Live browser audio
- **Recording Integration**: Whereby recordings
- **API Upload**: Direct API submission
### Pre-processing
Prepares audio for optimal processing:
- **Format Conversion**: Convert to 16kHz mono WAV
- **Normalization**: Adjust volume to -23 LUFS
- **Noise Reduction**: Optional background noise removal
- **Validation**: Check duration and quality
### Chunking
Splits audio for parallel processing:
- **Fixed Size**: 30-second chunks by default
- **Overlap**: 1-second overlap for continuity
- **Silence Detection**: Attempt to split at silence
- **Metadata**: Track chunk positions
### Transcription
Converts speech to text:
- **Model Selection**: Whisper or Parakeet
- **Language Detection**: Automatic or specified
- **Timestamp Generation**: Word-level timing
- **Confidence Scores**: Quality indicators
### Diarization
Identifies different speakers:
- **Voice Activity Detection**: Find speech segments
- **Speaker Embedding**: Extract voice characteristics
- **Clustering**: Group similar voices
- **Label Assignment**: Assign speaker IDs
### Alignment
Merges all processing results:
- **Chunk Assembly**: Combine transcription chunks
- **Speaker Mapping**: Align speakers with text
- **Overlap Resolution**: Handle chunk boundaries
- **Timeline Creation**: Build unified timeline
### Post-processing
Enhances the final output:
- **Formatting**: Apply punctuation and capitalization
- **Translation**: Convert to target languages
- **Summarization**: Generate concise summaries
- **Topic Extraction**: Identify key themes
- **Action Items**: Extract tasks and decisions
## Processing Modes
### Batch Processing
For uploaded files:
- Optimized for throughput
- Parallel chunk processing
- Higher accuracy models
- Complete file analysis
### Stream Processing
For live audio:
- Optimized for latency
- Sequential processing
- Real-time feedback
- Progressive results
### Hybrid Processing
For meetings:
- Stream during meeting
- Batch after completion
- Best of both modes
- Maximum accuracy
## Pipeline Configuration
### Model Selection
Choose models based on requirements:
```python
# High accuracy (slower)
config = {
"transcription_model": "whisper-large-v3",
"diarization_model": "pyannote-3.1",
"translation_model": "seamless-m4t-large"
}
# Balanced (default)
config = {
"transcription_model": "whisper-base",
"diarization_model": "pyannote-3.1",
"translation_model": "seamless-m4t-medium"
}
# Fast processing
config = {
"transcription_model": "whisper-tiny",
"diarization_model": "pyannote-3.1-fast",
"translation_model": "seamless-m4t-small"
}
```
### Processing Options
Customize pipeline behavior:
```yaml
# Parallel processing
max_parallel_chunks: 10
chunk_size_seconds: 30
chunk_overlap_seconds: 1
# Quality settings
enable_noise_reduction: true
enable_normalization: true
min_speech_confidence: 0.5
# Post-processing
enable_translation: true
target_languages: ["es", "fr", "de"]
enable_summarization: true
summary_length: "medium"
```
## Performance Characteristics
### Processing Times
For 1 hour of audio:
| Pipeline Config | Processing Time | Accuracy |
|----------------|-----------------|----------|
| Fast | 2-3 minutes | 85-90% |
| Balanced | 5-8 minutes | 92-95% |
| High Accuracy | 15-20 minutes | 95-98% |
### Resource Usage
| Component | CPU Usage | Memory | GPU |
|-----------|-----------|---------|-----|
| Transcription | Medium | 2-4 GB | Required |
| Diarization | High | 4-8 GB | Required |
| Translation | Low | 2-3 GB | Optional |
| Post-processing | Low | 1-2 GB | Not needed |
## Pipeline Orchestration
### Celery Task Chain
The pipeline is orchestrated using Celery:
```python
chain = (
chunk_audio.s(audio_id) |
group(transcribe_chunk.s(chunk) for chunk in chunks) |
merge_transcriptions.s() |
diarize_audio.s() |
align_speakers.s() |
post_process.s()
)
```
### Error Handling
Error recovery:
- **Automatic Retry**: Failed tasks retry up to 3 times
- **Partial Recovery**: Continue with successful chunks
- **Fallback Models**: Use alternative models on failure
- **Error Reporting**: Detailed error messages
### Progress Tracking
Real-time progress updates:
- **Chunk Progress**: Track individual chunk processing
- **Overall Progress**: Percentage completion
- **ETA Calculation**: Estimated completion time
- **WebSocket Updates**: Live progress to clients
## Optimization Strategies
### GPU Utilization
Maximize GPU efficiency:
- **Batch Processing**: Process multiple chunks together
- **Model Caching**: Keep models loaded in memory
- **Dynamic Batching**: Adjust batch size based on GPU memory
- **Multi-GPU Support**: Distribute across available GPUs
### Memory Management
Efficient memory usage:
- **Streaming Processing**: Process large files in chunks
- **Garbage Collection**: Clean up after each chunk
- **Memory Limits**: Prevent out-of-memory errors
- **Disk Caching**: Use disk for large intermediate results
### Network Optimization
Minimize network overhead:
- **Compression**: Compress audio before transfer
- **CDN Integration**: Use CDN for static assets
- **Connection Pooling**: Reuse network connections
- **Parallel Uploads**: Multiple concurrent uploads
## Quality Assurance
### Accuracy Metrics
Monitor processing quality:
- **Word Error Rate (WER)**: Transcription accuracy
- **Diarization Error Rate (DER)**: Speaker identification accuracy
- **Translation BLEU Score**: Translation quality
- **Summary Coherence**: Summary quality metrics
### Validation Steps
Ensure output quality:
- **Confidence Thresholds**: Filter low-confidence segments
- **Consistency Checks**: Verify timeline consistency
- **Language Validation**: Ensure correct language detection
- **Format Validation**: Check output format compliance
## Advanced Features
### Custom Models
Use your own models:
- **Fine-tuned Whisper**: Domain-specific models
- **Custom Diarization**: Trained on your speakers
- **Specialized Post-processing**: Industry-specific formatting
### Pipeline Extensions
Add custom processing steps:
- **Sentiment Analysis**: Analyze emotional tone
- **Entity Extraction**: Identify people, places, organizations
- **Custom Metrics**: Calculate domain-specific metrics
- **Integration Hooks**: Call external services

View File

@@ -0,0 +1,7 @@
---
title: authentik setup
---
# authentik setup
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: aws setup
---
# aws setup
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,23 @@
---
sidebar_position: 3
title: Docker Deployment
---
# Docker Deployment
See the [Docker directory](https://github.com/monadical-sas/reflector/tree/main/docker) in the repository for the complete Docker deployment configuration.
## Quick Start
1. Clone the repository
2. Navigate to `/docker` directory
3. Copy `.env.example` to `.env`
4. Configure environment variables
5. Run `docker compose up -d`
## Configuration
Check the repository for:
- `docker-compose.yml` - Service definitions
- `.env.example` - Environment variables
- `Caddyfile` - Reverse proxy configuration

View File

@@ -0,0 +1,7 @@
---
title: modal setup
---
# modal setup
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,162 @@
---
sidebar_position: 1
title: Installation Overview
---
# Installation Overview
Reflector is designed for self-hosted deployment, giving you complete control over your infrastructure and data.
## Deployment Options
### Docker Deployment (Recommended)
The easiest way to deploy Reflector:
- Pre-configured containers
- Automated dependency management
- Consistent environment
- Easy updates
### Manual Installation
For custom deployments:
- Greater control over configuration
- Integration with existing infrastructure
- Custom optimization options
- Development environments
## Requirements
### System Requirements
**Minimum Requirements:**
- CPU: 4 cores
- RAM: 8 GB
- Storage: 50 GB
- OS: Ubuntu 20.04+ or similar Linux
**Recommended Requirements:**
- CPU: 8+ cores
- RAM: 16 GB
- Storage: 100 GB SSD
- GPU: NVIDIA GPU with 8GB+ VRAM (for local processing)
### Network Requirements
- Public IP address (for WebRTC)
- Ports: 80, 443, 8000, 3000
- Domain name (for SSL)
- SSL certificate (Let's Encrypt supported)
## Required Services
### Core Services
These services are required for basic operation:
1. **PostgreSQL** - Primary database
2. **Redis** - Message broker and cache
3. **Docker** - Container runtime
### GPU Processing
Choose one:
- **Modal.com** - Serverless GPU (recommended)
- **Local GPU** - Self-hosted GPU processing
### Optional Services
Enhance functionality with:
- **AWS S3** - Long-term storage
- **Whereby** - Video conferencing rooms
- **Authentik** - Enterprise authentication
- **Zulip** - Chat integration
## Quick Start
### Using Docker Compose
1. Clone the repository:
```bash
git clone https://github.com/monadical-sas/reflector.git
cd reflector
```
2. Navigate to docker directory:
```bash
cd docker
```
3. Copy and configure environment:
```bash
cp .env.example .env
# Edit .env with your settings
```
4. Start services:
```bash
docker compose up -d
```
5. Access Reflector:
- Frontend: https://your-domain.com
- API: https://your-domain.com/api
## Configuration Overview
### Essential Configuration
```env
# Database
DATABASE_URL=postgresql://user:pass@localhost/reflector
# Redis
REDIS_URL=redis://localhost:6379
# Modal.com (for GPU processing)
TRANSCRIPT_MODAL_API_KEY=your-key
DIARIZATION_MODAL_API_KEY=your-key
# Domain
DOMAIN=your-domain.com
```
### Security Configuration
```env
# Authentication
REFLECTOR_AUTH_BACKEND=jwt
NEXTAUTH_SECRET=generate-strong-secret
# SSL (handled by Caddy)
# Automatic with Let's Encrypt
```
## Service Architecture
```mermaid
graph TD
A[Caddy Reverse Proxy] --> B[Frontend - Next.js]
A --> C[Backend - FastAPI]
C --> D[PostgreSQL]
C --> E[Redis]
C --> F[Celery Workers]
F --> G[Modal.com GPU]
```
## Next Steps
1. **Review Requirements**: [System Requirements](./requirements)
2. **Docker Setup**: [Docker Deployment Guide](./docker-setup)
3. **Configure Services**:
- [Modal.com Setup](./modal-setup)
- [Whereby Setup](./whereby-setup)
- [AWS S3 Setup](./aws-setup)
4. **Optional Services**:
- [Authentik Setup](./authentik-setup)
- [Zulip Setup](./zulip-setup)
## Getting Help
- [GitHub Issues](https://github.com/monadical-sas/reflector/issues)
- [Community Discord](#)

View File

@@ -0,0 +1,29 @@
---
sidebar_position: 2
title: System Requirements
---
# System Requirements
## Minimum Requirements
- **CPU**: 4 cores
- **RAM**: 8 GB
- **Storage**: 50 GB SSD
- **OS**: Ubuntu 20.04+ or compatible Linux
- **Network**: Public IP address
## Recommended Requirements
- **CPU**: 8+ cores
- **RAM**: 16 GB
- **Storage**: 100 GB SSD
- **GPU**: NVIDIA GPU with 8GB+ VRAM (for local processing)
- **Network**: 1 Gbps connection
## Software Requirements
- Docker Engine 20.10+
- Docker Compose 2.0+
- Node.js 18+ (for frontend development)
- Python 3.11+ (for backend development)

View File

@@ -0,0 +1,7 @@
---
title: whereby setup
---
# whereby setup
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: zulip setup
---
# zulip setup
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

61
docs/docs/intro.md Normal file
View File

@@ -0,0 +1,61 @@
---
sidebar_position: 1
title: Introduction
---
# Welcome to Reflector
Reflector is a privacy-focused, self-hosted AI-powered audio transcription and meeting analysis platform that provides real-time transcription, speaker diarization, translation, and summarization for audio content and live meetings. With complete control over your data and infrastructure, you can run models on your own hardware (roadmap - currently supports Modal.com for GPU processing).
## What is Reflector?
Reflector is a web application that utilizes AI to process audio content, providing:
- **Real-time Transcription**: Convert speech to text using [Whisper](https://github.com/openai/whisper) (multi-language) or [Parakeet](https://github.com/NVIDIA/NeMo) (English) models
- **Speaker Diarization**: Identify and label different speakers using [Pyannote](https://github.com/pyannote/pyannote-audio) 3.1
- **Live Translation**: Translate audio content in real-time to 100+ languages with [Facebook Seamless-M4T](https://github.com/facebookresearch/seamless_communication)
- **Topic Detection & Summarization**: Extract key topics and generate concise summaries using LLMs
- **Meeting Recording**: Create permanent records of meetings with searchable transcripts
## Features
| Feature | Public Mode | Private Mode |
|---------|------------|--------------|
| **Authentication** | None required | Required |
| **Audio Upload** | ✅ | ✅ |
| **Live Microphone Streaming** | ✅ | ✅ |
| **Transcription** | ✅ | ✅ |
| **Speaker Diarization** | ✅ | ✅ |
| **Translation** | ✅ | ✅ |
| **Topic Detection** | ✅ | ✅ |
| **Summarization** | ✅ | ✅ |
| **Virtual Meeting Rooms (Whereby)** | ❌ | ✅ |
| **Browse Transcripts Page** | ❌ | ✅ |
| **Search Functionality** | ❌ | ✅ |
| **Persistent Storage** | ❌ | ✅ |
## Architecture Overview
Reflector consists of three main components:
- **Frontend**: React application built with Next.js 14
- **Backend**: Python server using FastAPI
- **Processing**: Scalable GPU workers for ML inference (Modal.com or local)
## Getting Started
Ready to deploy Reflector? Head over to our [Installation Guide](./installation/overview) to set up your own instance.
For a quick overview of how Reflector processes audio, check out our [Pipeline Documentation](./pipelines/overview).
## Open Source
Reflector is open source software developed by [Monadical](https://monadical.com) and licensed under the **MIT License**. We welcome contributions from the community!
- [GitHub Repository](https://github.com/monadical-sas/reflector)
- [Issue Tracker](https://github.com/monadical-sas/reflector/issues)
- [Pull Requests](https://github.com/monadical-sas/reflector/pulls)
## Support
Need help? Reach out to the community through GitHub Discussions.

View File

@@ -0,0 +1,348 @@
---
sidebar_position: 2
title: File Processing Pipeline
---
# File Processing Pipeline
The file processing pipeline handles uploaded audio files, optimizing for accuracy and throughput.
## Pipeline Stages
### 1. Input Stage
**Accepted Formats:**
- MP3 (most common)
- WAV (uncompressed)
- M4A (Apple format)
- WebM (browser recordings)
- MP4 (video with audio track)
**File Validation:**
- Maximum size: 2GB (configurable)
- Minimum duration: 5 seconds
- Maximum duration: 6 hours
- Sample rate: Any (will be resampled)
### 2. Pre-processing
**Audio Normalization:**
```python
# Convert to standard format
- Sample rate: 16kHz (Whisper requirement)
- Channels: Mono
- Bit depth: 16-bit
- Format: WAV
```
**Volume Normalization:**
- Target: -23 LUFS (broadcast standard)
- Prevents clipping
- Improves transcription accuracy
**Noise Reduction (Optional):**
- Background noise removal
- Echo cancellation
- High-pass filter for rumble
### 3. Chunking Strategy
**Default Configuration:**
```yaml
chunk_size: 30 # seconds
overlap: 1 # seconds
max_parallel: 10
silence_detection: true
```
**Chunking with Silence Detection:**
- Detects silence periods
- Attempts to break at natural pauses
- Maintains context with overlap
- Preserves sentence boundaries
**Chunk Metadata:**
```json
{
"chunk_id": "chunk_001",
"start_time": 0.0,
"end_time": 30.0,
"duration": 30.0,
"has_speech": true,
"audio_hash": "sha256:..."
}
```
### 4. Transcription Processing
**Whisper Models:**
| Model | Size | Speed | Accuracy | Use Case |
|-------|------|-------|----------|----------|
| tiny | 39M | Very Fast | 85% | Quick drafts |
| base | 74M | Fast | 89% | Good balance |
| small | 244M | Medium | 91% | Better accuracy |
| medium | 769M | Slow | 93% | High quality |
| large-v3 | 1550M | Very Slow | 96% | Best quality |
**Processing Configuration:**
```python
transcription_config = {
"model": "whisper-base",
"language": "auto", # or specify: "en", "es", etc.
"task": "transcribe", # or "translate"
"temperature": 0, # deterministic
"compression_ratio_threshold": 2.4,
"no_speech_threshold": 0.6,
"condition_on_previous_text": True,
"initial_prompt": None, # optional context
}
```
**Parallel Processing:**
- Each chunk processed independently
- GPU batching for efficiency
- Automatic load balancing
- Failure isolation
### 5. Diarization (Speaker Identification)
**Pyannote 3.1 Pipeline:**
1. **Voice Activity Detection (VAD)**
- Identifies speech segments
- Filters out silence and noise
- Precision: 95%+
2. **Speaker Embedding**
- Extracts voice characteristics
- 256-dimensional vectors
- Speaker-invariant features
3. **Clustering**
- Groups similar voice embeddings
- Agglomerative clustering
- Automatic speaker count detection
4. **Segmentation**
- Assigns speaker labels to time segments
- Handles overlapping speech
- Minimum segment duration: 0.5s
**Configuration:**
```python
diarization_config = {
"min_speakers": 1,
"max_speakers": 10,
"min_duration": 0.5,
"clustering": "AgglomerativeClustering",
"embedding_model": "speechbrain/spkrec-ecapa-voxceleb",
}
```
### 6. Alignment & Merging
**Chunk Assembly:**
```python
# Merge overlapping segments
for chunk in chunks:
# Remove overlap duplicates
if chunk.start < previous.end:
chunk.text = resolve_overlap(previous, chunk)
# Maintain timeline
merged_transcript.append(chunk)
```
**Speaker Alignment:**
- Map diarization timeline to transcript
- Resolve speaker changes mid-sentence
- Handle multiple speakers per segment
**Quality Checks:**
- Timeline consistency
- No gaps in transcript
- Speaker label continuity
- Confidence score validation
### 7. Post-processing Chain
**Text Formatting:**
- Sentence capitalization
- Punctuation restoration
- Number formatting
- Acronym detection
**Translation (Optional):**
```python
translation_config = {
"model": "facebook/seamless-m4t-medium",
"source_lang": "auto",
"target_langs": ["es", "fr", "de"],
"preserve_formatting": True
}
```
**Topic Detection:**
- LLM-based analysis
- Extract 3-5 key topics
- Keyword extraction
- Entity recognition
**Summarization:**
```python
summary_config = {
"model": "openai-compatible",
"max_length": 500,
"style": "bullets", # or "paragraph"
"include_action_items": True,
"include_decisions": True
}
```
### 8. Storage & Delivery
**Database Storage:**
```sql
-- Main transcript record
INSERT INTO transcripts (
id, title, duration, language,
transcript_text, transcript_json,
speakers, topics, summary,
created_at, processing_time
) VALUES (...);
-- Processing metadata
INSERT INTO processing_metadata (
transcript_id, model_versions,
chunk_count, total_chunks,
error_count, warnings
) VALUES (...);
```
**File Storage:**
- Original audio: S3 (optional)
- Processed chunks: Temporary (24h)
- Transcript exports: JSON, SRT, VTT, TXT
**Notification:**
```json
{
"type": "webhook",
"url": "https://your-app.com/webhook",
"payload": {
"transcript_id": "...",
"status": "completed",
"duration": 3600,
"processing_time": 180
}
}
```
## Processing Times
**Estimated times for 1 hour of audio:**
| Component | Fast Mode | Balanced | High Quality |
|-----------|-----------|----------|--------------|
| Pre-processing | 10s | 10s | 10s |
| Transcription | 60s | 180s | 600s |
| Diarization | 30s | 60s | 120s |
| Post-processing | 20s | 30s | 60s |
| **Total** | **2 min** | **5 min** | **13 min** |
## Error Handling
### Retry Strategy
```python
@celery.task(
bind=True,
max_retries=3,
default_retry_delay=60,
retry_backoff=True
)
def process_chunk(self, chunk_id):
try:
# Process chunk
result = transcribe(chunk_id)
except Exception as exc:
# Exponential backoff
raise self.retry(exc=exc)
```
### Partial Recovery
- Continue with successful chunks
- Mark failed chunks in output
- Provide partial transcript
- Report processing issues
### Fallback Options
1. **Model Fallback:**
- If large model fails, try medium
- If GPU fails, try CPU
- If Modal fails, try local
2. **Quality Degradation:**
- Reduce chunk size
- Disable post-processing
- Skip diarization if needed
## Optimization Tips
### For Speed
1. Use smaller models (tiny/base)
2. Increase parallel chunks
3. Disable diarization
4. Skip post-processing
5. Use GPU acceleration
### For Accuracy
1. Use larger models (medium/large)
2. Enable all pre-processing
3. Reduce chunk size
4. Enable silence detection
5. Multiple pass processing
### For Cost
1. Use Modal spot instances
2. Batch multiple files
3. Cache common phrases
4. Optimize chunk size
5. Selective post-processing
## Monitoring
### Metrics to Track
```python
metrics = {
"processing_time": histogram,
"chunk_success_rate": gauge,
"model_accuracy": histogram,
"queue_depth": gauge,
"gpu_utilization": gauge,
"cost_per_hour": counter
}
```
### Quality Metrics
- Word Error Rate (WER)
- Diarization Error Rate (DER)
- Confidence scores
- Processing speed
- User feedback
### Alerts
- Processing time > 30 minutes
- Error rate > 5%
- Queue depth > 100
- GPU memory > 90%
- Cost spike detected

View File

@@ -0,0 +1,7 @@
---
title: live pipeline
---
# live pipeline
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: overview
---
# overview
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

448
docs/docs/reference/api.md Normal file
View File

@@ -0,0 +1,448 @@
---
title: API Reference
---
# API Reference
The Reflector API provides a comprehensive RESTful interface for audio transcription, meeting management, and real-time streaming capabilities.
## Base URL
```
http://localhost:8000/v1
```
All API endpoints are prefixed with `/v1/` for versioning.
## Authentication
Reflector supports multiple authentication modes:
- **No Authentication** (Public Mode): Basic transcription and upload functionality
- **JWT Authentication** (Private Mode): Full feature access including meeting rooms and persistent storage
- **OAuth/OIDC via Authentik**: Enterprise single sign-on integration
## Core Endpoints
### Transcripts
Manage audio transcriptions and their associated metadata.
#### List Transcripts
```http
GET /v1/transcripts/
```
Returns a paginated list of transcripts with filtering options.
#### Create Transcript
```http
POST /v1/transcripts/
```
Create a new transcript from uploaded audio or initialize for streaming.
#### Get Transcript
```http
GET /v1/transcripts/{transcript_id}
```
Retrieve detailed information about a specific transcript.
#### Update Transcript
```http
PATCH /v1/transcripts/{transcript_id}
```
Update transcript metadata, summary, or processing status.
#### Delete Transcript
```http
DELETE /v1/transcripts/{transcript_id}
```
Remove a transcript and its associated data.
### Audio Processing
#### Upload Audio
```http
POST /v1/transcripts_audio/{transcript_id}/upload
```
Upload an audio file for transcription processing.
**Supported formats:**
- WAV, MP3, M4A, FLAC, OGG
- Maximum file size: 500MB
- Sample rates: 8kHz - 48kHz
#### Download Audio
```http
GET /v1/transcripts_audio/{transcript_id}/download
```
Download the original or processed audio file.
#### Stream Audio
```http
GET /v1/transcripts_audio/{transcript_id}/stream
```
Stream audio content with range support for progressive playback.
### WebRTC Streaming
Real-time audio streaming via WebRTC for live transcription.
#### Initialize WebRTC Session
```http
POST /v1/transcripts_webrtc/{transcript_id}/offer
```
Create a WebRTC offer for establishing a peer connection.
#### Complete WebRTC Handshake
```http
POST /v1/transcripts_webrtc/{transcript_id}/answer
```
Submit the WebRTC answer to complete connection setup.
### WebSocket Streaming
Real-time updates and live transcription via WebSocket.
#### WebSocket Endpoint
```ws
ws://localhost:8000/v1/transcripts_websocket/{transcript_id}
```
Receive real-time transcription updates, speaker changes, and processing status.
**Message Types:**
- `transcription`: New transcribed text segments
- `diarization`: Speaker identification updates
- `status`: Processing status changes
- `error`: Error notifications
### Meetings
Manage virtual meeting rooms and recordings.
#### List Meetings
```http
GET /v1/meetings/
```
Get all meetings for the authenticated user.
#### Create Meeting
```http
POST /v1/meetings/
```
Initialize a new meeting room with Whereby integration.
#### Join Meeting
```http
POST /v1/meetings/{meeting_id}/join
```
Join an existing meeting and start recording.
#### End Meeting
```http
POST /v1/meetings/{meeting_id}/end
```
End the meeting and finalize the recording.
### Rooms
Virtual meeting room configuration and management.
#### List Rooms
```http
GET /v1/rooms/
```
Get available meeting rooms.
#### Create Room
```http
POST /v1/rooms/
```
Create a new persistent meeting room.
#### Update Room Settings
```http
PATCH /v1/rooms/{room_id}
```
Modify room configuration and permissions.
## Response Formats
### Success Response
```json
{
"id": "uuid",
"created_at": "2025-01-20T10:00:00Z",
"updated_at": "2025-01-20T10:30:00Z",
"data": {...}
}
```
### Error Response
```json
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message",
"details": {...}
}
}
```
### Status Codes
- `200 OK`: Successful request
- `201 Created`: Resource created successfully
- `204 No Content`: Successful deletion
- `400 Bad Request`: Invalid request parameters
- `401 Unauthorized`: Authentication required
- `403 Forbidden`: Insufficient permissions
- `404 Not Found`: Resource not found
- `409 Conflict`: Resource conflict
- `422 Unprocessable Entity`: Validation error
- `429 Too Many Requests`: Rate limit exceeded
- `500 Internal Server Error`: Server error
## Rate Limiting
- **Anonymous users**: 100 requests per minute
- **Authenticated users**: 1000 requests per minute
- **WebSocket connections**: 10 concurrent per user
- **File uploads**: 10 per hour for anonymous, 100 per hour for authenticated
## WebSocket Protocol
The WebSocket connection provides real-time updates during transcription processing. The server sends structured messages to communicate different events and data updates.
### Connection
```javascript
const ws = new WebSocket('ws://localhost:8000/v1/transcripts_websocket/{transcript_id}');
```
### Message Types and Formats
#### Transcription Update
Sent when new text is transcribed from the audio stream.
```json
{
"type": "transcription",
"data": {
"text": "The transcribed text segment",
"speaker": "Speaker 1",
"timestamp": 1705745623.456,
"confidence": 0.95,
"segment_id": "seg_001",
"is_final": true
}
}
```
#### Diarization Update
Sent when speaker changes are detected or speaker labels are updated.
```json
{
"type": "diarization",
"data": {
"speaker": "Speaker 2",
"speaker_id": "spk_002",
"start_time": 1705745620.123,
"end_time": 1705745625.456,
"confidence": 0.87
}
}
```
#### Processing Status
Sent to indicate changes in the processing pipeline status.
```json
{
"type": "status",
"data": {
"status": "processing",
"stage": "transcription",
"progress": 45.5,
"message": "Processing audio chunk 12 of 26"
}
}
```
Status values:
- `initializing`: Setting up processing pipeline
- `processing`: Active transcription/diarization
- `completed`: Processing finished successfully
- `failed`: Processing encountered an error
- `paused`: Processing temporarily suspended
#### Summary Update
Sent when AI-generated summaries or topics are available.
```json
{
"type": "summary",
"data": {
"summary": "Brief summary of the conversation",
"topics": ["topic1", "topic2", "topic3"],
"action_items": ["action 1", "action 2"],
"key_points": ["point 1", "point 2"]
}
}
```
#### Error Messages
Sent when errors occur during processing.
```json
{
"type": "error",
"data": {
"code": "AUDIO_FORMAT_ERROR",
"message": "Unsupported audio format",
"details": {
"format": "unknown",
"sample_rate": 0
},
"recoverable": false
}
}
```
#### Heartbeat/Keepalive
Sent periodically to maintain the connection.
```json
{
"type": "ping",
"data": {
"timestamp": 1705745630.000
}
}
```
### Client-to-Server Messages
Clients can send control messages to the server:
#### Start/Resume Processing
```json
{
"action": "start",
"params": {}
}
```
#### Pause Processing
```json
{
"action": "pause",
"params": {}
}
```
#### Request Status
```json
{
"action": "get_status",
"params": {}
}
```
## OpenAPI Specification
The complete OpenAPI 3.0 specification is available at:
```
http://localhost:8000/v1/openapi.json
```
You can import this specification into tools like:
- Postman
- Insomnia
- Swagger UI
- OpenAPI Generator (for client SDK generation)
## SDK Support
While Reflector doesn't provide official SDKs, you can generate client libraries using the OpenAPI specification with tools like:
- **Python**: `openapi-python-client`
- **TypeScript**: `openapi-typescript-codegen`
- **Go**: `oapi-codegen`
- **Java**: `openapi-generator`
## Example Usage
### Python Example
```python
import requests
# Upload and transcribe audio
with open('meeting.mp3', 'rb') as f:
response = requests.post(
'http://localhost:8000/v1/transcripts/',
files={'file': f}
)
transcript_id = response.json()['id']
# Check transcription status
status = requests.get(
f'http://localhost:8000/v1/transcripts/{transcript_id}'
).json()
print(f"Transcription status: {status['status']}")
```
### JavaScript WebSocket Example
```javascript
// Connect to WebSocket for real-time transcription updates
const ws = new WebSocket(`ws://localhost:8000/v1/transcripts_websocket/${transcriptId}`);
ws.onopen = () => {
console.log('Connected to transcription WebSocket');
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch(message.type) {
case 'transcription':
console.log(`[${message.data.speaker}]: ${message.data.text}`);
break;
case 'diarization':
console.log(`Speaker change: ${message.data.speaker}`);
break;
case 'status':
console.log(`Status: ${message.data.status}`);
break;
case 'error':
console.error(`Error: ${message.data.message}`);
break;
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('WebSocket connection closed');
};
```
## Need Help?
- Review [example implementations](https://github.com/monadical-sas/reflector/tree/main/examples)
- Open an issue on [GitHub](https://github.com/monadical-sas/reflector/issues)

View File

@@ -0,0 +1,7 @@
---
title: overview
---
# overview
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: backend
---
# backend
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: database
---
# database
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: frontend
---
# frontend
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: overview
---
# overview
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: workers
---
# workers
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: configuration
---
# configuration
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: analysis
---
# analysis
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: diarization
---
# diarization
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: transcription
---
# transcription
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

View File

@@ -0,0 +1,7 @@
---
title: translation
---
# translation
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.

139
docs/docs/roadmap.md Normal file
View File

@@ -0,0 +1,139 @@
---
sidebar_position: 100
title: Roadmap
---
# Product Roadmap
Our development roadmap for Reflector, focusing on expanding capabilities while maintaining privacy and performance.
## Planned Features
### 🌍 Multi-Language Support Enhancement
**Current State:**
- Whisper supports 99+ languages for transcription
- Parakeet supports English only with high accuracy
- Translation available to 100+ languages
**Planned Improvements:**
- Default language selection per room/user
- Automatic language detection improvements
- Multi-language diarization support
- RTL (Right-to-Left) language UI support
- Language-specific post-processing rules
### 🏠 Self-Hosted Room Providers
**Jitsi Integration**
Moving beyond Whereby to support self-hosted video conferencing:
- No API keys required
- Complete control over video infrastructure
- Custom branding and configuration
- Lower operational costs
- Enhanced privacy with self-hosted video
**Implementation Plan:**
- WebRTC bridge for Jitsi Meet
- Room management API integration
- Recording synchronization
- Participant tracking
### 📅 Calendar Integration
**Planned Capabilities:**
- Google Calendar synchronization
- Microsoft Outlook integration
- Automatic meeting room creation
- Pre-meeting document preparation
- Post-meeting transcript delivery
- Recurring meeting support
**Features:**
- Auto-join scheduled meetings
- Calendar-based access control
- Meeting agenda import
- Action item export to calendar
### 🖥️ Self-Hosted GPU Service
**For organizations with dedicated GPU hardware (H100, A100, RTX 4090):**
**Docker GPU Worker Image:**
- Self-contained processing service
- CUDA 11/12 support
- Pre-loaded models:
- Whisper (all sizes)
- Pyannote diarization
- Seamless-M4T translation
- Automatic model management
**Deployment Options:**
- Kubernetes GPU operators
- Docker Compose with nvidia-docker
- Bare metal installation
- Hybrid cloud/on-premise
**Benefits:**
- No Modal.com dependency
- Complete data isolation
- Predictable costs
- Maximum performance
- Custom model support
## Future Considerations
### Enhanced Analytics
- Meeting insights dashboard
- Speaker participation metrics
- Topic trends over time
- Team collaboration patterns
### Advanced AI Features
- Real-time sentiment analysis
- Emotion detection
- Meeting quality scores
- Automated coaching suggestions
### Integration Ecosystem
- Slack/Teams notifications
- CRM integration (Salesforce, HubSpot)
- Project management tools (Jira, Asana)
- Knowledge bases (Notion, Confluence)
### Performance Improvements
- WebAssembly for client-side processing
- Edge computing support
- 5G network optimization
- Blockchain for transcript verification
## Contributing
We welcome community contributions! Areas where you can help:
1. **Language Support**: Add support for your language
2. **Integrations**: Connect with your favorite tools
3. **Models**: Fine-tune models for specific domains
4. **Documentation**: Improve guides and examples
See our [Contributing Guide](https://github.com/monadical-sas/reflector/blob/main/CONTRIBUTING.md) for details.
## Timeline
We don't provide specific dates as development depends on community contributions and priorities. Features are generally released when they're ready and properly tested.
## Feature Requests
Have an idea for Reflector? We'd love to hear it!
- [Open a GitHub Issue](https://github.com/monadical-sas/reflector/issues/new)
- [Join our Discord](#)
- [Email us](mailto:reflector@monadical.com)
## Stay Updated
- Watch our [GitHub repository](https://github.com/monadical-sas/reflector)
- Follow our [blog](#)
- Subscribe to our [newsletter](#)