mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-23 05:39:05 +00:00
feat: WIP doc (vibe started and iterated)
This commit is contained in:
127
docs/docs/concepts/modes.md
Normal file
127
docs/docs/concepts/modes.md
Normal file
@@ -0,0 +1,127 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
title: Operating Modes
|
||||
---
|
||||
|
||||
# Operating Modes
|
||||
|
||||
Reflector operates in two distinct modes to accommodate different use cases and security requirements.
|
||||
|
||||
## Public Mode
|
||||
|
||||
Public mode provides immediate access to core transcription features without requiring authentication.
|
||||
|
||||
### Features Available
|
||||
- **File Upload**: Process audio files up to 2GB
|
||||
- **Live Transcription**: Stream audio from microphone
|
||||
- **Basic Processing**: Transcription and diarization
|
||||
- **Temporary Storage**: Results available for 24 hours
|
||||
|
||||
### Limitations
|
||||
- No persistent storage
|
||||
- No meeting rooms
|
||||
- Limited to single-user sessions
|
||||
- No team collaboration features
|
||||
|
||||
### Use Cases
|
||||
- Quick transcription needs
|
||||
- Testing and evaluation
|
||||
- Individual users
|
||||
- Public demonstrations
|
||||
|
||||
## Private Mode
|
||||
|
||||
Private mode unlocks the full potential of Reflector with authentication and persistent storage.
|
||||
|
||||
### Additional Features
|
||||
- **Virtual Meeting Rooms**: Whereby integration
|
||||
- **Team Collaboration**: Share transcripts with team
|
||||
- **Persistent Storage**: Long-term transcript archive
|
||||
- **Advanced Analytics**: Meeting insights and trends
|
||||
- **Custom Integration**: Webhooks and API access
|
||||
- **User Management**: Role-based access control
|
||||
|
||||
### Authentication Options
|
||||
|
||||
#### Authentik Integration
|
||||
Enterprise-grade SSO with support for:
|
||||
- SAML 2.0
|
||||
- OAuth 2.0 / OIDC
|
||||
- LDAP / Active Directory
|
||||
- Multi-factor authentication
|
||||
|
||||
#### JWT Authentication
|
||||
Stateless token-based auth for:
|
||||
- API access
|
||||
- Service-to-service communication
|
||||
- Mobile applications
|
||||
|
||||
### Room Management
|
||||
|
||||
Virtual rooms provide dedicated spaces for meetings:
|
||||
- **Persistent URLs**: Same link for recurring meetings
|
||||
- **Access Control**: Invite-only or open rooms
|
||||
- **Recording Consent**: Automatic consent management
|
||||
- **Custom Settings**: Per-room configuration
|
||||
|
||||
## Mode Selection
|
||||
|
||||
The mode is determined by your deployment configuration:
|
||||
|
||||
```yaml
|
||||
# Public Mode (no authentication)
|
||||
REFLECTOR_AUTH_BACKEND=none
|
||||
|
||||
# Private Mode (with authentication)
|
||||
REFLECTOR_AUTH_BACKEND=jwt
|
||||
# or
|
||||
REFLECTOR_AUTH_BACKEND=authentik
|
||||
```
|
||||
|
||||
## Feature Comparison
|
||||
|
||||
| Feature | Public Mode | Private Mode |
|
||||
|---------|------------|--------------|
|
||||
| File Upload | ✅ | ✅ |
|
||||
| Live Transcription | ✅ | ✅ |
|
||||
| Speaker Diarization | ✅ | ✅ |
|
||||
| Translation | ✅ | ✅ |
|
||||
| Summarization | ✅ | ✅ |
|
||||
| Meeting Rooms | ❌ | ✅ |
|
||||
| Persistent Storage | ❌ | ✅ |
|
||||
| Team Collaboration | ❌ | ✅ |
|
||||
| API Access | Limited | Full |
|
||||
| User Management | ❌ | ✅ |
|
||||
| Custom Branding | ❌ | ✅ |
|
||||
| Analytics | ❌ | ✅ |
|
||||
| Webhooks | ❌ | ✅ |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Public Mode Security
|
||||
- Rate limiting to prevent abuse
|
||||
- File size restrictions
|
||||
- Automatic cleanup of old data
|
||||
- No PII storage
|
||||
|
||||
### Private Mode Security
|
||||
- Encrypted data storage
|
||||
- Audit logging
|
||||
- Session management
|
||||
- Access control lists
|
||||
- Data retention policies
|
||||
|
||||
## Choosing the Right Mode
|
||||
|
||||
### Choose Public Mode if:
|
||||
- You need quick, one-time transcriptions
|
||||
- You're evaluating Reflector
|
||||
- You don't need persistent storage
|
||||
- You're processing non-sensitive content
|
||||
|
||||
### Choose Private Mode if:
|
||||
- You need team collaboration
|
||||
- You require persistent storage
|
||||
- You're processing sensitive content
|
||||
- You need meeting room functionality
|
||||
- You want advanced analytics
|
||||
194
docs/docs/concepts/overview.md
Normal file
194
docs/docs/concepts/overview.md
Normal file
@@ -0,0 +1,194 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
title: Architecture Overview
|
||||
---
|
||||
|
||||
# Architecture Overview
|
||||
|
||||
Reflector is built as a modern, scalable, microservices-based application designed to handle audio processing workloads efficiently while maintaining data privacy and control.
|
||||
|
||||
## System Components
|
||||
|
||||
### Frontend Application
|
||||
|
||||
The user interface is built with **Next.js 14** using the App Router pattern, providing:
|
||||
|
||||
- Server-side rendering for optimal performance
|
||||
- Real-time WebSocket connections for live transcription
|
||||
- WebRTC support for audio streaming
|
||||
- Responsive design with Chakra UI components
|
||||
|
||||
### Backend API Server
|
||||
|
||||
The core API is powered by **FastAPI**, a modern Python framework that provides:
|
||||
|
||||
- High-performance async request handling
|
||||
- Automatic OpenAPI documentation generation
|
||||
- Type safety with Pydantic models
|
||||
- WebSocket support for real-time updates
|
||||
|
||||
### Processing Pipeline
|
||||
|
||||
Audio processing is handled through a modular pipeline architecture:
|
||||
|
||||
```
|
||||
Audio Input → Chunking → Transcription → Diarization → Post-Processing → Storage
|
||||
```
|
||||
|
||||
Each step can run independently and in parallel, allowing for:
|
||||
- Scalable processing of large files
|
||||
- Real-time streaming capabilities
|
||||
- Fault tolerance and retry mechanisms
|
||||
|
||||
### Worker Architecture
|
||||
|
||||
Background tasks are managed by **Celery** workers with **Redis** as the message broker:
|
||||
|
||||
- Distributed task processing
|
||||
- Priority queues for time-sensitive operations
|
||||
- Automatic retry on failure
|
||||
- Progress tracking and notifications
|
||||
|
||||
### GPU Acceleration
|
||||
|
||||
ML models run on GPU-accelerated infrastructure:
|
||||
|
||||
- **Modal.com** for serverless GPU processing
|
||||
- Support for local GPU deployment (coming soon)
|
||||
- Automatic scaling based on demand
|
||||
- Cost-effective pay-per-use model
|
||||
|
||||
## Data Flow
|
||||
|
||||
### File Processing Flow
|
||||
|
||||
1. **Upload**: User uploads audio file through web interface
|
||||
2. **Storage**: File stored temporarily or in S3
|
||||
3. **Queue**: Processing job added to Celery queue
|
||||
4. **Chunking**: Audio split into 30-second segments
|
||||
5. **Parallel Processing**: Chunks processed simultaneously
|
||||
6. **Assembly**: Results merged and aligned
|
||||
7. **Post-Processing**: Summary, topics, translation
|
||||
8. **Delivery**: Results stored and user notified
|
||||
|
||||
### Live Streaming Flow
|
||||
|
||||
1. **WebRTC Connection**: Browser establishes peer connection
|
||||
2. **Audio Capture**: Microphone audio streamed to server
|
||||
3. **Buffering**: Audio buffered for processing
|
||||
4. **VAD**: Voice activity detection segments speech
|
||||
5. **Real-time Processing**: Segments transcribed immediately
|
||||
6. **WebSocket Updates**: Results streamed back to client
|
||||
7. **Continuous Assembly**: Full transcript built progressively
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### Container-Based Deployment
|
||||
|
||||
All components are containerized for consistent deployment:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
frontend: # Next.js application
|
||||
backend: # FastAPI server
|
||||
worker: # Celery workers
|
||||
redis: # Message broker
|
||||
postgres: # Database
|
||||
caddy: # Reverse proxy
|
||||
```
|
||||
|
||||
### Networking
|
||||
|
||||
- **Host Network Mode**: Required for WebRTC/ICE compatibility
|
||||
- **Caddy Reverse Proxy**: Handles SSL termination and routing
|
||||
- **WebSocket Upgrade**: Supports real-time connections
|
||||
|
||||
## Scalability Considerations
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
- **Stateless Backend**: Multiple API server instances
|
||||
- **Worker Pools**: Add workers based on queue depth
|
||||
- **Database Pooling**: Connection management for concurrent access
|
||||
|
||||
### Vertical Scaling
|
||||
|
||||
- **GPU Workers**: Scale up for faster model inference
|
||||
- **Memory Optimization**: Efficient audio buffering
|
||||
- **CPU Optimization**: Multi-threaded processing where applicable
|
||||
|
||||
## Security Architecture
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
- **JWT Tokens**: Stateless authentication
|
||||
- **Authentik Integration**: Enterprise SSO support
|
||||
- **Role-Based Access**: Granular permissions
|
||||
|
||||
### Data Protection
|
||||
|
||||
- **Encryption at Rest**: Database and S3 encryption
|
||||
- **Encryption in Transit**: TLS for all connections
|
||||
- **Temporary Storage**: Automatic cleanup of processed files
|
||||
|
||||
### Privacy by Design
|
||||
|
||||
- **Local Processing**: Option to process entirely on-premises
|
||||
- **No Training on User Data**: Models are pre-trained
|
||||
- **Data Isolation**: Multi-tenant data separation
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Services
|
||||
|
||||
- **Modal.com**: GPU processing
|
||||
- **AWS S3**: Long-term storage
|
||||
- **Whereby**: Video conferencing rooms
|
||||
- **Zulip**: Chat integration (optional)
|
||||
|
||||
### APIs and Webhooks
|
||||
|
||||
- **RESTful API**: Standard CRUD operations
|
||||
- **WebSocket API**: Real-time updates
|
||||
- **Webhook Notifications**: Processing completion events
|
||||
- **OpenAPI Specification**: Machine-readable API definition
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
- **Redis Cache**: Frequently accessed data
|
||||
- **CDN**: Static asset delivery
|
||||
- **Browser Cache**: Client-side optimization
|
||||
|
||||
### Database Optimization
|
||||
|
||||
- **Indexed Queries**: Fast search and retrieval
|
||||
- **Connection Pooling**: Efficient resource usage
|
||||
- **Query Optimization**: N+1 query prevention
|
||||
|
||||
### Processing Optimization
|
||||
|
||||
- **Batch Processing**: Efficient GPU utilization
|
||||
- **Parallel Execution**: Multi-core CPU usage
|
||||
- **Stream Processing**: Reduced memory footprint
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Metrics Collection
|
||||
|
||||
- **Application Metrics**: Request rates, response times
|
||||
- **System Metrics**: CPU, memory, disk usage
|
||||
- **Business Metrics**: Transcription accuracy, processing times
|
||||
|
||||
### Logging
|
||||
|
||||
- **Structured Logging**: JSON format for analysis
|
||||
- **Log Aggregation**: Centralized log management
|
||||
- **Error Tracking**: Sentry integration
|
||||
|
||||
### Health Checks
|
||||
|
||||
- **Liveness Probes**: Component availability
|
||||
- **Readiness Probes**: Service readiness
|
||||
- **Dependency Checks**: External service status
|
||||
274
docs/docs/concepts/pipeline.md
Normal file
274
docs/docs/concepts/pipeline.md
Normal file
@@ -0,0 +1,274 @@
|
||||
---
|
||||
sidebar_position: 4
|
||||
title: Processing Pipeline
|
||||
---
|
||||
|
||||
# Processing Pipeline
|
||||
|
||||
Reflector uses a modular pipeline architecture to process audio efficiently and accurately.
|
||||
|
||||
## Pipeline Overview
|
||||
|
||||
The processing pipeline consists of modular components that can be combined and configured based on your needs:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Audio Input] --> B[Pre-processing]
|
||||
B --> C[Chunking]
|
||||
C --> D[Transcription]
|
||||
D --> E[Diarization]
|
||||
E --> F[Alignment]
|
||||
F --> G[Post-processing]
|
||||
G --> H[Output]
|
||||
```
|
||||
|
||||
## Pipeline Components
|
||||
|
||||
### Audio Input
|
||||
|
||||
Accepts various input sources:
|
||||
- **File Upload**: MP3, WAV, M4A, WebM, MP4
|
||||
- **WebRTC Stream**: Live browser audio
|
||||
- **Recording Integration**: Whereby recordings
|
||||
- **API Upload**: Direct API submission
|
||||
|
||||
### Pre-processing
|
||||
|
||||
Prepares audio for optimal processing:
|
||||
- **Format Conversion**: Convert to 16kHz mono WAV
|
||||
- **Normalization**: Adjust volume to -23 LUFS
|
||||
- **Noise Reduction**: Optional background noise removal
|
||||
- **Validation**: Check duration and quality
|
||||
|
||||
### Chunking
|
||||
|
||||
Splits audio for parallel processing:
|
||||
- **Fixed Size**: 30-second chunks by default
|
||||
- **Overlap**: 1-second overlap for continuity
|
||||
- **Silence Detection**: Attempt to split at silence
|
||||
- **Metadata**: Track chunk positions
|
||||
|
||||
### Transcription
|
||||
|
||||
Converts speech to text:
|
||||
- **Model Selection**: Whisper or Parakeet
|
||||
- **Language Detection**: Automatic or specified
|
||||
- **Timestamp Generation**: Word-level timing
|
||||
- **Confidence Scores**: Quality indicators
|
||||
|
||||
### Diarization
|
||||
|
||||
Identifies different speakers:
|
||||
- **Voice Activity Detection**: Find speech segments
|
||||
- **Speaker Embedding**: Extract voice characteristics
|
||||
- **Clustering**: Group similar voices
|
||||
- **Label Assignment**: Assign speaker IDs
|
||||
|
||||
### Alignment
|
||||
|
||||
Merges all processing results:
|
||||
- **Chunk Assembly**: Combine transcription chunks
|
||||
- **Speaker Mapping**: Align speakers with text
|
||||
- **Overlap Resolution**: Handle chunk boundaries
|
||||
- **Timeline Creation**: Build unified timeline
|
||||
|
||||
### Post-processing
|
||||
|
||||
Enhances the final output:
|
||||
- **Formatting**: Apply punctuation and capitalization
|
||||
- **Translation**: Convert to target languages
|
||||
- **Summarization**: Generate concise summaries
|
||||
- **Topic Extraction**: Identify key themes
|
||||
- **Action Items**: Extract tasks and decisions
|
||||
|
||||
## Processing Modes
|
||||
|
||||
### Batch Processing
|
||||
|
||||
For uploaded files:
|
||||
- Optimized for throughput
|
||||
- Parallel chunk processing
|
||||
- Higher accuracy models
|
||||
- Complete file analysis
|
||||
|
||||
### Stream Processing
|
||||
|
||||
For live audio:
|
||||
- Optimized for latency
|
||||
- Sequential processing
|
||||
- Real-time feedback
|
||||
- Progressive results
|
||||
|
||||
### Hybrid Processing
|
||||
|
||||
For meetings:
|
||||
- Stream during meeting
|
||||
- Batch after completion
|
||||
- Best of both modes
|
||||
- Maximum accuracy
|
||||
|
||||
## Pipeline Configuration
|
||||
|
||||
### Model Selection
|
||||
|
||||
Choose models based on requirements:
|
||||
|
||||
```python
|
||||
# High accuracy (slower)
|
||||
config = {
|
||||
"transcription_model": "whisper-large-v3",
|
||||
"diarization_model": "pyannote-3.1",
|
||||
"translation_model": "seamless-m4t-large"
|
||||
}
|
||||
|
||||
# Balanced (default)
|
||||
config = {
|
||||
"transcription_model": "whisper-base",
|
||||
"diarization_model": "pyannote-3.1",
|
||||
"translation_model": "seamless-m4t-medium"
|
||||
}
|
||||
|
||||
# Fast processing
|
||||
config = {
|
||||
"transcription_model": "whisper-tiny",
|
||||
"diarization_model": "pyannote-3.1-fast",
|
||||
"translation_model": "seamless-m4t-small"
|
||||
}
|
||||
```
|
||||
|
||||
### Processing Options
|
||||
|
||||
Customize pipeline behavior:
|
||||
|
||||
```yaml
|
||||
# Parallel processing
|
||||
max_parallel_chunks: 10
|
||||
chunk_size_seconds: 30
|
||||
chunk_overlap_seconds: 1
|
||||
|
||||
# Quality settings
|
||||
enable_noise_reduction: true
|
||||
enable_normalization: true
|
||||
min_speech_confidence: 0.5
|
||||
|
||||
# Post-processing
|
||||
enable_translation: true
|
||||
target_languages: ["es", "fr", "de"]
|
||||
enable_summarization: true
|
||||
summary_length: "medium"
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Processing Times
|
||||
|
||||
For 1 hour of audio:
|
||||
|
||||
| Pipeline Config | Processing Time | Accuracy |
|
||||
|----------------|-----------------|----------|
|
||||
| Fast | 2-3 minutes | 85-90% |
|
||||
| Balanced | 5-8 minutes | 92-95% |
|
||||
| High Accuracy | 15-20 minutes | 95-98% |
|
||||
|
||||
### Resource Usage
|
||||
|
||||
| Component | CPU Usage | Memory | GPU |
|
||||
|-----------|-----------|---------|-----|
|
||||
| Transcription | Medium | 2-4 GB | Required |
|
||||
| Diarization | High | 4-8 GB | Required |
|
||||
| Translation | Low | 2-3 GB | Optional |
|
||||
| Post-processing | Low | 1-2 GB | Not needed |
|
||||
|
||||
## Pipeline Orchestration
|
||||
|
||||
### Celery Task Chain
|
||||
|
||||
The pipeline is orchestrated using Celery:
|
||||
|
||||
```python
|
||||
chain = (
|
||||
chunk_audio.s(audio_id) |
|
||||
group(transcribe_chunk.s(chunk) for chunk in chunks) |
|
||||
merge_transcriptions.s() |
|
||||
diarize_audio.s() |
|
||||
align_speakers.s() |
|
||||
post_process.s()
|
||||
)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
Error recovery:
|
||||
- **Automatic Retry**: Failed tasks retry up to 3 times
|
||||
- **Partial Recovery**: Continue with successful chunks
|
||||
- **Fallback Models**: Use alternative models on failure
|
||||
- **Error Reporting**: Detailed error messages
|
||||
|
||||
### Progress Tracking
|
||||
|
||||
Real-time progress updates:
|
||||
- **Chunk Progress**: Track individual chunk processing
|
||||
- **Overall Progress**: Percentage completion
|
||||
- **ETA Calculation**: Estimated completion time
|
||||
- **WebSocket Updates**: Live progress to clients
|
||||
|
||||
## Optimization Strategies
|
||||
|
||||
### GPU Utilization
|
||||
|
||||
Maximize GPU efficiency:
|
||||
- **Batch Processing**: Process multiple chunks together
|
||||
- **Model Caching**: Keep models loaded in memory
|
||||
- **Dynamic Batching**: Adjust batch size based on GPU memory
|
||||
- **Multi-GPU Support**: Distribute across available GPUs
|
||||
|
||||
### Memory Management
|
||||
|
||||
Efficient memory usage:
|
||||
- **Streaming Processing**: Process large files in chunks
|
||||
- **Garbage Collection**: Clean up after each chunk
|
||||
- **Memory Limits**: Prevent out-of-memory errors
|
||||
- **Disk Caching**: Use disk for large intermediate results
|
||||
|
||||
### Network Optimization
|
||||
|
||||
Minimize network overhead:
|
||||
- **Compression**: Compress audio before transfer
|
||||
- **CDN Integration**: Use CDN for static assets
|
||||
- **Connection Pooling**: Reuse network connections
|
||||
- **Parallel Uploads**: Multiple concurrent uploads
|
||||
|
||||
## Quality Assurance
|
||||
|
||||
### Accuracy Metrics
|
||||
|
||||
Monitor processing quality:
|
||||
- **Word Error Rate (WER)**: Transcription accuracy
|
||||
- **Diarization Error Rate (DER)**: Speaker identification accuracy
|
||||
- **Translation BLEU Score**: Translation quality
|
||||
- **Summary Coherence**: Summary quality metrics
|
||||
|
||||
### Validation Steps
|
||||
|
||||
Ensure output quality:
|
||||
- **Confidence Thresholds**: Filter low-confidence segments
|
||||
- **Consistency Checks**: Verify timeline consistency
|
||||
- **Language Validation**: Ensure correct language detection
|
||||
- **Format Validation**: Check output format compliance
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Custom Models
|
||||
|
||||
Use your own models:
|
||||
- **Fine-tuned Whisper**: Domain-specific models
|
||||
- **Custom Diarization**: Trained on your speakers
|
||||
- **Specialized Post-processing**: Industry-specific formatting
|
||||
|
||||
### Pipeline Extensions
|
||||
|
||||
Add custom processing steps:
|
||||
- **Sentiment Analysis**: Analyze emotional tone
|
||||
- **Entity Extraction**: Identify people, places, organizations
|
||||
- **Custom Metrics**: Calculate domain-specific metrics
|
||||
- **Integration Hooks**: Call external services
|
||||
7
docs/docs/installation/authentik-setup.md
Normal file
7
docs/docs/installation/authentik-setup.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: authentik setup
|
||||
---
|
||||
|
||||
# authentik setup
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/installation/aws-setup.md
Normal file
7
docs/docs/installation/aws-setup.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: aws setup
|
||||
---
|
||||
|
||||
# aws setup
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
23
docs/docs/installation/docker-setup.md
Normal file
23
docs/docs/installation/docker-setup.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
title: Docker Deployment
|
||||
---
|
||||
|
||||
# Docker Deployment
|
||||
|
||||
See the [Docker directory](https://github.com/monadical-sas/reflector/tree/main/docker) in the repository for the complete Docker deployment configuration.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Clone the repository
|
||||
2. Navigate to `/docker` directory
|
||||
3. Copy `.env.example` to `.env`
|
||||
4. Configure environment variables
|
||||
5. Run `docker compose up -d`
|
||||
|
||||
## Configuration
|
||||
|
||||
Check the repository for:
|
||||
- `docker-compose.yml` - Service definitions
|
||||
- `.env.example` - Environment variables
|
||||
- `Caddyfile` - Reverse proxy configuration
|
||||
7
docs/docs/installation/modal-setup.md
Normal file
7
docs/docs/installation/modal-setup.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: modal setup
|
||||
---
|
||||
|
||||
# modal setup
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
162
docs/docs/installation/overview.md
Normal file
162
docs/docs/installation/overview.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
title: Installation Overview
|
||||
---
|
||||
|
||||
# Installation Overview
|
||||
|
||||
Reflector is designed for self-hosted deployment, giving you complete control over your infrastructure and data.
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### Docker Deployment (Recommended)
|
||||
|
||||
The easiest way to deploy Reflector:
|
||||
- Pre-configured containers
|
||||
- Automated dependency management
|
||||
- Consistent environment
|
||||
- Easy updates
|
||||
|
||||
### Manual Installation
|
||||
|
||||
For custom deployments:
|
||||
- Greater control over configuration
|
||||
- Integration with existing infrastructure
|
||||
- Custom optimization options
|
||||
- Development environments
|
||||
|
||||
## Requirements
|
||||
|
||||
### System Requirements
|
||||
|
||||
**Minimum Requirements:**
|
||||
- CPU: 4 cores
|
||||
- RAM: 8 GB
|
||||
- Storage: 50 GB
|
||||
- OS: Ubuntu 20.04+ or similar Linux
|
||||
|
||||
**Recommended Requirements:**
|
||||
- CPU: 8+ cores
|
||||
- RAM: 16 GB
|
||||
- Storage: 100 GB SSD
|
||||
- GPU: NVIDIA GPU with 8GB+ VRAM (for local processing)
|
||||
|
||||
### Network Requirements
|
||||
|
||||
- Public IP address (for WebRTC)
|
||||
- Ports: 80, 443, 8000, 3000
|
||||
- Domain name (for SSL)
|
||||
- SSL certificate (Let's Encrypt supported)
|
||||
|
||||
## Required Services
|
||||
|
||||
### Core Services
|
||||
|
||||
These services are required for basic operation:
|
||||
|
||||
1. **PostgreSQL** - Primary database
|
||||
2. **Redis** - Message broker and cache
|
||||
3. **Docker** - Container runtime
|
||||
|
||||
### GPU Processing
|
||||
|
||||
Choose one:
|
||||
- **Modal.com** - Serverless GPU (recommended)
|
||||
- **Local GPU** - Self-hosted GPU processing
|
||||
|
||||
### Optional Services
|
||||
|
||||
Enhance functionality with:
|
||||
- **AWS S3** - Long-term storage
|
||||
- **Whereby** - Video conferencing rooms
|
||||
- **Authentik** - Enterprise authentication
|
||||
- **Zulip** - Chat integration
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Using Docker Compose
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone https://github.com/monadical-sas/reflector.git
|
||||
cd reflector
|
||||
```
|
||||
|
||||
2. Navigate to docker directory:
|
||||
```bash
|
||||
cd docker
|
||||
```
|
||||
|
||||
3. Copy and configure environment:
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env with your settings
|
||||
```
|
||||
|
||||
4. Start services:
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
5. Access Reflector:
|
||||
- Frontend: https://your-domain.com
|
||||
- API: https://your-domain.com/api
|
||||
|
||||
## Configuration Overview
|
||||
|
||||
### Essential Configuration
|
||||
|
||||
```env
|
||||
# Database
|
||||
DATABASE_URL=postgresql://user:pass@localhost/reflector
|
||||
|
||||
# Redis
|
||||
REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Modal.com (for GPU processing)
|
||||
TRANSCRIPT_MODAL_API_KEY=your-key
|
||||
DIARIZATION_MODAL_API_KEY=your-key
|
||||
|
||||
# Domain
|
||||
DOMAIN=your-domain.com
|
||||
```
|
||||
|
||||
### Security Configuration
|
||||
|
||||
```env
|
||||
# Authentication
|
||||
REFLECTOR_AUTH_BACKEND=jwt
|
||||
NEXTAUTH_SECRET=generate-strong-secret
|
||||
|
||||
# SSL (handled by Caddy)
|
||||
# Automatic with Let's Encrypt
|
||||
```
|
||||
|
||||
## Service Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Caddy Reverse Proxy] --> B[Frontend - Next.js]
|
||||
A --> C[Backend - FastAPI]
|
||||
C --> D[PostgreSQL]
|
||||
C --> E[Redis]
|
||||
C --> F[Celery Workers]
|
||||
F --> G[Modal.com GPU]
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Review Requirements**: [System Requirements](./requirements)
|
||||
2. **Docker Setup**: [Docker Deployment Guide](./docker-setup)
|
||||
3. **Configure Services**:
|
||||
- [Modal.com Setup](./modal-setup)
|
||||
- [Whereby Setup](./whereby-setup)
|
||||
- [AWS S3 Setup](./aws-setup)
|
||||
4. **Optional Services**:
|
||||
- [Authentik Setup](./authentik-setup)
|
||||
- [Zulip Setup](./zulip-setup)
|
||||
|
||||
## Getting Help
|
||||
|
||||
- [GitHub Issues](https://github.com/monadical-sas/reflector/issues)
|
||||
- [Community Discord](#)
|
||||
29
docs/docs/installation/requirements.md
Normal file
29
docs/docs/installation/requirements.md
Normal file
@@ -0,0 +1,29 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
title: System Requirements
|
||||
---
|
||||
|
||||
# System Requirements
|
||||
|
||||
## Minimum Requirements
|
||||
|
||||
- **CPU**: 4 cores
|
||||
- **RAM**: 8 GB
|
||||
- **Storage**: 50 GB SSD
|
||||
- **OS**: Ubuntu 20.04+ or compatible Linux
|
||||
- **Network**: Public IP address
|
||||
|
||||
## Recommended Requirements
|
||||
|
||||
- **CPU**: 8+ cores
|
||||
- **RAM**: 16 GB
|
||||
- **Storage**: 100 GB SSD
|
||||
- **GPU**: NVIDIA GPU with 8GB+ VRAM (for local processing)
|
||||
- **Network**: 1 Gbps connection
|
||||
|
||||
## Software Requirements
|
||||
|
||||
- Docker Engine 20.10+
|
||||
- Docker Compose 2.0+
|
||||
- Node.js 18+ (for frontend development)
|
||||
- Python 3.11+ (for backend development)
|
||||
7
docs/docs/installation/whereby-setup.md
Normal file
7
docs/docs/installation/whereby-setup.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: whereby setup
|
||||
---
|
||||
|
||||
# whereby setup
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/installation/zulip-setup.md
Normal file
7
docs/docs/installation/zulip-setup.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: zulip setup
|
||||
---
|
||||
|
||||
# zulip setup
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
61
docs/docs/intro.md
Normal file
61
docs/docs/intro.md
Normal file
@@ -0,0 +1,61 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
title: Introduction
|
||||
---
|
||||
|
||||
# Welcome to Reflector
|
||||
|
||||
Reflector is a privacy-focused, self-hosted AI-powered audio transcription and meeting analysis platform that provides real-time transcription, speaker diarization, translation, and summarization for audio content and live meetings. With complete control over your data and infrastructure, you can run models on your own hardware (roadmap - currently supports Modal.com for GPU processing).
|
||||
|
||||
## What is Reflector?
|
||||
|
||||
Reflector is a web application that utilizes AI to process audio content, providing:
|
||||
|
||||
- **Real-time Transcription**: Convert speech to text using [Whisper](https://github.com/openai/whisper) (multi-language) or [Parakeet](https://github.com/NVIDIA/NeMo) (English) models
|
||||
- **Speaker Diarization**: Identify and label different speakers using [Pyannote](https://github.com/pyannote/pyannote-audio) 3.1
|
||||
- **Live Translation**: Translate audio content in real-time to 100+ languages with [Facebook Seamless-M4T](https://github.com/facebookresearch/seamless_communication)
|
||||
- **Topic Detection & Summarization**: Extract key topics and generate concise summaries using LLMs
|
||||
- **Meeting Recording**: Create permanent records of meetings with searchable transcripts
|
||||
|
||||
## Features
|
||||
|
||||
| Feature | Public Mode | Private Mode |
|
||||
|---------|------------|--------------|
|
||||
| **Authentication** | None required | Required |
|
||||
| **Audio Upload** | ✅ | ✅ |
|
||||
| **Live Microphone Streaming** | ✅ | ✅ |
|
||||
| **Transcription** | ✅ | ✅ |
|
||||
| **Speaker Diarization** | ✅ | ✅ |
|
||||
| **Translation** | ✅ | ✅ |
|
||||
| **Topic Detection** | ✅ | ✅ |
|
||||
| **Summarization** | ✅ | ✅ |
|
||||
| **Virtual Meeting Rooms (Whereby)** | ❌ | ✅ |
|
||||
| **Browse Transcripts Page** | ❌ | ✅ |
|
||||
| **Search Functionality** | ❌ | ✅ |
|
||||
| **Persistent Storage** | ❌ | ✅ |
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
Reflector consists of three main components:
|
||||
|
||||
- **Frontend**: React application built with Next.js 14
|
||||
- **Backend**: Python server using FastAPI
|
||||
- **Processing**: Scalable GPU workers for ML inference (Modal.com or local)
|
||||
|
||||
## Getting Started
|
||||
|
||||
Ready to deploy Reflector? Head over to our [Installation Guide](./installation/overview) to set up your own instance.
|
||||
|
||||
For a quick overview of how Reflector processes audio, check out our [Pipeline Documentation](./pipelines/overview).
|
||||
|
||||
## Open Source
|
||||
|
||||
Reflector is open source software developed by [Monadical](https://monadical.com) and licensed under the **MIT License**. We welcome contributions from the community!
|
||||
|
||||
- [GitHub Repository](https://github.com/monadical-sas/reflector)
|
||||
- [Issue Tracker](https://github.com/monadical-sas/reflector/issues)
|
||||
- [Pull Requests](https://github.com/monadical-sas/reflector/pulls)
|
||||
|
||||
## Support
|
||||
|
||||
Need help? Reach out to the community through GitHub Discussions.
|
||||
348
docs/docs/pipelines/file-pipeline.md
Normal file
348
docs/docs/pipelines/file-pipeline.md
Normal file
@@ -0,0 +1,348 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
title: File Processing Pipeline
|
||||
---
|
||||
|
||||
# File Processing Pipeline
|
||||
|
||||
The file processing pipeline handles uploaded audio files, optimizing for accuracy and throughput.
|
||||
|
||||
## Pipeline Stages
|
||||
|
||||
### 1. Input Stage
|
||||
|
||||
**Accepted Formats:**
|
||||
- MP3 (most common)
|
||||
- WAV (uncompressed)
|
||||
- M4A (Apple format)
|
||||
- WebM (browser recordings)
|
||||
- MP4 (video with audio track)
|
||||
|
||||
**File Validation:**
|
||||
- Maximum size: 2GB (configurable)
|
||||
- Minimum duration: 5 seconds
|
||||
- Maximum duration: 6 hours
|
||||
- Sample rate: Any (will be resampled)
|
||||
|
||||
### 2. Pre-processing
|
||||
|
||||
**Audio Normalization:**
|
||||
```python
|
||||
# Convert to standard format
|
||||
- Sample rate: 16kHz (Whisper requirement)
|
||||
- Channels: Mono
|
||||
- Bit depth: 16-bit
|
||||
- Format: WAV
|
||||
```
|
||||
|
||||
**Volume Normalization:**
|
||||
- Target: -23 LUFS (broadcast standard)
|
||||
- Prevents clipping
|
||||
- Improves transcription accuracy
|
||||
|
||||
**Noise Reduction (Optional):**
|
||||
- Background noise removal
|
||||
- Echo cancellation
|
||||
- High-pass filter for rumble
|
||||
|
||||
### 3. Chunking Strategy
|
||||
|
||||
**Default Configuration:**
|
||||
```yaml
|
||||
chunk_size: 30 # seconds
|
||||
overlap: 1 # seconds
|
||||
max_parallel: 10
|
||||
silence_detection: true
|
||||
```
|
||||
|
||||
**Chunking with Silence Detection:**
|
||||
- Detects silence periods
|
||||
- Attempts to break at natural pauses
|
||||
- Maintains context with overlap
|
||||
- Preserves sentence boundaries
|
||||
|
||||
**Chunk Metadata:**
|
||||
```json
|
||||
{
|
||||
"chunk_id": "chunk_001",
|
||||
"start_time": 0.0,
|
||||
"end_time": 30.0,
|
||||
"duration": 30.0,
|
||||
"has_speech": true,
|
||||
"audio_hash": "sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Transcription Processing
|
||||
|
||||
**Whisper Models:**
|
||||
|
||||
| Model | Size | Speed | Accuracy | Use Case |
|
||||
|-------|------|-------|----------|----------|
|
||||
| tiny | 39M | Very Fast | 85% | Quick drafts |
|
||||
| base | 74M | Fast | 89% | Good balance |
|
||||
| small | 244M | Medium | 91% | Better accuracy |
|
||||
| medium | 769M | Slow | 93% | High quality |
|
||||
| large-v3 | 1550M | Very Slow | 96% | Best quality |
|
||||
|
||||
**Processing Configuration:**
|
||||
```python
|
||||
transcription_config = {
|
||||
"model": "whisper-base",
|
||||
"language": "auto", # or specify: "en", "es", etc.
|
||||
"task": "transcribe", # or "translate"
|
||||
"temperature": 0, # deterministic
|
||||
"compression_ratio_threshold": 2.4,
|
||||
"no_speech_threshold": 0.6,
|
||||
"condition_on_previous_text": True,
|
||||
"initial_prompt": None, # optional context
|
||||
}
|
||||
```
|
||||
|
||||
**Parallel Processing:**
|
||||
- Each chunk processed independently
|
||||
- GPU batching for efficiency
|
||||
- Automatic load balancing
|
||||
- Failure isolation
|
||||
|
||||
### 5. Diarization (Speaker Identification)
|
||||
|
||||
**Pyannote 3.1 Pipeline:**
|
||||
|
||||
1. **Voice Activity Detection (VAD)**
|
||||
- Identifies speech segments
|
||||
- Filters out silence and noise
|
||||
- Precision: 95%+
|
||||
|
||||
2. **Speaker Embedding**
|
||||
- Extracts voice characteristics
|
||||
- 256-dimensional vectors
|
||||
- Speaker-invariant features
|
||||
|
||||
3. **Clustering**
|
||||
- Groups similar voice embeddings
|
||||
- Agglomerative clustering
|
||||
- Automatic speaker count detection
|
||||
|
||||
4. **Segmentation**
|
||||
- Assigns speaker labels to time segments
|
||||
- Handles overlapping speech
|
||||
- Minimum segment duration: 0.5s
|
||||
|
||||
**Configuration:**
|
||||
```python
|
||||
diarization_config = {
|
||||
"min_speakers": 1,
|
||||
"max_speakers": 10,
|
||||
"min_duration": 0.5,
|
||||
"clustering": "AgglomerativeClustering",
|
||||
"embedding_model": "speechbrain/spkrec-ecapa-voxceleb",
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Alignment & Merging
|
||||
|
||||
**Chunk Assembly:**
|
||||
```python
|
||||
# Merge overlapping segments
|
||||
for chunk in chunks:
|
||||
# Remove overlap duplicates
|
||||
if chunk.start < previous.end:
|
||||
chunk.text = resolve_overlap(previous, chunk)
|
||||
|
||||
# Maintain timeline
|
||||
merged_transcript.append(chunk)
|
||||
```
|
||||
|
||||
**Speaker Alignment:**
|
||||
- Map diarization timeline to transcript
|
||||
- Resolve speaker changes mid-sentence
|
||||
- Handle multiple speakers per segment
|
||||
|
||||
**Quality Checks:**
|
||||
- Timeline consistency
|
||||
- No gaps in transcript
|
||||
- Speaker label continuity
|
||||
- Confidence score validation
|
||||
|
||||
### 7. Post-processing Chain
|
||||
|
||||
**Text Formatting:**
|
||||
- Sentence capitalization
|
||||
- Punctuation restoration
|
||||
- Number formatting
|
||||
- Acronym detection
|
||||
|
||||
**Translation (Optional):**
|
||||
```python
|
||||
translation_config = {
|
||||
"model": "facebook/seamless-m4t-medium",
|
||||
"source_lang": "auto",
|
||||
"target_langs": ["es", "fr", "de"],
|
||||
"preserve_formatting": True
|
||||
}
|
||||
```
|
||||
|
||||
**Topic Detection:**
|
||||
- LLM-based analysis
|
||||
- Extract 3-5 key topics
|
||||
- Keyword extraction
|
||||
- Entity recognition
|
||||
|
||||
**Summarization:**
|
||||
```python
|
||||
summary_config = {
|
||||
"model": "openai-compatible",
|
||||
"max_length": 500,
|
||||
"style": "bullets", # or "paragraph"
|
||||
"include_action_items": True,
|
||||
"include_decisions": True
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Storage & Delivery
|
||||
|
||||
**Database Storage:**
|
||||
```sql
|
||||
-- Main transcript record
|
||||
INSERT INTO transcripts (
|
||||
id, title, duration, language,
|
||||
transcript_text, transcript_json,
|
||||
speakers, topics, summary,
|
||||
created_at, processing_time
|
||||
) VALUES (...);
|
||||
|
||||
-- Processing metadata
|
||||
INSERT INTO processing_metadata (
|
||||
transcript_id, model_versions,
|
||||
chunk_count, total_chunks,
|
||||
error_count, warnings
|
||||
) VALUES (...);
|
||||
```
|
||||
|
||||
**File Storage:**
|
||||
- Original audio: S3 (optional)
|
||||
- Processed chunks: Temporary (24h)
|
||||
- Transcript exports: JSON, SRT, VTT, TXT
|
||||
|
||||
**Notification:**
|
||||
```json
|
||||
{
|
||||
"type": "webhook",
|
||||
"url": "https://your-app.com/webhook",
|
||||
"payload": {
|
||||
"transcript_id": "...",
|
||||
"status": "completed",
|
||||
"duration": 3600,
|
||||
"processing_time": 180
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Processing Times
|
||||
|
||||
**Estimated times for 1 hour of audio:**
|
||||
|
||||
| Component | Fast Mode | Balanced | High Quality |
|
||||
|-----------|-----------|----------|--------------|
|
||||
| Pre-processing | 10s | 10s | 10s |
|
||||
| Transcription | 60s | 180s | 600s |
|
||||
| Diarization | 30s | 60s | 120s |
|
||||
| Post-processing | 20s | 30s | 60s |
|
||||
| **Total** | **2 min** | **5 min** | **13 min** |
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Retry Strategy
|
||||
|
||||
```python
|
||||
@celery.task(
|
||||
bind=True,
|
||||
max_retries=3,
|
||||
default_retry_delay=60,
|
||||
retry_backoff=True
|
||||
)
|
||||
def process_chunk(self, chunk_id):
|
||||
try:
|
||||
# Process chunk
|
||||
result = transcribe(chunk_id)
|
||||
except Exception as exc:
|
||||
# Exponential backoff
|
||||
raise self.retry(exc=exc)
|
||||
```
|
||||
|
||||
### Partial Recovery
|
||||
|
||||
- Continue with successful chunks
|
||||
- Mark failed chunks in output
|
||||
- Provide partial transcript
|
||||
- Report processing issues
|
||||
|
||||
### Fallback Options
|
||||
|
||||
1. **Model Fallback:**
|
||||
- If large model fails, try medium
|
||||
- If GPU fails, try CPU
|
||||
- If Modal fails, try local
|
||||
|
||||
2. **Quality Degradation:**
|
||||
- Reduce chunk size
|
||||
- Disable post-processing
|
||||
- Skip diarization if needed
|
||||
|
||||
## Optimization Tips
|
||||
|
||||
### For Speed
|
||||
|
||||
1. Use smaller models (tiny/base)
|
||||
2. Increase parallel chunks
|
||||
3. Disable diarization
|
||||
4. Skip post-processing
|
||||
5. Use GPU acceleration
|
||||
|
||||
### For Accuracy
|
||||
|
||||
1. Use larger models (medium/large)
|
||||
2. Enable all pre-processing
|
||||
3. Reduce chunk size
|
||||
4. Enable silence detection
|
||||
5. Multiple pass processing
|
||||
|
||||
### For Cost
|
||||
|
||||
1. Use Modal spot instances
|
||||
2. Batch multiple files
|
||||
3. Cache common phrases
|
||||
4. Optimize chunk size
|
||||
5. Selective post-processing
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Metrics to Track
|
||||
|
||||
```python
|
||||
metrics = {
|
||||
"processing_time": histogram,
|
||||
"chunk_success_rate": gauge,
|
||||
"model_accuracy": histogram,
|
||||
"queue_depth": gauge,
|
||||
"gpu_utilization": gauge,
|
||||
"cost_per_hour": counter
|
||||
}
|
||||
```
|
||||
|
||||
### Quality Metrics
|
||||
|
||||
- Word Error Rate (WER)
|
||||
- Diarization Error Rate (DER)
|
||||
- Confidence scores
|
||||
- Processing speed
|
||||
- User feedback
|
||||
|
||||
### Alerts
|
||||
|
||||
- Processing time > 30 minutes
|
||||
- Error rate > 5%
|
||||
- Queue depth > 100
|
||||
- GPU memory > 90%
|
||||
- Cost spike detected
|
||||
7
docs/docs/pipelines/live-pipeline.md
Normal file
7
docs/docs/pipelines/live-pipeline.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: live pipeline
|
||||
---
|
||||
|
||||
# live pipeline
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/pipelines/overview.md
Normal file
7
docs/docs/pipelines/overview.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: overview
|
||||
---
|
||||
|
||||
# overview
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
448
docs/docs/reference/api.md
Normal file
448
docs/docs/reference/api.md
Normal file
@@ -0,0 +1,448 @@
|
||||
---
|
||||
title: API Reference
|
||||
---
|
||||
|
||||
# API Reference
|
||||
|
||||
The Reflector API provides a comprehensive RESTful interface for audio transcription, meeting management, and real-time streaming capabilities.
|
||||
|
||||
## Base URL
|
||||
|
||||
```
|
||||
http://localhost:8000/v1
|
||||
```
|
||||
|
||||
All API endpoints are prefixed with `/v1/` for versioning.
|
||||
|
||||
## Authentication
|
||||
|
||||
Reflector supports multiple authentication modes:
|
||||
|
||||
- **No Authentication** (Public Mode): Basic transcription and upload functionality
|
||||
- **JWT Authentication** (Private Mode): Full feature access including meeting rooms and persistent storage
|
||||
- **OAuth/OIDC via Authentik**: Enterprise single sign-on integration
|
||||
|
||||
## Core Endpoints
|
||||
|
||||
### Transcripts
|
||||
|
||||
Manage audio transcriptions and their associated metadata.
|
||||
|
||||
#### List Transcripts
|
||||
```http
|
||||
GET /v1/transcripts/
|
||||
```
|
||||
|
||||
Returns a paginated list of transcripts with filtering options.
|
||||
|
||||
#### Create Transcript
|
||||
```http
|
||||
POST /v1/transcripts/
|
||||
```
|
||||
|
||||
Create a new transcript from uploaded audio or initialize for streaming.
|
||||
|
||||
#### Get Transcript
|
||||
```http
|
||||
GET /v1/transcripts/{transcript_id}
|
||||
```
|
||||
|
||||
Retrieve detailed information about a specific transcript.
|
||||
|
||||
#### Update Transcript
|
||||
```http
|
||||
PATCH /v1/transcripts/{transcript_id}
|
||||
```
|
||||
|
||||
Update transcript metadata, summary, or processing status.
|
||||
|
||||
#### Delete Transcript
|
||||
```http
|
||||
DELETE /v1/transcripts/{transcript_id}
|
||||
```
|
||||
|
||||
Remove a transcript and its associated data.
|
||||
|
||||
### Audio Processing
|
||||
|
||||
#### Upload Audio
|
||||
```http
|
||||
POST /v1/transcripts_audio/{transcript_id}/upload
|
||||
```
|
||||
|
||||
Upload an audio file for transcription processing.
|
||||
|
||||
**Supported formats:**
|
||||
- WAV, MP3, M4A, FLAC, OGG
|
||||
- Maximum file size: 500MB
|
||||
- Sample rates: 8kHz - 48kHz
|
||||
|
||||
#### Download Audio
|
||||
```http
|
||||
GET /v1/transcripts_audio/{transcript_id}/download
|
||||
```
|
||||
|
||||
Download the original or processed audio file.
|
||||
|
||||
#### Stream Audio
|
||||
```http
|
||||
GET /v1/transcripts_audio/{transcript_id}/stream
|
||||
```
|
||||
|
||||
Stream audio content with range support for progressive playback.
|
||||
|
||||
### WebRTC Streaming
|
||||
|
||||
Real-time audio streaming via WebRTC for live transcription.
|
||||
|
||||
#### Initialize WebRTC Session
|
||||
```http
|
||||
POST /v1/transcripts_webrtc/{transcript_id}/offer
|
||||
```
|
||||
|
||||
Create a WebRTC offer for establishing a peer connection.
|
||||
|
||||
#### Complete WebRTC Handshake
|
||||
```http
|
||||
POST /v1/transcripts_webrtc/{transcript_id}/answer
|
||||
```
|
||||
|
||||
Submit the WebRTC answer to complete connection setup.
|
||||
|
||||
### WebSocket Streaming
|
||||
|
||||
Real-time updates and live transcription via WebSocket.
|
||||
|
||||
#### WebSocket Endpoint
|
||||
```ws
|
||||
ws://localhost:8000/v1/transcripts_websocket/{transcript_id}
|
||||
```
|
||||
|
||||
Receive real-time transcription updates, speaker changes, and processing status.
|
||||
|
||||
**Message Types:**
|
||||
- `transcription`: New transcribed text segments
|
||||
- `diarization`: Speaker identification updates
|
||||
- `status`: Processing status changes
|
||||
- `error`: Error notifications
|
||||
|
||||
### Meetings
|
||||
|
||||
Manage virtual meeting rooms and recordings.
|
||||
|
||||
#### List Meetings
|
||||
```http
|
||||
GET /v1/meetings/
|
||||
```
|
||||
|
||||
Get all meetings for the authenticated user.
|
||||
|
||||
#### Create Meeting
|
||||
```http
|
||||
POST /v1/meetings/
|
||||
```
|
||||
|
||||
Initialize a new meeting room with Whereby integration.
|
||||
|
||||
#### Join Meeting
|
||||
```http
|
||||
POST /v1/meetings/{meeting_id}/join
|
||||
```
|
||||
|
||||
Join an existing meeting and start recording.
|
||||
|
||||
#### End Meeting
|
||||
```http
|
||||
POST /v1/meetings/{meeting_id}/end
|
||||
```
|
||||
|
||||
End the meeting and finalize the recording.
|
||||
|
||||
### Rooms
|
||||
|
||||
Virtual meeting room configuration and management.
|
||||
|
||||
#### List Rooms
|
||||
```http
|
||||
GET /v1/rooms/
|
||||
```
|
||||
|
||||
Get available meeting rooms.
|
||||
|
||||
#### Create Room
|
||||
```http
|
||||
POST /v1/rooms/
|
||||
```
|
||||
|
||||
Create a new persistent meeting room.
|
||||
|
||||
#### Update Room Settings
|
||||
```http
|
||||
PATCH /v1/rooms/{room_id}
|
||||
```
|
||||
|
||||
Modify room configuration and permissions.
|
||||
|
||||
## Response Formats
|
||||
|
||||
### Success Response
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"created_at": "2025-01-20T10:00:00Z",
|
||||
"updated_at": "2025-01-20T10:30:00Z",
|
||||
"data": {...}
|
||||
}
|
||||
```
|
||||
|
||||
### Error Response
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"code": "ERROR_CODE",
|
||||
"message": "Human-readable error message",
|
||||
"details": {...}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Status Codes
|
||||
|
||||
- `200 OK`: Successful request
|
||||
- `201 Created`: Resource created successfully
|
||||
- `204 No Content`: Successful deletion
|
||||
- `400 Bad Request`: Invalid request parameters
|
||||
- `401 Unauthorized`: Authentication required
|
||||
- `403 Forbidden`: Insufficient permissions
|
||||
- `404 Not Found`: Resource not found
|
||||
- `409 Conflict`: Resource conflict
|
||||
- `422 Unprocessable Entity`: Validation error
|
||||
- `429 Too Many Requests`: Rate limit exceeded
|
||||
- `500 Internal Server Error`: Server error
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
- **Anonymous users**: 100 requests per minute
|
||||
- **Authenticated users**: 1000 requests per minute
|
||||
- **WebSocket connections**: 10 concurrent per user
|
||||
- **File uploads**: 10 per hour for anonymous, 100 per hour for authenticated
|
||||
|
||||
## WebSocket Protocol
|
||||
|
||||
The WebSocket connection provides real-time updates during transcription processing. The server sends structured messages to communicate different events and data updates.
|
||||
|
||||
### Connection
|
||||
```javascript
|
||||
const ws = new WebSocket('ws://localhost:8000/v1/transcripts_websocket/{transcript_id}');
|
||||
```
|
||||
|
||||
### Message Types and Formats
|
||||
|
||||
#### Transcription Update
|
||||
Sent when new text is transcribed from the audio stream.
|
||||
```json
|
||||
{
|
||||
"type": "transcription",
|
||||
"data": {
|
||||
"text": "The transcribed text segment",
|
||||
"speaker": "Speaker 1",
|
||||
"timestamp": 1705745623.456,
|
||||
"confidence": 0.95,
|
||||
"segment_id": "seg_001",
|
||||
"is_final": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Diarization Update
|
||||
Sent when speaker changes are detected or speaker labels are updated.
|
||||
```json
|
||||
{
|
||||
"type": "diarization",
|
||||
"data": {
|
||||
"speaker": "Speaker 2",
|
||||
"speaker_id": "spk_002",
|
||||
"start_time": 1705745620.123,
|
||||
"end_time": 1705745625.456,
|
||||
"confidence": 0.87
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Processing Status
|
||||
Sent to indicate changes in the processing pipeline status.
|
||||
```json
|
||||
{
|
||||
"type": "status",
|
||||
"data": {
|
||||
"status": "processing",
|
||||
"stage": "transcription",
|
||||
"progress": 45.5,
|
||||
"message": "Processing audio chunk 12 of 26"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Status values:
|
||||
- `initializing`: Setting up processing pipeline
|
||||
- `processing`: Active transcription/diarization
|
||||
- `completed`: Processing finished successfully
|
||||
- `failed`: Processing encountered an error
|
||||
- `paused`: Processing temporarily suspended
|
||||
|
||||
#### Summary Update
|
||||
Sent when AI-generated summaries or topics are available.
|
||||
```json
|
||||
{
|
||||
"type": "summary",
|
||||
"data": {
|
||||
"summary": "Brief summary of the conversation",
|
||||
"topics": ["topic1", "topic2", "topic3"],
|
||||
"action_items": ["action 1", "action 2"],
|
||||
"key_points": ["point 1", "point 2"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Error Messages
|
||||
Sent when errors occur during processing.
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"data": {
|
||||
"code": "AUDIO_FORMAT_ERROR",
|
||||
"message": "Unsupported audio format",
|
||||
"details": {
|
||||
"format": "unknown",
|
||||
"sample_rate": 0
|
||||
},
|
||||
"recoverable": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Heartbeat/Keepalive
|
||||
Sent periodically to maintain the connection.
|
||||
```json
|
||||
{
|
||||
"type": "ping",
|
||||
"data": {
|
||||
"timestamp": 1705745630.000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Client-to-Server Messages
|
||||
|
||||
Clients can send control messages to the server:
|
||||
|
||||
#### Start/Resume Processing
|
||||
```json
|
||||
{
|
||||
"action": "start",
|
||||
"params": {}
|
||||
}
|
||||
```
|
||||
|
||||
#### Pause Processing
|
||||
```json
|
||||
{
|
||||
"action": "pause",
|
||||
"params": {}
|
||||
}
|
||||
```
|
||||
|
||||
#### Request Status
|
||||
```json
|
||||
{
|
||||
"action": "get_status",
|
||||
"params": {}
|
||||
}
|
||||
```
|
||||
|
||||
## OpenAPI Specification
|
||||
|
||||
The complete OpenAPI 3.0 specification is available at:
|
||||
|
||||
```
|
||||
http://localhost:8000/v1/openapi.json
|
||||
```
|
||||
|
||||
You can import this specification into tools like:
|
||||
- Postman
|
||||
- Insomnia
|
||||
- Swagger UI
|
||||
- OpenAPI Generator (for client SDK generation)
|
||||
|
||||
## SDK Support
|
||||
|
||||
While Reflector doesn't provide official SDKs, you can generate client libraries using the OpenAPI specification with tools like:
|
||||
|
||||
- **Python**: `openapi-python-client`
|
||||
- **TypeScript**: `openapi-typescript-codegen`
|
||||
- **Go**: `oapi-codegen`
|
||||
- **Java**: `openapi-generator`
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Python Example
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Upload and transcribe audio
|
||||
with open('meeting.mp3', 'rb') as f:
|
||||
response = requests.post(
|
||||
'http://localhost:8000/v1/transcripts/',
|
||||
files={'file': f}
|
||||
)
|
||||
transcript_id = response.json()['id']
|
||||
|
||||
# Check transcription status
|
||||
status = requests.get(
|
||||
f'http://localhost:8000/v1/transcripts/{transcript_id}'
|
||||
).json()
|
||||
|
||||
print(f"Transcription status: {status['status']}")
|
||||
```
|
||||
|
||||
### JavaScript WebSocket Example
|
||||
```javascript
|
||||
// Connect to WebSocket for real-time transcription updates
|
||||
const ws = new WebSocket(`ws://localhost:8000/v1/transcripts_websocket/${transcriptId}`);
|
||||
|
||||
ws.onopen = () => {
|
||||
console.log('Connected to transcription WebSocket');
|
||||
};
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
const message = JSON.parse(event.data);
|
||||
|
||||
switch(message.type) {
|
||||
case 'transcription':
|
||||
console.log(`[${message.data.speaker}]: ${message.data.text}`);
|
||||
break;
|
||||
case 'diarization':
|
||||
console.log(`Speaker change: ${message.data.speaker}`);
|
||||
break;
|
||||
case 'status':
|
||||
console.log(`Status: ${message.data.status}`);
|
||||
break;
|
||||
case 'error':
|
||||
console.error(`Error: ${message.data.message}`);
|
||||
break;
|
||||
}
|
||||
};
|
||||
|
||||
ws.onerror = (error) => {
|
||||
console.error('WebSocket error:', error);
|
||||
};
|
||||
|
||||
ws.onclose = () => {
|
||||
console.log('WebSocket connection closed');
|
||||
};
|
||||
```
|
||||
|
||||
## Need Help?
|
||||
|
||||
- Review [example implementations](https://github.com/monadical-sas/reflector/tree/main/examples)
|
||||
- Open an issue on [GitHub](https://github.com/monadical-sas/reflector/issues)
|
||||
7
docs/docs/reference/api/overview.md
Normal file
7
docs/docs/reference/api/overview.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: overview
|
||||
---
|
||||
|
||||
# overview
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/architecture/backend.md
Normal file
7
docs/docs/reference/architecture/backend.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: backend
|
||||
---
|
||||
|
||||
# backend
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/architecture/database.md
Normal file
7
docs/docs/reference/architecture/database.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: database
|
||||
---
|
||||
|
||||
# database
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/architecture/frontend.md
Normal file
7
docs/docs/reference/architecture/frontend.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: frontend
|
||||
---
|
||||
|
||||
# frontend
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/architecture/overview.md
Normal file
7
docs/docs/reference/architecture/overview.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: overview
|
||||
---
|
||||
|
||||
# overview
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/architecture/workers.md
Normal file
7
docs/docs/reference/architecture/workers.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: workers
|
||||
---
|
||||
|
||||
# workers
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/configuration.md
Normal file
7
docs/docs/reference/configuration.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: configuration
|
||||
---
|
||||
|
||||
# configuration
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/processors/analysis.md
Normal file
7
docs/docs/reference/processors/analysis.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: analysis
|
||||
---
|
||||
|
||||
# analysis
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/processors/diarization.md
Normal file
7
docs/docs/reference/processors/diarization.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: diarization
|
||||
---
|
||||
|
||||
# diarization
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/processors/transcription.md
Normal file
7
docs/docs/reference/processors/transcription.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: transcription
|
||||
---
|
||||
|
||||
# transcription
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
7
docs/docs/reference/processors/translation.md
Normal file
7
docs/docs/reference/processors/translation.md
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
title: translation
|
||||
---
|
||||
|
||||
# translation
|
||||
|
||||
Documentation coming soon. See [TODO.md](/docs/TODO) for required information.
|
||||
139
docs/docs/roadmap.md
Normal file
139
docs/docs/roadmap.md
Normal file
@@ -0,0 +1,139 @@
|
||||
---
|
||||
sidebar_position: 100
|
||||
title: Roadmap
|
||||
---
|
||||
|
||||
# Product Roadmap
|
||||
|
||||
Our development roadmap for Reflector, focusing on expanding capabilities while maintaining privacy and performance.
|
||||
|
||||
## Planned Features
|
||||
|
||||
### 🌍 Multi-Language Support Enhancement
|
||||
|
||||
**Current State:**
|
||||
- Whisper supports 99+ languages for transcription
|
||||
- Parakeet supports English only with high accuracy
|
||||
- Translation available to 100+ languages
|
||||
|
||||
**Planned Improvements:**
|
||||
- Default language selection per room/user
|
||||
- Automatic language detection improvements
|
||||
- Multi-language diarization support
|
||||
- RTL (Right-to-Left) language UI support
|
||||
- Language-specific post-processing rules
|
||||
|
||||
### 🏠 Self-Hosted Room Providers
|
||||
|
||||
**Jitsi Integration**
|
||||
|
||||
Moving beyond Whereby to support self-hosted video conferencing:
|
||||
|
||||
- No API keys required
|
||||
- Complete control over video infrastructure
|
||||
- Custom branding and configuration
|
||||
- Lower operational costs
|
||||
- Enhanced privacy with self-hosted video
|
||||
|
||||
**Implementation Plan:**
|
||||
- WebRTC bridge for Jitsi Meet
|
||||
- Room management API integration
|
||||
- Recording synchronization
|
||||
- Participant tracking
|
||||
|
||||
### 📅 Calendar Integration
|
||||
|
||||
**Planned Capabilities:**
|
||||
- Google Calendar synchronization
|
||||
- Microsoft Outlook integration
|
||||
- Automatic meeting room creation
|
||||
- Pre-meeting document preparation
|
||||
- Post-meeting transcript delivery
|
||||
- Recurring meeting support
|
||||
|
||||
**Features:**
|
||||
- Auto-join scheduled meetings
|
||||
- Calendar-based access control
|
||||
- Meeting agenda import
|
||||
- Action item export to calendar
|
||||
|
||||
### 🖥️ Self-Hosted GPU Service
|
||||
|
||||
**For organizations with dedicated GPU hardware (H100, A100, RTX 4090):**
|
||||
|
||||
**Docker GPU Worker Image:**
|
||||
- Self-contained processing service
|
||||
- CUDA 11/12 support
|
||||
- Pre-loaded models:
|
||||
- Whisper (all sizes)
|
||||
- Pyannote diarization
|
||||
- Seamless-M4T translation
|
||||
- Automatic model management
|
||||
|
||||
**Deployment Options:**
|
||||
- Kubernetes GPU operators
|
||||
- Docker Compose with nvidia-docker
|
||||
- Bare metal installation
|
||||
- Hybrid cloud/on-premise
|
||||
|
||||
**Benefits:**
|
||||
- No Modal.com dependency
|
||||
- Complete data isolation
|
||||
- Predictable costs
|
||||
- Maximum performance
|
||||
- Custom model support
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Enhanced Analytics
|
||||
- Meeting insights dashboard
|
||||
- Speaker participation metrics
|
||||
- Topic trends over time
|
||||
- Team collaboration patterns
|
||||
|
||||
### Advanced AI Features
|
||||
- Real-time sentiment analysis
|
||||
- Emotion detection
|
||||
- Meeting quality scores
|
||||
- Automated coaching suggestions
|
||||
|
||||
### Integration Ecosystem
|
||||
- Slack/Teams notifications
|
||||
- CRM integration (Salesforce, HubSpot)
|
||||
- Project management tools (Jira, Asana)
|
||||
- Knowledge bases (Notion, Confluence)
|
||||
|
||||
### Performance Improvements
|
||||
- WebAssembly for client-side processing
|
||||
- Edge computing support
|
||||
- 5G network optimization
|
||||
- Blockchain for transcript verification
|
||||
|
||||
## Contributing
|
||||
|
||||
We welcome community contributions! Areas where you can help:
|
||||
|
||||
1. **Language Support**: Add support for your language
|
||||
2. **Integrations**: Connect with your favorite tools
|
||||
3. **Models**: Fine-tune models for specific domains
|
||||
4. **Documentation**: Improve guides and examples
|
||||
|
||||
See our [Contributing Guide](https://github.com/monadical-sas/reflector/blob/main/CONTRIBUTING.md) for details.
|
||||
|
||||
## Timeline
|
||||
|
||||
We don't provide specific dates as development depends on community contributions and priorities. Features are generally released when they're ready and properly tested.
|
||||
|
||||
## Feature Requests
|
||||
|
||||
Have an idea for Reflector? We'd love to hear it!
|
||||
|
||||
- [Open a GitHub Issue](https://github.com/monadical-sas/reflector/issues/new)
|
||||
- [Join our Discord](#)
|
||||
- [Email us](mailto:reflector@monadical.com)
|
||||
|
||||
## Stay Updated
|
||||
|
||||
- Watch our [GitHub repository](https://github.com/monadical-sas/reflector)
|
||||
- Follow our [blog](#)
|
||||
- Subscribe to our [newsletter](#)
|
||||
Reference in New Issue
Block a user