mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-02-04 18:06:48 +00:00
* feat: WIP doc (vibe started and iterated) * install from scratch docs * caddyfile.example * gitignore * authentik script * authentik script * authentik script * llm doc * authentik ongoing * more daily setup logs * doc website * gpu self hosted setup guide (no-mistakes) * doc review round * doc review round * doc review round * update doc site sidebars * feat(docs): add mermaid diagram support * docs polishing * live pipeline doc * move pipeline dev docs to dev docs location * doc pr review iteration * dockerfile healthcheck * docs/pr-comments * remove jwt comment * llm suggestion * pr comments * pr comments * document auto migrations * cleanup docs --------- Co-authored-by: Mathieu Virbel <mat@meltingrocks.com> Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
201 lines
6.0 KiB
Markdown
201 lines
6.0 KiB
Markdown
---
|
|
sidebar_position: 1
|
|
title: Architecture Overview
|
|
---
|
|
|
|
# Architecture Overview
|
|
|
|
Reflector is built as a modern, scalable, microservices-based application designed to handle audio processing workloads efficiently while maintaining data privacy and control.
|
|
|
|
## System Components
|
|
|
|
### Frontend Application
|
|
|
|
The user interface is built with **Next.js 15** using the App Router pattern, providing:
|
|
|
|
- Server-side rendering for optimal performance
|
|
- Real-time WebSocket connections for live transcription
|
|
- WebRTC support for audio streaming and live meetings (via Daily.co or Whereby)
|
|
- Responsive design with Chakra UI components
|
|
|
|
### Backend API Server
|
|
|
|
The core API is powered by **FastAPI**, a modern Python framework that provides:
|
|
|
|
- High-performance async request handling
|
|
- Automatic OpenAPI documentation generation
|
|
- Type safety with Pydantic models
|
|
- WebSocket support for real-time updates
|
|
|
|
### Processing Pipeline
|
|
|
|
Audio processing is handled through a modular pipeline architecture:
|
|
|
|
```
|
|
Audio Input → Chunking → Transcription → Diarization → Post-Processing → Storage
|
|
```
|
|
|
|
Each step can run independently and in parallel, allowing for:
|
|
- Scalable processing of large files
|
|
- Real-time streaming capabilities
|
|
- Fault tolerance and retry mechanisms
|
|
|
|
### Worker Architecture
|
|
|
|
Background tasks are managed by **Celery** workers with **Redis** as the message broker:
|
|
|
|
- Distributed task processing
|
|
- Priority queues for time-sensitive operations
|
|
- Automatic retry on failure
|
|
- Progress tracking and notifications
|
|
|
|
### GPU Acceleration
|
|
|
|
ML models run on GPU-accelerated infrastructure:
|
|
|
|
- **Modal.com** for serverless GPU processing
|
|
- **Self-hosted GPU** with Docker deployment
|
|
- Automatic scaling based on demand
|
|
- Cost-effective pay-per-use model
|
|
|
|
## Data Flow
|
|
|
|
### Daily.co Meeting Recording Flow
|
|
|
|
1. **Recording**: Daily.co captures separate audio tracks per participant
|
|
2. **Webhook**: Daily.co notifies Reflector when recording is ready
|
|
3. **Track Download**: Individual participant tracks fetched from S3
|
|
4. **Padding**: Tracks padded with silence based on join time for synchronization
|
|
5. **Transcription**: Each track transcribed independently (speaker = track index)
|
|
6. **Merge**: Transcriptions sorted by timestamp and combined
|
|
7. **Mixdown**: Tracks mixed to single MP3 for playback
|
|
8. **Post-Processing**: Topics, title, and summaries generated via LLM
|
|
9. **Delivery**: Results stored and user notified via WebSocket
|
|
|
|
### File Upload Flow
|
|
|
|
1. **Upload**: User uploads audio file through web interface
|
|
2. **Storage**: File stored temporarily
|
|
3. **Transcription**: Full file transcribed via Whisper
|
|
4. **Diarization**: ML-based speaker identification (Pyannote)
|
|
5. **Post-Processing**: Topics, title, summaries
|
|
6. **Delivery**: Results stored and user notified
|
|
|
|
### Live Streaming Flow
|
|
|
|
1. **WebRTC Connection**: Browser establishes peer connection via Daily.co or Whereby
|
|
2. **Audio Capture**: Microphone audio streamed to server
|
|
3. **Buffering**: Audio buffered for processing
|
|
4. **Real-time Processing**: Segments transcribed as they arrive
|
|
5. **WebSocket Updates**: Results streamed back to client
|
|
6. **Continuous Assembly**: Full transcript built progressively
|
|
|
|
## Deployment Architecture
|
|
|
|
### Container-Based Deployment
|
|
|
|
All components are containerized for consistent deployment:
|
|
|
|
```yaml
|
|
services:
|
|
web: # Next.js application
|
|
server: # FastAPI server
|
|
worker: # Celery workers
|
|
redis: # Message broker
|
|
postgres: # Database
|
|
caddy: # Reverse proxy
|
|
```
|
|
|
|
### Networking
|
|
|
|
- **Host Network Mode**: Required for WebRTC/ICE compatibility
|
|
- **Caddy Reverse Proxy**: Handles SSL termination and routing
|
|
- **WebSocket Upgrade**: Supports real-time connections
|
|
|
|
## Scalability Considerations
|
|
|
|
### Horizontal Scaling
|
|
|
|
- **Stateless Backend**: Multiple API server instances
|
|
- **Worker Pools**: Add workers based on queue depth
|
|
- **Database Pooling**: Connection management for concurrent access
|
|
|
|
### Vertical Scaling
|
|
|
|
- **GPU Workers**: Scale up for faster model inference
|
|
- **Memory Optimization**: Efficient audio buffering
|
|
|
|
## Security Architecture
|
|
|
|
### Authentication & Authorization
|
|
|
|
- **JWT Tokens**: Stateless authentication
|
|
- **Authentik Integration**: Enterprise SSO support
|
|
- **Role-Based Access**: Granular permissions
|
|
|
|
### Data Protection
|
|
|
|
- **Encryption in Transit**: TLS for all connections
|
|
- **Temporary Storage**: Automatic cleanup of processed files
|
|
|
|
### Privacy by Design
|
|
|
|
- **Local Processing**: Option to process entirely on-premises
|
|
- **No Training on User Data**: Models are pre-trained
|
|
- **Data Isolation**: Multi-tenant data separation
|
|
|
|
## Integration Points
|
|
|
|
### External Services
|
|
|
|
- **Modal.com**: GPU processing
|
|
- **AWS S3**: Long-term storage
|
|
- **Whereby**: Video conferencing rooms
|
|
- **Zulip**: Chat integration (optional)
|
|
|
|
### APIs and Webhooks
|
|
|
|
- **RESTful API**: Standard CRUD operations
|
|
- **WebSocket API**: Real-time updates
|
|
- **Webhook Notifications**: Processing completion events
|
|
- **OpenAPI Specification**: Machine-readable API definition
|
|
|
|
## Performance Optimization
|
|
|
|
### Caching Strategy
|
|
|
|
- **Redis Cache**: Frequently accessed data
|
|
- **CDN**: Static asset delivery
|
|
- **Browser Cache**: Client-side optimization
|
|
|
|
### Database Optimization
|
|
|
|
- **Indexed Queries**: Fast search and retrieval
|
|
- **Connection Pooling**: Efficient resource usage
|
|
- **Query Optimization**: N+1 query prevention
|
|
|
|
### Processing Optimization
|
|
|
|
- **Batch Processing**: Efficient GPU utilization
|
|
- **Parallel Execution**: Multi-core CPU usage
|
|
- **Stream Processing**: Reduced memory footprint
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Metrics Collection
|
|
|
|
- **Application Metrics**: Request rates, response times
|
|
- **System Metrics**: CPU, memory, disk usage
|
|
- **Business Metrics**: Transcription accuracy, processing times
|
|
|
|
### Logging
|
|
|
|
- **Structured Logging**: JSON format for analysis
|
|
- **Log Aggregation**: Centralized log management
|
|
- **Error Tracking**: Sentry integration
|
|
|
|
### Health Checks
|
|
|
|
- **Liveness Probes**: Component availability
|
|
- **Readiness Probes**: Service readiness
|
|
- **Dependency Checks**: External service status |