Compare commits

..

27 Commits

Author SHA1 Message Date
e91979abbc feat: use jitsi file system 2025-09-17 15:16:03 -06:00
95e8011975 Merge main into jisti-integration branch
- Resolved conflicts in server/reflector/views/rooms.py to keep platform-agnostic approach
- Resolved conflicts in www/app/[roomName]/page.tsx to keep VideoPlatformEmbed approach
- Accepted main's version of generated API files (schemas.gen.ts, services.gen.ts, types.gen.ts)
- Removed config-template.ts as per main branch changes
2025-09-15 12:53:49 -06:00
293f7d4f1f feat: implement frontend video platform configuration and abstraction
- Add NEXT_PUBLIC_VIDEO_PLATFORM environment variable support
- Create video platform abstraction layer with factory pattern
- Implement Whereby and Jitsi platform providers
- Update room meeting page to use platform-agnostic component
- Add platform display in room management (cards and table views)
- Support single platform per deployment configuration
- Maintain backward compatibility with existing Whereby integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-04 12:21:51 -06:00
41224a424c docs: move platform-jitsi.md to docs/ directory 2025-09-02 18:28:50 -06:00
dd0089906f fix: replace datetime.utcnow() with datetime.now(tz=timezone.utc) in Jitsi health check 2025-09-02 18:25:55 -06:00
fa559b1970 feat: update and expand video platform tests
- Update existing tests for StrEnum instead of string literals
- Add comprehensive WherebyClient tests with HTTP mocking
- Add webhook event storage tests for participant and recording events
- Add typing overload tests for create_platform_client factory
- Update webhook test paths to new video_platforms router locations
- Fix mock ordering and parameter issues in async tests
- Test all platform client functionality including signature verification
- Verify webhook event storage with proper timestamp handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:16:41 -06:00
c26ce65083 feat: update Jitsi documentation with webhook events storage system
- Add comprehensive webhook event storage documentation
- Document event structure and JSON storage in meetings table
- Add practical webhook testing examples with proper signature generation
- Include detailed troubleshooting for webhook signature verification issues
- Add webhook event payload examples for all supported event types
- Document event storage verification and database querying methods
- Enhance existing webhook configuration with real-world examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:09:44 -06:00
52eff2acc0 feat: clean up legacy code and remove excessive documentation
- Remove excessive inline comments from meeting creation flow
- Remove verbose docstrings from simple property methods and basic functions
- Clean up obvious comments like 'Generate JWT tokens', 'Build room URLs'
- Remove unnecessary explanatory comments in platform clients
- Keep only essential documentation for complex logic
- Simplify race condition handling comments
- Remove excessive method documentation for simple operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:08:12 -06:00
7875ec3432 feat: move platform routers to video_platforms folders
- Move Jitsi router from views/jitsi.py to video_platforms/jitsi/router.py
- Move Whereby router from views/whereby.py to video_platforms/whereby/router.py
- Update __init__.py files to export routers from platform packages
- Update app.py imports to use video_platforms instead of views
- Remove old view files after successful migration
- Maintain exact same API endpoint paths (/v1/jitsi, /v1/whereby)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:05:04 -06:00
398be06fad feat: add typing overloads and clean up platform client factory
- Add typing overloads to get_platform_client for JitsiClient and WherebyClient return types
- Add overloads to create_platform_client in factory for better IDE support
- Remove PyJWT fallback imports from views/rooms.py
- Remove platform defaults from CreateRoom and UpdateRoom models
- Clean up legacy whereby fallback code in meeting creation
- Use direct platform client access instead of conditional fallbacks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:02:43 -06:00
da700069d9 Add webhook events storage to meetings model
- Add events column as JSON type to meetings table with default empty array
- Add events: List[Dict[str, Any]] field to Meeting model
- Create migration 2890b5104577 for events column and apply successfully
- Add MeetingController helper methods for event storage:
  - add_event() for generic event storage with timestamps
  - participant_joined(), participant_left() for participant tracking
  - recording_started(), recording_stopped() for recording events
  - get_events() for event retrieval
- Update Jitsi webhook endpoints to store events:
  - Store participant join/leave events with data and timestamps
  - Store recording start/stop events from Prosody webhooks
  - Store recording completion events from Jibri finalize script
- Events stored with type, timestamp, and data for webhook history tracking
- Fix linting and formatting issues

Addresses PR feedback point 12: save webhook events in meetings events field
2025-09-02 17:53:35 -06:00
51229a1790 Fix Jitsi client issues and create typed meeting data
- Remove 'transcription': True from JWT features in _generate_jwt
- Replace int(time.time()) with generate_uuid4() for room naming to avoid conflicts
- Replace datetime.utcnow() with datetime.now(tz=timezone.utc) for proper timezone handling
- Create JitsiMeetingData(MeetingData) class with typed extra_data properties
- Update PLATFORM_NAME = VideoPlatform.JITSI to use enum
- Update create_meeting to return JitsiMeetingData instance with proper typing
- Fix get_room_sessions mock to use timezone-aware datetime
- Export JitsiMeetingData from jitsi module

Addresses PR feedback points 4, 5, 6, 10: remove transcription features, use UUID,
proper datetime handling, and typed meeting data
2025-09-02 17:44:04 -06:00
2d2c23f7cc Create video_platforms/whereby structure and WherebyClient
- Create video_platforms/whereby/ directory with __init__.py, client.py, tasks.py
- Implement WherebyClient inheriting from VideoPlatformClient interface
- Move all functions from whereby.py into WherebyClient methods
- Use VideoPlatform.WHEREBY enum for PLATFORM_NAME
- Register WherebyClient in platform registry
- Update factory.py to include S3 bucket config for whereby
- Update worker process to use platform abstraction for get_room_sessions
- Preserve exact API behavior for meeting activity detection
- Maintain AWS S3 configuration handling in WherebyClient
- Fix linting and formatting issues

Addresses PR feedback point 7: implement video_platforms/whereby structure
Note: whereby.py kept for legacy fallback until task 7 cleanup
2025-09-02 17:40:32 -06:00
0acb9cac79 Replace Literal with VideoPlatform StrEnum for platform field
- Create VideoPlatform StrEnum with WHEREBY and JITSI values
- Update rooms.py and meetings.py to use VideoPlatform enum
- Update views/rooms.py and video_platforms/factory.py to use enum values
- Generate new migration with proper server_default='whereby'
- Apply migration successfully with backward compatibility
- Fix linting and formatting issues

Addresses PR feedback point 1: use StrEnum instead of Literal[]
2025-09-02 17:36:14 -06:00
d861d92cc2 docs: add comprehensive Jitsi Meet integration user guide
- Complete end-user configuration guide for self-hosted Jitsi Meet
- Covers installation, JWT authentication, and Prosody configuration
- Webhook event handling with mod_event_sync setup
- Jibri recording service configuration and finalize script
- Room creation, JWT token management, and security best practices
- Comprehensive troubleshooting with debug commands and solutions
- Performance optimization and scaling considerations
- Migration guidance from Whereby platform

🤖 Generated with Claude Code
2025-09-02 17:07:09 -06:00
24ff83a2ec docs: add comprehensive Whereby integration user guide
- Complete end-user configuration guide for Whereby video platform
- Covers account setup, API key generation, and webhook configuration
- AWS S3 storage setup with IAM permissions and security best practices
- Room creation, recording options, and meeting feature configuration
- Troubleshooting guide with common issues and debug commands
- Security considerations and performance optimization tips
- Migration guidance from other platforms

🤖 Generated with Claude Code
2025-09-02 17:05:40 -06:00
249234238c feat: add comprehensive video platform test suite
- Created complete test coverage for video platform abstraction
- Tests for base classes, JitsiClient implementation, and platform registry
- JWT generation tests with proper mocking and error scenarios
- Webhook signature verification tests (valid/invalid/missing secret)
- Platform factory tests for Jitsi and Whereby configuration
- Registry tests for platform registration and client creation
- Webhook endpoint tests with signature verification and error cases
- Integration tests for rooms endpoint with platform abstraction
- 24 comprehensive test cases covering all video platform functionality
- All tests passing with proper mocking and isolation

🤖 Generated with Claude Code
2025-09-02 16:54:58 -06:00
42a603d5c3 feat: add PyJWT dependency and finalize Jitsi integration
- Added PyJWT>=2.8.0 to pyproject.toml dependencies
- Installed dependency via uv sync successfully
- Verified JWT generation functionality works correctly
- Confirmed platform factory creates JitsiClient instances
- Validated database migrations applied (platform fields available)
- Tested webhook endpoints are registered and functional
- Verified FastAPI app starts without errors with full integration
- All integration tests pass - Jitsi platform fully functional

🤖 Generated with Claude Code
2025-09-02 16:28:44 -06:00
6d2092f950 feat: create comprehensive Jitsi integration documentation
- Added complete end-user configuration guide at server/platform-jitsi.md
- Covers prerequisites, environment setup, and Jitsi Meet configuration
- Includes JWT authentication, Jibri recording, and Prosody event-sync setup
- Provides troubleshooting guide with common issues and solutions
- Documents security best practices and performance optimization
- Includes testing procedures and migration guidance from Whereby
- Ready for production deployment with step-by-step instructions
- Uses environment variable placeholders for security

🤖 Generated with Claude Code
2025-09-02 16:24:47 -06:00
f2bb6aaecb feat: update rooms.py to use video platform abstraction
- Added platform field to Room, CreateRoom, and UpdateRoom models
- Updated rooms_create function to pass platform parameter
- Rewrote rooms_create_meeting to use platform factory pattern
- Added graceful fallback to legacy whereby implementation
- Maintained API compatibility and error handling patterns
- Prepared for multi-platform support (Whereby/Jitsi)

🤖 Generated with Claude Code
2025-09-02 16:21:58 -06:00
2b136ac7b0 feat: create Jitsi webhook endpoints for event handling
- Added comprehensive Jitsi webhook endpoint in views/jitsi.py
- Handles Prosody event-sync events (muc-occupant-joined/left)
- Implements participant counting following whereby.py pattern
- Added Jibri recording completion webhook endpoint
- Includes signature verification with fallback when platform client unavailable
- Registered router in app.py for /v1/jitsi endpoints
- Added health check endpoint for webhook configuration

🤖 Generated with Claude Code
2025-09-02 16:19:54 -06:00
3f4fc26483 feat: register Jitsi platform in video platforms factory and registry
- Added JitsiClient registration to platform registry
- Enables dynamic platform selection through factory pattern
- Factory configuration already supports Jitsi settings
- Platform abstraction layer now supports beide Whereby and Jitsi

🤖 Generated with Claude Code
2025-09-02 16:17:32 -06:00
8e5ef5bca6 feat: implement JitsiClient with JWT authentication
Complete implementation of JitsiClient following VideoPlatformClient interface
with JWT-based room access control and webhook signature verification.

- Add JWT token generation with proper payload structure
- Implement unique room name generation with timestamp
- Create separate user/host JWT tokens with moderator permissions
- Build secure room URLs with embedded JWT parameters
- Add HMAC-SHA256 webhook signature verification for Prosody events
- Implement all abstract methods with Jitsi-specific behavior
- Include comprehensive typing and error handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:15:49 -06:00
d49fdcb38d feat: create video platforms architecture with Jitsi directory structure
Create complete video platforms abstraction layer following daily.co branch
pattern with Jitsi-specific directory structure.

- Add video_platforms base module with abstract classes
- Create VideoPlatformClient, MeetingData, VideoPlatformConfig interfaces
- Add platform registry system for client management
- Create factory pattern for platform client creation
- Add Jitsi directory structure with __init__.py, tasks.py, client.py
- Configure Jitsi platform in factory with JWT-based authentication

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:14:42 -06:00
d42380abf1 feat: add Jitsi configuration settings
Add comprehensive Jitsi Meet configuration settings to settings.py
following the same pattern as WHEREBY settings.

- Add JITSI_DOMAIN with meet.jit.si default
- Add JITSI_JWT_SECRET for JWT token signing
- Add JITSI_WEBHOOK_SECRET for webhook validation
- Add JITSI_APP_ID, JITSI_JWT_ISSUER, JITSI_JWT_AUDIENCE for JWT configuration
- Follow consistent naming and typing patterns

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:11:00 -06:00
cf64e1a3d9 feat: add database migration for platform field
Generate Alembic migration to add platform column to rooms and meetings
tables enabling multi-platform video conferencing support.

- Add platform column to meeting table with whereby default
- Add platform column to room table with whereby default
- Migration tested successfully with alembic upgrade head

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:10:11 -06:00
ea53ca7000 feat: add platform field to Room and Meeting models
Add platform column to rooms and meetings database tables with Literal typing
to support multiple video conferencing platforms (whereby, jitsi).

- Add platform column to rooms/meetings SQLAlchemy tables with whereby default
- Update Room/Meeting Pydantic models with platform field and Literal typing
- Modify RoomController.add() to accept platform parameter
- Update MeetingController.create() to copy platform from room

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:08:38 -06:00
168 changed files with 9937 additions and 13970 deletions

View File

@@ -1,57 +1,5 @@
# Changelog
## [0.13.1](https://github.com/Monadical-SAS/reflector/compare/v0.13.0...v0.13.1) (2025-09-22)
### Bug Fixes
* TypeError on not all arguments converted during string formatting in logger ([#667](https://github.com/Monadical-SAS/reflector/issues/667)) ([565a629](https://github.com/Monadical-SAS/reflector/commit/565a62900f5a02fc946b68f9269a42190ed70ab6))
## [0.13.0](https://github.com/Monadical-SAS/reflector/compare/v0.12.1...v0.13.0) (2025-09-19)
### Features
* room form edit with enter ([#662](https://github.com/Monadical-SAS/reflector/issues/662)) ([47716f6](https://github.com/Monadical-SAS/reflector/commit/47716f6e5ddee952609d2fa0ffabdfa865286796))
### Bug Fixes
* invalid cleanup call ([#660](https://github.com/Monadical-SAS/reflector/issues/660)) ([0abcebf](https://github.com/Monadical-SAS/reflector/commit/0abcebfc9491f87f605f21faa3e53996fafedd9a))
## [0.12.1](https://github.com/Monadical-SAS/reflector/compare/v0.12.0...v0.12.1) (2025-09-17)
### Bug Fixes
* production blocked because having existing meeting with room_id null ([#657](https://github.com/Monadical-SAS/reflector/issues/657)) ([870e860](https://github.com/Monadical-SAS/reflector/commit/870e8605171a27155a9cbee215eeccb9a8d6c0a2))
## [0.12.0](https://github.com/Monadical-SAS/reflector/compare/v0.11.0...v0.12.0) (2025-09-17)
### Features
* calendar integration ([#608](https://github.com/Monadical-SAS/reflector/issues/608)) ([6f680b5](https://github.com/Monadical-SAS/reflector/commit/6f680b57954c688882c4ed49f40f161c52a00a24))
* self-hosted gpu api ([#636](https://github.com/Monadical-SAS/reflector/issues/636)) ([ab859d6](https://github.com/Monadical-SAS/reflector/commit/ab859d65a6bded904133a163a081a651b3938d42))
### Bug Fixes
* ignore player hotkeys for text inputs ([#646](https://github.com/Monadical-SAS/reflector/issues/646)) ([fa049e8](https://github.com/Monadical-SAS/reflector/commit/fa049e8d068190ce7ea015fd9fcccb8543f54a3f))
## [0.11.0](https://github.com/Monadical-SAS/reflector/compare/v0.10.0...v0.11.0) (2025-09-16)
### Features
* remove profanity filter that was there for conference ([#652](https://github.com/Monadical-SAS/reflector/issues/652)) ([b42f7cf](https://github.com/Monadical-SAS/reflector/commit/b42f7cfc606783afcee792590efcc78b507468ab))
### Bug Fixes
* zulip and consent handler on the file pipeline ([#645](https://github.com/Monadical-SAS/reflector/issues/645)) ([5f143fe](https://github.com/Monadical-SAS/reflector/commit/5f143fe3640875dcb56c26694254a93189281d17))
* zulip stream and topic selection in share dialog ([#644](https://github.com/Monadical-SAS/reflector/issues/644)) ([c546e69](https://github.com/Monadical-SAS/reflector/commit/c546e69739e68bb74fbc877eb62609928e5b8de6))
## [0.10.0](https://github.com/Monadical-SAS/reflector/compare/v0.9.0...v0.10.0) (2025-09-11)

369
docs/jitsi.md Normal file
View File

@@ -0,0 +1,369 @@
# Jitsi Integration for Reflector
This document contains research and planning notes for integrating Jitsi Meet as a replacement for Whereby in Reflector.
## Overview
Jitsi Meet is an open-source video conferencing solution that can replace Whereby in Reflector, providing:
- Cost reduction (no per-minute charges)
- Direct recording access via Jibri
- Real-time event webhooks
- Full customization and control
## Current Whereby Integration Analysis
### Architecture
1. **Room Creation**: User creates a "room" template in Reflector DB with settings
2. **Meeting Creation**: `/rooms/{room_name}/meeting` endpoint calls Whereby API to create meeting
3. **Recording**: Whereby handles recording automatically to S3 bucket
4. **Webhooks**: Whereby sends events for participant tracking
### Database Structure
```python
# Room = Template/Configuration
class Room:
id, name, user_id
recording_type, recording_trigger # cloud, automatic-2nd-participant
webhook_url, webhook_secret
# Meeting = Actual Whereby Meeting Instance
class Meeting:
id # Whereby meetingId
room_name # Generated by Whereby
room_url, host_room_url # Whereby URLs
num_clients # Updated via webhooks
```
## Jitsi Components
### Core Architecture
- **Jitsi Meet**: Web frontend (Next.js + React)
- **Prosody**: XMPP server for messaging/rooms
- **Jicofo**: Conference focus (orchestration)
- **JVB**: Videobridge (media routing)
- **Jibri**: Recording service
- **Jigasi**: SIP gateway (optional, for phone dial-in)
### Exposure Requirements
- **Web service**: 443/80 (frontend)
- **JVB**: 10000/UDP (media streams) - **MUST EXPOSE**
- **Prosody**: 5280 (BOSH/WebSocket) - can proxy via web
- **Jicofo, Jibri, Jigasi**: Internal only
## Recording with Jibri
### How Jibri Works
- Each Jibri instance handles **one recording at a time**
- Records mixed audio/video to MP4 format
- Uses Chrome headless + ffmpeg for capture
- Supports finalize scripts for post-processing
### Jibri Pool for Scaling
- Multiple Jibri instances join "jibribrewery" MUC
- Jicofo distributes recording requests to available instances
- Automatic load balancing and failover
```yaml
# Multiple Jibri instances
jibri1:
environment:
- JIBRI_INSTANCE_ID=jibri1
- JIBRI_BREWERY_MUC=jibribrewery
jibri2:
environment:
- JIBRI_INSTANCE_ID=jibri2
- JIBRI_BREWERY_MUC=jibribrewery
```
### Recording Automation Options
1. **Environment Variables**: `ENABLE_RECORDING=1`, `AUTO_RECORDING=1`
2. **URL Parameters**: `?config.autoRecord=true`
3. **JWT Token**: Include recording permissions in JWT
4. **API Control**: `api.executeCommand('startRecording')`
### Post-Processing Integration
```bash
#!/bin/bash
# finalize.sh - runs after recording completion
RECORDING_FILE=$1
MEETING_METADATA=$2
ROOM_NAME=$3
# Copy to Reflector-accessible location
cp "$RECORDING_FILE" /shared/reflector-uploads/
# Trigger Reflector processing
curl -X POST "http://reflector-api:8000/v1/transcripts/process" \
-H "Content-Type: application/json" \
-d "{
\"file_path\": \"/shared/reflector-uploads/$(basename $RECORDING_FILE)\",
\"room_name\": \"$ROOM_NAME\",
\"source\": \"jitsi\"
}"
```
## React Integration
### Official React SDK
```bash
npm i @jitsi/react-sdk
```
```jsx
import { JitsiMeeting } from '@jitsi/react-sdk'
<JitsiMeeting
room="meeting-room"
serverURL="https://your-jitsi.domain"
jwt="your-jwt-token"
config={{
startWithAudioMuted: true,
fileRecordingsEnabled: true,
autoRecord: true
}}
onParticipantJoined={(participant) => {
// Track participant events
}}
onRecordingStatusChanged={(status) => {
// Handle recording events
}}
/>
```
## Authentication & Room Control
### JWT-Based Access Control
```python
def generate_jitsi_jwt(payload):
return jwt.encode({
"aud": "jitsi",
"iss": "reflector",
"sub": "reflector-user",
"room": payload["room"],
"exp": int(payload["exp"].timestamp()),
"context": {
"user": {
"name": payload["user_name"],
"moderator": payload.get("moderator", False)
},
"features": {
"recording": payload.get("recording", True)
}
}
}, JITSI_JWT_SECRET)
```
### Prevent Anonymous Room Creation
```bash
# Environment configuration
ENABLE_AUTH=1
ENABLE_GUESTS=0
AUTH_TYPE=jwt
JWT_APP_ID=reflector
JWT_APP_SECRET=your-secret-key
```
## Webhook Integration
### Real-time Events via Prosody
Custom event-sync module can send webhooks for:
- Participant join/leave
- Recording start/stop
- Room creation/destruction
- Mute/unmute events
```lua
-- mod_event_sync.lua
module:hook("muc-occupant-joined", function(event)
send_event({
type = "participant_joined",
room = event.room.jid,
participant = {
nick = event.occupant.nick,
jid = event.occupant.jid,
},
timestamp = os.time(),
});
end);
```
### Jibri Recording Webhooks
```bash
# Environment variable
JIBRI_WEBHOOK_SUBSCRIBERS=https://your-reflector.com/webhooks/jibri
```
## Proposed Reflector Integration
### Modified Database Schema
```python
class Meeting(BaseModel):
id: str # Our generated meeting ID
room_name: str # Generated: reflector-{room.name}-{timestamp}
room_url: str # https://jitsi.domain/room_name?jwt=token
host_room_url: str # Same but with moderator JWT
# Add Jitsi-specific fields
jitsi_jwt: str # JWT token
jitsi_room_id: str # Internal room identifier
recording_status: str # pending, recording, completed
recording_file_path: Optional[str]
```
### API Replacement
```python
# Replace whereby.py with jitsi.py
async def create_meeting(room_name_prefix: str, end_date: datetime, room: Room):
# Generate unique room name
jitsi_room = f"reflector-{room.name}-{int(time.time())}"
# Generate JWT tokens
user_jwt = generate_jwt(room=jitsi_room, moderator=False, exp=end_date)
host_jwt = generate_jwt(room=jitsi_room, moderator=True, exp=end_date)
return {
"meetingId": generate_uuid4(), # Our ID
"roomName": jitsi_room,
"roomUrl": f"https://jitsi.domain/{jitsi_room}?jwt={user_jwt}",
"hostRoomUrl": f"https://jitsi.domain/{jitsi_room}?jwt={host_jwt}",
"startDate": datetime.now().isoformat(),
"endDate": end_date.isoformat(),
}
```
### Webhook Endpoints
```python
# Replace whereby webhook with jitsi webhooks
@router.post("/jitsi/events")
async def jitsi_events_webhook(event_data: dict):
event_type = event_data.get("event")
room_name = event_data.get("room", "").split("@")[0]
meeting = await Meeting.get_by_room(room_name)
if event_type == "muc-occupant-joined":
# Update participant count
meeting.num_clients += 1
elif event_type == "jibri-recording-on":
meeting.recording_status = "recording"
elif event_type == "jibri-recording-off":
meeting.recording_status = "processing"
await process_meeting_recording.delay(meeting.id)
@router.post("/jibri/recording-complete")
async def recording_complete(data: dict):
# Handle finalize script webhook
room_name = data.get("room_name")
file_path = data.get("file_path")
meeting = await Meeting.get_by_room(room_name)
meeting.recording_file_path = file_path
meeting.recording_status = "completed"
# Start Reflector processing
await process_recording_for_transcription(meeting.id, file_path)
```
## Deployment with Docker
### Official docker-jitsi-meet
```bash
# Download official release
wget $(wget -q -O - https://api.github.com/repos/jitsi/docker-jitsi-meet/releases/latest | grep zip | cut -d\" -f4)
# Setup
mkdir -p ~/.jitsi-meet-cfg/{web,transcripts,prosody/config,prosody/prosody-plugins-custom,jicofo,jvb,jigasi,jibri}
./gen-passwords.sh # Generate secure passwords
docker compose up -d
```
### Coolify Integration
```yaml
services:
web:
ports: ["80:80", "443:443"]
jvb:
ports: ["10000:10000/udp"] # Must expose for media
jibri1:
environment:
- JIBRI_INSTANCE_ID=jibri1
- JIBRI_FINALIZE_RECORDING_SCRIPT_PATH=/config/finalize.sh
jibri2:
environment:
- JIBRI_INSTANCE_ID=jibri2
```
## Benefits vs Whereby
### Cost & Control
**No per-minute charges** - significant cost savings
**Full recording control** - direct file access
**Custom branding** - complete UI control
**Self-hosted** - no vendor lock-in
### Technical Advantages
**Real-time events** - immediate webhook notifications
**Rich participant metadata** - detailed tracking
**JWT security** - token-based access with expiration
**Multiple recording formats** - audio-only options
**Scalable architecture** - horizontal Jibri scaling
### Integration Benefits
**Same API surface** - minimal changes to existing code
**React SDK** - better frontend integration
**Direct processing** - no S3 download delays
**Event-driven architecture** - better real-time capabilities
## Implementation Plan
1. **Deploy Jitsi Stack** - Set up docker-jitsi-meet with multiple Jibri instances
2. **Create jitsi.py** - Replace whereby.py with Jitsi API functions
3. **Update Database** - Add Jitsi-specific fields to Meeting model
4. **Webhook Integration** - Replace Whereby webhooks with Jitsi events
5. **Frontend Updates** - Replace Whereby embed with Jitsi React SDK
6. **Testing & Migration** - Gradual rollout with fallback to Whereby
## Recording Limitations & Considerations
### Current Limitations
- **Mixed audio only** - Jibri doesn't separate participant tracks natively
- **One recording per Jibri** - requires multiple instances for concurrent recordings
- **Chrome dependency** - Jibri uses headless Chrome for recording
### Metadata Capabilities
**Participant join/leave timestamps** - via webhooks
**Speaking time tracking** - via audio level events
**Meeting duration** - precise timing
**Room-specific data** - custom metadata in JWT
### Alternative Recording Methods
- **Local recording** - browser-based, per-participant
- **Custom recording** - lib-jitsi-meet for individual streams
- **Third-party solutions** - Recall.ai, Otter.ai integrations
## Security Considerations
### JWT Configuration
- **Room-specific tokens** - limit access to specific rooms
- **Time-based expiration** - automatic cleanup
- **Feature permissions** - control recording, moderation rights
- **User identification** - embed user metadata in tokens
### Access Control
- **No anonymous rooms** - all rooms require valid JWT
- **API-only creation** - prevent direct room access
- **Webhook verification** - HMAC signature validation
## Next Steps
1. **Deploy test Jitsi instance** - validate recording pipeline
2. **Prototype jitsi.py** - create equivalent API functions
3. **Test webhook integration** - ensure event delivery works
4. **Performance testing** - validate multiple concurrent recordings
5. **Migration strategy** - plan gradual transition from Whereby
---
*This document serves as the comprehensive planning and research notes for Jitsi integration in Reflector. It should be updated as implementation progresses and new insights are discovered.*

720
docs/video-jitsi.md Normal file
View File

@@ -0,0 +1,720 @@
# Jitsi Meet Integration Configuration Guide
This guide explains how to configure Reflector to use your self-hosted Jitsi Meet installation for video meetings, recording, and participant tracking.
## Overview
Jitsi Meet is an open-source video conferencing platform that can be self-hosted. Reflector integrates with Jitsi Meet to:
- Create secure meeting rooms with JWT authentication
- Track participant join/leave events via Prosody webhooks
- Record meetings using Jibri recording service
- Process recordings for transcription and analysis
## Requirements
### Self-Hosted Jitsi Meet
You need a complete Jitsi Meet installation including:
1. **Jitsi Meet Web Interface** - The main meeting interface
2. **Prosody XMPP Server** - Handles room management and authentication
3. **Jicofo (JItsi COnference FOcus)** - Manages media sessions
4. **Jitsi Videobridge (JVB)** - Handles WebRTC media routing
5. **Jibri Recording Service** - Records meetings (optional but recommended)
### System Requirements
- **Domain with SSL Certificate** - Required for WebRTC functionality
- **Prosody mod_event_sync** - For webhook event handling
- **JWT Authentication** - For secure room access control
- **Storage Solution** - For recording files (local or cloud)
## Configuration Variables
Add the following environment variables to your Reflector `.env` file:
### Required Variables
```bash
# Jitsi Meet Domain (without https://)
JITSI_DOMAIN=meet.example.com
# JWT Secret for room authentication (generate with: openssl rand -hex 32)
JITSI_JWT_SECRET=your-64-character-hex-secret-here
# Webhook secret for event handling (generate with: openssl rand -hex 16)
JITSI_WEBHOOK_SECRET=your-32-character-hex-secret-here
```
### Optional Variables
```bash
# Application identifier (should match Jitsi configuration)
JITSI_APP_ID=reflector
# JWT issuer and audience (should match Jitsi configuration)
JITSI_JWT_ISSUER=reflector
JITSI_JWT_AUDIENCE=jitsi
```
## Installation Steps
### 1. Jitsi Meet Server Installation
#### Quick Installation (Ubuntu/Debian)
```bash
# Add Jitsi repository
curl -fsSL https://download.jitsi.org/jitsi-key.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/jitsi-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/jitsi-keyring.gpg] https://download.jitsi.org stable/" | sudo tee /etc/apt/sources.list.d/jitsi-stable.list
# Install Jitsi Meet
sudo apt update
sudo apt install jitsi-meet
# Configure SSL certificate
sudo /usr/share/jitsi-meet/scripts/install-letsencrypt-cert.sh
```
#### Docker Installation
```bash
# Clone Jitsi Docker repository
git clone https://github.com/jitsi/docker-jitsi-meet
cd docker-jitsi-meet
# Copy environment template
cp env.example .env
# Edit configuration
nano .env
# Start services
docker-compose up -d
```
### 2. JWT Authentication Setup
#### Update Prosody Configuration
Edit `/etc/prosody/conf.d/your-domain.cfg.lua`:
```lua
VirtualHost "meet.example.com"
authentication = "token"
app_id = "reflector"
app_secret = "your-jwt-secret-here"
-- Allow anonymous access for guests
c2s_require_encryption = false
admins = { "focusUser@auth.meet.example.com" }
modules_enabled = {
"bosh";
"pubsub";
"ping";
"roster";
"saslauth";
"tls";
"dialback";
"disco";
"carbons";
"pep";
"private";
"blocklist";
"vcard";
"version";
"uptime";
"time";
"ping";
"register";
"admin_adhoc";
"token_verification";
"event_sync"; -- Required for webhooks
}
```
#### Configure Jitsi Meet Interface
Edit `/etc/jitsi/meet/your-domain-config.js`:
```javascript
var config = {
hosts: {
domain: 'meet.example.com',
muc: 'conference.meet.example.com'
},
// Enable JWT authentication
enableUserRolesBasedOnToken: true,
// Recording configuration
fileRecordingsEnabled: true,
liveStreamingEnabled: false,
// Reflector integration settings
prejoinPageEnabled: true,
requireDisplayName: true
};
```
### 3. Webhook Event Configuration
#### Install Event Sync Module
```bash
# Download the module
cd /usr/share/jitsi-meet/prosody-plugins/
wget https://raw.githubusercontent.com/jitsi-contrib/prosody-plugins/main/mod_event_sync.lua
```
#### Configure Event Sync
Add to your Prosody configuration:
```lua
Component "conference.meet.example.com" "muc"
storage = "memory"
modules_enabled = {
"muc_meeting_id";
"muc_domain_mapper";
"polls";
"event_sync"; -- Enable event sync
}
-- Event sync webhook configuration
event_sync_url = "https://your-reflector-domain.com/v1/jitsi/events"
event_sync_secret = "your-webhook-secret-here"
-- Events to track
event_sync_events = {
"muc-occupant-joined",
"muc-occupant-left",
"jibri-recording-on",
"jibri-recording-off"
}
#### Webhook Event Payload Examples
**Participant Joined Event:**
```json
{
"event": "muc-occupant-joined",
"room": "reflector-my-room-uuid123",
"timestamp": "2025-01-15T10:30:00.000Z",
"data": {
"occupant_id": "participant-456",
"nick": "John Doe",
"role": "participant",
"affiliation": "none"
}
}
```
**Recording Started Event:**
```json
{
"event": "jibri-recording-on",
"room": "reflector-my-room-uuid123",
"timestamp": "2025-01-15T10:32:00.000Z",
"data": {
"recording_id": "rec-789",
"initiator": "moderator-123"
}
}
```
**Recording Completed Event:**
```json
{
"room_name": "reflector-my-room-uuid123",
"recording_file": "/var/recordings/rec-789.mp4",
"recording_status": "completed",
"timestamp": "2025-01-15T11:15:00.000Z"
}
```
### 4. Jibri Recording Setup (Optional)
#### Install Jibri
```bash
# Install Jibri package
sudo apt install jibri
# Create recording directory
sudo mkdir -p /var/recordings
sudo chown jibri:jibri /var/recordings
```
#### Configure Jibri
Edit `/etc/jitsi/jibri/jibri.conf`:
```hocon
jibri {
recording {
recordings-directory = "/var/recordings"
finalize-script = "/opt/jitsi/jibri/finalize.sh"
}
api {
xmpp {
environments = [{
name = "prod environment"
xmpp-server-hosts = ["meet.example.com"]
xmpp-domain = "meet.example.com"
control-muc {
domain = "internal.auth.meet.example.com"
room-name = "JibriBrewery"
nickname = "jibri-nickname"
}
control-login {
domain = "auth.meet.example.com"
username = "jibri"
password = "jibri-password"
}
}]
}
}
}
```
#### Create Finalize Script
Create `/opt/jitsi/jibri/finalize.sh`:
```bash
#!/bin/bash
# Jibri finalize script for Reflector integration
RECORDING_FILE="$1"
ROOM_NAME="$2"
REFLECTOR_API_URL="${REFLECTOR_API_URL:-http://localhost:1250}"
# Prepare webhook payload
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ)
PAYLOAD=$(cat <<EOF
{
"room_name": "$ROOM_NAME",
"recording_file": "$RECORDING_FILE",
"recording_status": "completed",
"timestamp": "$TIMESTAMP"
}
EOF
)
# Generate signature
SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$JITSI_WEBHOOK_SECRET" | cut -d' ' -f2)
# Send webhook to Reflector
curl -X POST "$REFLECTOR_API_URL/v1/jibri/recording-complete" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: $SIGNATURE" \
-d "$PAYLOAD"
echo "Recording finalization webhook sent for room: $ROOM_NAME"
```
Make executable:
```bash
sudo chmod +x /opt/jitsi/jibri/finalize.sh
```
### 5. Restart Services
After configuration changes:
```bash
sudo systemctl restart prosody
sudo systemctl restart jicofo
sudo systemctl restart jitsi-videobridge2
sudo systemctl restart jibri
sudo systemctl restart nginx
```
## Room Configuration
### Creating Jitsi Rooms
Create rooms with Jitsi platform in Reflector:
```bash
curl -X POST "https://your-reflector-domain.com/v1/rooms" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d '{
"name": "my-jitsi-room",
"platform": "jitsi",
"recording_type": "cloud",
"recording_trigger": "automatic-2nd-participant",
"is_locked": false,
"room_mode": "normal"
}'
```
### Meeting Creation
Meetings automatically use JWT authentication:
```bash
curl -X POST "https://your-reflector-domain.com/v1/rooms/my-jitsi-room/meeting" \
-H "Authorization: Bearer $AUTH_TOKEN"
```
Response includes JWT-authenticated URLs:
```json
{
"id": "meeting-uuid",
"room_name": "reflector-my-jitsi-room-123456",
"room_url": "https://meet.example.com/room?jwt=user-token",
"host_room_url": "https://meet.example.com/room?jwt=moderator-token"
}
```
## Features and Capabilities
### JWT Authentication
Reflector automatically generates JWT tokens with:
- **Room Access Control** - Secure room entry
- **User Roles** - Moderator vs participant permissions
- **Expiration** - Configurable token lifetime (default 8 hours)
- **Custom Claims** - Room-specific metadata
### Recording Options
**Recording Types:**
- `"none"` - No recording
- `"local"` - Local Jibri recording
- `"cloud"` - Cloud recording (requires external storage)
**Recording Triggers:**
- `"none"` - Manual recording only
- `"prompt"` - Prompt users to start
- `"automatic"` - Start immediately
- `"automatic-2nd-participant"` - Start when 2nd person joins
### Event Tracking and Storage
Reflector automatically stores all webhook events in the `meetings` table for comprehensive meeting analytics:
**Supported Event Types:**
- `muc-occupant-joined` - Participant joined the meeting
- `muc-occupant-left` - Participant left the meeting
- `jibri-recording-on` - Recording started
- `jibri-recording-off` - Recording stopped
- `recording_completed` - Recording file ready for processing
**Event Storage Structure:**
Each webhook event is stored as a JSON object in the `meetings.events` column:
```json
{
"type": "muc-occupant-joined",
"timestamp": "2025-01-15T10:30:00.123456Z",
"data": {
"timestamp": "2025-01-15T10:30:00Z",
"user_id": "participant-123",
"display_name": "John Doe"
}
}
```
**Querying Stored Events:**
```sql
-- Get all events for a meeting
SELECT events FROM meeting WHERE id = 'meeting-uuid';
-- Count participant joins
SELECT json_array_length(
json_extract(events, '$[*] ? (@.type == "muc-occupant-joined")')
) as total_joins FROM meeting WHERE id = 'meeting-uuid';
```
## Testing and Verification
### Health Check
Test Jitsi webhook integration:
```bash
curl "https://your-reflector-domain.com/v1/jitsi/health"
```
Expected response:
```json
{
"status": "ok",
"service": "jitsi-webhooks",
"timestamp": "2025-01-15T10:30:00.000Z",
"webhook_secret_configured": true
}
```
### JWT Token Testing
Verify JWT generation works:
```bash
# Create a test meeting
MEETING=$(curl -X POST "https://your-reflector-domain.com/v1/rooms/test-room/meeting" \
-H "Authorization: Bearer $AUTH_TOKEN" | jq -r '.room_url')
echo "Test meeting URL: $MEETING"
```
### Webhook Testing
#### Manual Webhook Event Testing
Test participant join event:
```bash
# Generate proper signature
PAYLOAD='{"event":"muc-occupant-joined","room":"reflector-test-room-uuid","timestamp":"2025-01-15T10:30:00.000Z","data":{"user_id":"test-user","display_name":"Test User"}}'
SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$JITSI_WEBHOOK_SECRET" | cut -d' ' -f2)
curl -X POST "https://your-reflector-domain.com/v1/jitsi/events" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: $SIGNATURE" \
-d "$PAYLOAD"
```
Expected response:
```json
{
"status": "ok",
"event": "muc-occupant-joined",
"room": "reflector-test-room-uuid"
}
```
#### Recording Webhook Testing
Test recording completion event:
```bash
PAYLOAD='{"room_name":"reflector-test-room-uuid","recording_file":"/recordings/test.mp4","recording_status":"completed","timestamp":"2025-01-15T10:30:00.000Z"}'
SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$JITSI_WEBHOOK_SECRET" | cut -d' ' -f2)
curl -X POST "https://your-reflector-domain.com/v1/jibri/recording-complete" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: $SIGNATURE" \
-d "$PAYLOAD"
```
#### Event Storage Verification
Verify events were stored:
```bash
# Check meeting events via API (requires authentication)
curl -H "Authorization: Bearer $AUTH_TOKEN" \
"https://your-reflector-domain.com/v1/meetings/{meeting-id}"
```
## Troubleshooting
### Common Issues
#### JWT Authentication Failures
**Symptoms**: Users cannot join rooms, "Authentication failed" errors
**Solutions**:
1. Verify `JITSI_JWT_SECRET` matches Prosody configuration
2. Check JWT token hasn't expired (default 8 hours)
3. Ensure system clocks are synchronized between servers
4. Validate JWT issuer/audience configuration matches
**Debug JWT tokens**:
```bash
# Decode JWT payload
echo "JWT_TOKEN_HERE" | cut -d'.' -f2 | base64 -d | jq
```
#### Webhook Events Not Received
**Symptoms**: Participant counts not updating, no recording events
**Solutions**:
1. Verify `mod_event_sync` is loaded in Prosody
2. Check webhook URL is accessible from Jitsi server
3. Validate webhook signature generation
4. Review Prosody and Reflector logs
**Debug webhook connectivity**:
```bash
# Test from Jitsi server
curl -v "https://your-reflector-domain.com/v1/jitsi/health"
# Check Prosody logs
sudo tail -f /var/log/prosody/prosody.log
```
#### Webhook Signature Verification Issues
**Symptoms**: HTTP 401 "Invalid webhook signature" errors
**Solutions**:
1. Verify webhook secret matches between Jitsi and Reflector
2. Check payload encoding (no extra whitespace)
3. Ensure proper HMAC-SHA256 signature generation
**Debug signature generation**:
```bash
# Test signature manually
PAYLOAD='{"event":"test","room":"test","timestamp":"2025-01-15T10:30:00.000Z","data":{}}'
SECRET="your-webhook-secret-here"
# Generate signature (should match X-Jitsi-Signature header)
echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | cut -d' ' -f2
# Test with curl
curl -X POST "https://your-reflector-domain.com/v1/jitsi/events" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: $(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | cut -d' ' -f2)" \
-d "$PAYLOAD" -v
```
#### Event Storage Problems
**Symptoms**: Events received but not stored in database
**Solutions**:
1. Check database connectivity and permissions
2. Verify meeting exists before event processing
3. Review Reflector application logs
4. Ensure JSON column support in database
**Debug event storage**:
```bash
# Check meeting exists
curl -H "Authorization: Bearer $TOKEN" \
"https://your-reflector-domain.com/v1/meetings/{meeting-id}"
# Monitor database queries (if using PostgreSQL)
sudo -u postgres psql -c "SELECT * FROM pg_stat_activity WHERE query LIKE '%meeting%';"
# Check Reflector logs for event processing
sudo journalctl -u reflector -f | grep -E "(event|webhook|jitsi)"
```
#### Recording Issues
**Symptoms**: Recordings not starting, finalize script errors
**Solutions**:
1. Verify Jibri service status: `sudo systemctl status jibri`
2. Check recording directory permissions: `/var/recordings`
3. Validate finalize script execution permissions
4. Monitor Jibri logs: `sudo journalctl -u jibri -f`
**Test finalize script**:
```bash
sudo -u jibri /opt/jitsi/jibri/finalize.sh "/test/recording.mp4" "test-room"
```
#### Meeting Creation Failures
**Symptoms**: HTTP 500 errors when creating meetings
**Solutions**:
1. Check Reflector logs for JWT generation errors
2. Verify all required environment variables are set
3. Ensure Jitsi domain is accessible from Reflector
4. Test JWT secret configuration
### Debug Commands
```bash
# Verify Prosody configuration
sudo prosodyctl check config
# Check Jitsi services status
sudo systemctl status prosody jicofo jitsi-videobridge2
# Test JWT generation
curl -X POST "https://your-reflector-domain.com/v1/rooms/test/meeting" \
-H "Authorization: Bearer $TOKEN" -v
# Monitor webhook events
sudo tail -f /var/log/reflector/app.log | grep jitsi
# Check SSL certificates
sudo certbot certificates
```
### Performance Optimization
#### Scaling Considerations
**Single Server Limits:**
- ~50 concurrent participants per JVB instance
- ~10 concurrent Jibri recordings
- CPU and bandwidth become bottlenecks
**Multi-Server Setup:**
- Multiple JVB instances for scaling
- Dedicated Jibri recording servers
- Load balancing for high availability
#### Resource Monitoring
```bash
# Monitor JVB performance
sudo systemctl status jitsi-videobridge2
sudo journalctl -u jitsi-videobridge2 -f
# Check Prosody connections
sudo prosodyctl mod_admin_telnet
> c2s:show()
> muc:rooms()
```
## Security Best Practices
### JWT Security
- Use strong, unique secrets (32+ characters)
- Rotate JWT secrets regularly
- Implement proper token expiration
- Never log or expose JWT tokens
### Network Security
- Use HTTPS/WSS for all communications
- Implement proper firewall rules
- Consider VPN for server-to-server communication
- Monitor for unauthorized access attempts
### Recording Security
- Encrypt recordings at rest
- Implement access controls for recording files
- Regular security audits of file permissions
- Comply with data protection regulations
## Migration from Whereby
If migrating from Whereby to Jitsi:
1. **Parallel Setup** - Configure Jitsi alongside existing Whereby
2. **Room Migration** - Update room platform field to "jitsi"
3. **Test Integration** - Verify meeting creation and webhooks
4. **User Training** - Different UI and feature set
5. **Monitor Performance** - Watch for issues during transition
6. **Cleanup** - Remove Whereby configuration when stable
## Support and Resources
### Jitsi Community Resources
- **Documentation**: [jitsi.github.io/handbook](https://jitsi.github.io/handbook/)
- **Community Forum**: [community.jitsi.org](https://community.jitsi.org/)
- **GitHub Issues**: [github.com/jitsi/jitsi-meet](https://github.com/jitsi/jitsi-meet)
### Professional Support
- **8x8 Commercial Support** - Professional Jitsi hosting and support
- **Community Consulting** - Third-party Jitsi implementation services
### Monitoring and Maintenance
- Monitor system resources (CPU, memory, bandwidth)
- Regular security updates for all components
- Backup configuration files and certificates
- Test disaster recovery procedures

276
docs/video-whereby.md Normal file
View File

@@ -0,0 +1,276 @@
# Whereby Integration Configuration Guide
This guide explains how to configure Reflector to use Whereby as your video meeting platform for room creation, recording, and participant tracking.
## Overview
Whereby is a browser-based video meeting platform that provides hosted meeting rooms with recording capabilities. Reflector integrates with Whereby's API to:
- Create secure meeting rooms with custom branding
- Handle participant join/leave events via webhooks
- Automatically record meetings to AWS S3 storage
- Track meeting sessions and participant counts
## Requirements
### Whereby Account Setup
1. **Whereby Account**: Sign up for a Whereby business account at [whereby.com](https://whereby.com/business)
2. **API Access**: Request API access from Whereby support (required for programmatic room creation)
3. **Webhook Configuration**: Configure webhooks in your Whereby dashboard to point to your Reflector instance
### AWS S3 Storage
Whereby requires AWS S3 for recording storage. You need:
- AWS account with S3 access
- Dedicated S3 bucket for Whereby recordings
- AWS IAM credentials with S3 write permissions
## Configuration Variables
Add the following environment variables to your Reflector `.env` file:
### Required Variables
```bash
# Whereby API Configuration
WHEREBY_API_KEY=your-whereby-jwt-api-key
WHEREBY_WEBHOOK_SECRET=your-webhook-secret-from-whereby
# AWS S3 Storage for Recordings
AWS_WHEREBY_ACCESS_KEY_ID=your-aws-access-key
AWS_WHEREBY_ACCESS_KEY_SECRET=your-aws-secret-key
RECORDING_STORAGE_AWS_BUCKET_NAME=your-s3-bucket-name
```
### Optional Variables
```bash
# Whereby API URL (defaults to production)
WHEREBY_API_URL=https://api.whereby.dev/v1
# SQS Configuration (for recording processing)
AWS_PROCESS_RECORDING_QUEUE_URL=https://sqs.region.amazonaws.com/account/queue
SQS_POLLING_TIMEOUT_SECONDS=60
```
## Configuration Steps
### 1. Whereby API Key Setup
1. **Contact Whereby Support** to request API access for your account
2. **Generate JWT Token** in your Whereby dashboard under API settings
3. **Copy the JWT token** and set it as `WHEREBY_API_KEY` in your environment
The API key is a JWT token that looks like:
```
eyJ[...truncated JWT token...]
```
### 2. Webhook Configuration
1. **Access Whereby Dashboard** and navigate to webhook settings
2. **Set Webhook URL** to your Reflector instance:
```
https://your-reflector-domain.com/v1/whereby
```
3. **Configure Events** to send the following event types:
- `room.client.joined` - When participants join
- `room.client.left` - When participants leave
4. **Generate Webhook Secret** and set it as `WHEREBY_WEBHOOK_SECRET`
5. **Save Configuration** in your Whereby dashboard
### 3. AWS S3 Storage Setup
1. **Create S3 Bucket** dedicated for Whereby recordings
2. **Create IAM User** with programmatic access
3. **Attach S3 Policy** with the following permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}
```
4. **Configure Environment Variables** with the IAM credentials
### 4. Room Configuration
When creating rooms in Reflector, set the platform to use Whereby:
```bash
curl -X POST "https://your-reflector-domain.com/v1/rooms" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-d '{
"name": "my-whereby-room",
"platform": "whereby",
"recording_type": "cloud",
"recording_trigger": "automatic-2nd-participant",
"is_locked": false,
"room_mode": "normal"
}'
```
## Meeting Features
### Recording Options
Whereby supports three recording types:
- **`none`**: No recording
- **`local`**: Local recording (not recommended for production)
- **`cloud`**: Cloud recording to S3 (recommended)
### Recording Triggers
Control when recordings start:
- **`none`**: No automatic recording
- **`prompt`**: Prompt users to start recording
- **`automatic`**: Start immediately when meeting begins
- **`automatic-2nd-participant`**: Start when second participant joins
### Room Modes
- **`normal`**: Standard meeting room
- **`group`**: Group meeting with advanced features
## Webhook Event Handling
Reflector automatically handles these Whereby webhook events:
### Participant Tracking
```json
{
"type": "room.client.joined",
"data": {
"meetingId": "room-uuid",
"numClients": 2
}
}
```
### Recording Events
Whereby sends recording completion events that trigger Reflector's processing pipeline:
- Audio transcription
- Speaker diarization
- Summary generation
## Troubleshooting
### Common Issues
#### API Authentication Errors
**Symptoms**: 401 Unauthorized errors when creating meetings
**Solutions**:
1. Verify your `WHEREBY_API_KEY` is correct and not expired
2. Ensure you have API access enabled on your Whereby account
3. Contact Whereby support if API access is not available
#### Webhook Signature Validation Failed
**Symptoms**: Webhook events rejected with 401 errors
**Solutions**:
1. Verify `WHEREBY_WEBHOOK_SECRET` matches your Whereby dashboard configuration
2. Check webhook URL is correctly configured in Whereby dashboard
3. Ensure webhook endpoint is accessible from Whereby servers
#### Recording Upload Failures
**Symptoms**: Recordings not appearing in S3 bucket
**Solutions**:
1. Verify AWS credentials have S3 write permissions
2. Check S3 bucket name is correct and accessible
3. Ensure AWS region settings match your bucket location
4. Review AWS CloudTrail logs for permission issues
#### Participant Count Not Updating
**Symptoms**: Meeting participant counts remain at 0
**Solutions**:
1. Verify webhook events are being received at `/v1/whereby`
2. Check webhook signature validation is passing
3. Ensure meeting IDs match between Whereby and Reflector database
### Debug Commands
```bash
# Test Whereby API connectivity
curl -H "Authorization: Bearer $WHEREBY_API_KEY" \
https://api.whereby.dev/v1/meetings
# Check webhook endpoint health
curl https://your-reflector-domain.com/v1/whereby/health
# Verify S3 bucket access
aws s3 ls s3://your-bucket-name --profile whereby-user
```
## Security Considerations
### API Key Security
- Store API keys securely using environment variables
- Rotate API keys regularly
- Never commit API keys to version control
- Use separate keys for development and production
### Webhook Security
- Always validate webhook signatures using HMAC-SHA256
- Use HTTPS for all webhook endpoints
- Implement rate limiting on webhook endpoints
- Monitor webhook events for suspicious activity
### Recording Privacy
- Ensure S3 bucket access is restricted to authorized users
- Consider encryption at rest for sensitive recordings
- Implement retention policies for recorded content
- Comply with data protection regulations (GDPR, etc.)
## Performance Optimization
### Meeting Scaling
- Monitor concurrent meeting limits on your Whereby plan
- Implement meeting cleanup for expired sessions
- Use appropriate room modes for different use cases
### Recording Processing
- Configure SQS for asynchronous recording processing
- Monitor S3 storage usage and costs
- Implement automatic cleanup of processed recordings
### Webhook Reliability
- Implement webhook retry mechanisms
- Monitor webhook delivery success rates
- Log webhook events for debugging and auditing
## Migration from Other Platforms
If migrating from another video platform:
1. **Update Room Configuration**: Change existing rooms to use `"platform": "whereby"`
2. **Configure Webhooks**: Set up Whereby webhook endpoints
3. **Test Integration**: Verify meeting creation and event handling
4. **Monitor Performance**: Watch for any issues during transition
5. **Update Documentation**: Inform users of any workflow changes
## Support
For Whereby-specific issues:
- **Whereby Support**: [whereby.com/support](https://whereby.com/support)
- **API Documentation**: [whereby.dev](https://whereby.dev)
- **Status Page**: [status.whereby.com](https://status.whereby.com)
For Reflector integration issues:
- Check application logs for error details
- Verify environment variable configuration
- Test webhook connectivity and authentication
- Review AWS permissions and S3 access

474
docs/video_platforms.md Normal file
View File

@@ -0,0 +1,474 @@
# Video Platforms Architecture (PR #529 Analysis)
This document analyzes the video platforms refactoring implemented in PR #529 for daily.co integration, providing a blueprint for extending support to Jitsi and other video conferencing platforms.
## Overview
The video platforms refactoring introduces a clean abstraction layer that allows Reflector to support multiple video conferencing providers (Whereby, Daily.co, etc.) without changing core application logic. This architecture enables:
- Seamless switching between video platforms
- Platform-specific feature support
- Isolated platform code organization
- Consistent API surface across platforms
- Feature flags for gradual migration
## Architecture Components
### 1. **Directory Structure**
```
server/reflector/video_platforms/
├── __init__.py # Public API exports
├── base.py # Abstract base classes
├── factory.py # Platform client factory
├── registry.py # Platform registration system
├── whereby.py # Whereby implementation
├── daily.py # Daily.co implementation
└── mock.py # Testing implementation
```
### 2. **Core Abstract Classes**
#### `VideoPlatformClient` (base.py)
Abstract base class defining the interface all platforms must implement:
```python
class VideoPlatformClient(ABC):
PLATFORM_NAME: str = ""
@abstractmethod
async def create_meeting(self, room_name_prefix: str, end_date: datetime, room: Room) -> MeetingData
@abstractmethod
async def get_room_sessions(self, room_name: str) -> Dict[str, Any]
@abstractmethod
async def delete_room(self, room_name: str) -> bool
@abstractmethod
async def upload_logo(self, room_name: str, logo_path: str) -> bool
@abstractmethod
def verify_webhook_signature(self, body: bytes, signature: str, timestamp: Optional[str] = None) -> bool
```
#### `MeetingData` (base.py)
Standardized meeting data structure returned by all platforms:
```python
class MeetingData(BaseModel):
meeting_id: str
room_name: str
room_url: str
host_room_url: str
platform: str
extra_data: Dict[str, Any] = {} # Platform-specific data
```
#### `VideoPlatformConfig` (base.py)
Unified configuration structure for all platforms:
```python
class VideoPlatformConfig(BaseModel):
api_key: str
webhook_secret: str
api_url: Optional[str] = None
subdomain: Optional[str] = None
s3_bucket: Optional[str] = None
s3_region: Optional[str] = None
aws_role_arn: Optional[str] = None
aws_access_key_id: Optional[str] = None
aws_access_key_secret: Optional[str] = None
```
### 3. **Platform Registration System**
#### Registry Pattern (registry.py)
- Automatic registration of built-in platforms
- Runtime platform discovery
- Type-safe client instantiation
```python
# Auto-registration of platforms
_PLATFORMS: Dict[str, Type[VideoPlatformClient]] = {}
def register_platform(name: str, client_class: Type[VideoPlatformClient])
def get_platform_client(platform: str, config: VideoPlatformConfig) -> VideoPlatformClient
```
#### Factory System (factory.py)
- Configuration management per platform
- Platform selection logic
- Feature flag integration
```python
def get_platform_for_room(room_id: Optional[str] = None) -> str:
"""Determine which platform to use based on feature flags."""
if not settings.DAILY_MIGRATION_ENABLED:
return "whereby"
if room_id and room_id in settings.DAILY_MIGRATION_ROOM_IDS:
return "daily"
return settings.DEFAULT_VIDEO_PLATFORM
```
### 4. **Database Schema Changes**
#### Room Model Updates
Added `platform` field to track which video platform each room uses:
```python
# Database Schema
platform_column = sqlalchemy.Column(
"platform",
sqlalchemy.String,
nullable=False,
server_default="whereby"
)
# Pydantic Model
class Room(BaseModel):
platform: Literal["whereby", "daily"] = "whereby"
```
#### Meeting Model Updates
Added `platform` field to meetings for tracking and debugging:
```python
# Database Schema
platform_column = sqlalchemy.Column(
"platform",
sqlalchemy.String,
nullable=False,
server_default="whereby"
)
# Pydantic Model
class Meeting(BaseModel):
platform: Literal["whereby", "daily"] = "whereby"
```
**Key Decision**: No platform-specific fields were added to models. Instead, the `extra_data` field in `MeetingData` handles platform-specific information, following the user's rule of using generic `provider_data` as JSON if needed.
### 5. **Settings Configuration**
#### Feature Flags
```python
# Migration control
DAILY_MIGRATION_ENABLED: bool = True
DAILY_MIGRATION_ROOM_IDS: list[str] = []
DEFAULT_VIDEO_PLATFORM: str = "daily"
# Daily.co specific settings
DAILY_API_KEY: str | None = None
DAILY_WEBHOOK_SECRET: str | None = None
DAILY_SUBDOMAIN: str | None = None
AWS_DAILY_S3_BUCKET: str | None = None
AWS_DAILY_S3_REGION: str = "us-west-2"
AWS_DAILY_ROLE_ARN: str | None = None
```
#### Configuration Pattern
Each platform gets its own configuration namespace while sharing common patterns:
```python
def get_platform_config(platform: str) -> VideoPlatformConfig:
if platform == "whereby":
return VideoPlatformConfig(
api_key=settings.WHEREBY_API_KEY or "",
webhook_secret=settings.WHEREBY_WEBHOOK_SECRET or "",
# ... whereby-specific config
)
elif platform == "daily":
return VideoPlatformConfig(
api_key=settings.DAILY_API_KEY or "",
webhook_secret=settings.DAILY_WEBHOOK_SECRET or "",
# ... daily-specific config
)
```
### 6. **API Integration Updates**
#### Room Creation (views/rooms.py)
Updated to use platform factory instead of direct Whereby calls:
```python
@router.post("/rooms/{room_name}/meeting")
async def rooms_create_meeting(room_name: str, user: UserInfo):
# OLD: Direct Whereby integration
# whereby_meeting = await create_meeting("", end_date=end_date, room=room)
# NEW: Platform abstraction
platform = get_platform_for_room(room.id)
client = create_platform_client(platform)
meeting_data = await client.create_meeting(
room_name_prefix=room.name, end_date=end_date, room=room
)
await client.upload_logo(meeting_data.room_name, "./images/logo.png")
```
### 7. **Webhook Handling**
#### Separate Webhook Endpoints
Each platform gets its own webhook endpoint with platform-specific signature verification:
```python
# views/daily.py
@router.post("/daily_webhook")
async def daily_webhook(event: DailyWebhookEvent, request: Request):
# Verify Daily.co signature
body = await request.body()
signature = request.headers.get("X-Daily-Signature", "")
if not verify_daily_webhook_signature(body, signature):
raise HTTPException(status_code=401)
# Handle platform-specific events
if event.type == "participant.joined":
await _handle_participant_joined(event)
```
#### Consistent Event Handling
Despite different event formats, the core business logic remains the same:
```python
async def _handle_participant_joined(event):
room_name = event.data.get("room", {}).get("name") # Daily.co format
meeting = await meetings_controller.get_by_room_name(room_name)
if meeting:
current_count = getattr(meeting, "num_clients", 0)
await meetings_controller.update_meeting(
meeting.id, num_clients=current_count + 1
)
```
### 8. **Worker Task Integration**
#### New Task for Daily.co Recording Processing
Added platform-specific recording processing while maintaining the same pipeline:
```python
@shared_task
@asynctask
async def process_recording_from_url(recording_url: str, meeting_id: str, recording_id: str):
"""Process recording from Direct URL (Daily.co webhook)."""
logger.info("Processing recording from URL for meeting: %s", meeting_id)
# Uses same processing pipeline as Whereby S3 recordings
```
**Key Decision**: Worker tasks remain in main worker module but could be moved to platform-specific folders as suggested by the user.
### 9. **Testing Infrastructure**
#### Comprehensive Test Suite
- Unit tests for each platform client
- Integration tests for platform switching
- Mock platform for testing without external dependencies
- Webhook signature verification tests
```python
class TestPlatformIntegration:
"""Integration tests for platform switching."""
async def test_platform_switching_preserves_interface(self):
"""Test that different platforms provide consistent interface."""
# Test both Mock and Daily platforms return MeetingData objects
# with consistent fields
```
## Implementation Patterns for Jitsi Integration
Based on the daily.co implementation, here's how Jitsi should be integrated:
### 1. **Jitsi Client Implementation**
```python
# video_platforms/jitsi.py
class JitsiClient(VideoPlatformClient):
PLATFORM_NAME = "jitsi"
async def create_meeting(self, room_name_prefix: str, end_date: datetime, room: Room) -> MeetingData:
# Generate unique room name
jitsi_room = f"reflector-{room.name}-{int(time.time())}"
# Generate JWT tokens
user_jwt = self._generate_jwt(room=jitsi_room, moderator=False, exp=end_date)
host_jwt = self._generate_jwt(room=jitsi_room, moderator=True, exp=end_date)
return MeetingData(
meeting_id=generate_uuid4(),
room_name=jitsi_room,
room_url=f"https://jitsi.domain/{jitsi_room}?jwt={user_jwt}",
host_room_url=f"https://jitsi.domain/{jitsi_room}?jwt={host_jwt}",
platform=self.PLATFORM_NAME,
extra_data={"user_jwt": user_jwt, "host_jwt": host_jwt}
)
```
### 2. **Settings Integration**
```python
# settings.py
JITSI_DOMAIN: str = "meet.jit.si"
JITSI_JWT_SECRET: str | None = None
JITSI_WEBHOOK_SECRET: str | None = None
JITSI_API_URL: str | None = None # If using Jitsi API
```
### 3. **Factory Registration**
```python
# registry.py
def _register_builtin_platforms():
from .jitsi import JitsiClient
register_platform("jitsi", JitsiClient)
# factory.py
def get_platform_config(platform: str) -> VideoPlatformConfig:
elif platform == "jitsi":
return VideoPlatformConfig(
api_key="", # Jitsi may not need API key
webhook_secret=settings.JITSI_WEBHOOK_SECRET or "",
api_url=settings.JITSI_API_URL,
)
```
### 4. **Webhook Integration**
```python
# views/jitsi.py
@router.post("/jitsi/events")
async def jitsi_events_webhook(event_data: dict):
# Handle Prosody event-sync webhook format
event_type = event_data.get("event")
room_name = event_data.get("room", "").split("@")[0]
if event_type == "muc-occupant-joined":
# Same participant handling logic as other platforms
```
## Key Benefits of This Architecture
### 1. **Isolation and Organization**
- Platform-specific code contained in separate modules
- No platform logic leaking into core application
- Easy to add/remove platforms without affecting others
### 2. **Consistent Interface**
- All platforms implement the same abstract methods
- Standardized `MeetingData` structure
- Uniform error handling and logging
### 3. **Gradual Migration Support**
- Feature flags for controlled rollouts
- Room-specific platform selection
- Fallback mechanisms for platform failures
### 4. **Configuration Management**
- Centralized settings per platform
- Consistent naming patterns
- Environment-based configuration
### 5. **Testing and Quality**
- Mock platform for testing
- Comprehensive test coverage
- Platform-specific test utilities
## Migration Strategy Applied
The daily.co implementation demonstrates a careful migration approach:
### 1. **Backward Compatibility**
- Default platform remains "whereby"
- Existing rooms continue using Whereby unless explicitly migrated
- Same API endpoints and response formats
### 2. **Feature Flag Control**
```python
# Gradual rollout control
DAILY_MIGRATION_ENABLED: bool = True
DAILY_MIGRATION_ROOM_IDS: list[str] = [] # Specific rooms to migrate
DEFAULT_VIDEO_PLATFORM: str = "daily" # New rooms default
```
### 3. **Data Integrity**
- Platform field tracks which service each room/meeting uses
- No data loss during migration
- Platform-specific data preserved in `extra_data`
### 4. **Monitoring and Rollback**
- Comprehensive logging of platform selection
- Easy rollback by changing feature flags
- Platform-specific error tracking
## Recommendations for Jitsi Integration
Based on this analysis and the user's requirements:
### 1. **Follow the Pattern**
- Create `video_platforms/jitsi/` directory with:
- `client.py` - Main JitsiClient implementation
- `tasks.py` - Jitsi-specific worker tasks
- `__init__.py` - Module exports
### 2. **Settings Organization**
- Use `JITSI_*` prefix for all Jitsi settings
- Follow the same configuration pattern as Daily.co
- Support both environment variables and config files
### 3. **Generic Database Fields**
- Avoid platform-specific columns in database
- Use `provider_data` JSON field if platform-specific data needed
- Keep `platform` field as simple string identifier
### 4. **Worker Task Migration**
According to user requirements, migrate platform-specific tasks:
```
video_platforms/
├── whereby/
│ ├── client.py (moved from whereby.py)
│ └── tasks.py (moved from worker/whereby_tasks.py)
├── daily/
│ ├── client.py (moved from daily.py)
│ └── tasks.py (moved from worker/daily_tasks.py)
└── jitsi/
├── client.py (new JitsiClient)
└── tasks.py (new Jitsi recording tasks)
```
### 5. **Webhook Architecture**
- Create `views/jitsi.py` for Jitsi-specific webhooks
- Follow the same signature verification pattern
- Reuse existing participant tracking logic
## Implementation Checklist for Jitsi
- [ ] Create `video_platforms/jitsi/` directory structure
- [ ] Implement `JitsiClient` following the abstract interface
- [ ] Add Jitsi settings to configuration
- [ ] Register Jitsi platform in factory/registry
- [ ] Create Jitsi webhook endpoint
- [ ] Implement JWT token generation for room access
- [ ] Add Jitsi recording processing tasks
- [ ] Create comprehensive test suite
- [ ] Update database migrations for platform field
- [ ] Document Jitsi-specific configuration
## Conclusion
The video platforms refactoring in PR #529 provides an excellent foundation for adding Jitsi support. The architecture is well-designed with clear separation of concerns, consistent interfaces, and excellent extensibility. The daily.co implementation demonstrates how to add a new platform while maintaining backward compatibility and providing gradual migration capabilities.
The pattern should be directly applicable to Jitsi integration, with the main differences being:
- JWT-based authentication instead of API keys
- Different webhook event formats
- Jibri recording pipeline integration
- Self-hosted deployment considerations
This architecture successfully achieves the user's goals of:
1. Settings-based configuration
2. Generic database fields (no provider-specific columns)
3. Platform isolation in separate directories
4. Worker task organization within platform folders

View File

@@ -1,33 +0,0 @@
# OS / Editor
.DS_Store
.vscode/
.idea/
# Python
__pycache__/
*.py[cod]
*$py.class
# Logs
*.log
# Env and secrets
.env
.env.*
*.env
*.secret
# Build / dist
build/
dist/
.eggs/
*.egg-info/
# Coverage / test
.pytest_cache/
.coverage*
htmlcov/
# Modal local state (if any)
modal_mounts/
.modal_cache/

View File

@@ -1,2 +0,0 @@
REFLECTOR_GPU_APIKEY=
HF_TOKEN=

View File

@@ -1,38 +0,0 @@
cache/
# OS / Editor
.DS_Store
.vscode/
.idea/
# Python
__pycache__/
*.py[cod]
*$py.class
# Env and secrets
.env
*.env
*.secret
HF_TOKEN
REFLECTOR_GPU_APIKEY
# Virtual env / uv
.venv/
venv/
ENV/
uv/
# Build / dist
build/
dist/
.eggs/
*.egg-info/
# Coverage / test
.pytest_cache/
.coverage*
htmlcov/
# Logs
*.log

View File

@@ -1,46 +0,0 @@
FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1 \
UV_LINK_MODE=copy \
UV_NO_CACHE=1
WORKDIR /tmp
RUN apt-get update \
&& apt-get install -y \
ffmpeg \
curl \
ca-certificates \
gnupg \
wget \
&& apt-get clean
# Add NVIDIA CUDA repo for Debian 12 (bookworm) and install cuDNN 9 for CUDA 12
ADD https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb /cuda-keyring.deb
RUN dpkg -i /cuda-keyring.deb \
&& rm /cuda-keyring.deb \
&& apt-get update \
&& apt-get install -y --no-install-recommends \
cuda-cudart-12-6 \
libcublas-12-6 \
libcudnn9-cuda-12 \
libcudnn9-dev-cuda-12 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
ADD https://astral.sh/uv/install.sh /uv-installer.sh
RUN sh /uv-installer.sh && rm /uv-installer.sh
ENV PATH="/root/.local/bin/:$PATH"
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
RUN mkdir -p /app
WORKDIR /app
COPY pyproject.toml uv.lock /app/
COPY ./app /app/app
COPY ./main.py /app/
COPY ./runserver.sh /app/
EXPOSE 8000
CMD ["sh", "/app/runserver.sh"]

View File

@@ -1,73 +0,0 @@
# Self-hosted Model API
Run transcription, translation, and diarization services compatible with Reflector's GPU Model API. Works on CPU or GPU.
Environment variables
- REFLECTOR_GPU_APIKEY: Optional Bearer token. If unset, auth is disabled.
- HF_TOKEN: Optional. Required for diarization to download pyannote pipelines
Requirements
- FFmpeg must be installed and on PATH (used for URL-based and segmented transcription)
- Python 3.12+
- NVIDIA GPU optional. If available, it will be used automatically
Local run
Set env vars in self_hosted/.env file
uv sync
uv run uvicorn main:app --host 0.0.0.0 --port 8000
Authentication
- If REFLECTOR_GPU_APIKEY is set, include header: Authorization: Bearer <key>
Endpoints
- POST /v1/audio/transcriptions
- multipart/form-data
- fields: file (single file) OR files[] (multiple files), language, batch (true/false)
- response: single { text, words, filename } or { results: [ ... ] }
- POST /v1/audio/transcriptions-from-url
- application/json
- body: { audio_file_url, language, timestamp_offset }
- response: { text, words }
- POST /translate
- text: query parameter
- body (application/json): { source_language, target_language }
- response: { text: { <src>: original, <tgt>: translated } }
- POST /diarize
- query parameters: audio_file_url, timestamp (optional)
- requires HF_TOKEN to be set (for pyannote)
- response: { diarization: [ { start, end, speaker } ] }
OpenAPI docs
- Visit /docs when the server is running
Docker
- Not yet provided in this directory. A Dockerfile will be added later. For now, use Local run above
Conformance tests
# From this directory
TRANSCRIPT_URL=http://localhost:8000 \
TRANSCRIPT_API_KEY=dev-key \
uv run -m pytest -m model_api --no-cov ../../server/tests/test_model_api_transcript.py
TRANSLATION_URL=http://localhost:8000 \
TRANSLATION_API_KEY=dev-key \
uv run -m pytest -m model_api --no-cov ../../server/tests/test_model_api_translation.py
DIARIZATION_URL=http://localhost:8000 \
DIARIZATION_API_KEY=dev-key \
uv run -m pytest -m model_api --no-cov ../../server/tests/test_model_api_diarization.py

View File

@@ -1,19 +0,0 @@
import os
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def apikey_auth(apikey: str = Depends(oauth2_scheme)):
required_key = os.environ.get("REFLECTOR_GPU_APIKEY")
if not required_key:
return
if apikey == required_key:
return
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API key",
headers={"WWW-Authenticate": "Bearer"},
)

View File

@@ -1,12 +0,0 @@
from pathlib import Path
SUPPORTED_FILE_EXTENSIONS = ["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"]
SAMPLE_RATE = 16000
VAD_CONFIG = {
"batch_max_duration": 30.0,
"silence_padding": 0.5,
"window_size": 512,
}
# App-level paths
UPLOADS_PATH = Path("/tmp/whisper-uploads")

View File

@@ -1,30 +0,0 @@
from contextlib import asynccontextmanager
from fastapi import FastAPI
from .routers.diarization import router as diarization_router
from .routers.transcription import router as transcription_router
from .routers.translation import router as translation_router
from .services.transcriber import WhisperService
from .services.diarizer import PyannoteDiarizationService
from .utils import ensure_dirs
@asynccontextmanager
async def lifespan(app: FastAPI):
ensure_dirs()
whisper_service = WhisperService()
whisper_service.load()
app.state.whisper = whisper_service
diarization_service = PyannoteDiarizationService()
diarization_service.load()
app.state.diarizer = diarization_service
yield
def create_app() -> FastAPI:
app = FastAPI(lifespan=lifespan)
app.include_router(transcription_router)
app.include_router(translation_router)
app.include_router(diarization_router)
return app

View File

@@ -1,30 +0,0 @@
from typing import List
from fastapi import APIRouter, Depends, Request
from pydantic import BaseModel
from ..auth import apikey_auth
from ..services.diarizer import PyannoteDiarizationService
from ..utils import download_audio_file
router = APIRouter(tags=["diarization"])
class DiarizationSegment(BaseModel):
start: float
end: float
speaker: int
class DiarizationResponse(BaseModel):
diarization: List[DiarizationSegment]
@router.post(
"/diarize", dependencies=[Depends(apikey_auth)], response_model=DiarizationResponse
)
def diarize(request: Request, audio_file_url: str, timestamp: float = 0.0):
with download_audio_file(audio_file_url) as (file_path, _ext):
file_path = str(file_path)
diarizer: PyannoteDiarizationService = request.app.state.diarizer
return diarizer.diarize_file(file_path, timestamp=timestamp)

View File

@@ -1,109 +0,0 @@
import uuid
from typing import Optional, Union
from fastapi import APIRouter, Body, Depends, Form, HTTPException, Request, UploadFile
from pydantic import BaseModel
from pathlib import Path
from ..auth import apikey_auth
from ..config import SUPPORTED_FILE_EXTENSIONS, UPLOADS_PATH
from ..services.transcriber import MODEL_NAME
from ..utils import cleanup_uploaded_files, download_audio_file
router = APIRouter(prefix="/v1/audio", tags=["transcription"])
class WordTiming(BaseModel):
word: str
start: float
end: float
class TranscriptResult(BaseModel):
text: str
words: list[WordTiming]
filename: Optional[str] = None
class TranscriptBatchResponse(BaseModel):
results: list[TranscriptResult]
@router.post(
"/transcriptions",
dependencies=[Depends(apikey_auth)],
response_model=Union[TranscriptResult, TranscriptBatchResponse],
)
def transcribe(
request: Request,
file: UploadFile = None,
files: list[UploadFile] | None = None,
model: str = Form(MODEL_NAME),
language: str = Form("en"),
batch: bool = Form(False),
):
service = request.app.state.whisper
if not file and not files:
raise HTTPException(
status_code=400, detail="Either 'file' or 'files' parameter is required"
)
if batch and not files:
raise HTTPException(
status_code=400, detail="Batch transcription requires 'files'"
)
upload_files = [file] if file else files
uploaded_paths: list[Path] = []
with cleanup_uploaded_files(uploaded_paths):
for upload_file in upload_files:
audio_suffix = upload_file.filename.split(".")[-1].lower()
if audio_suffix not in SUPPORTED_FILE_EXTENSIONS:
raise HTTPException(
status_code=400,
detail=(
f"Unsupported audio format. Supported extensions: {', '.join(SUPPORTED_FILE_EXTENSIONS)}"
),
)
unique_filename = f"{uuid.uuid4()}.{audio_suffix}"
file_path = UPLOADS_PATH / unique_filename
with open(file_path, "wb") as f:
content = upload_file.file.read()
f.write(content)
uploaded_paths.append(file_path)
if batch and len(upload_files) > 1:
results = []
for path in uploaded_paths:
result = service.transcribe_file(str(path), language=language)
result["filename"] = path.name
results.append(result)
return {"results": results}
results = []
for path in uploaded_paths:
result = service.transcribe_file(str(path), language=language)
result["filename"] = path.name
results.append(result)
return {"results": results} if len(results) > 1 else results[0]
@router.post(
"/transcriptions-from-url",
dependencies=[Depends(apikey_auth)],
response_model=TranscriptResult,
)
def transcribe_from_url(
request: Request,
audio_file_url: str = Body(..., description="URL of the audio file to transcribe"),
model: str = Body(MODEL_NAME),
language: str = Body("en"),
timestamp_offset: float = Body(0.0),
):
service = request.app.state.whisper
with download_audio_file(audio_file_url) as (file_path, _ext):
file_path = str(file_path)
result = service.transcribe_vad_url_segment(
file_path=file_path, timestamp_offset=timestamp_offset, language=language
)
return result

View File

@@ -1,28 +0,0 @@
from typing import Dict
from fastapi import APIRouter, Body, Depends
from pydantic import BaseModel
from ..auth import apikey_auth
from ..services.translator import TextTranslatorService
router = APIRouter(tags=["translation"])
translator = TextTranslatorService()
class TranslationResponse(BaseModel):
text: Dict[str, str]
@router.post(
"/translate",
dependencies=[Depends(apikey_auth)],
response_model=TranslationResponse,
)
def translate(
text: str,
source_language: str = Body("en"),
target_language: str = Body("fr"),
):
return translator.translate(text, source_language, target_language)

View File

@@ -1,42 +0,0 @@
import os
import threading
import torch
import torchaudio
from pyannote.audio import Pipeline
class PyannoteDiarizationService:
def __init__(self):
self._pipeline = None
self._device = "cpu"
self._lock = threading.Lock()
def load(self):
self._device = "cuda" if torch.cuda.is_available() else "cpu"
self._pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=os.environ.get("HF_TOKEN"),
)
self._pipeline.to(torch.device(self._device))
def diarize_file(self, file_path: str, timestamp: float = 0.0) -> dict:
if self._pipeline is None:
self.load()
waveform, sample_rate = torchaudio.load(file_path)
with self._lock:
diarization = self._pipeline(
{"waveform": waveform, "sample_rate": sample_rate}
)
words = []
for diarization_segment, _, speaker in diarization.itertracks(yield_label=True):
words.append(
{
"start": round(timestamp + diarization_segment.start, 3),
"end": round(timestamp + diarization_segment.end, 3),
"speaker": int(speaker[-2:])
if speaker and speaker[-2:].isdigit()
else 0,
}
)
return {"diarization": words}

View File

@@ -1,208 +0,0 @@
import os
import shutil
import subprocess
import threading
from typing import Generator
import faster_whisper
import librosa
import numpy as np
import torch
from fastapi import HTTPException
from silero_vad import VADIterator, load_silero_vad
from ..config import SAMPLE_RATE, VAD_CONFIG
# Whisper configuration (service-local defaults)
MODEL_NAME = "large-v2"
# None delegates compute type to runtime: float16 on CUDA, int8 on CPU
MODEL_COMPUTE_TYPE = None
MODEL_NUM_WORKERS = 1
CACHE_PATH = os.path.join(os.path.expanduser("~"), ".cache", "reflector-whisper")
from ..utils import NoStdStreams
class WhisperService:
def __init__(self):
self.model = None
self.device = "cpu"
self.lock = threading.Lock()
def load(self):
self.device = "cuda" if torch.cuda.is_available() else "cpu"
compute_type = MODEL_COMPUTE_TYPE or (
"float16" if self.device == "cuda" else "int8"
)
self.model = faster_whisper.WhisperModel(
MODEL_NAME,
device=self.device,
compute_type=compute_type,
num_workers=MODEL_NUM_WORKERS,
download_root=CACHE_PATH,
)
def pad_audio(self, audio_array, sample_rate: int = SAMPLE_RATE):
audio_duration = len(audio_array) / sample_rate
if audio_duration < VAD_CONFIG["silence_padding"]:
silence_samples = int(sample_rate * VAD_CONFIG["silence_padding"])
silence = np.zeros(silence_samples, dtype=np.float32)
return np.concatenate([audio_array, silence])
return audio_array
def enforce_word_timing_constraints(self, words: list[dict]) -> list[dict]:
if len(words) <= 1:
return words
enforced: list[dict] = []
for i, word in enumerate(words):
current = dict(word)
if i < len(words) - 1:
next_start = words[i + 1]["start"]
if current["end"] > next_start:
current["end"] = next_start
enforced.append(current)
return enforced
def transcribe_file(self, file_path: str, language: str = "en") -> dict:
input_for_model: str | "object" = file_path
try:
audio_array, _sample_rate = librosa.load(
file_path, sr=SAMPLE_RATE, mono=True
)
if len(audio_array) / float(SAMPLE_RATE) < VAD_CONFIG["silence_padding"]:
input_for_model = self.pad_audio(audio_array, SAMPLE_RATE)
except Exception:
pass
with self.lock:
with NoStdStreams():
segments, _ = self.model.transcribe(
input_for_model,
language=language,
beam_size=5,
word_timestamps=True,
vad_filter=True,
vad_parameters={"min_silence_duration_ms": 500},
)
segments = list(segments)
text = "".join(segment.text for segment in segments).strip()
words = [
{
"word": word.word,
"start": round(float(word.start), 2),
"end": round(float(word.end), 2),
}
for segment in segments
for word in segment.words
]
words = self.enforce_word_timing_constraints(words)
return {"text": text, "words": words}
def transcribe_vad_url_segment(
self, file_path: str, timestamp_offset: float = 0.0, language: str = "en"
) -> dict:
def load_audio_via_ffmpeg(input_path: str, sample_rate: int) -> np.ndarray:
ffmpeg_bin = shutil.which("ffmpeg") or "ffmpeg"
cmd = [
ffmpeg_bin,
"-nostdin",
"-threads",
"1",
"-i",
input_path,
"-f",
"f32le",
"-acodec",
"pcm_f32le",
"-ac",
"1",
"-ar",
str(sample_rate),
"pipe:1",
]
try:
proc = subprocess.run(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True
)
except Exception as e:
raise HTTPException(status_code=400, detail=f"ffmpeg failed: {e}")
audio = np.frombuffer(proc.stdout, dtype=np.float32)
return audio
def vad_segments(
audio_array,
sample_rate: int = SAMPLE_RATE,
window_size: int = VAD_CONFIG["window_size"],
) -> Generator[tuple[float, float], None, None]:
vad_model = load_silero_vad(onnx=False)
iterator = VADIterator(vad_model, sampling_rate=sample_rate)
start = None
for i in range(0, len(audio_array), window_size):
chunk = audio_array[i : i + window_size]
if len(chunk) < window_size:
chunk = np.pad(
chunk, (0, window_size - len(chunk)), mode="constant"
)
speech = iterator(chunk)
if not speech:
continue
if "start" in speech:
start = speech["start"]
continue
if "end" in speech and start is not None:
end = speech["end"]
yield (start / float(SAMPLE_RATE), end / float(SAMPLE_RATE))
start = None
iterator.reset_states()
audio_array = load_audio_via_ffmpeg(file_path, SAMPLE_RATE)
merged_batches: list[tuple[float, float]] = []
batch_start = None
batch_end = None
max_duration = VAD_CONFIG["batch_max_duration"]
for seg_start, seg_end in vad_segments(audio_array):
if batch_start is None:
batch_start, batch_end = seg_start, seg_end
continue
if seg_end - batch_start <= max_duration:
batch_end = seg_end
else:
merged_batches.append((batch_start, batch_end))
batch_start, batch_end = seg_start, seg_end
if batch_start is not None and batch_end is not None:
merged_batches.append((batch_start, batch_end))
all_text = []
all_words = []
for start_time, end_time in merged_batches:
s_idx = int(start_time * SAMPLE_RATE)
e_idx = int(end_time * SAMPLE_RATE)
segment = audio_array[s_idx:e_idx]
segment = self.pad_audio(segment, SAMPLE_RATE)
with self.lock:
segments, _ = self.model.transcribe(
segment,
language=language,
beam_size=5,
word_timestamps=True,
vad_filter=True,
vad_parameters={"min_silence_duration_ms": 500},
)
segments = list(segments)
text = "".join(seg.text for seg in segments).strip()
words = [
{
"word": w.word,
"start": round(float(w.start) + start_time + timestamp_offset, 2),
"end": round(float(w.end) + start_time + timestamp_offset, 2),
}
for seg in segments
for w in seg.words
]
if text:
all_text.append(text)
all_words.extend(words)
all_words = self.enforce_word_timing_constraints(all_words)
return {"text": " ".join(all_text), "words": all_words}

View File

@@ -1,44 +0,0 @@
import threading
from transformers import MarianMTModel, MarianTokenizer, pipeline
class TextTranslatorService:
"""Simple text-to-text translator using HuggingFace MarianMT models.
This mirrors the modal translator API shape but uses text translation only.
"""
def __init__(self):
self._pipeline = None
self._lock = threading.Lock()
def load(self, source_language: str = "en", target_language: str = "fr"):
# Pick a default MarianMT model pair if available; fall back to Helsinki-NLP en->fr
model_name = self._resolve_model_name(source_language, target_language)
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
self._pipeline = pipeline("translation", model=model, tokenizer=tokenizer)
def _resolve_model_name(self, src: str, tgt: str) -> str:
# Minimal mapping; extend as needed
pair = (src.lower(), tgt.lower())
mapping = {
("en", "fr"): "Helsinki-NLP/opus-mt-en-fr",
("fr", "en"): "Helsinki-NLP/opus-mt-fr-en",
("en", "es"): "Helsinki-NLP/opus-mt-en-es",
("es", "en"): "Helsinki-NLP/opus-mt-es-en",
("en", "de"): "Helsinki-NLP/opus-mt-en-de",
("de", "en"): "Helsinki-NLP/opus-mt-de-en",
}
return mapping.get(pair, "Helsinki-NLP/opus-mt-en-fr")
def translate(self, text: str, source_language: str, target_language: str) -> dict:
if self._pipeline is None:
self.load(source_language, target_language)
with self._lock:
results = self._pipeline(
text, src_lang=source_language, tgt_lang=target_language
)
translated = results[0]["translation_text"] if results else ""
return {"text": {source_language: text, target_language: translated}}

View File

@@ -1,107 +0,0 @@
import logging
import os
import sys
import uuid
from contextlib import contextmanager
from typing import Mapping
from urllib.parse import urlparse
from pathlib import Path
import requests
from fastapi import HTTPException
from .config import SUPPORTED_FILE_EXTENSIONS, UPLOADS_PATH
logger = logging.getLogger(__name__)
class NoStdStreams:
def __init__(self):
self.devnull = open(os.devnull, "w")
def __enter__(self):
self._stdout, self._stderr = sys.stdout, sys.stderr
self._stdout.flush()
self._stderr.flush()
sys.stdout, sys.stderr = self.devnull, self.devnull
def __exit__(self, exc_type, exc_value, traceback):
sys.stdout, sys.stderr = self._stdout, self._stderr
self.devnull.close()
def ensure_dirs():
UPLOADS_PATH.mkdir(parents=True, exist_ok=True)
def detect_audio_format(url: str, headers: Mapping[str, str]) -> str:
url_path = urlparse(url).path
for ext in SUPPORTED_FILE_EXTENSIONS:
if url_path.lower().endswith(f".{ext}"):
return ext
content_type = headers.get("content-type", "").lower()
if "audio/mpeg" in content_type or "audio/mp3" in content_type:
return "mp3"
if "audio/wav" in content_type:
return "wav"
if "audio/mp4" in content_type:
return "mp4"
raise HTTPException(
status_code=400,
detail=(
f"Unsupported audio format for URL. Supported extensions: {', '.join(SUPPORTED_FILE_EXTENSIONS)}"
),
)
def download_audio_to_uploads(audio_file_url: str) -> tuple[Path, str]:
response = requests.head(audio_file_url, allow_redirects=True)
if response.status_code == 404:
raise HTTPException(status_code=404, detail="Audio file not found")
response = requests.get(audio_file_url, allow_redirects=True)
response.raise_for_status()
audio_suffix = detect_audio_format(audio_file_url, response.headers)
unique_filename = f"{uuid.uuid4()}.{audio_suffix}"
file_path: Path = UPLOADS_PATH / unique_filename
with open(file_path, "wb") as f:
f.write(response.content)
return file_path, audio_suffix
@contextmanager
def download_audio_file(audio_file_url: str):
"""Download an audio file to UPLOADS_PATH and remove it after use.
Yields (file_path: Path, audio_suffix: str).
"""
file_path, audio_suffix = download_audio_to_uploads(audio_file_url)
try:
yield file_path, audio_suffix
finally:
try:
file_path.unlink(missing_ok=True)
except Exception as e:
logger.error("Error deleting temporary file %s: %s", file_path, e)
@contextmanager
def cleanup_uploaded_files(file_paths: list[Path]):
"""Ensure provided file paths are removed after use.
The provided list can be populated inside the context; all present entries
at exit will be deleted.
"""
try:
yield file_paths
finally:
for path in list(file_paths):
try:
path.unlink(missing_ok=True)
except Exception as e:
logger.error("Error deleting temporary file %s: %s", path, e)

View File

@@ -1,10 +0,0 @@
services:
reflector_gpu:
build:
context: .
ports:
- "8000:8000"
env_file:
- .env
volumes:
- ./cache:/root/.cache

View File

@@ -1,3 +0,0 @@
from app.factory import create_app
app = create_app()

View File

@@ -1,19 +0,0 @@
[project]
name = "reflector-gpu"
version = "0.1.0"
description = "Self-hosted GPU service for speech transcription, diarization, and translation via FastAPI."
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"fastapi[standard]>=0.116.1",
"uvicorn[standard]>=0.30.0",
"torch>=2.3.0",
"faster-whisper>=1.1.0",
"librosa==0.10.1",
"numpy<2",
"silero-vad==5.1.0",
"transformers>=4.35.0",
"sentencepiece",
"pyannote.audio==3.1.0",
"torchaudio>=2.3.0",
]

View File

@@ -1,17 +0,0 @@
#!/bin/sh
set -e
export PATH="/root/.local/bin:$PATH"
cd /app
# Install Python dependencies at runtime (first run or when FORCE_SYNC=1)
if [ ! -d "/app/.venv" ] || [ "$FORCE_SYNC" = "1" ]; then
echo "[startup] Installing Python dependencies with uv..."
uv sync --compile-bytecode --locked
else
echo "[startup] Using existing virtual environment at /app/.venv"
fi
exec uv run uvicorn main:app --host 0.0.0.0 --port 8000

3013
gpu/self_hosted/uv.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,118 +0,0 @@
# AsyncIO Event Loop Analysis for test_attendee_parsing_bug.py
## Problem Summary
The test passes but encounters an error during teardown where asyncpg tries to use a different/closed event loop, resulting in:
- `RuntimeError: Task got Future attached to a different loop`
- `RuntimeError: Event loop is closed`
## Root Cause Analysis
### 1. Multiple Event Loop Creation Points
The test environment creates event loops at different scopes:
1. **Session-scoped loop** (conftest.py:27-34):
- Created once per test session
- Used by session-scoped fixtures
- Closed after all tests complete
2. **Function-scoped loop** (pytest-asyncio default):
- Created for each async test function
- This is the loop that runs the actual test
- Closed immediately after test completes
3. **AsyncPG internal loop**:
- AsyncPG connections store a reference to the loop they were created with
- Used for connection lifecycle management
### 2. Event Loop Lifecycle Mismatch
The issue occurs because:
1. **Session fixture creates database connection** on session-scoped loop
2. **Test runs** on function-scoped loop (different from session loop)
3. **During teardown**, the session fixture tries to rollback/close using the original session loop
4. **AsyncPG connection** still references the function-scoped loop which is now closed
5. **Conflict**: SQLAlchemy tries to use session loop, but asyncpg Future is attached to the closed function loop
### 3. Configuration Issues
Current pytest configuration:
- `asyncio_mode = "auto"` in pyproject.toml
- `asyncio_default_fixture_loop_scope=session` (shown in test output)
- `asyncio_default_test_loop_scope=function` (shown in test output)
This mismatch between fixture loop scope (session) and test loop scope (function) causes the problem.
## Solutions
### Option 1: Align Loop Scopes (Recommended)
Change pytest-asyncio configuration to use consistent loop scopes:
```python
# pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function" # Change from session to function
```
### Option 2: Use Function-Scoped Database Fixture
Change the `session` fixture scope from session to function:
```python
@pytest_asyncio.fixture # Remove scope="session"
async def session(setup_database):
# ... existing code ...
```
### Option 3: Explicit Loop Management
Ensure all async operations use the same loop:
```python
@pytest_asyncio.fixture
async def session(setup_database, event_loop):
# Force using the current event loop
engine = create_async_engine(
settings.DATABASE_URL,
echo=False,
poolclass=NullPool,
connect_args={"loop": event_loop} # Pass explicit loop
)
# ... rest of fixture ...
```
### Option 4: Upgrade pytest-asyncio
The current version (1.1.0) has known issues with loop management. Consider upgrading to the latest version which has better loop scope handling.
## Immediate Workaround
For the test to run cleanly without the teardown error, you can:
1. Add explicit cleanup in the test:
```python
@pytest.mark.asyncio
async def test_attendee_parsing_bug(session):
# ... existing test code ...
# Explicit cleanup before fixture teardown
await session.commit() # or await session.close()
```
2. Or suppress the teardown error (not recommended for production):
```python
@pytest.fixture
async def session(setup_database):
# ... existing setup ...
try:
yield session
await session.rollback()
except RuntimeError as e:
if "Event loop is closed" not in str(e):
raise
finally:
await session.close()
```
## Recommendation
The cleanest solution is to align the loop scopes by setting both fixture and test loop scopes to "function" scope. This ensures each test gets its own clean event loop and avoids cross-contamination between tests.

View File

@@ -0,0 +1,212 @@
# Event Logger for Docker-Jitsi-Meet
A Prosody module that logs Jitsi meeting events to JSONL files alongside recordings, enabling complete participant tracking and speaker statistics.
## Prerequisites
- Running docker-jitsi-meet installation
- Jibri configured for recording
## Installation
### Step 1: Copy the Module
Copy the Prosody module to your custom plugins directory:
```bash
# Create the directory if it doesn't exist
mkdir -p ~/.jitsi-meet-cfg/prosody/prosody-plugins-custom
# Copy the module
cp mod_event_logger.lua ~/.jitsi-meet-cfg/prosody/prosody-plugins-custom/
```
### Step 2: Update Your .env File
Add or modify these variables in your `.env` file:
```bash
# If XMPP_MUC_MODULES already exists, append event_logger
# Example: XMPP_MUC_MODULES=existing_module,event_logger
XMPP_MUC_MODULES=event_logger
# Optional: Configure the module (these are defaults)
JIBRI_RECORDINGS_PATH=/config/recordings
JIBRI_LOG_SPEAKER_STATS=true
JIBRI_SPEAKER_STATS_INTERVAL=10
```
**Important**: If you already have `XMPP_MUC_MODULES` defined, add `event_logger` to the comma-separated list:
```bash
# Existing modules + our module
XMPP_MUC_MODULES=mod_info,mod_alert,event_logger
```
### Step 3: Modify docker-compose.yml
Add a shared recordings volume so Prosody can write events alongside Jibri recordings:
```yaml
services:
prosody:
# ... existing configuration ...
volumes:
- ${CONFIG}/prosody/config:/config:Z
- ${CONFIG}/prosody/prosody-plugins-custom:/prosody-plugins-custom:Z
- ${CONFIG}/recordings:/config/recordings:Z # Add this line
environment:
# Add if not using .env file
- XMPP_MUC_MODULES=${XMPP_MUC_MODULES:-event_logger}
- JIBRI_RECORDINGS_PATH=/config/recordings
jibri:
# ... existing configuration ...
volumes:
- ${CONFIG}/jibri:/config:Z
- ${CONFIG}/recordings:/config/recordings:Z # Add this line
environment:
# For Reflector webhook integration (optional)
- REFLECTOR_WEBHOOK_URL=${REFLECTOR_WEBHOOK_URL:-}
- JIBRI_FINALIZE_RECORDING_SCRIPT_PATH=/config/finalize.sh
```
### Step 4: Add Finalize Script (Optional - For Reflector Integration)
If you want to notify Reflector when recordings complete:
```bash
# Copy the finalize script
cp finalize.sh ~/.jitsi-meet-cfg/jibri/finalize.sh
chmod +x ~/.jitsi-meet-cfg/jibri/finalize.sh
# Add to .env
REFLECTOR_WEBHOOK_URL=http://your-reflector-api:8000
```
### Step 5: Restart Services
```bash
docker-compose down
docker-compose up -d
```
## What Gets Created
After a recording, you'll find in `~/.jitsi-meet-cfg/recordings/{session-id}/`:
- `recording.mp4` - The video recording (created by Jibri)
- `metadata.json` - Basic metadata (created by Jibri)
- `events.jsonl` - Complete participant timeline (created by this module)
## Event Format
Each line in `events.jsonl` is a JSON object:
```json
{"type":"room_created","timestamp":1234567890,"room_name":"TestRoom","room_jid":"testroom@conference.meet.jitsi","meeting_url":"https://meet.jitsi/TestRoom"}
{"type":"recording_started","timestamp":1234567891,"room_name":"TestRoom","session_id":"20240115120000_TestRoom","jibri_jid":"jibri@recorder.meet.jitsi"}
{"type":"participant_joined","timestamp":1234567892,"room_name":"TestRoom","participant":{"jid":"user1@meet.jitsi/web","nick":"John Doe","id":"user1@meet.jitsi","is_moderator":false}}
{"type":"speaker_active","timestamp":1234567895,"room_name":"TestRoom","speaker_jid":"user1@meet.jitsi","speaker_nick":"John Doe","duration":10}
{"type":"participant_left","timestamp":1234567920,"room_name":"TestRoom","participant":{"jid":"user1@meet.jitsi/web","nick":"John Doe","duration_seconds":28}}
{"type":"recording_stopped","timestamp":1234567950,"room_name":"TestRoom","session_id":"20240115120000_TestRoom","meeting_url":"https://meet.jitsi/TestRoom"}
```
## Configuration Options
All configuration can be done via environment variables:
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `JIBRI_RECORDINGS_PATH` | `/config/recordings` | Path where recordings are stored |
| `JIBRI_LOG_SPEAKER_STATS` | `true` | Enable speaker statistics logging |
| `JIBRI_SPEAKER_STATS_INTERVAL` | `10` | Seconds between speaker stats updates |
## Verifying Installation
Check that the module is loaded:
```bash
docker-compose logs prosody | grep "Event Logger"
# Should see: "Event Logger loaded - writing to /config/recordings"
```
Check for events after a recording:
```bash
ls -la ~/.jitsi-meet-cfg/recordings/*/events.jsonl
cat ~/.jitsi-meet-cfg/recordings/*/events.jsonl | jq .
```
## Troubleshooting
### No events.jsonl file created
1. **Check module is enabled**:
```bash
docker-compose exec prosody grep -r "event_logger" /config
```
2. **Verify volume permissions**:
```bash
docker-compose exec prosody ls -la /config/recordings
```
3. **Check Prosody logs for errors**:
```bash
docker-compose logs prosody | grep -i error
```
### Module not loading
1. **Verify file exists in container**:
```bash
docker-compose exec prosody ls -la /prosody-plugins-custom/
```
2. **Check XMPP_MUC_MODULES format** (must be comma-separated, no spaces):
- ✅ Correct: `XMPP_MUC_MODULES=mod1,mod2,event_logger`
- ❌ Wrong: `XMPP_MUC_MODULES=mod1, mod2, event_logger`
## Common docker-compose.yml Patterns
### Minimal Addition (if you trust defaults)
```yaml
services:
prosody:
volumes:
- ${CONFIG}/recordings:/config/recordings:Z # Just add this
```
### Full Configuration
```yaml
services:
prosody:
volumes:
- ${CONFIG}/prosody/config:/config:Z
- ${CONFIG}/prosody/prosody-plugins-custom:/prosody-plugins-custom:Z
- ${CONFIG}/recordings:/config/recordings:Z
environment:
- XMPP_MUC_MODULES=event_logger
- JIBRI_RECORDINGS_PATH=/config/recordings
- JIBRI_LOG_SPEAKER_STATS=true
- JIBRI_SPEAKER_STATS_INTERVAL=10
jibri:
volumes:
- ${CONFIG}/jibri:/config:Z
- ${CONFIG}/recordings:/config/recordings:Z
environment:
- JIBRI_RECORDING_DIR=/config/recordings
- JIBRI_FINALIZE_RECORDING_SCRIPT_PATH=/config/finalize.sh
```
## Integration with Reflector
The finalize.sh script will automatically notify Reflector when a recording completes if `REFLECTOR_WEBHOOK_URL` is set. Reflector will receive:
```json
{
"session_id": "20240115120000_TestRoom",
"path": "20240115120000_TestRoom",
"meeting_url": "https://meet.jitsi/TestRoom"
}
```
Reflector then processes the recording along with the complete participant timeline from `events.jsonl`.

View File

@@ -0,0 +1,49 @@
#!/bin/bash
# Jibri finalize script to notify Reflector when recording is complete
# This script is called by Jibri with the recording directory as argument
RECORDING_PATH="$1"
SESSION_ID=$(basename "$RECORDING_PATH")
METADATA_FILE="$RECORDING_PATH/metadata.json"
# Extract meeting URL from Jibri's metadata
MEETING_URL=""
if [ -f "$METADATA_FILE" ]; then
MEETING_URL=$(jq -r '.meeting_url' "$METADATA_FILE" 2>/dev/null || echo "")
fi
echo "[$(date)] Recording finalized: $RECORDING_PATH"
echo "[$(date)] Session ID: $SESSION_ID"
echo "[$(date)] Meeting URL: $MEETING_URL"
# Check if events.jsonl was created by our Prosody module
if [ -f "$RECORDING_PATH/events.jsonl" ]; then
EVENT_COUNT=$(wc -l < "$RECORDING_PATH/events.jsonl")
echo "[$(date)] Found events.jsonl with $EVENT_COUNT events"
else
echo "[$(date)] Warning: No events.jsonl found"
fi
# Notify Reflector if webhook URL is configured
if [ -n "$REFLECTOR_WEBHOOK_URL" ]; then
echo "[$(date)] Notifying Reflector at: $REFLECTOR_WEBHOOK_URL"
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "$REFLECTOR_WEBHOOK_URL/api/v1/jibri/recording-ready" \
-H "Content-Type: application/json" \
-d "{\"session_id\":\"$SESSION_ID\",\"path\":\"$SESSION_ID\",\"meeting_url\":\"$MEETING_URL\"}")
HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
BODY=$(echo "$RESPONSE" | sed '$d')
if [ "$HTTP_CODE" = "200" ]; then
echo "[$(date)] Reflector notified successfully"
echo "[$(date)] Response: $BODY"
else
echo "[$(date)] Failed to notify Reflector. HTTP code: $HTTP_CODE"
echo "[$(date)] Response: $BODY"
fi
else
echo "[$(date)] No REFLECTOR_WEBHOOK_URL configured, skipping notification"
fi
echo "[$(date)] Finalize script completed"

View File

@@ -0,0 +1,372 @@
local json = require "util.json"
local st = require "util.stanza"
local jid_bare = require "util.jid".bare
local recordings_path = os.getenv("JIBRI_RECORDINGS_PATH") or
module:get_option_string("jibri_recordings_path", "/recordings")
-- room_jid -> { session_id, participants = {jid -> info} }
local active_recordings = {}
-- room_jid -> { participants = {jid -> info}, created_at }
local room_states = {}
local function get_timestamp()
return os.time()
end
local function write_event(session_id, event)
if not session_id then
module:log("warn", "No session_id for event: %s", event.type)
return
end
local session_dir = string.format("%s/%s", recordings_path, session_id)
local event_file = string.format("%s/events.jsonl", session_dir)
module:log("info", "Writing event %s to %s", event.type, event_file)
-- Create directory
local mkdir_cmd = string.format("mkdir -p '%s' 2>&1", session_dir)
local mkdir_result = os.execute(mkdir_cmd)
module:log("debug", "mkdir result: %s", tostring(mkdir_result))
local file, err = io.open(event_file, "a")
if file then
local json_str = json.encode(event)
file:write(json_str .. "\n")
file:close()
module:log("info", "Successfully wrote event %s", event.type)
else
module:log("error", "Failed to write event to %s: %s", event_file, err)
end
end
local function extract_participant_info(occupant)
local info = {
jid = occupant.jid,
bare_jid = occupant.bare_jid,
nick = occupant.nick,
display_name = nil,
role = occupant.role
}
local presence = occupant:get_presence()
if presence then
local nick_element = presence:get_child("nick", "http://jabber.org/protocol/nick")
if nick_element then
info.display_name = nick_element:get_text()
end
local identity = presence:get_child("identity")
if identity then
local user = identity:get_child("user")
if user then
local name = user:get_child("name")
if name then
info.display_name = name:get_text()
end
local id_element = user:get_child("id")
if id_element then
info.id = id_element:get_text()
end
end
end
if not info.display_name and occupant.nick then
local _, _, resource = occupant.nick:match("([^@]+)@([^/]+)/(.+)")
if resource then
info.display_name = resource
end
end
end
return info
end
local function get_room_participant_count(room)
local count = 0
for _ in room:each_occupant() do
count = count + 1
end
return count
end
local function snapshot_room_participants(room)
local participants = {}
local total = 0
local skipped = 0
module:log("info", "Snapshotting room participants")
for _, occupant in room:each_occupant() do
total = total + 1
-- Skip recorders (Jibri)
if occupant.bare_jid and (occupant.bare_jid:match("^recorder@") or
occupant.bare_jid:match("^jibri@")) then
skipped = skipped + 1
else
local info = extract_participant_info(occupant)
participants[occupant.jid] = info
module:log("debug", "Added participant: %s", info.display_name or info.bare_jid)
end
end
module:log("info", "Snapshot: %d total, %d participants", total, total - skipped)
return participants
end
-- Import utility functions if available
local util = module:require "util";
local get_room_from_jid = util.get_room_from_jid;
local room_jid_match_rewrite = util.room_jid_match_rewrite;
-- Main IQ handler for Jibri stanzas
module:hook("pre-iq/full", function(event)
local stanza = event.stanza
if stanza.name ~= "iq" then
return
end
local jibri = stanza:get_child('jibri', 'http://jitsi.org/protocol/jibri')
if not jibri then
return
end
module:log("info", "=== Jibri IQ intercepted ===")
local action = jibri.attr.action
local session_id = jibri.attr.session_id
local room_jid = jibri.attr.room
local recording_mode = jibri.attr.recording_mode
local app_data = jibri.attr.app_data
module:log("info", "Jibri %s - session: %s, room: %s, mode: %s",
action or "?", session_id or "?", room_jid or "?", recording_mode or "?")
if not room_jid or not session_id then
module:log("warn", "Missing room_jid or session_id")
return
end
-- Get the room using util function
local room = get_room_from_jid(room_jid_match_rewrite(jid_bare(stanza.attr.to)))
if not room then
-- Try with the room_jid directly
room = get_room_from_jid(room_jid)
end
if not room then
module:log("error", "Room not found for jid: %s", room_jid)
return
end
module:log("info", "Room found: %s", room:get_name() or room_jid)
if action == "start" then
module:log("info", "Recording START for session %s", session_id)
-- Count and snapshot participants
local participant_count = 0
for _ in room:each_occupant() do
participant_count = participant_count + 1
end
local participants = snapshot_room_participants(room)
local participant_list = {}
for jid, info in pairs(participants) do
table.insert(participant_list, info)
end
active_recordings[room_jid] = {
session_id = session_id,
participants = participants,
started_at = get_timestamp()
}
write_event(session_id, {
type = "recording_started",
timestamp = get_timestamp(),
room_jid = room_jid,
room_name = room:get_name(),
session_id = session_id,
recording_mode = recording_mode,
app_data = app_data,
participant_count = participant_count,
participants_at_start = participant_list
})
elseif action == "stop" then
module:log("info", "Recording STOP for session %s", session_id)
local recording = active_recordings[room_jid]
if recording and recording.session_id == session_id then
write_event(session_id, {
type = "recording_stopped",
timestamp = get_timestamp(),
room_jid = room_jid,
room_name = room:get_name(),
session_id = session_id,
duration = get_timestamp() - recording.started_at,
participant_count = get_room_participant_count(room)
})
active_recordings[room_jid] = nil
else
module:log("warn", "No active recording found for room %s", room_jid)
end
end
end);
-- Room and participant event hooks
local function setup_room_hooks(host_module)
module:log("info", "Setting up room hooks on %s", host_module.host or "unknown")
-- Room created
host_module:hook("muc-room-created", function(event)
local room = event.room
local room_jid = room.jid
room_states[room_jid] = {
participants = {},
created_at = get_timestamp()
}
module:log("info", "Room created: %s", room_jid)
end)
-- Room destroyed
host_module:hook("muc-room-destroyed", function(event)
local room = event.room
local room_jid = room.jid
room_states[room_jid] = nil
active_recordings[room_jid] = nil
module:log("info", "Room destroyed: %s", room_jid)
end)
-- Occupant joined
host_module:hook("muc-occupant-joined", function(event)
local room = event.room
local occupant = event.occupant
local room_jid = room.jid
-- Skip recorders
if occupant.bare_jid and (occupant.bare_jid:match("^recorder@") or
occupant.bare_jid:match("^jibri@")) then
return
end
local participant_info = extract_participant_info(occupant)
-- Update room state
if room_states[room_jid] then
room_states[room_jid].participants[occupant.jid] = participant_info
end
-- Log to active recording if exists
local recording = active_recordings[room_jid]
if recording then
recording.participants[occupant.jid] = participant_info
write_event(recording.session_id, {
type = "participant_joined",
timestamp = get_timestamp(),
room_jid = room_jid,
room_name = room:get_name(),
participant = participant_info,
participant_count = get_room_participant_count(room)
})
end
module:log("info", "Participant joined %s: %s (%d total)",
room:get_name() or room_jid,
participant_info.display_name or participant_info.bare_jid,
get_room_participant_count(room))
end)
-- Occupant left
host_module:hook("muc-occupant-left", function(event)
local room = event.room
local occupant = event.occupant
local room_jid = room.jid
-- Skip recorders
if occupant.bare_jid and (occupant.bare_jid:match("^recorder@") or
occupant.bare_jid:match("^jibri@")) then
return
end
local participant_info = extract_participant_info(occupant)
-- Update room state
if room_states[room_jid] then
room_states[room_jid].participants[occupant.jid] = nil
end
-- Log to active recording if exists
local recording = active_recordings[room_jid]
if recording then
if recording.participants[occupant.jid] then
recording.participants[occupant.jid] = nil
end
write_event(recording.session_id, {
type = "participant_left",
timestamp = get_timestamp(),
room_jid = room_jid,
room_name = room:get_name(),
participant = participant_info,
participant_count = get_room_participant_count(room)
})
end
module:log("info", "Participant left %s: %s (%d remaining)",
room:get_name() or room_jid,
participant_info.display_name or participant_info.bare_jid,
get_room_participant_count(room))
end)
end
-- Module initialization
local current_host = module:get_host()
local host_type = module:get_host_type()
module:log("info", "Event Logger loading on %s (type: %s)", current_host, host_type or "unknown")
module:log("info", "Recording path: %s", recordings_path)
-- Setup room hooks based on host type
if host_type == "component" and current_host:match("^[^.]+%.") then
setup_room_hooks(module)
else
-- Try to find and hook to MUC component
local process_host_module = util.process_host_module
local muc_component_host = module:get_option_string("muc_component") or
module:get_option_string("main_muc")
if not muc_component_host then
local possible_hosts = {
"muc." .. current_host,
"conference." .. current_host,
"rooms." .. current_host
}
for _, host in ipairs(possible_hosts) do
if prosody.hosts[host] then
muc_component_host = host
module:log("info", "Auto-detected MUC component: %s", muc_component_host)
break
end
end
end
if muc_component_host then
process_host_module(muc_component_host, function(host_module, host)
module:log("info", "Hooking to MUC events on %s", host)
setup_room_hooks(host_module)
end)
else
module:log("error", "Could not find MUC component")
end
end

View File

@@ -190,5 +190,5 @@ Use the pytest-based conformance tests to validate any new implementation (inclu
```
TRANSCRIPT_URL=https://<your-deployment-base> \
TRANSCRIPT_MODAL_API_KEY=your-api-key \
uv run -m pytest -m model_api --no-cov server/tests/test_model_api_transcript.py
uv run -m pytest -m gpu_modal --no-cov server/tests/test_gpu_modal_transcript.py
```

View File

@@ -0,0 +1,493 @@
# Jitsi Integration Configuration Guide
This guide provides step-by-step instructions for configuring Reflector to work with a self-hosted Jitsi Meet installation for video meetings and recording.
## Prerequisites
Before configuring Jitsi integration, ensure you have:
- **Self-hosted Jitsi Meet installation** (version 2.0.8922 or later recommended)
- **Jibri recording service** configured and running
- **Prosody XMPP server** with mod_event_sync module installed
- **Docker or system deployment** of Reflector with access to environment variables
- **SSL certificates** for secure communication between services
## Environment Configuration
Add the following environment variables to your Reflector deployment:
### Required Settings
```bash
# Jitsi Meet domain (without https://)
JITSI_DOMAIN=meet.example.com
# JWT secret for room authentication (generate with: openssl rand -hex 32)
JITSI_JWT_SECRET=your-64-character-hex-secret-here
# Webhook secret for secure event handling (generate with: openssl rand -hex 16)
JITSI_WEBHOOK_SECRET=your-32-character-hex-secret-here
# Application identifier (should match Jitsi configuration)
JITSI_APP_ID=reflector
# JWT issuer and audience (should match Jitsi configuration)
JITSI_JWT_ISSUER=reflector
JITSI_JWT_AUDIENCE=jitsi
```
### Example .env Configuration
```bash
# Add to your server/.env file
JITSI_DOMAIN=meet.mycompany.com
JITSI_JWT_SECRET=$(openssl rand -hex 32)
JITSI_WEBHOOK_SECRET=$(openssl rand -hex 16)
JITSI_APP_ID=reflector
JITSI_JWT_ISSUER=reflector
JITSI_JWT_AUDIENCE=jitsi
```
## Jitsi Meet Server Configuration
### 1. JWT Authentication Setup
Edit `/etc/prosody/conf.d/[YOUR_DOMAIN].cfg.lua`:
```lua
VirtualHost "meet.example.com"
authentication = "token"
app_id = "reflector"
app_secret = "your-jwt-secret-here"
-- Allow anonymous access for non-authenticated users
c2s_require_encryption = false
admins = { "focusUser@auth.meet.example.com" }
modules_enabled = {
"bosh";
"pubsub";
"ping";
"roster";
"saslauth";
"tls";
"dialback";
"disco";
"carbons";
"pep";
"private";
"blocklist";
"vcard";
"version";
"uptime";
"time";
"ping";
"register";
"admin_adhoc";
"token_verification";
"event_sync"; -- Required for webhook events
}
```
### 2. Room Access Control
Edit `/etc/jitsi/meet/meet.example.com-config.js`:
```javascript
var config = {
hosts: {
domain: 'meet.example.com',
muc: 'conference.meet.example.com'
},
// Enable JWT authentication
enableUserRolesBasedOnToken: true,
// Recording configuration
fileRecordingsEnabled: true,
liveStreamingEnabled: false,
// Reflector-specific settings
prejoinPageEnabled: true,
requireDisplayName: true,
};
```
### 3. Interface Configuration
Edit `/usr/share/jitsi-meet/interface_config.js`:
```javascript
var interfaceConfig = {
// Customize for Reflector branding
APP_NAME: 'Reflector Meeting',
DEFAULT_WELCOME_PAGE_LOGO_URL: 'https://your-domain.com/logo.png',
// Hide unnecessary buttons
TOOLBAR_BUTTONS: [
'microphone', 'camera', 'closedcaptions', 'desktop',
'fullscreen', 'fodeviceselection', 'hangup',
'chat', 'recording', 'livestreaming', 'etherpad',
'sharedvideo', 'settings', 'raisehand', 'videoquality',
'filmstrip', 'invite', 'feedback', 'stats', 'shortcuts',
'tileview', 'videobackgroundblur', 'download', 'help',
'mute-everyone'
]
};
```
## Jibri Configuration
### 1. Recording Service Setup
Edit `/etc/jitsi/jibri/jibri.conf`:
```hocon
jibri {
recording {
recordings-directory = "/var/recordings"
finalize-script = "/opt/jitsi/jibri/finalize.sh"
}
api {
xmpp {
environments = [{
name = "prod environment"
xmpp-server-hosts = ["meet.example.com"]
xmpp-domain = "meet.example.com"
control-muc {
domain = "internal.auth.meet.example.com"
room-name = "JibriBrewery"
nickname = "jibri-nickname"
}
control-login {
domain = "auth.meet.example.com"
username = "jibri"
password = "jibri-password"
}
}]
}
}
}
```
### 2. Finalize Script Setup
Create `/opt/jitsi/jibri/finalize.sh`:
```bash
#!/bin/bash
# Jibri finalize script for Reflector integration
RECORDING_FILE="$1"
ROOM_NAME="$2"
REFLECTOR_API_URL="${REFLECTOR_API_URL:-http://localhost:1250}"
WEBHOOK_SECRET="${JITSI_WEBHOOK_SECRET}"
# Generate webhook signature
generate_signature() {
local payload="$1"
echo -n "$payload" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | cut -d' ' -f2
}
# Prepare webhook payload
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ)
PAYLOAD=$(cat <<EOF
{
"room_name": "$ROOM_NAME",
"recording_file": "$RECORDING_FILE",
"recording_status": "completed",
"timestamp": "$TIMESTAMP"
}
EOF
)
# Generate signature
SIGNATURE=$(generate_signature "$PAYLOAD")
# Send webhook to Reflector
curl -X POST "$REFLECTOR_API_URL/v1/jibri/recording-complete" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: $SIGNATURE" \
-d "$PAYLOAD" \
--max-time 30
echo "Recording finalization webhook sent for room: $ROOM_NAME"
```
Make the script executable:
```bash
chmod +x /opt/jitsi/jibri/finalize.sh
```
## Prosody Event Configuration
### 1. Event-Sync Module Installation
Install the mod_event_sync module:
```bash
# Download the module
cd /usr/share/jitsi-meet/prosody-plugins/
wget https://raw.githubusercontent.com/jitsi-contrib/prosody-plugins/main/mod_event_sync.lua
# Or if using git
git clone https://github.com/jitsi-contrib/prosody-plugins.git
cp prosody-plugins/mod_event_sync.lua /usr/share/jitsi-meet/prosody-plugins/
```
### 2. Webhook Configuration
Add to `/etc/prosody/conf.d/[YOUR_DOMAIN].cfg.lua`:
```lua
Component "conference.meet.example.com" "muc"
storage = "memory"
modules_enabled = {
"muc_meeting_id";
"muc_domain_mapper";
"polls";
"event_sync"; -- Enable event sync
}
-- Event sync webhook configuration
event_sync_url = "https://your-reflector-domain.com/v1/jitsi/events"
event_sync_secret = "your-webhook-secret-here"
-- Events to track
event_sync_events = {
"muc-occupant-joined",
"muc-occupant-left",
"jibri-recording-on",
"jibri-recording-off"
}
```
### 3. Restart Services
After configuration changes, restart all services:
```bash
systemctl restart prosody
systemctl restart jicofo
systemctl restart jitsi-videobridge2
systemctl restart jibri
systemctl restart nginx
```
## Reflector Room Configuration
### 1. Create Jitsi Room
When creating rooms in Reflector, set the platform field:
```bash
curl -X POST "https://your-reflector-domain.com/v1/rooms" \
-H "Authorization: Bearer $AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-jitsi-room",
"platform": "jitsi",
"recording_type": "cloud",
"recording_trigger": "automatic-2nd-participant",
"is_locked": false,
"room_mode": "normal"
}'
```
### 2. Meeting Creation
Meetings will automatically use Jitsi when the room platform is set to "jitsi":
```bash
curl -X POST "https://your-reflector-domain.com/v1/rooms/my-jitsi-room/meeting" \
-H "Authorization: Bearer $AUTH_TOKEN"
```
## Testing the Integration
### 1. Health Check
Verify Jitsi webhook configuration:
```bash
curl "https://your-reflector-domain.com/v1/jitsi/health"
```
Expected response:
```json
{
"status": "ok",
"service": "jitsi-webhooks",
"timestamp": "2025-01-15T10:30:00.000Z",
"webhook_secret_configured": true
}
```
### 2. Room Creation Test
1. Create a Jitsi room via Reflector API
2. Start a meeting - should generate Jitsi Meet URL with JWT token
3. Join with multiple participants - should trigger participant events
4. Start recording - should trigger Jibri recording workflow
### 3. Webhook Event Test
Monitor Reflector logs for incoming webhook events:
```bash
# Check for participant events
curl -X POST "https://your-reflector-domain.com/v1/jitsi/events" \
-H "Content-Type: application/json" \
-H "X-Jitsi-Signature: test-signature" \
-d '{
"event": "muc-occupant-joined",
"room": "test-room-name",
"timestamp": "2025-01-15T10:30:00.000Z",
"data": {}
}'
```
## Troubleshooting
### Common Issues
#### JWT Authentication Failures
**Symptoms:** Users can't join rooms, "Authentication failed" errors
**Solutions:**
1. Verify JWT secret matches between Jitsi and Reflector
2. Check JWT token expiration (default 8 hours)
3. Ensure system clocks are synchronized
4. Validate JWT issuer/audience configuration
```bash
# Debug JWT tokens
echo "JWT_TOKEN_HERE" | cut -d'.' -f2 | base64 -d | jq
```
#### Webhook Events Not Received
**Symptoms:** Participant counts not updating, recording events missing
**Solutions:**
1. Verify event_sync module is loaded in Prosody
2. Check webhook URL accessibility from Jitsi server
3. Validate webhook signature generation
4. Review Prosody and Reflector logs
```bash
# Test webhook connectivity
curl -v "https://your-reflector-domain.com/v1/jitsi/health"
# Check Prosody logs
tail -f /var/log/prosody/prosody.log
# Check Reflector logs
docker logs your-reflector-container
```
#### Recording Issues
**Symptoms:** Recordings not starting, finalize script errors
**Solutions:**
1. Verify Jibri service status and configuration
2. Check recording directory permissions
3. Validate finalize script execution permissions
4. Monitor Jibri logs for errors
```bash
# Check Jibri status
systemctl status jibri
# Test finalize script
sudo -u jibri /opt/jitsi/jibri/finalize.sh "/test/recording.mp4" "test-room"
# Check Jibri logs
journalctl -u jibri -f
```
### Debug Commands
```bash
# Verify Jitsi configuration
prosodyctl check config
# Test JWT generation
curl -X POST "https://your-reflector-domain.com/v1/rooms/test/meeting" \
-H "Authorization: Bearer $TOKEN" -v
# Monitor webhook events
tail -f /var/log/reflector/app.log | grep jitsi
# Check room participant counts
curl "https://your-reflector-domain.com/v1/rooms" \
-H "Authorization: Bearer $TOKEN" | jq '.data[].num_clients'
```
### Performance Optimization
#### For High-Concurrent Usage
1. **Jitsi Videobridge Tuning:**
```bash
# /etc/jitsi/videobridge/sip-communicator.properties
org.jitsi.videobridge.STATISTICS_INTERVAL=5000
org.jitsi.videobridge.load.INITIAL_STREAM_LIMIT=50
```
2. **Database Connection Pooling:**
```python
# In your Reflector settings
DATABASE_POOL_SIZE=20
DATABASE_MAX_OVERFLOW=30
```
3. **Redis Configuration:**
```bash
# For webhook event caching
REDIS_URL=redis://localhost:6379/1
WEBHOOK_EVENT_TTL=3600
```
## Security Considerations
### Network Security
- Use HTTPS/WSS for all communications
- Implement proper firewall rules
- Consider VPN for server-to-server communication
### Authentication Security
- Rotate JWT secrets regularly
- Use strong webhook secrets (32+ characters)
- Implement rate limiting on webhook endpoints
### Recording Security
- Encrypt recordings at rest
- Implement access controls for recording files
- Regular security audits of file permissions
## Support
For additional support:
1. **Reflector Issues:** Check GitHub issues or create new ones
2. **Jitsi Community:** [Community Forum](https://community.jitsi.org/)
3. **Documentation:** [Jitsi Developer Guide](https://jitsi.github.io/handbook/)
## Migration from Whereby
If migrating from Whereby integration:
1. Update existing rooms to use "jitsi" platform
2. Verify webhook configurations are updated
3. Test recording workflows thoroughly
4. Monitor participant event accuracy
5. Update any custom integrations using meeting APIs
The platform abstraction layer ensures smooth migration with minimal API changes.

View File

@@ -3,7 +3,7 @@ from logging.config import fileConfig
from alembic import context
from sqlalchemy import engine_from_config, pool
from reflector.db.base import metadata
from reflector.db import metadata
from reflector.settings import settings
# this is the Alembic Config object, which provides

View File

@@ -0,0 +1,38 @@
"""Add events column to meetings table
Revision ID: 2890b5104577
Revises: 6e6ea8e607c5
Create Date: 2025-09-02 17:51:41.620777
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "2890b5104577"
down_revision: Union[str, None] = "6e6ea8e607c5"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.add_column(
sa.Column(
"events", sa.JSON(), server_default=sa.text("'[]'"), nullable=False
)
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.drop_column("events")
# ### end Alembic commands ###

View File

@@ -28,7 +28,7 @@ def upgrade() -> None:
transcript = table("transcript", column("id", sa.String), column("topics", sa.JSON))
# Select all rows from the transcript table
results = bind.execute(select(transcript.c.id, transcript.c.topics))
results = bind.execute(select([transcript.c.id, transcript.c.topics]))
for row in results:
transcript_id = row["id"]
@@ -58,7 +58,7 @@ def downgrade() -> None:
transcript = table("transcript", column("id", sa.String), column("topics", sa.JSON))
# Select all rows from the transcript table
results = bind.execute(select(transcript.c.id, transcript.c.topics))
results = bind.execute(select([transcript.c.id, transcript.c.topics]))
for row in results:
transcript_id = row["id"]

View File

@@ -36,7 +36,9 @@ def upgrade() -> None:
# select only the one with duration = 0
results = bind.execute(
select(transcript.c.id, transcript.c.duration).where(transcript.c.duration == 0)
select([transcript.c.id, transcript.c.duration]).where(
transcript.c.duration == 0
)
)
data_dir = Path(settings.DATA_DIR)

View File

@@ -1,53 +0,0 @@
"""remove_one_active_meeting_per_room_constraint
Revision ID: 6025e9b2bef2
Revises: 2ae3db106d4e
Create Date: 2025-08-18 18:45:44.418392
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "6025e9b2bef2"
down_revision: Union[str, None] = "2ae3db106d4e"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Remove the unique constraint that prevents multiple active meetings per room
# This is needed to support calendar integration with overlapping meetings
# Check if index exists before trying to drop it
from alembic import context
if context.get_context().dialect.name == "postgresql":
conn = op.get_bind()
result = conn.execute(
sa.text(
"SELECT 1 FROM pg_indexes WHERE indexname = 'idx_one_active_meeting_per_room'"
)
)
if result.fetchone():
op.drop_index("idx_one_active_meeting_per_room", table_name="meeting")
else:
# For SQLite, just try to drop it
try:
op.drop_index("idx_one_active_meeting_per_room", table_name="meeting")
except:
pass
def downgrade() -> None:
# Restore the unique constraint
op.create_index(
"idx_one_active_meeting_per_room",
"meeting",
["room_id"],
unique=True,
postgresql_where=sa.text("is_active = true"),
sqlite_where=sa.text("is_active = 1"),
)

View File

@@ -8,6 +8,7 @@ Create Date: 2025-09-10 10:47:06.006819
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
@@ -20,6 +21,7 @@ depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.alter_column("room_id", existing_type=sa.VARCHAR(), nullable=False)
batch_op.create_foreign_key(
None, "room", ["room_id"], ["id"], ondelete="CASCADE"
)
@@ -31,5 +33,6 @@ def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.drop_constraint("meeting_room_id_fkey", type_="foreignkey")
batch_op.alter_column("room_id", existing_type=sa.VARCHAR(), nullable=True)
# ### end Alembic commands ###

View File

@@ -0,0 +1,44 @@
"""Add VideoPlatform enum for rooms and meetings
Revision ID: 6e6ea8e607c5
Revises: 61882a919591
Create Date: 2025-09-02 17:33:21.022214
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "6e6ea8e607c5"
down_revision: Union[str, None] = "61882a919591"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.add_column(
sa.Column("platform", sa.String(), server_default="whereby", nullable=False)
)
with op.batch_alter_table("room", schema=None) as batch_op:
batch_op.add_column(
sa.Column("platform", sa.String(), server_default="whereby", nullable=False)
)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("room", schema=None) as batch_op:
batch_op.drop_column("platform")
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.drop_column("platform")
# ### end Alembic commands ###

View File

@@ -28,7 +28,7 @@ def upgrade() -> None:
transcript = table("transcript", column("id", sa.String), column("topics", sa.JSON))
# Select all rows from the transcript table
results = bind.execute(select(transcript.c.id, transcript.c.topics))
results = bind.execute(select([transcript.c.id, transcript.c.topics]))
for row in results:
transcript_id = row["id"]
@@ -58,7 +58,7 @@ def downgrade() -> None:
transcript = table("transcript", column("id", sa.String), column("topics", sa.JSON))
# Select all rows from the transcript table
results = bind.execute(select(transcript.c.id, transcript.c.topics))
results = bind.execute(select([transcript.c.id, transcript.c.topics]))
for row in results:
transcript_id = row["id"]

View File

@@ -1,34 +0,0 @@
"""add_grace_period_fields_to_meeting
Revision ID: d4a1c446458c
Revises: 6025e9b2bef2
Create Date: 2025-08-18 18:50:37.768052
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "d4a1c446458c"
down_revision: Union[str, None] = "6025e9b2bef2"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Add fields to track when participants left for grace period logic
op.add_column(
"meeting", sa.Column("last_participant_left_at", sa.DateTime(timezone=True))
)
op.add_column(
"meeting",
sa.Column("grace_period_minutes", sa.Integer, server_default=sa.text("15")),
)
def downgrade() -> None:
op.drop_column("meeting", "grace_period_minutes")
op.drop_column("meeting", "last_participant_left_at")

View File

@@ -1,129 +0,0 @@
"""add calendar
Revision ID: d8e204bbf615
Revises: d4a1c446458c
Create Date: 2025-09-10 19:56:22.295756
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects import postgresql
# revision identifiers, used by Alembic.
revision: str = "d8e204bbf615"
down_revision: Union[str, None] = "d4a1c446458c"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.create_table(
"calendar_event",
sa.Column("id", sa.String(), nullable=False),
sa.Column("room_id", sa.String(), nullable=False),
sa.Column("ics_uid", sa.Text(), nullable=False),
sa.Column("title", sa.Text(), nullable=True),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("start_time", sa.DateTime(timezone=True), nullable=False),
sa.Column("end_time", sa.DateTime(timezone=True), nullable=False),
sa.Column("attendees", postgresql.JSONB(astext_type=sa.Text()), nullable=True),
sa.Column("location", sa.Text(), nullable=True),
sa.Column("ics_raw_data", sa.Text(), nullable=True),
sa.Column("last_synced", sa.DateTime(timezone=True), nullable=False),
sa.Column(
"is_deleted", sa.Boolean(), server_default=sa.text("false"), nullable=False
),
sa.Column("created_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False),
sa.ForeignKeyConstraint(
["room_id"],
["room.id"],
name="fk_calendar_event_room_id",
ondelete="CASCADE",
),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("room_id", "ics_uid", name="uq_room_calendar_event"),
)
with op.batch_alter_table("calendar_event", schema=None) as batch_op:
batch_op.create_index(
"idx_calendar_event_deleted",
["is_deleted"],
unique=False,
postgresql_where=sa.text("NOT is_deleted"),
)
batch_op.create_index(
"idx_calendar_event_room_start", ["room_id", "start_time"], unique=False
)
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.add_column(sa.Column("calendar_event_id", sa.String(), nullable=True))
batch_op.add_column(
sa.Column(
"calendar_metadata",
postgresql.JSONB(astext_type=sa.Text()),
nullable=True,
)
)
batch_op.create_index(
"idx_meeting_calendar_event", ["calendar_event_id"], unique=False
)
batch_op.create_foreign_key(
"fk_meeting_calendar_event_id",
"calendar_event",
["calendar_event_id"],
["id"],
ondelete="SET NULL",
)
with op.batch_alter_table("room", schema=None) as batch_op:
batch_op.add_column(sa.Column("ics_url", sa.Text(), nullable=True))
batch_op.add_column(
sa.Column(
"ics_fetch_interval", sa.Integer(), server_default="300", nullable=True
)
)
batch_op.add_column(
sa.Column(
"ics_enabled",
sa.Boolean(),
server_default=sa.text("false"),
nullable=False,
)
)
batch_op.add_column(
sa.Column("ics_last_sync", sa.DateTime(timezone=True), nullable=True)
)
batch_op.add_column(sa.Column("ics_last_etag", sa.Text(), nullable=True))
batch_op.create_index("idx_room_ics_enabled", ["ics_enabled"], unique=False)
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table("room", schema=None) as batch_op:
batch_op.drop_index("idx_room_ics_enabled")
batch_op.drop_column("ics_last_etag")
batch_op.drop_column("ics_last_sync")
batch_op.drop_column("ics_enabled")
batch_op.drop_column("ics_fetch_interval")
batch_op.drop_column("ics_url")
with op.batch_alter_table("meeting", schema=None) as batch_op:
batch_op.drop_constraint("fk_meeting_calendar_event_id", type_="foreignkey")
batch_op.drop_index("idx_meeting_calendar_event")
batch_op.drop_column("calendar_metadata")
batch_op.drop_column("calendar_event_id")
with op.batch_alter_table("calendar_event", schema=None) as batch_op:
batch_op.drop_index("idx_calendar_event_room_start")
batch_op.drop_index(
"idx_calendar_event_deleted", postgresql_where=sa.text("NOT is_deleted")
)
op.drop_table("calendar_event")
# ### end Alembic commands ###

View File

@@ -1,43 +0,0 @@
"""remove_grace_period_fields
Revision ID: dc035ff72fd5
Revises: d8e204bbf615
Create Date: 2025-09-11 10:36:45.197588
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = "dc035ff72fd5"
down_revision: Union[str, None] = "d8e204bbf615"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Remove grace period columns from meeting table
op.drop_column("meeting", "last_participant_left_at")
op.drop_column("meeting", "grace_period_minutes")
def downgrade() -> None:
# Add back grace period columns to meeting table
op.add_column(
"meeting",
sa.Column(
"last_participant_left_at", sa.DateTime(timezone=True), nullable=True
),
)
op.add_column(
"meeting",
sa.Column(
"grace_period_minutes",
sa.Integer(),
server_default=sa.text("15"),
nullable=True,
),
)

View File

@@ -12,6 +12,7 @@ dependencies = [
"requests>=2.31.0",
"aiortc>=1.5.0",
"sortedcontainers>=2.4.0",
"loguru>=0.7.0",
"pydantic-settings>=2.0.2",
"structlog>=23.1.0",
"uvicorn[standard]>=0.23.1",
@@ -19,13 +20,14 @@ dependencies = [
"sentry-sdk[fastapi]>=1.29.2",
"httpx>=0.24.1",
"fastapi-pagination>=0.12.6",
"sqlalchemy>=2.0.0",
"asyncpg>=0.29.0",
"databases[aiosqlite, asyncpg]>=0.7.0",
"sqlalchemy<1.5",
"alembic>=1.11.3",
"nltk>=3.8.1",
"prometheus-fastapi-instrumentator>=6.1.0",
"sentencepiece>=0.1.99",
"protobuf>=4.24.3",
"profanityfilter>=2.0.6",
"celery>=5.3.4",
"redis>=5.0.1",
"python-jose[cryptography]>=3.3.0",
@@ -38,7 +40,7 @@ dependencies = [
"llama-index-llms-openai-like>=0.4.0",
"pytest-env>=1.1.5",
"webvtt-py>=0.5.0",
"icalendar>=6.0.0",
"PyJWT>=2.8.0",
]
[dependency-groups]
@@ -46,7 +48,6 @@ dev = [
"black>=24.1.1",
"stamina>=23.1.0",
"pyinstrument>=4.6.1",
"pytest-async-sqlalchemy>=0.2.0",
]
tests = [
"pytest-cov>=4.1.0",
@@ -112,17 +113,14 @@ source = ["reflector"]
[tool.pytest_env]
ENVIRONMENT = "pytest"
DATABASE_URL = "postgresql+asyncpg://test_user:test_password@localhost:15432/reflector_test"
DATABASE_URL = "postgresql://test_user:test_password@localhost:15432/reflector_test"
[tool.pytest.ini_options]
addopts = "-ra -q --disable-pytest-warnings --cov --cov-report html -v"
testpaths = ["tests"]
asyncio_mode = "auto"
asyncio_debug = true
asyncio_default_fixture_loop_scope = "session"
asyncio_default_test_loop_scope = "session"
markers = [
"model_api: tests for the unified model-serving HTTP API (backend- and hardware-agnostic)",
"gpu_modal: mark test to run only with GPU Modal endpoints (deselect with '-m \"not gpu_modal\"')",
]
[tool.ruff.lint]
@@ -134,7 +132,7 @@ select = [
[tool.ruff.lint.per-file-ignores]
"reflector/processors/summary/summary_builder.py" = ["E501"]
"gpu/modal_deployments/**.py" = ["PLC0415"]
"gpu/**.py" = ["PLC0415"]
"reflector/tools/**.py" = ["PLC0415"]
"migrations/versions/**.py" = ["PLC0415"]
"tests/**.py" = ["PLC0415"]

View File

@@ -12,6 +12,9 @@ from reflector.events import subscribers_shutdown, subscribers_startup
from reflector.logger import logger
from reflector.metrics import metrics_init
from reflector.settings import settings
from reflector.video_platforms.jitsi import router as jitsi_router
from reflector.video_platforms.whereby import router as whereby_router
from reflector.views.jibri_webhook import router as jibri_webhook_router
from reflector.views.meetings import router as meetings_router
from reflector.views.rooms import router as rooms_router
from reflector.views.rtc_offer import router as rtc_offer_router
@@ -26,7 +29,6 @@ from reflector.views.transcripts_upload import router as transcripts_upload_rout
from reflector.views.transcripts_webrtc import router as transcripts_webrtc_router
from reflector.views.transcripts_websocket import router as transcripts_websocket_router
from reflector.views.user import router as user_router
from reflector.views.whereby import router as whereby_router
from reflector.views.zulip import router as zulip_router
try:
@@ -86,6 +88,8 @@ app.include_router(transcripts_process_router, prefix="/v1")
app.include_router(user_router, prefix="/v1")
app.include_router(zulip_router, prefix="/v1")
app.include_router(whereby_router, prefix="/v1")
app.include_router(jitsi_router, prefix="/v1")
app.include_router(jibri_webhook_router) # No /v1 prefix, uses /api/v1/jibri
add_pagination(app)
# prepare celery

View File

@@ -1,14 +1,21 @@
import asyncio
import functools
from reflector.db import get_database
def asynctask(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
async def run_async():
return await f(*args, **kwargs)
async def run_with_db():
database = get_database()
await database.connect()
try:
return await f(*args, **kwargs)
finally:
await database.disconnect()
coro = run_async()
coro = run_with_db()
try:
loop = asyncio.get_running_loop()
except RuntimeError:

View File

@@ -67,8 +67,7 @@ def current_user(
try:
payload = jwtauth.verify_token(token)
sub = payload["sub"]
email = payload["email"]
return UserInfo(sub=sub, email=email)
return UserInfo(sub=sub)
except JWTError as e:
logger.error(f"JWT error: {e}")
raise HTTPException(status_code=401, detail="Invalid authentication")

View File

@@ -1,69 +1,47 @@
from typing import AsyncGenerator
import contextvars
from typing import Optional
from sqlalchemy.ext.asyncio import (
AsyncEngine,
AsyncSession,
async_sessionmaker,
create_async_engine,
)
import databases
import sqlalchemy
from reflector.db.base import Base as Base
from reflector.db.base import metadata as metadata
from reflector.events import subscribers_shutdown, subscribers_startup
from reflector.settings import settings
_engine: AsyncEngine | None = None
_session_factory: async_sessionmaker[AsyncSession] | None = None
metadata = sqlalchemy.MetaData()
_database_context: contextvars.ContextVar[Optional[databases.Database]] = (
contextvars.ContextVar("database", default=None)
)
def get_engine() -> AsyncEngine:
global _engine
if _engine is None:
_engine = create_async_engine(
settings.DATABASE_URL,
echo=False,
pool_pre_ping=True,
)
return _engine
def get_database() -> databases.Database:
"""Get database instance for current asyncio context"""
db = _database_context.get()
if db is None:
db = databases.Database(settings.DATABASE_URL)
_database_context.set(db)
return db
def get_session_factory() -> async_sessionmaker[AsyncSession]:
global _session_factory
if _session_factory is None:
_session_factory = async_sessionmaker(
get_engine(),
class_=AsyncSession,
expire_on_commit=False,
)
return _session_factory
async def _get_session() -> AsyncGenerator[AsyncSession, None]:
# necessary implementation to ease mocking on pytest
async with get_session_factory()() as session:
yield session
async def get_session() -> AsyncGenerator[AsyncSession, None]:
async for session in _get_session():
yield session
import reflector.db.calendar_events # noqa
# import models
import reflector.db.meetings # noqa
import reflector.db.recordings # noqa
import reflector.db.rooms # noqa
import reflector.db.transcripts # noqa
kwargs = {}
if "postgres" not in settings.DATABASE_URL:
raise Exception("Only postgres database is supported in reflector")
engine = sqlalchemy.create_engine(settings.DATABASE_URL, **kwargs)
@subscribers_startup.append
async def database_connect(_):
get_engine()
database = get_database()
await database.connect()
@subscribers_shutdown.append
async def database_disconnect(_):
global _engine
if _engine:
await _engine.dispose()
_engine = None
database = get_database()
await database.disconnect()

View File

@@ -1,237 +0,0 @@
from datetime import datetime
from typing import Optional
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB, TSVECTOR
from sqlalchemy.ext.asyncio import AsyncAttrs
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(AsyncAttrs, DeclarativeBase):
pass
class TranscriptModel(Base):
__tablename__ = "transcript"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
name: Mapped[Optional[str]] = mapped_column(sa.String)
status: Mapped[Optional[str]] = mapped_column(sa.String)
locked: Mapped[Optional[bool]] = mapped_column(sa.Boolean)
duration: Mapped[Optional[float]] = mapped_column(sa.Float)
created_at: Mapped[Optional[datetime]] = mapped_column(sa.DateTime(timezone=True))
title: Mapped[Optional[str]] = mapped_column(sa.String)
short_summary: Mapped[Optional[str]] = mapped_column(sa.String)
long_summary: Mapped[Optional[str]] = mapped_column(sa.String)
topics: Mapped[Optional[list]] = mapped_column(sa.JSON)
events: Mapped[Optional[list]] = mapped_column(sa.JSON)
participants: Mapped[Optional[list]] = mapped_column(sa.JSON)
source_language: Mapped[Optional[str]] = mapped_column(sa.String)
target_language: Mapped[Optional[str]] = mapped_column(sa.String)
reviewed: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
audio_location: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="local"
)
user_id: Mapped[Optional[str]] = mapped_column(sa.String)
share_mode: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="private"
)
meeting_id: Mapped[Optional[str]] = mapped_column(sa.String)
recording_id: Mapped[Optional[str]] = mapped_column(sa.String)
zulip_message_id: Mapped[Optional[int]] = mapped_column(sa.Integer)
source_kind: Mapped[str] = mapped_column(
sa.String, nullable=False
) # Enum will be handled separately
audio_deleted: Mapped[Optional[bool]] = mapped_column(sa.Boolean)
room_id: Mapped[Optional[str]] = mapped_column(sa.String)
webvtt: Mapped[Optional[str]] = mapped_column(sa.Text)
__table_args__ = (
sa.Index("idx_transcript_recording_id", "recording_id"),
sa.Index("idx_transcript_user_id", "user_id"),
sa.Index("idx_transcript_created_at", "created_at"),
sa.Index("idx_transcript_user_id_recording_id", "user_id", "recording_id"),
sa.Index("idx_transcript_room_id", "room_id"),
sa.Index("idx_transcript_source_kind", "source_kind"),
sa.Index("idx_transcript_room_id_created_at", "room_id", "created_at"),
)
TranscriptModel.search_vector_en = sa.Column(
"search_vector_en",
TSVECTOR,
sa.Computed(
"setweight(to_tsvector('english', coalesce(title, '')), 'A') || "
"setweight(to_tsvector('english', coalesce(long_summary, '')), 'B') || "
"setweight(to_tsvector('english', coalesce(webvtt, '')), 'C')",
persisted=True,
),
)
class RoomModel(Base):
__tablename__ = "room"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
name: Mapped[str] = mapped_column(sa.String, nullable=False, unique=True)
user_id: Mapped[str] = mapped_column(sa.String, nullable=False)
created_at: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
zulip_auto_post: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
zulip_stream: Mapped[Optional[str]] = mapped_column(sa.String)
zulip_topic: Mapped[Optional[str]] = mapped_column(sa.String)
is_locked: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
room_mode: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="normal"
)
recording_type: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="cloud"
)
recording_trigger: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="automatic-2nd-participant"
)
is_shared: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
webhook_url: Mapped[Optional[str]] = mapped_column(sa.String)
webhook_secret: Mapped[Optional[str]] = mapped_column(sa.String)
ics_url: Mapped[Optional[str]] = mapped_column(sa.Text)
ics_fetch_interval: Mapped[Optional[int]] = mapped_column(
sa.Integer, server_default=sa.text("300")
)
ics_enabled: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
ics_last_sync: Mapped[Optional[datetime]] = mapped_column(
sa.DateTime(timezone=True)
)
ics_last_etag: Mapped[Optional[str]] = mapped_column(sa.Text)
__table_args__ = (
sa.Index("idx_room_is_shared", "is_shared"),
sa.Index("idx_room_ics_enabled", "ics_enabled"),
)
class MeetingModel(Base):
__tablename__ = "meeting"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
room_name: Mapped[Optional[str]] = mapped_column(sa.String)
room_url: Mapped[Optional[str]] = mapped_column(sa.String)
host_room_url: Mapped[Optional[str]] = mapped_column(sa.String)
start_date: Mapped[Optional[datetime]] = mapped_column(sa.DateTime(timezone=True))
end_date: Mapped[Optional[datetime]] = mapped_column(sa.DateTime(timezone=True))
room_id: Mapped[Optional[str]] = mapped_column(
sa.String, sa.ForeignKey("room.id", ondelete="CASCADE")
)
is_locked: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
room_mode: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="normal"
)
recording_type: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="cloud"
)
recording_trigger: Mapped[str] = mapped_column(
sa.String, nullable=False, server_default="automatic-2nd-participant"
)
num_clients: Mapped[int] = mapped_column(
sa.Integer, nullable=False, server_default=sa.text("0")
)
is_active: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("true")
)
calendar_event_id: Mapped[Optional[str]] = mapped_column(
sa.String,
sa.ForeignKey(
"calendar_event.id",
ondelete="SET NULL",
name="fk_meeting_calendar_event_id",
),
)
calendar_metadata: Mapped[Optional[dict]] = mapped_column(JSONB)
__table_args__ = (
sa.Index("idx_meeting_room_id", "room_id"),
sa.Index("idx_meeting_calendar_event", "calendar_event_id"),
)
class MeetingConsentModel(Base):
__tablename__ = "meeting_consent"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
meeting_id: Mapped[str] = mapped_column(
sa.String, sa.ForeignKey("meeting.id", ondelete="CASCADE"), nullable=False
)
user_id: Mapped[Optional[str]] = mapped_column(sa.String)
consent_given: Mapped[bool] = mapped_column(sa.Boolean, nullable=False)
consent_timestamp: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
class RecordingModel(Base):
__tablename__ = "recording"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
meeting_id: Mapped[str] = mapped_column(
sa.String, sa.ForeignKey("meeting.id", ondelete="CASCADE"), nullable=False
)
url: Mapped[str] = mapped_column(sa.String, nullable=False)
object_key: Mapped[str] = mapped_column(sa.String, nullable=False)
duration: Mapped[Optional[float]] = mapped_column(sa.Float)
created_at: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
__table_args__ = (sa.Index("idx_recording_meeting_id", "meeting_id"),)
class CalendarEventModel(Base):
__tablename__ = "calendar_event"
id: Mapped[str] = mapped_column(sa.String, primary_key=True)
room_id: Mapped[str] = mapped_column(
sa.String, sa.ForeignKey("room.id", ondelete="CASCADE"), nullable=False
)
ics_uid: Mapped[str] = mapped_column(sa.Text, nullable=False)
title: Mapped[Optional[str]] = mapped_column(sa.Text)
description: Mapped[Optional[str]] = mapped_column(sa.Text)
start_time: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
end_time: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
attendees: Mapped[Optional[dict]] = mapped_column(JSONB)
location: Mapped[Optional[str]] = mapped_column(sa.Text)
ics_raw_data: Mapped[Optional[str]] = mapped_column(sa.Text)
last_synced: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
is_deleted: Mapped[bool] = mapped_column(
sa.Boolean, nullable=False, server_default=sa.text("false")
)
created_at: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
sa.DateTime(timezone=True), nullable=False
)
__table_args__ = (
sa.Index("idx_calendar_event_room_start", "room_id", "start_time"),
)
metadata = Base.metadata

View File

@@ -1,187 +0,0 @@
from datetime import datetime, timedelta, timezone
from typing import Any
import sqlalchemy as sa
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy import delete, select, update
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db.base import CalendarEventModel
from reflector.utils import generate_uuid4
class CalendarEvent(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: str = Field(default_factory=generate_uuid4)
room_id: str
ics_uid: str
title: str | None = None
description: str | None = None
start_time: datetime
end_time: datetime
attendees: list[dict[str, Any]] | None = None
location: str | None = None
ics_raw_data: str | None = None
last_synced: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
is_deleted: bool = False
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
class CalendarEventController:
async def get_upcoming_events(
self,
session: AsyncSession,
room_id: str,
current_time: datetime,
buffer_minutes: int = 15,
) -> list[CalendarEvent]:
buffer_time = current_time + timedelta(minutes=buffer_minutes)
query = (
select(CalendarEventModel)
.where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.start_time <= buffer_time,
CalendarEventModel.end_time > current_time,
)
)
.order_by(CalendarEventModel.start_time)
)
result = await session.execute(query)
return [CalendarEvent.model_validate(row) for row in result.scalars().all()]
async def get_by_id(
self, session: AsyncSession, event_id: str
) -> CalendarEvent | None:
query = select(CalendarEventModel).where(CalendarEventModel.id == event_id)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
return None
return CalendarEvent.model_validate(row)
async def get_by_ics_uid(
self, session: AsyncSession, room_id: str, ics_uid: str
) -> CalendarEvent | None:
query = select(CalendarEventModel).where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.ics_uid == ics_uid,
)
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
return None
return CalendarEvent.model_validate(row)
async def upsert(
self, session: AsyncSession, event: CalendarEvent
) -> CalendarEvent:
existing = await self.get_by_ics_uid(session, event.room_id, event.ics_uid)
if existing:
event.updated_at = datetime.now(timezone.utc)
query = (
update(CalendarEventModel)
.where(CalendarEventModel.id == existing.id)
.values(**event.model_dump(exclude={"id"}))
)
await session.execute(query)
await session.commit()
return event
else:
new_event = CalendarEventModel(**event.model_dump())
session.add(new_event)
await session.commit()
return event
async def delete_old_events(
self, session: AsyncSession, room_id: str, cutoff_date: datetime
) -> int:
query = delete(CalendarEventModel).where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.end_time < cutoff_date,
)
)
result = await session.execute(query)
await session.commit()
return result.rowcount
async def delete_events_not_in_list(
self, session: AsyncSession, room_id: str, keep_ics_uids: list[str]
) -> int:
if not keep_ics_uids:
query = delete(CalendarEventModel).where(
CalendarEventModel.room_id == room_id
)
else:
query = delete(CalendarEventModel).where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.ics_uid.notin_(keep_ics_uids),
)
)
result = await session.execute(query)
await session.commit()
return result.rowcount
async def get_by_room(
self, session: AsyncSession, room_id: str, include_deleted: bool = True
) -> list[CalendarEvent]:
query = select(CalendarEventModel).where(CalendarEventModel.room_id == room_id)
if not include_deleted:
query = query.where(CalendarEventModel.is_deleted == False)
result = await session.execute(query)
return [CalendarEvent.model_validate(row) for row in result.scalars().all()]
async def get_upcoming(
self, session: AsyncSession, room_id: str, minutes_ahead: int = 120
) -> list[CalendarEvent]:
now = datetime.now(timezone.utc)
buffer_time = now + timedelta(minutes=minutes_ahead)
query = (
select(CalendarEventModel)
.where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.start_time <= buffer_time,
CalendarEventModel.end_time > now,
CalendarEventModel.is_deleted == False,
)
)
.order_by(CalendarEventModel.start_time)
)
result = await session.execute(query)
return [CalendarEvent.model_validate(row) for row in result.scalars().all()]
async def soft_delete_missing(
self, session: AsyncSession, room_id: str, current_ics_uids: list[str]
) -> int:
query = (
update(CalendarEventModel)
.where(
sa.and_(
CalendarEventModel.room_id == room_id,
CalendarEventModel.ics_uid.notin_(current_ics_uids)
if current_ics_uids
else True,
CalendarEventModel.end_time > datetime.now(timezone.utc),
)
)
.values(is_deleted=True)
)
result = await session.execute(query)
await session.commit()
return result.rowcount
calendar_events_controller = CalendarEventController()

View File

@@ -1,19 +1,77 @@
from datetime import datetime
from typing import Any, Literal
from datetime import datetime, timezone
from typing import Any, Dict, List, Literal
import sqlalchemy as sa
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy import select, update
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field
from reflector.db.base import MeetingConsentModel, MeetingModel
from reflector.db.rooms import Room
from reflector.db import get_database, metadata
from reflector.db.rooms import Room, VideoPlatform
from reflector.utils import generate_uuid4
meetings = sa.Table(
"meeting",
metadata,
sa.Column("id", sa.String, primary_key=True),
sa.Column("room_name", sa.String),
sa.Column("room_url", sa.String),
sa.Column("host_room_url", sa.String),
sa.Column("start_date", sa.DateTime(timezone=True)),
sa.Column("end_date", sa.DateTime(timezone=True)),
sa.Column(
"room_id",
sa.String,
sa.ForeignKey("room.id", ondelete="CASCADE"),
nullable=True,
),
sa.Column("is_locked", sa.Boolean, nullable=False, server_default=sa.false()),
sa.Column("room_mode", sa.String, nullable=False, server_default="normal"),
sa.Column("recording_type", sa.String, nullable=False, server_default="cloud"),
sa.Column(
"recording_trigger",
sa.String,
nullable=False,
server_default="automatic-2nd-participant",
),
sa.Column(
"num_clients",
sa.Integer,
nullable=False,
server_default=sa.text("0"),
),
sa.Column(
"is_active",
sa.Boolean,
nullable=False,
server_default=sa.true(),
),
sa.Column("platform", sa.String, nullable=False, server_default="whereby"),
sa.Column("events", sa.JSON, nullable=False, server_default=sa.text("'[]'")),
sa.Index("idx_meeting_room_id", "room_id"),
sa.Index(
"idx_one_active_meeting_per_room",
"room_id",
unique=True,
postgresql_where=sa.text("is_active = true"),
),
)
meeting_consent = sa.Table(
"meeting_consent",
metadata,
sa.Column("id", sa.String, primary_key=True),
sa.Column(
"meeting_id",
sa.String,
sa.ForeignKey("meeting.id", ondelete="CASCADE"),
nullable=False,
),
sa.Column("user_id", sa.String),
sa.Column("consent_given", sa.Boolean, nullable=False),
sa.Column("consent_timestamp", sa.DateTime(timezone=True), nullable=False),
)
class MeetingConsent(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: str = Field(default_factory=generate_uuid4)
meeting_id: str
user_id: str | None = None
@@ -22,8 +80,6 @@ class MeetingConsent(BaseModel):
class Meeting(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: str
room_name: str
room_url: str
@@ -38,15 +94,13 @@ class Meeting(BaseModel):
"none", "prompt", "automatic", "automatic-2nd-participant"
] = "automatic-2nd-participant"
num_clients: int = 0
is_active: bool = True
calendar_event_id: str | None = None
calendar_metadata: dict[str, Any] | None = None
platform: VideoPlatform = VideoPlatform.WHEREBY
events: List[Dict[str, Any]] = Field(default_factory=list)
class MeetingController:
async def create(
self,
session: AsyncSession,
id: str,
room_name: str,
room_url: str,
@@ -54,8 +108,6 @@ class MeetingController:
start_date: datetime,
end_date: datetime,
room: Room,
calendar_event_id: str | None = None,
calendar_metadata: dict[str, Any] | None = None,
):
meeting = Meeting(
id=id,
@@ -69,201 +121,176 @@ class MeetingController:
room_mode=room.room_mode,
recording_type=room.recording_type,
recording_trigger=room.recording_trigger,
calendar_event_id=calendar_event_id,
calendar_metadata=calendar_metadata,
platform=room.platform,
)
new_meeting = MeetingModel(**meeting.model_dump())
session.add(new_meeting)
await session.commit()
query = meetings.insert().values(**meeting.model_dump())
await get_database().execute(query)
return meeting
async def get_all_active(self, session: AsyncSession) -> list[Meeting]:
query = select(MeetingModel).where(MeetingModel.is_active)
result = await session.execute(query)
return [Meeting.model_validate(row) for row in result.scalars().all()]
async def get_all_active(self) -> list[Meeting]:
query = meetings.select().where(meetings.c.is_active)
return await get_database().fetch_all(query)
async def get_by_room_name(
self,
session: AsyncSession,
room_name: str,
) -> Meeting | None:
"""
Get a meeting by room name.
For backward compatibility, returns the most recent meeting.
"""
query = (
select(MeetingModel)
.where(MeetingModel.room_name == room_name)
.order_by(MeetingModel.end_date.desc())
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
query = meetings.select().where(meetings.c.room_name == room_name)
result = await get_database().fetch_one(query)
if not result:
return None
return Meeting.model_validate(row)
async def get_active(
self, session: AsyncSession, room: Room, current_time: datetime
) -> Meeting | None:
"""
Get latest active meeting for a room.
For backward compatibility, returns the most recent active meeting.
"""
return Meeting(**result)
async def get_active(self, room: Room, current_time: datetime) -> Meeting | None:
end_date = getattr(meetings.c, "end_date")
query = (
select(MeetingModel)
meetings.select()
.where(
sa.and_(
MeetingModel.room_id == room.id,
MeetingModel.end_date > current_time,
MeetingModel.is_active,
meetings.c.room_id == room.id,
meetings.c.end_date > current_time,
meetings.c.is_active,
)
)
.order_by(MeetingModel.end_date.desc())
.order_by(end_date.desc())
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
result = await get_database().fetch_one(query)
if not result:
return None
return Meeting.model_validate(row)
async def get_all_active_for_room(
self, session: AsyncSession, room: Room, current_time: datetime
) -> list[Meeting]:
query = (
select(MeetingModel)
.where(
sa.and_(
MeetingModel.room_id == room.id,
MeetingModel.end_date > current_time,
MeetingModel.is_active,
)
)
.order_by(MeetingModel.end_date.desc())
)
result = await session.execute(query)
return [Meeting.model_validate(row) for row in result.scalars().all()]
return Meeting(**result)
async def get_active_by_calendar_event(
self,
session: AsyncSession,
room: Room,
calendar_event_id: str,
current_time: datetime,
) -> Meeting | None:
"""
Get active meeting for a specific calendar event.
"""
query = select(MeetingModel).where(
sa.and_(
MeetingModel.room_id == room.id,
MeetingModel.calendar_event_id == calendar_event_id,
MeetingModel.end_date > current_time,
MeetingModel.is_active,
)
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
async def get_by_id(self, meeting_id: str, **kwargs) -> Meeting | None:
query = meetings.select().where(meetings.c.id == meeting_id)
result = await get_database().fetch_one(query)
if not result:
return None
return Meeting.model_validate(row)
return Meeting(**result)
async def get_by_id(
self, session: AsyncSession, meeting_id: str, **kwargs
) -> Meeting | None:
query = select(MeetingModel).where(MeetingModel.id == meeting_id)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
return None
return Meeting.model_validate(row)
async def update_meeting(self, meeting_id: str, **kwargs):
query = meetings.update().where(meetings.c.id == meeting_id).values(**kwargs)
await get_database().execute(query)
async def get_by_calendar_event(
self, session: AsyncSession, calendar_event_id: str
) -> Meeting | None:
query = select(MeetingModel).where(
MeetingModel.calendar_event_id == calendar_event_id
async def add_event(
self, meeting_id: str, event_type: str, event_data: Dict[str, Any] = None
):
"""Add an event to a meeting's events list."""
if event_data is None:
event_data = {}
event = {
"type": event_type,
"timestamp": datetime.now(tz=timezone.utc).isoformat(),
"data": event_data,
}
# Get current events
query = meetings.select().where(meetings.c.id == meeting_id)
result = await get_database().fetch_one(query)
if not result:
return
current_events = result["events"] or []
current_events.append(event)
# Update with new events list
update_query = (
meetings.update()
.where(meetings.c.id == meeting_id)
.values(events=current_events)
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
return None
return Meeting.model_validate(row)
await get_database().execute(update_query)
async def update_meeting(self, session: AsyncSession, meeting_id: str, **kwargs):
query = (
update(MeetingModel).where(MeetingModel.id == meeting_id).values(**kwargs)
)
await session.execute(query)
await session.commit()
async def participant_joined(
self, meeting_id: str, participant_data: Dict[str, Any] = None
):
"""Record a participant joined event."""
await self.add_event(meeting_id, "participant_joined", participant_data)
async def participant_left(
self, meeting_id: str, participant_data: Dict[str, Any] = None
):
"""Record a participant left event."""
await self.add_event(meeting_id, "participant_left", participant_data)
async def recording_started(
self, meeting_id: str, recording_data: Dict[str, Any] = None
):
"""Record a recording started event."""
await self.add_event(meeting_id, "recording_started", recording_data)
async def recording_stopped(
self, meeting_id: str, recording_data: Dict[str, Any] = None
):
"""Record a recording stopped event."""
await self.add_event(meeting_id, "recording_stopped", recording_data)
async def get_events(self, meeting_id: str) -> List[Dict[str, Any]]:
"""Get all events for a meeting."""
query = meetings.select().where(meetings.c.id == meeting_id)
result = await get_database().fetch_one(query)
if not result:
return []
return result["events"] or []
class MeetingConsentController:
async def get_by_meeting_id(
self, session: AsyncSession, meeting_id: str
) -> list[MeetingConsent]:
query = select(MeetingConsentModel).where(
MeetingConsentModel.meeting_id == meeting_id
async def get_by_meeting_id(self, meeting_id: str) -> list[MeetingConsent]:
query = meeting_consent.select().where(
meeting_consent.c.meeting_id == meeting_id
)
result = await session.execute(query)
return [MeetingConsent.model_validate(row) for row in result.scalars().all()]
results = await get_database().fetch_all(query)
return [MeetingConsent(**result) for result in results]
async def get_by_meeting_and_user(
self, session: AsyncSession, meeting_id: str, user_id: str
self, meeting_id: str, user_id: str
) -> MeetingConsent | None:
"""Get existing consent for a specific user and meeting"""
query = select(MeetingConsentModel).where(
sa.and_(
MeetingConsentModel.meeting_id == meeting_id,
MeetingConsentModel.user_id == user_id,
)
query = meeting_consent.select().where(
meeting_consent.c.meeting_id == meeting_id,
meeting_consent.c.user_id == user_id,
)
result = await session.execute(query)
row = result.scalar_one_or_none()
if row is None:
result = await get_database().fetch_one(query)
if result is None:
return None
return MeetingConsent.model_validate(row)
return MeetingConsent(**result)
async def upsert(
self, session: AsyncSession, consent: MeetingConsent
) -> MeetingConsent:
async def upsert(self, consent: MeetingConsent) -> MeetingConsent:
"""Create new consent or update existing one for authenticated users"""
if consent.user_id:
# For authenticated users, check if consent already exists
# not transactional but we're ok with that; the consents ain't deleted anyways
existing = await self.get_by_meeting_and_user(
session, consent.meeting_id, consent.user_id
consent.meeting_id, consent.user_id
)
if existing:
query = (
update(MeetingConsentModel)
.where(MeetingConsentModel.id == existing.id)
meeting_consent.update()
.where(meeting_consent.c.id == existing.id)
.values(
consent_given=consent.consent_given,
consent_timestamp=consent.consent_timestamp,
)
)
await session.execute(query)
await session.commit()
await get_database().execute(query)
existing.consent_given = consent.consent_given
existing.consent_timestamp = consent.consent_timestamp
return existing
existing.consent_given = consent.consent_given
existing.consent_timestamp = consent.consent_timestamp
return existing
new_consent = MeetingConsentModel(**consent.model_dump())
session.add(new_consent)
await session.commit()
query = meeting_consent.insert().values(**consent.model_dump())
await get_database().execute(query)
return consent
async def has_any_denial(self, session: AsyncSession, meeting_id: str) -> bool:
async def has_any_denial(self, meeting_id: str) -> bool:
"""Check if any participant denied consent for this meeting"""
query = select(MeetingConsentModel).where(
sa.and_(
MeetingConsentModel.meeting_id == meeting_id,
MeetingConsentModel.consent_given.is_(False),
)
query = meeting_consent.select().where(
meeting_consent.c.meeting_id == meeting_id,
meeting_consent.c.consent_given.is_(False),
)
result = await session.execute(query)
row = result.scalar_one_or_none()
return row is not None
result = await get_database().fetch_one(query)
return result is not None
meetings_controller = MeetingController()

View File

@@ -1,79 +1,61 @@
from datetime import datetime, timezone
from datetime import datetime
from typing import Literal
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy import delete, select
from sqlalchemy.ext.asyncio import AsyncSession
import sqlalchemy as sa
from pydantic import BaseModel, Field
from reflector.db.base import RecordingModel
from reflector.db import get_database, metadata
from reflector.utils import generate_uuid4
recordings = sa.Table(
"recording",
metadata,
sa.Column("id", sa.String, primary_key=True),
sa.Column("bucket_name", sa.String, nullable=False),
sa.Column("object_key", sa.String, nullable=False),
sa.Column("recorded_at", sa.DateTime(timezone=True), nullable=False),
sa.Column(
"status",
sa.String,
nullable=False,
server_default="pending",
),
sa.Column("meeting_id", sa.String),
sa.Index("idx_recording_meeting_id", "meeting_id"),
)
class Recording(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: str = Field(default_factory=generate_uuid4)
meeting_id: str
url: str
bucket_name: str
object_key: str
duration: float | None = None
created_at: datetime
recorded_at: datetime
status: Literal["pending", "processing", "completed", "failed"] = "pending"
meeting_id: str | None = None
class RecordingController:
async def create(
self,
session: AsyncSession,
meeting_id: str,
url: str,
object_key: str,
duration: float | None = None,
created_at: datetime | None = None,
):
if created_at is None:
created_at = datetime.now(timezone.utc)
recording = Recording(
meeting_id=meeting_id,
url=url,
object_key=object_key,
duration=duration,
created_at=created_at,
)
new_recording = RecordingModel(**recording.model_dump())
session.add(new_recording)
await session.commit()
async def create(self, recording: Recording):
query = recordings.insert().values(**recording.model_dump())
await get_database().execute(query)
return recording
async def get_by_id(
self, session: AsyncSession, recording_id: str
) -> Recording | None:
"""
Get a recording by id
"""
query = select(RecordingModel).where(RecordingModel.id == recording_id)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
return None
return Recording.model_validate(row)
async def get_by_id(self, id: str) -> Recording:
query = recordings.select().where(recordings.c.id == id)
result = await get_database().fetch_one(query)
return Recording(**result) if result else None
async def get_by_meeting_id(
self, session: AsyncSession, meeting_id: str
) -> list[Recording]:
"""
Get all recordings for a meeting
"""
query = select(RecordingModel).where(RecordingModel.meeting_id == meeting_id)
result = await session.execute(query)
return [Recording.model_validate(row) for row in result.scalars().all()]
async def get_by_object_key(self, bucket_name: str, object_key: str) -> Recording:
query = recordings.select().where(
recordings.c.bucket_name == bucket_name,
recordings.c.object_key == object_key,
)
result = await get_database().fetch_one(query)
return Recording(**result) if result else None
async def remove_by_id(self, session: AsyncSession, recording_id: str) -> None:
"""
Remove a recording by id
"""
query = delete(RecordingModel).where(RecordingModel.id == recording_id)
await session.execute(query)
await session.commit()
async def remove_by_id(self, id: str) -> None:
query = recordings.delete().where(recordings.c.id == id)
await get_database().execute(query)
recordings_controller = RecordingController()

View File

@@ -1,21 +1,63 @@
import secrets
from datetime import datetime, timezone
from enum import StrEnum
from sqlite3 import IntegrityError
from typing import Literal
import sqlalchemy
from fastapi import HTTPException
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy import delete, select, update
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.sql import or_
from pydantic import BaseModel, Field
from sqlalchemy.sql import false, or_
from reflector.db.base import RoomModel
from reflector.db import get_database, metadata
from reflector.utils import generate_uuid4
class Room(BaseModel):
model_config = ConfigDict(from_attributes=True)
class VideoPlatform(StrEnum):
WHEREBY = "whereby"
JITSI = "jitsi"
rooms = sqlalchemy.Table(
"room",
metadata,
sqlalchemy.Column("id", sqlalchemy.String, primary_key=True),
sqlalchemy.Column("name", sqlalchemy.String, nullable=False, unique=True),
sqlalchemy.Column("user_id", sqlalchemy.String, nullable=False),
sqlalchemy.Column("created_at", sqlalchemy.DateTime(timezone=True), nullable=False),
sqlalchemy.Column(
"zulip_auto_post", sqlalchemy.Boolean, nullable=False, server_default=false()
),
sqlalchemy.Column("zulip_stream", sqlalchemy.String),
sqlalchemy.Column("zulip_topic", sqlalchemy.String),
sqlalchemy.Column(
"is_locked", sqlalchemy.Boolean, nullable=False, server_default=false()
),
sqlalchemy.Column(
"room_mode", sqlalchemy.String, nullable=False, server_default="normal"
),
sqlalchemy.Column(
"recording_type", sqlalchemy.String, nullable=False, server_default="cloud"
),
sqlalchemy.Column(
"recording_trigger",
sqlalchemy.String,
nullable=False,
server_default="automatic-2nd-participant",
),
sqlalchemy.Column(
"is_shared", sqlalchemy.Boolean, nullable=False, server_default=false()
),
sqlalchemy.Column("webhook_url", sqlalchemy.String, nullable=True),
sqlalchemy.Column("webhook_secret", sqlalchemy.String, nullable=True),
sqlalchemy.Column(
"platform", sqlalchemy.String, nullable=False, server_default="whereby"
),
sqlalchemy.Index("idx_room_is_shared", "is_shared"),
)
class Room(BaseModel):
id: str = Field(default_factory=generate_uuid4)
name: str
user_id: str
@@ -32,17 +74,12 @@ class Room(BaseModel):
is_shared: bool = False
webhook_url: str | None = None
webhook_secret: str | None = None
ics_url: str | None = None
ics_fetch_interval: int = 300
ics_enabled: bool = False
ics_last_sync: datetime | None = None
ics_last_etag: str | None = None
platform: VideoPlatform = VideoPlatform.WHEREBY
class RoomController:
async def get_all(
self,
session: AsyncSession,
user_id: str | None = None,
order_by: str | None = None,
return_query: bool = False,
@@ -56,14 +93,14 @@ class RoomController:
Parameters:
- `order_by`: field to order by, e.g. "-created_at"
"""
query = select(RoomModel)
query = rooms.select()
if user_id is not None:
query = query.where(or_(RoomModel.user_id == user_id, RoomModel.is_shared))
query = query.where(or_(rooms.c.user_id == user_id, rooms.c.is_shared))
else:
query = query.where(RoomModel.is_shared)
query = query.where(rooms.c.is_shared)
if order_by is not None:
field = getattr(RoomModel, order_by[1:])
field = getattr(rooms.c, order_by[1:])
if order_by.startswith("-"):
field = field.desc()
query = query.order_by(field)
@@ -71,12 +108,11 @@ class RoomController:
if return_query:
return query
result = await session.execute(query)
return [Room.model_validate(row) for row in result.scalars().all()]
results = await get_database().fetch_all(query)
return results
async def add(
self,
session: AsyncSession,
name: str,
user_id: str,
zulip_auto_post: bool,
@@ -89,9 +125,7 @@ class RoomController:
is_shared: bool,
webhook_url: str = "",
webhook_secret: str = "",
ics_url: str | None = None,
ics_fetch_interval: int = 300,
ics_enabled: bool = False,
platform: str = "whereby",
):
"""
Add a new room
@@ -112,31 +146,25 @@ class RoomController:
is_shared=is_shared,
webhook_url=webhook_url,
webhook_secret=webhook_secret,
ics_url=ics_url,
ics_fetch_interval=ics_fetch_interval,
ics_enabled=ics_enabled,
platform=platform,
)
new_room = RoomModel(**room.model_dump())
session.add(new_room)
query = rooms.insert().values(**room.model_dump())
try:
await session.flush()
await get_database().execute(query)
except IntegrityError:
raise HTTPException(status_code=400, detail="Room name is not unique")
return room
async def update(
self, session: AsyncSession, room: Room, values: dict, mutate=True
):
async def update(self, room: Room, values: dict, mutate=True):
"""
Update a room fields with key/values in values
"""
if values.get("webhook_url") and not values.get("webhook_secret"):
values["webhook_secret"] = secrets.token_urlsafe(32)
query = update(RoomModel).where(RoomModel.id == room.id).values(**values)
query = rooms.update().where(rooms.c.id == room.id).values(**values)
try:
await session.execute(query)
await session.flush()
await get_database().execute(query)
except IntegrityError:
raise HTTPException(status_code=400, detail="Room name is not unique")
@@ -144,79 +172,60 @@ class RoomController:
for key, value in values.items():
setattr(room, key, value)
async def get_by_id(
self, session: AsyncSession, room_id: str, **kwargs
) -> Room | None:
async def get_by_id(self, room_id: str, **kwargs) -> Room | None:
"""
Get a room by id
"""
query = select(RoomModel).where(RoomModel.id == room_id)
query = rooms.select().where(rooms.c.id == room_id)
if "user_id" in kwargs:
query = query.where(RoomModel.user_id == kwargs["user_id"])
result = await session.execute(query)
row = result.scalars().first()
if not row:
query = query.where(rooms.c.user_id == kwargs["user_id"])
result = await get_database().fetch_one(query)
if not result:
return None
return Room.model_validate(row)
return Room(**result)
async def get_by_name(
self, session: AsyncSession, room_name: str, **kwargs
) -> Room | None:
async def get_by_name(self, room_name: str, **kwargs) -> Room | None:
"""
Get a room by name
"""
query = select(RoomModel).where(RoomModel.name == room_name)
query = rooms.select().where(rooms.c.name == room_name)
if "user_id" in kwargs:
query = query.where(RoomModel.user_id == kwargs["user_id"])
result = await session.execute(query)
row = result.scalars().first()
if not row:
query = query.where(rooms.c.user_id == kwargs["user_id"])
result = await get_database().fetch_one(query)
if not result:
return None
return Room.model_validate(row)
return Room(**result)
async def get_by_id_for_http(
self, session: AsyncSession, meeting_id: str, user_id: str | None
) -> Room:
async def get_by_id_for_http(self, meeting_id: str, user_id: str | None) -> Room:
"""
Get a room by ID for HTTP request.
If not found, it will raise a 404 error.
"""
query = select(RoomModel).where(RoomModel.id == meeting_id)
result = await session.execute(query)
row = result.scalars().first()
if not row:
query = rooms.select().where(rooms.c.id == meeting_id)
result = await get_database().fetch_one(query)
if not result:
raise HTTPException(status_code=404, detail="Room not found")
room = Room.model_validate(row)
room = Room(**result)
return room
async def get_ics_enabled(self, session: AsyncSession) -> list[Room]:
query = select(RoomModel).where(
RoomModel.ics_enabled == True, RoomModel.ics_url != None
)
result = await session.execute(query)
results = result.scalars().all()
return [Room(**row.__dict__) for row in results]
async def remove_by_id(
self,
session: AsyncSession,
room_id: str,
user_id: str | None = None,
) -> None:
"""
Remove a room by id
"""
room = await self.get_by_id(session, room_id, user_id=user_id)
room = await self.get_by_id(room_id, user_id=user_id)
if not room:
return
if user_id is not None and room.user_id != user_id:
return
query = delete(RoomModel).where(RoomModel.id == room_id)
await session.execute(query)
await session.flush()
query = rooms.delete().where(rooms.c.id == room_id)
await get_database().execute(query)
rooms_controller = RoomController()

View File

@@ -8,6 +8,7 @@ from typing import Annotated, Any, Dict, Iterator
import sqlalchemy
import webvtt
from databases.interfaces import Record as DbRecord
from fastapi import HTTPException
from pydantic import (
BaseModel,
@@ -19,10 +20,11 @@ from pydantic import (
constr,
field_serializer,
)
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db.base import RoomModel, TranscriptModel
from reflector.db.transcripts import SourceKind, TranscriptStatus
from reflector.db import get_database
from reflector.db.rooms import rooms
from reflector.db.transcripts import SourceKind, TranscriptStatus, transcripts
from reflector.db.utils import is_postgresql
from reflector.logger import logger
from reflector.utils.string import NonEmptyString, try_parse_non_empty_string
@@ -329,30 +331,36 @@ class SearchController:
@classmethod
async def search_transcripts(
cls, session: AsyncSession, params: SearchParameters
cls, params: SearchParameters
) -> tuple[list[SearchResult], int]:
"""
Full-text search for transcripts using PostgreSQL tsvector.
Returns (results, total_count).
"""
if not is_postgresql():
logger.warning(
"Full-text search requires PostgreSQL. Returning empty results."
)
return [], 0
base_columns = [
TranscriptModel.id,
TranscriptModel.title,
TranscriptModel.created_at,
TranscriptModel.duration,
TranscriptModel.status,
TranscriptModel.user_id,
TranscriptModel.room_id,
TranscriptModel.source_kind,
TranscriptModel.webvtt,
TranscriptModel.long_summary,
transcripts.c.id,
transcripts.c.title,
transcripts.c.created_at,
transcripts.c.duration,
transcripts.c.status,
transcripts.c.user_id,
transcripts.c.room_id,
transcripts.c.source_kind,
transcripts.c.webvtt,
transcripts.c.long_summary,
sqlalchemy.case(
(
TranscriptModel.room_id.isnot(None) & RoomModel.id.is_(None),
transcripts.c.room_id.isnot(None) & rooms.c.id.is_(None),
"Deleted Room",
),
else_=RoomModel.name,
else_=rooms.c.name,
).label("room_name"),
]
search_query = None
@@ -361,7 +369,7 @@ class SearchController:
"english", params.query_text
)
rank_column = sqlalchemy.func.ts_rank(
TranscriptModel.search_vector_en,
transcripts.c.search_vector_en,
search_query,
32, # normalization flag: rank/(rank+1) for 0-1 range
).label("rank")
@@ -369,51 +377,47 @@ class SearchController:
rank_column = sqlalchemy.cast(1.0, sqlalchemy.Float).label("rank")
columns = base_columns + [rank_column]
base_query = (
sqlalchemy.select(*columns)
.select_from(TranscriptModel)
.outerjoin(RoomModel, TranscriptModel.room_id == RoomModel.id)
base_query = sqlalchemy.select(columns).select_from(
transcripts.join(rooms, transcripts.c.room_id == rooms.c.id, isouter=True)
)
if params.query_text is not None:
# because already initialized based on params.query_text presence above
assert search_query is not None
base_query = base_query.where(
TranscriptModel.search_vector_en.op("@@")(search_query)
transcripts.c.search_vector_en.op("@@")(search_query)
)
if params.user_id:
base_query = base_query.where(
sqlalchemy.or_(
TranscriptModel.user_id == params.user_id, RoomModel.is_shared
transcripts.c.user_id == params.user_id, rooms.c.is_shared
)
)
else:
base_query = base_query.where(RoomModel.is_shared)
base_query = base_query.where(rooms.c.is_shared)
if params.room_id:
base_query = base_query.where(TranscriptModel.room_id == params.room_id)
base_query = base_query.where(transcripts.c.room_id == params.room_id)
if params.source_kind:
base_query = base_query.where(
TranscriptModel.source_kind == params.source_kind
transcripts.c.source_kind == params.source_kind
)
if params.query_text is not None:
order_by = sqlalchemy.desc(sqlalchemy.text("rank"))
else:
order_by = sqlalchemy.desc(TranscriptModel.created_at)
order_by = sqlalchemy.desc(transcripts.c.created_at)
query = base_query.order_by(order_by).limit(params.limit).offset(params.offset)
result = await session.execute(query)
rs = result.mappings().all()
rs = await get_database().fetch_all(query)
count_query = sqlalchemy.select(sqlalchemy.func.count()).select_from(
count_query = sqlalchemy.select([sqlalchemy.func.count()]).select_from(
base_query.alias("search_results")
)
count_result = await session.execute(count_query)
total = count_result.scalar()
total = await get_database().fetch_val(count_query)
def _process_result(r: dict) -> SearchResult:
def _process_result(r: DbRecord) -> SearchResult:
r_dict: Dict[str, Any] = dict(r)
webvtt_raw: str | None = r_dict.pop("webvtt", None)

View File

@@ -7,14 +7,17 @@ from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Literal
import sqlalchemy
from fastapi import HTTPException
from pydantic import BaseModel, ConfigDict, Field, field_serializer
from sqlalchemy import delete, insert, select, update
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.sql import or_
from sqlalchemy import Enum
from sqlalchemy.dialects.postgresql import TSVECTOR
from sqlalchemy.sql import false, or_
from reflector.db.base import RoomModel, TranscriptModel
from reflector.db import get_database, metadata
from reflector.db.recordings import recordings_controller
from reflector.db.rooms import rooms
from reflector.db.utils import is_postgresql
from reflector.logger import logger
from reflector.processors.types import Word as ProcessorWord
from reflector.settings import settings
@@ -29,6 +32,91 @@ class SourceKind(enum.StrEnum):
FILE = enum.auto()
transcripts = sqlalchemy.Table(
"transcript",
metadata,
sqlalchemy.Column("id", sqlalchemy.String, primary_key=True),
sqlalchemy.Column("name", sqlalchemy.String),
sqlalchemy.Column("status", sqlalchemy.String),
sqlalchemy.Column("locked", sqlalchemy.Boolean),
sqlalchemy.Column("duration", sqlalchemy.Float),
sqlalchemy.Column("created_at", sqlalchemy.DateTime(timezone=True)),
sqlalchemy.Column("title", sqlalchemy.String),
sqlalchemy.Column("short_summary", sqlalchemy.String),
sqlalchemy.Column("long_summary", sqlalchemy.String),
sqlalchemy.Column("topics", sqlalchemy.JSON),
sqlalchemy.Column("events", sqlalchemy.JSON),
sqlalchemy.Column("participants", sqlalchemy.JSON),
sqlalchemy.Column("source_language", sqlalchemy.String),
sqlalchemy.Column("target_language", sqlalchemy.String),
sqlalchemy.Column(
"reviewed", sqlalchemy.Boolean, nullable=False, server_default=false()
),
sqlalchemy.Column(
"audio_location",
sqlalchemy.String,
nullable=False,
server_default="local",
),
# with user attached, optional
sqlalchemy.Column("user_id", sqlalchemy.String),
sqlalchemy.Column(
"share_mode",
sqlalchemy.String,
nullable=False,
server_default="private",
),
sqlalchemy.Column(
"meeting_id",
sqlalchemy.String,
),
sqlalchemy.Column("recording_id", sqlalchemy.String),
sqlalchemy.Column("zulip_message_id", sqlalchemy.Integer),
sqlalchemy.Column(
"source_kind",
Enum(SourceKind, values_callable=lambda obj: [e.value for e in obj]),
nullable=False,
),
# indicative field: whether associated audio is deleted
# the main "audio deleted" is the presence of the audio itself / consents not-given
# same field could've been in recording/meeting, and it's maybe even ok to dupe it at need
sqlalchemy.Column("audio_deleted", sqlalchemy.Boolean),
sqlalchemy.Column("room_id", sqlalchemy.String),
sqlalchemy.Column("webvtt", sqlalchemy.Text),
sqlalchemy.Index("idx_transcript_recording_id", "recording_id"),
sqlalchemy.Index("idx_transcript_user_id", "user_id"),
sqlalchemy.Index("idx_transcript_created_at", "created_at"),
sqlalchemy.Index("idx_transcript_user_id_recording_id", "user_id", "recording_id"),
sqlalchemy.Index("idx_transcript_room_id", "room_id"),
sqlalchemy.Index("idx_transcript_source_kind", "source_kind"),
sqlalchemy.Index("idx_transcript_room_id_created_at", "room_id", "created_at"),
)
# Add PostgreSQL-specific full-text search column
# This matches the migration in migrations/versions/116b2f287eab_add_full_text_search.py
if is_postgresql():
transcripts.append_column(
sqlalchemy.Column(
"search_vector_en",
TSVECTOR,
sqlalchemy.Computed(
"setweight(to_tsvector('english', coalesce(title, '')), 'A') || "
"setweight(to_tsvector('english', coalesce(long_summary, '')), 'B') || "
"setweight(to_tsvector('english', coalesce(webvtt, '')), 'C')",
persisted=True,
),
)
)
# Add GIN index for the search vector
transcripts.append_constraint(
sqlalchemy.Index(
"idx_transcript_search_vector_en",
"search_vector_en",
postgresql_using="gin",
)
)
def generate_transcript_name() -> str:
now = datetime.now(timezone.utc)
return f"Transcript {now.strftime('%Y-%m-%d %H:%M:%S')}"
@@ -103,8 +191,6 @@ class TranscriptParticipant(BaseModel):
class Transcript(BaseModel):
"""Full transcript model with all fields."""
model_config = ConfigDict(from_attributes=True)
id: str = Field(default_factory=generate_uuid4)
user_id: str | None = None
name: str = Field(default_factory=generate_transcript_name)
@@ -273,7 +359,6 @@ class Transcript(BaseModel):
class TranscriptController:
async def get_all(
self,
session: AsyncSession,
user_id: str | None = None,
order_by: str | None = None,
filter_empty: bool | None = False,
@@ -298,114 +383,102 @@ class TranscriptController:
- `search_term`: filter transcripts by search term
"""
query = select(TranscriptModel).join(
RoomModel, TranscriptModel.room_id == RoomModel.id, isouter=True
query = transcripts.select().join(
rooms, transcripts.c.room_id == rooms.c.id, isouter=True
)
if user_id:
query = query.where(
or_(TranscriptModel.user_id == user_id, RoomModel.is_shared)
or_(transcripts.c.user_id == user_id, rooms.c.is_shared)
)
else:
query = query.where(RoomModel.is_shared)
query = query.where(rooms.c.is_shared)
if source_kind:
query = query.where(TranscriptModel.source_kind == source_kind)
query = query.where(transcripts.c.source_kind == source_kind)
if room_id:
query = query.where(TranscriptModel.room_id == room_id)
query = query.where(transcripts.c.room_id == room_id)
if search_term:
query = query.where(TranscriptModel.title.ilike(f"%{search_term}%"))
query = query.where(transcripts.c.title.ilike(f"%{search_term}%"))
# Exclude heavy JSON columns from list queries
# Get all ORM column attributes except excluded ones
transcript_columns = [
getattr(TranscriptModel, col.name)
for col in TranscriptModel.__table__.c
if col.name not in exclude_columns
col for col in transcripts.c if col.name not in exclude_columns
]
query = query.with_only_columns(
*transcript_columns,
RoomModel.name.label("room_name"),
transcript_columns
+ [
rooms.c.name.label("room_name"),
]
)
if order_by is not None:
field = getattr(TranscriptModel, order_by[1:])
field = getattr(transcripts.c, order_by[1:])
if order_by.startswith("-"):
field = field.desc()
query = query.order_by(field)
if filter_empty:
query = query.filter(TranscriptModel.status != "idle")
query = query.filter(transcripts.c.status != "idle")
if filter_recording:
query = query.filter(TranscriptModel.status != "recording")
query = query.filter(transcripts.c.status != "recording")
# print(query.compile(compile_kwargs={"literal_binds": True}))
if return_query:
return query
result = await session.execute(query)
return [dict(row) for row in result.mappings().all()]
results = await get_database().fetch_all(query)
return results
async def get_by_id(
self, session: AsyncSession, transcript_id: str, **kwargs
) -> Transcript | None:
async def get_by_id(self, transcript_id: str, **kwargs) -> Transcript | None:
"""
Get a transcript by id
"""
query = select(TranscriptModel).where(TranscriptModel.id == transcript_id)
query = transcripts.select().where(transcripts.c.id == transcript_id)
if "user_id" in kwargs:
query = query.where(TranscriptModel.user_id == kwargs["user_id"])
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
query = query.where(transcripts.c.user_id == kwargs["user_id"])
result = await get_database().fetch_one(query)
if not result:
return None
return Transcript.model_validate(row)
return Transcript(**result)
async def get_by_recording_id(
self, session: AsyncSession, recording_id: str, **kwargs
self, recording_id: str, **kwargs
) -> Transcript | None:
"""
Get a transcript by recording_id
"""
query = select(TranscriptModel).where(
TranscriptModel.recording_id == recording_id
)
query = transcripts.select().where(transcripts.c.recording_id == recording_id)
if "user_id" in kwargs:
query = query.where(TranscriptModel.user_id == kwargs["user_id"])
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
query = query.where(transcripts.c.user_id == kwargs["user_id"])
result = await get_database().fetch_one(query)
if not result:
return None
return Transcript.model_validate(row)
return Transcript(**result)
async def get_by_room_id(
self, session: AsyncSession, room_id: str, **kwargs
) -> list[Transcript]:
async def get_by_room_id(self, room_id: str, **kwargs) -> list[Transcript]:
"""
Get transcripts by room_id (direct access without joins)
"""
query = select(TranscriptModel).where(TranscriptModel.room_id == room_id)
query = transcripts.select().where(transcripts.c.room_id == room_id)
if "user_id" in kwargs:
query = query.where(TranscriptModel.user_id == kwargs["user_id"])
query = query.where(transcripts.c.user_id == kwargs["user_id"])
if "order_by" in kwargs:
order_by = kwargs["order_by"]
field = getattr(TranscriptModel, order_by[1:])
field = getattr(transcripts.c, order_by[1:])
if order_by.startswith("-"):
field = field.desc()
query = query.order_by(field)
results = await session.execute(query)
return [
Transcript.model_validate(dict(row)) for row in results.mappings().all()
]
results = await get_database().fetch_all(query)
return [Transcript(**result) for result in results]
async def get_by_id_for_http(
self,
session: AsyncSession,
transcript_id: str,
user_id: str | None,
) -> Transcript:
@@ -418,14 +491,13 @@ class TranscriptController:
This method checks the share mode of the transcript and the user_id
to determine if the user can access the transcript.
"""
query = select(TranscriptModel).where(TranscriptModel.id == transcript_id)
result = await session.execute(query)
row = result.scalar_one_or_none()
if not row:
query = transcripts.select().where(transcripts.c.id == transcript_id)
result = await get_database().fetch_one(query)
if not result:
raise HTTPException(status_code=404, detail="Transcript not found")
# if the transcript is anonymous, share mode is not checked
transcript = Transcript.model_validate(row)
transcript = Transcript(**result)
if transcript.user_id is None:
return transcript
@@ -448,7 +520,6 @@ class TranscriptController:
async def add(
self,
session: AsyncSession,
name: str,
source_kind: SourceKind,
source_language: str = "en",
@@ -473,15 +544,14 @@ class TranscriptController:
meeting_id=meeting_id,
room_id=room_id,
)
query = insert(TranscriptModel).values(**transcript.model_dump())
await session.execute(query)
await session.commit()
query = transcripts.insert().values(**transcript.model_dump())
await get_database().execute(query)
return transcript
# TODO investigate why mutate= is used. it's used in one place currently, maybe because of ORM field updates.
# using mutate=True is discouraged
async def update(
self, session: AsyncSession, transcript: Transcript, values: dict, mutate=False
self, transcript: Transcript, values: dict, mutate=False
) -> Transcript:
"""
Update a transcript fields with key/values in values.
@@ -490,12 +560,11 @@ class TranscriptController:
values = TranscriptController._handle_topics_update(values)
query = (
update(TranscriptModel)
.where(TranscriptModel.id == transcript.id)
transcripts.update()
.where(transcripts.c.id == transcript.id)
.values(**values)
)
await session.execute(query)
await session.commit()
await get_database().execute(query)
if mutate:
for key, value in values.items():
setattr(transcript, key, value)
@@ -524,14 +593,13 @@ class TranscriptController:
async def remove_by_id(
self,
session: AsyncSession,
transcript_id: str,
user_id: str | None = None,
) -> None:
"""
Remove a transcript by id
"""
transcript = await self.get_by_id(session, transcript_id)
transcript = await self.get_by_id(transcript_id)
if not transcript:
return
if user_id is not None and transcript.user_id != user_id:
@@ -551,7 +619,7 @@ class TranscriptController:
if transcript.recording_id:
try:
recording = await recordings_controller.get_by_id(
session, transcript.recording_id
transcript.recording_id
)
if recording:
try:
@@ -562,40 +630,33 @@ class TranscriptController:
exc_info=e,
recording_id=transcript.recording_id,
)
await recordings_controller.remove_by_id(
session, transcript.recording_id
)
await recordings_controller.remove_by_id(transcript.recording_id)
except Exception as e:
logger.warning(
"Failed to delete recording row",
exc_info=e,
recording_id=transcript.recording_id,
)
query = delete(TranscriptModel).where(TranscriptModel.id == transcript_id)
await session.execute(query)
await session.commit()
query = transcripts.delete().where(transcripts.c.id == transcript_id)
await get_database().execute(query)
async def remove_by_recording_id(self, session: AsyncSession, recording_id: str):
async def remove_by_recording_id(self, recording_id: str):
"""
Remove a transcript by recording_id
"""
query = delete(TranscriptModel).where(
TranscriptModel.recording_id == recording_id
)
await session.execute(query)
await session.commit()
query = transcripts.delete().where(transcripts.c.recording_id == recording_id)
await get_database().execute(query)
@asynccontextmanager
async def transaction(self, session: AsyncSession):
async def transaction(self):
"""
A context manager for database transaction
"""
async with session.begin():
async with get_database().transaction(isolation="serializable"):
yield
async def append_event(
self,
session: AsyncSession,
transcript: Transcript,
event: str,
data: Any,
@@ -604,12 +665,11 @@ class TranscriptController:
Append an event to a transcript
"""
resp = transcript.add_event(event=event, data=data)
await self.update(session, transcript, {"events": transcript.events_dump()})
await self.update(transcript, {"events": transcript.events_dump()})
return resp
async def upsert_topic(
self,
session: AsyncSession,
transcript: Transcript,
topic: TranscriptTopic,
) -> TranscriptEvent:
@@ -617,9 +677,9 @@ class TranscriptController:
Upsert topics to a transcript
"""
transcript.upsert_topic(topic)
await self.update(session, transcript, {"topics": transcript.topics_dump()})
await self.update(transcript, {"topics": transcript.topics_dump()})
async def move_mp3_to_storage(self, session: AsyncSession, transcript: Transcript):
async def move_mp3_to_storage(self, transcript: Transcript):
"""
Move mp3 file to storage
"""
@@ -643,16 +703,12 @@ class TranscriptController:
# indicate on the transcript that the audio is now on storage
# mutates transcript argument
await self.update(
session, transcript, {"audio_location": "storage"}, mutate=True
)
await self.update(transcript, {"audio_location": "storage"}, mutate=True)
# unlink the local file
transcript.audio_mp3_filename.unlink(missing_ok=True)
async def download_mp3_from_storage(
self, session: AsyncSession, transcript: Transcript
):
async def download_mp3_from_storage(self, transcript: Transcript):
"""
Download audio from storage
"""
@@ -664,7 +720,6 @@ class TranscriptController:
async def upsert_participant(
self,
session: AsyncSession,
transcript: Transcript,
participant: TranscriptParticipant,
) -> TranscriptParticipant:
@@ -672,14 +727,11 @@ class TranscriptController:
Add/update a participant to a transcript
"""
result = transcript.upsert_participant(participant)
await self.update(
session, transcript, {"participants": transcript.participants_dump()}
)
await self.update(transcript, {"participants": transcript.participants_dump()})
return result
async def delete_participant(
self,
session: AsyncSession,
transcript: Transcript,
participant_id: str,
):
@@ -687,31 +739,28 @@ class TranscriptController:
Delete a participant from a transcript
"""
transcript.delete_participant(participant_id)
await self.update(
session, transcript, {"participants": transcript.participants_dump()}
)
await self.update(transcript, {"participants": transcript.participants_dump()})
async def set_status(
self, session: AsyncSession, transcript_id: str, status: TranscriptStatus
self, transcript_id: str, status: TranscriptStatus
) -> TranscriptEvent | None:
"""
Update the status of a transcript
Will add an event STATUS + update the status field of transcript
"""
async with self.transaction(session):
transcript = await self.get_by_id(session, transcript_id)
async with self.transaction():
transcript = await self.get_by_id(transcript_id)
if not transcript:
raise Exception(f"Transcript {transcript_id} not found")
if transcript.status == status:
return
resp = await self.append_event(
session,
transcript=transcript,
event="STATUS",
data=StrValue(value=status),
)
await self.update(session, transcript, {"status": status})
await self.update(transcript, {"status": status})
return resp

View File

@@ -0,0 +1,9 @@
"""Database utility functions."""
from reflector.db import get_database
def is_postgresql() -> bool:
return get_database().url.scheme and get_database().url.scheme.startswith(
"postgresql"
)

View File

@@ -0,0 +1,227 @@
import json
from pathlib import Path
from typing import Any, Dict, List, Literal, Optional, Union
from pydantic import BaseModel
from typing_extensions import TypedDict
class ParticipantInfo(BaseModel):
jid: str
nick: str
id: str
is_moderator: bool = False
class ParticipantLeftInfo(BaseModel):
jid: str
nick: Optional[str] = None
duration_seconds: Optional[int] = None
class RoomCreatedEvent(BaseModel):
type: Literal["room_created"]
timestamp: int
room_name: str
room_jid: str
meeting_url: str
class RecordingStartedEvent(BaseModel):
type: Literal["recording_started"]
timestamp: int
room_name: str
session_id: str
jibri_jid: str
class RecordingStoppedEvent(BaseModel):
type: Literal["recording_stopped"]
timestamp: int
room_name: str
session_id: str
meeting_url: str
class ParticipantJoinedEvent(BaseModel):
type: Literal["participant_joined"]
timestamp: int
room_name: str
participant: ParticipantInfo
class ParticipantLeftEvent(BaseModel):
type: Literal["participant_left"]
timestamp: int
room_name: str
participant: ParticipantLeftInfo
class SpeakerActiveEvent(BaseModel):
type: Literal["speaker_active"]
timestamp: int
room_name: str
speaker_jid: str
speaker_nick: str
duration: int
class DominantSpeakerChangedEvent(BaseModel):
type: Literal["dominant_speaker_changed"]
timestamp: int
room_name: str
previous: str
current: str
JitsiEvent = Union[
RoomCreatedEvent,
RecordingStartedEvent,
RecordingStoppedEvent,
ParticipantJoinedEvent,
ParticipantLeftEvent,
SpeakerActiveEvent,
DominantSpeakerChangedEvent,
]
class RoomInfo(TypedDict):
name: str
jid: str
created_at: int
meeting_url: str
recording_stopped_at: Optional[int]
class ParticipantData(TypedDict):
jid: str
nick: str
id: str
is_moderator: bool
joined_at: int
left_at: Optional[int]
duration: Optional[int]
events: List[str]
class SpeakerStats(TypedDict):
total_time: int
nick: str
class ParsedMetadata(TypedDict):
room: RoomInfo
participants: List[ParticipantData]
speaker_stats: Dict[str, SpeakerStats]
event_count: int
class JitsiEventParser:
def parse_event(self, event_data: Dict[str, Any]) -> Optional[JitsiEvent]:
event_type = event_data.get("type")
try:
if event_type == "room_created":
return RoomCreatedEvent(**event_data)
elif event_type == "recording_started":
return RecordingStartedEvent(**event_data)
elif event_type == "recording_stopped":
return RecordingStoppedEvent(**event_data)
elif event_type == "participant_joined":
return ParticipantJoinedEvent(**event_data)
elif event_type == "participant_left":
return ParticipantLeftEvent(**event_data)
elif event_type == "speaker_active":
return SpeakerActiveEvent(**event_data)
elif event_type == "dominant_speaker_changed":
return DominantSpeakerChangedEvent(**event_data)
else:
return None
except Exception:
return None
def parse_events_file(self, recording_path: str) -> ParsedMetadata:
events_file = Path(recording_path) / "events.jsonl"
room_info: RoomInfo = {
"name": "",
"jid": "",
"created_at": 0,
"meeting_url": "",
"recording_stopped_at": None,
}
if not events_file.exists():
return ParsedMetadata(
room=room_info, participants=[], speaker_stats={}, event_count=0
)
events: List[JitsiEvent] = []
participants: Dict[str, ParticipantData] = {}
speaker_stats: Dict[str, SpeakerStats] = {}
with open(events_file, "r") as f:
for line in f:
if not line.strip():
continue
try:
event_data = json.loads(line)
event = self.parse_event(event_data)
if event is None:
continue
events.append(event)
if isinstance(event, RoomCreatedEvent):
room_info = {
"name": event.room_name,
"jid": event.room_jid,
"created_at": event.timestamp,
"meeting_url": event.meeting_url,
"recording_stopped_at": None,
}
elif isinstance(event, ParticipantJoinedEvent):
participants[event.participant.id] = {
"jid": event.participant.jid,
"nick": event.participant.nick,
"id": event.participant.id,
"is_moderator": event.participant.is_moderator,
"joined_at": event.timestamp,
"left_at": None,
"duration": None,
"events": ["joined"],
}
elif isinstance(event, ParticipantLeftEvent):
participant_id = event.participant.jid.split("/")[0]
if participant_id in participants:
participants[participant_id]["left_at"] = event.timestamp
participants[participant_id]["duration"] = (
event.participant.duration_seconds
)
participants[participant_id]["events"].append("left")
elif isinstance(event, SpeakerActiveEvent):
if event.speaker_jid not in speaker_stats:
speaker_stats[event.speaker_jid] = {
"total_time": 0,
"nick": event.speaker_nick,
}
speaker_stats[event.speaker_jid]["total_time"] += event.duration
elif isinstance(event, RecordingStoppedEvent):
room_info["recording_stopped_at"] = event.timestamp
room_info["meeting_url"] = event.meeting_url
except (json.JSONDecodeError, Exception):
continue
return ParsedMetadata(
room=room_info,
participants=list(participants.values()),
speaker_stats=speaker_stats,
event_count=len(events),
)

View File

@@ -13,10 +13,8 @@ from pathlib import Path
import av
import structlog
from celery import chain, shared_task
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.asynctask import asynctask
from reflector.db import get_session_factory
from reflector.db.rooms import rooms_controller
from reflector.db.transcripts import (
SourceKind,
@@ -55,7 +53,6 @@ from reflector.processors.types import (
)
from reflector.settings import settings
from reflector.storage import get_transcripts_storage
from reflector.worker.session_decorator import with_session
from reflector.worker.webhook import send_transcript_webhook
@@ -100,23 +97,17 @@ class PipelineMainFile(PipelineMainBase):
@broadcast_to_sockets
async def set_status(self, transcript_id: str, status: TranscriptStatus):
async with self.lock_transaction():
async with get_session_factory()() as session:
return await transcripts_controller.set_status(
session, transcript_id, status
)
return await transcripts_controller.set_status(transcript_id, status)
async def process(self, file_path: Path):
"""Main entry point for file processing"""
self.logger.info(f"Starting file pipeline for {file_path}")
async with get_session_factory()() as session:
transcript = await transcripts_controller.get_by_id(
session, self.transcript_id
)
transcript = await self.get_transcript()
# Clear transcript as we're going to regenerate everything
# Clear transcript as we're going to regenerate everything
async with self.transaction():
await transcripts_controller.update(
session,
transcript,
{
"events": [],
@@ -132,7 +123,6 @@ class PipelineMainFile(PipelineMainBase):
# Run parallel processing
await self.run_parallel_processing(
session,
audio_path,
audio_url,
transcript.source_language,
@@ -141,8 +131,7 @@ class PipelineMainFile(PipelineMainBase):
self.logger.info("File pipeline complete")
async with get_session_factory()() as session:
await transcripts_controller.set_status(session, transcript.id, "ended")
await transcripts_controller.set_status(transcript.id, "ended")
async def extract_and_write_audio(
self, file_path: Path, transcript: Transcript
@@ -204,7 +193,6 @@ class PipelineMainFile(PipelineMainBase):
async def run_parallel_processing(
self,
session,
audio_path: Path,
audio_url: str,
source_language: str,
@@ -218,7 +206,7 @@ class PipelineMainFile(PipelineMainBase):
# Phase 1: Parallel processing of independent tasks
transcription_task = self.transcribe_file(audio_url, source_language)
diarization_task = self.diarize_file(audio_url)
waveform_task = self.generate_waveform(session, audio_path)
waveform_task = self.generate_waveform(audio_path)
results = await asyncio.gather(
transcription_task, diarization_task, waveform_task, return_exceptions=True
@@ -266,7 +254,7 @@ class PipelineMainFile(PipelineMainBase):
)
results = await asyncio.gather(
self.generate_title(topics),
self.generate_summaries(session, topics),
self.generate_summaries(topics),
return_exceptions=True,
)
@@ -318,9 +306,9 @@ class PipelineMainFile(PipelineMainBase):
self.logger.error(f"Diarization failed: {e}")
return None
async def generate_waveform(self, session: AsyncSession, audio_path: Path):
async def generate_waveform(self, audio_path: Path):
"""Generate and save waveform"""
transcript = await transcripts_controller.get_by_id(session, self.transcript_id)
transcript = await self.get_transcript()
processor = AudioWaveformProcessor(
audio_path=audio_path,
@@ -373,13 +361,13 @@ class PipelineMainFile(PipelineMainBase):
await processor.flush()
async def generate_summaries(self, session, topics: list[TitleSummary]):
async def generate_summaries(self, topics: list[TitleSummary]):
"""Generate long and short summaries from topics"""
if not topics:
self.logger.warning("No topics for summary generation")
return
transcript = await transcripts_controller.get_by_id(session, self.transcript_id)
transcript = await self.get_transcript()
processor = TranscriptFinalSummaryProcessor(
transcript=transcript,
callback=self.on_long_summary,
@@ -395,15 +383,14 @@ class PipelineMainFile(PipelineMainBase):
@shared_task
@asynctask
@with_session
async def task_send_webhook_if_needed(session, *, transcript_id: str):
async def task_send_webhook_if_needed(*, transcript_id: str):
"""Send webhook if this is a room recording with webhook configured"""
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
return
if transcript.source_kind == SourceKind.ROOM and transcript.room_id:
room = await rooms_controller.get_by_id(session, transcript.room_id)
room = await rooms_controller.get_by_id(transcript.room_id)
if room and room.webhook_url:
logger.info(
"Dispatching webhook",
@@ -418,10 +405,10 @@ async def task_send_webhook_if_needed(session, *, transcript_id: str):
@shared_task
@asynctask
@with_session
async def task_pipeline_file_process(session, *, transcript_id: str):
async def task_pipeline_file_process(*, transcript_id: str):
"""Celery task for file pipeline processing"""
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
raise Exception(f"Transcript {transcript_id} not found")

View File

@@ -20,11 +20,9 @@ import av
import boto3
from celery import chord, current_task, group, shared_task
from pydantic import BaseModel
from sqlalchemy.ext.asyncio import AsyncSession
from structlog import BoundLogger as Logger
from reflector.asynctask import asynctask
from reflector.db import get_session_factory
from reflector.db.meetings import meeting_consent_controller, meetings_controller
from reflector.db.recordings import recordings_controller
from reflector.db.rooms import rooms_controller
@@ -64,7 +62,6 @@ from reflector.processors.types import (
from reflector.processors.types import Transcript as TranscriptProcessorType
from reflector.settings import settings
from reflector.storage import get_transcripts_storage
from reflector.worker.session_decorator import with_session_and_transcript
from reflector.ws_manager import WebsocketManager, get_ws_manager
from reflector.zulip import (
get_zulip_message,
@@ -99,10 +96,9 @@ def get_transcript(func):
@functools.wraps(func)
async def wrapper(**kwargs):
transcript_id = kwargs.pop("transcript_id")
async with get_session_factory()() as session:
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id=transcript_id)
if not transcript:
raise Exception(f"Transcript {transcript_id} not found")
raise Exception("Transcript {transcript_id} not found")
# Enhanced logger with Celery task context
tlogger = logger.bind(transcript_id=transcript.id)
@@ -143,9 +139,11 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
self._ws_manager = get_ws_manager()
return self._ws_manager
async def get_transcript(self, session: AsyncSession) -> Transcript:
async def get_transcript(self) -> Transcript:
# fetch the transcript
result = await transcripts_controller.get_by_id(session, self.transcript_id)
result = await transcripts_controller.get_by_id(
transcript_id=self.transcript_id
)
if not result:
raise Exception("Transcript not found")
return result
@@ -177,8 +175,8 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
@asynccontextmanager
async def transaction(self):
async with self.lock_transaction():
async with get_session_factory()() as session:
yield session
async with transcripts_controller.transaction():
yield
@broadcast_to_sockets
async def on_status(self, status):
@@ -209,17 +207,13 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
# when the status of the pipeline changes, update the transcript
async with self._lock:
async with get_session_factory()() as session:
return await transcripts_controller.set_status(
session, self.transcript_id, status
)
return await transcripts_controller.set_status(self.transcript_id, status)
@broadcast_to_sockets
async def on_transcript(self, data):
async with self.transaction() as session:
transcript = await self.get_transcript(session)
async with self.transaction():
transcript = await self.get_transcript()
return await transcripts_controller.append_event(
session,
transcript=transcript,
event="TRANSCRIPT",
data=TranscriptText(text=data.text, translation=data.translation),
@@ -236,11 +230,10 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
)
if isinstance(data, TitleSummaryWithIdProcessorType):
topic.id = data.id
async with self.transaction() as session:
transcript = await self.get_transcript(session)
await transcripts_controller.upsert_topic(session, transcript, topic)
async with self.transaction():
transcript = await self.get_transcript()
await transcripts_controller.upsert_topic(transcript, topic)
return await transcripts_controller.append_event(
session,
transcript=transcript,
event="TOPIC",
data=topic,
@@ -249,18 +242,16 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
@broadcast_to_sockets
async def on_title(self, data):
final_title = TranscriptFinalTitle(title=data.title)
async with self.transaction() as session:
transcript = await self.get_transcript(session)
async with self.transaction():
transcript = await self.get_transcript()
if not transcript.title:
await transcripts_controller.update(
session,
transcript,
{
"title": final_title.title,
},
)
return await transcripts_controller.append_event(
session,
transcript=transcript,
event="FINAL_TITLE",
data=final_title,
@@ -269,17 +260,15 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
@broadcast_to_sockets
async def on_long_summary(self, data):
final_long_summary = TranscriptFinalLongSummary(long_summary=data.long_summary)
async with self.transaction() as session:
transcript = await self.get_transcript(session)
async with self.transaction():
transcript = await self.get_transcript()
await transcripts_controller.update(
session,
transcript,
{
"long_summary": final_long_summary.long_summary,
},
)
return await transcripts_controller.append_event(
session,
transcript=transcript,
event="FINAL_LONG_SUMMARY",
data=final_long_summary,
@@ -290,17 +279,15 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
final_short_summary = TranscriptFinalShortSummary(
short_summary=data.short_summary
)
async with self.transaction() as session:
transcript = await self.get_transcript(session)
async with self.transaction():
transcript = await self.get_transcript()
await transcripts_controller.update(
session,
transcript,
{
"short_summary": final_short_summary.short_summary,
},
)
return await transcripts_controller.append_event(
session,
transcript=transcript,
event="FINAL_SHORT_SUMMARY",
data=final_short_summary,
@@ -308,30 +295,29 @@ class PipelineMainBase(PipelineRunner[PipelineMessage], Generic[PipelineMessage]
@broadcast_to_sockets
async def on_duration(self, data):
async with self.transaction() as session:
async with self.transaction():
duration = TranscriptDuration(duration=data)
transcript = await self.get_transcript(session)
transcript = await self.get_transcript()
await transcripts_controller.update(
session,
transcript,
{
"duration": duration.duration,
},
)
return await transcripts_controller.append_event(
session, transcript=transcript, event="DURATION", data=duration
transcript=transcript, event="DURATION", data=duration
)
@broadcast_to_sockets
async def on_waveform(self, data):
async with self.transaction() as session:
async with self.transaction():
waveform = TranscriptWaveform(waveform=data)
transcript = await self.get_transcript(session)
transcript = await self.get_transcript()
return await transcripts_controller.append_event(
session, transcript=transcript, event="WAVEFORM", data=waveform
transcript=transcript, event="WAVEFORM", data=waveform
)
@@ -344,8 +330,7 @@ class PipelineMainLive(PipelineMainBase):
async def create(self) -> Pipeline:
# create a context for the whole rtc transaction
# add a customised logger to the context
async with get_session_factory()() as session:
transcript = await self.get_transcript(session)
transcript = await self.get_transcript()
processors = [
AudioFileWriterProcessor(
@@ -393,8 +378,7 @@ class PipelineMainDiarization(PipelineMainBase[AudioDiarizationInput]):
# now let's start the pipeline by pushing information to the
# first processor diarization processor
# XXX translation is lost when converting our data model to the processor model
async with get_session_factory()() as session:
transcript = await self.get_transcript(session)
transcript = await self.get_transcript()
# diarization works only if the file is uploaded to an external storage
if transcript.audio_location == "local":
@@ -427,8 +411,7 @@ class PipelineMainFromTopics(PipelineMainBase[TitleSummaryWithIdProcessorType]):
async def create(self) -> Pipeline:
# get transcript
async with get_session_factory()() as session:
self._transcript = transcript = await self.get_transcript(session)
self._transcript = transcript = await self.get_transcript()
# create pipeline
processors = self.get_processors()
@@ -533,7 +516,8 @@ async def pipeline_convert_to_mp3(transcript: Transcript, logger: Logger):
logger.info("Convert to mp3 done")
async def pipeline_upload_mp3(session, transcript: Transcript, logger: Logger):
@get_transcript
async def pipeline_upload_mp3(transcript: Transcript, logger: Logger):
if not settings.TRANSCRIPT_STORAGE_BACKEND:
logger.info("No storage backend configured, skipping mp3 upload")
return
@@ -551,7 +535,7 @@ async def pipeline_upload_mp3(session, transcript: Transcript, logger: Logger):
return
# Upload to external storage and delete the file
await transcripts_controller.move_mp3_to_storage(session, transcript)
await transcripts_controller.move_mp3_to_storage(transcript)
logger.info("Upload mp3 done")
@@ -580,23 +564,20 @@ async def pipeline_summaries(transcript: Transcript, logger: Logger):
logger.info("Summaries done")
async def cleanup_consent(session, transcript: Transcript, logger: Logger):
@get_transcript
async def cleanup_consent(transcript: Transcript, logger: Logger):
logger.info("Starting consent cleanup")
consent_denied = False
recording = None
try:
if transcript.recording_id:
recording = await recordings_controller.get_by_id(
session, transcript.recording_id
)
recording = await recordings_controller.get_by_id(transcript.recording_id)
if recording and recording.meeting_id:
meeting = await meetings_controller.get_by_id(
session, recording.meeting_id
)
meeting = await meetings_controller.get_by_id(recording.meeting_id)
if meeting:
consent_denied = await meeting_consent_controller.has_any_denial(
session, meeting.id
meeting.id
)
except Exception as e:
logger.error(f"Failed to get fetch consent: {e}", exc_info=e)
@@ -625,7 +606,7 @@ async def cleanup_consent(session, transcript: Transcript, logger: Logger):
logger.error(f"Failed to delete Whereby recording: {e}", exc_info=e)
# non-transactional, files marked for deletion not actually deleted is possible
await transcripts_controller.update(session, transcript, {"audio_deleted": True})
await transcripts_controller.update(transcript, {"audio_deleted": True})
# 2. Delete processed audio from transcript storage S3 bucket
if transcript.audio_location == "storage":
storage = get_transcripts_storage()
@@ -649,14 +630,15 @@ async def cleanup_consent(session, transcript: Transcript, logger: Logger):
logger.info("Consent cleanup done")
async def pipeline_post_to_zulip(session, transcript: Transcript, logger: Logger):
@get_transcript
async def pipeline_post_to_zulip(transcript: Transcript, logger: Logger):
logger.info("Starting post to zulip")
if not transcript.recording_id:
logger.info("Transcript has no recording")
return
recording = await recordings_controller.get_by_id(session, transcript.recording_id)
recording = await recordings_controller.get_by_id(transcript.recording_id)
if not recording:
logger.info("Recording not found")
return
@@ -665,12 +647,12 @@ async def pipeline_post_to_zulip(session, transcript: Transcript, logger: Logger
logger.info("Recording has no meeting")
return
meeting = await meetings_controller.get_by_id(session, recording.meeting_id)
meeting = await meetings_controller.get_by_id(recording.meeting_id)
if not meeting:
logger.info("No meeting found for this recording")
return
room = await rooms_controller.get_by_id(session, meeting.room_id)
room = await rooms_controller.get_by_id(meeting.room_id)
if not room:
logger.error(f"Missing room for a meeting {meeting.id}")
return
@@ -696,7 +678,7 @@ async def pipeline_post_to_zulip(session, transcript: Transcript, logger: Logger
room.zulip_stream, room.zulip_topic, message
)
await transcripts_controller.update(
session, transcript, {"zulip_message_id": response["id"]}
transcript, {"zulip_message_id": response["id"]}
)
logger.info("Posted to zulip")
@@ -727,11 +709,8 @@ async def task_pipeline_convert_to_mp3(*, transcript_id: str):
@shared_task
@asynctask
@with_session_and_transcript
async def task_pipeline_upload_mp3(
session, *, transcript: Transcript, logger: Logger, transcript_id: str
):
await pipeline_upload_mp3(session, transcript=transcript, logger=logger)
async def task_pipeline_upload_mp3(*, transcript_id: str):
await pipeline_upload_mp3(transcript_id=transcript_id)
@shared_task
@@ -754,20 +733,14 @@ async def task_pipeline_final_summaries(*, transcript_id: str):
@shared_task
@asynctask
@with_session_and_transcript
async def task_cleanup_consent(
session, *, transcript: Transcript, logger: Logger, transcript_id: str
):
await cleanup_consent(session, transcript=transcript, logger=logger)
async def task_cleanup_consent(*, transcript_id: str):
await cleanup_consent(transcript_id=transcript_id)
@shared_task
@asynctask
@with_session_and_transcript
async def task_pipeline_post_to_zulip(
session, *, transcript: Transcript, logger: Logger, transcript_id: str
):
await pipeline_post_to_zulip(session, transcript=transcript, logger=logger)
async def task_pipeline_post_to_zulip(*, transcript_id: str):
await pipeline_post_to_zulip(transcript_id=transcript_id)
def pipeline_post(*, transcript_id: str):
@@ -799,16 +772,14 @@ def pipeline_post(*, transcript_id: str):
async def pipeline_process(transcript: Transcript, logger: Logger):
try:
if transcript.audio_location == "storage":
async with get_session_factory()() as session:
await transcripts_controller.download_mp3_from_storage(transcript)
transcript.audio_waveform_filename.unlink(missing_ok=True)
await transcripts_controller.update(
session,
transcript,
{
"topics": [],
},
)
await transcripts_controller.download_mp3_from_storage(transcript)
transcript.audio_waveform_filename.unlink(missing_ok=True)
await transcripts_controller.update(
transcript,
{
"topics": [],
},
)
# open audio
audio_filename = next(transcript.data_path.glob("upload.*"), None)
@@ -840,14 +811,12 @@ async def pipeline_process(transcript: Transcript, logger: Logger):
except Exception as exc:
logger.error("Pipeline error", exc_info=exc)
async with get_session_factory()() as session:
await transcripts_controller.update(
session,
transcript,
{
"status": "error",
},
)
await transcripts_controller.update(
transcript,
{
"status": "error",
},
)
raise
logger.info("Pipeline ended")

View File

@@ -4,8 +4,11 @@ import tempfile
from pathlib import Path
from typing import Annotated, TypedDict
from profanityfilter import ProfanityFilter
from pydantic import BaseModel, Field, PrivateAttr
from reflector.redis_cache import redis_cache
class DiarizationSegment(TypedDict):
"""Type definition for diarization segment containing speaker information"""
@@ -17,6 +20,9 @@ class DiarizationSegment(TypedDict):
PUNC_RE = re.compile(r"[.;:?!…]")
profanity_filter = ProfanityFilter()
profanity_filter.set_censor("*")
class AudioFile(BaseModel):
name: str
@@ -118,11 +124,21 @@ def words_to_segments(words: list[Word]) -> list[TranscriptSegment]:
class Transcript(BaseModel):
translation: str | None = None
words: list[Word] = []
words: list[Word] = None
@property
def raw_text(self):
# Uncensored text
return "".join([word.text for word in self.words])
@redis_cache(prefix="profanity", duration=3600 * 24 * 7)
def _get_censored_text(self, text: str):
return profanity_filter.censor(text).strip()
@property
def text(self):
return "".join([word.text for word in self.words])
# Censored text
return self._get_censored_text(self.raw_text)
@property
def human_timestamp(self):
@@ -154,6 +170,12 @@ class Transcript(BaseModel):
word.start += offset
word.end += offset
def clone(self):
words = [
Word(text=word.text, start=word.start, end=word.end) for word in self.words
]
return Transcript(text=self.text, translation=self.translation, words=words)
def as_segments(self) -> list[TranscriptSegment]:
return words_to_segments(self.words)

View File

@@ -1,17 +1,10 @@
import asyncio
import functools
import json
from typing import Optional
import redis
import redis.asyncio as redis_async
import structlog
from redis.exceptions import LockError
from reflector.settings import settings
logger = structlog.get_logger(__name__)
redis_clients = {}
@@ -28,12 +21,6 @@ def get_redis_client(db=0):
return redis_clients[db]
async def get_async_redis_client(db: int = 0):
return await redis_async.from_url(
f"redis://{settings.REDIS_HOST}:{settings.REDIS_PORT}/{db}"
)
def redis_cache(prefix="cache", duration=3600, db=settings.REDIS_CACHE_DB, argidx=1):
"""
Cache the result of a function in Redis.
@@ -62,87 +49,3 @@ def redis_cache(prefix="cache", duration=3600, db=settings.REDIS_CACHE_DB, argid
return wrapper
return decorator
class RedisAsyncLock:
def __init__(
self,
key: str,
timeout: int = 120,
extend_interval: int = 30,
skip_if_locked: bool = False,
blocking: bool = True,
blocking_timeout: Optional[float] = None,
):
self.key = f"async_lock:{key}"
self.timeout = timeout
self.extend_interval = extend_interval
self.skip_if_locked = skip_if_locked
self.blocking = blocking
self.blocking_timeout = blocking_timeout
self._lock = None
self._redis = None
self._extend_task = None
self._acquired = False
async def _extend_lock_periodically(self):
while True:
try:
await asyncio.sleep(self.extend_interval)
if self._lock:
await self._lock.extend(self.timeout, replace_ttl=True)
logger.debug("Extended lock", key=self.key)
except LockError:
logger.warning("Failed to extend lock", key=self.key)
break
except asyncio.CancelledError:
break
except Exception as e:
logger.error("Error extending lock", key=self.key, error=str(e))
break
async def __aenter__(self):
self._redis = await get_async_redis_client()
self._lock = self._redis.lock(
self.key,
timeout=self.timeout,
blocking=self.blocking,
blocking_timeout=self.blocking_timeout,
)
self._acquired = await self._lock.acquire()
if not self._acquired:
if self.skip_if_locked:
logger.warning(
"Lock already acquired by another process, skipping", key=self.key
)
return self
else:
raise LockError(f"Failed to acquire lock: {self.key}")
self._extend_task = asyncio.create_task(self._extend_lock_periodically())
logger.info("Acquired lock", key=self.key)
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
if self._extend_task:
self._extend_task.cancel()
try:
await self._extend_task
except asyncio.CancelledError:
pass
if self._acquired and self._lock:
try:
await self._lock.release()
logger.info("Released lock", key=self.key)
except LockError:
logger.debug("Lock already released or expired", key=self.key)
if self._redis:
await self._redis.aclose()
@property
def acquired(self) -> bool:
return self._acquired

View File

@@ -1,412 +0,0 @@
"""
ICS Calendar Synchronization Service
This module provides services for fetching, parsing, and synchronizing ICS (iCalendar)
calendar feeds with room booking data in the database.
Key Components:
- ICSFetchService: Handles HTTP fetching and parsing of ICS calendar data
- ICSSyncService: Manages the synchronization process between ICS feeds and database
Example Usage:
# Sync a room's calendar
room = Room(id="room1", name="conference-room", ics_url="https://cal.example.com/room.ics")
result = await ics_sync_service.sync_room_calendar(room)
# Result structure:
{
"status": "success", # success|unchanged|error|skipped
"hash": "abc123...", # MD5 hash of ICS content
"events_found": 5, # Events matching this room
"total_events": 12, # Total events in calendar within time window
"events_created": 2, # New events added to database
"events_updated": 3, # Existing events modified
"events_deleted": 1 # Events soft-deleted (no longer in calendar)
}
Event Matching:
Events are matched to rooms by checking if the room's full URL appears in the
event's LOCATION or DESCRIPTION fields. Only events within a 25-hour window
(1 hour ago to 24 hours from now) are processed.
Input: ICS calendar URL (e.g., "https://calendar.google.com/calendar/ical/...")
Output: EventData objects with structured calendar information:
{
"ics_uid": "event123@google.com",
"title": "Team Meeting",
"description": "Weekly sync meeting",
"location": "https://meet.company.com/conference-room",
"start_time": datetime(2024, 1, 15, 14, 0, tzinfo=UTC),
"end_time": datetime(2024, 1, 15, 15, 0, tzinfo=UTC),
"attendees": [
{"email": "user@company.com", "name": "John Doe", "role": "ORGANIZER"},
{"email": "attendee@company.com", "name": "Jane Smith", "status": "ACCEPTED"}
],
"ics_raw_data": "BEGIN:VEVENT\nUID:event123@google.com\n..."
}
"""
import hashlib
from datetime import date, datetime, timedelta, timezone
from enum import Enum
from typing import TypedDict
import httpx
import pytz
import structlog
from icalendar import Calendar, Event
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db.calendar_events import CalendarEvent, calendar_events_controller
from reflector.db.rooms import Room, rooms_controller
from reflector.redis_cache import RedisAsyncLock
from reflector.settings import settings
logger = structlog.get_logger()
EVENT_WINDOW_DELTA_START = timedelta(hours=-1)
EVENT_WINDOW_DELTA_END = timedelta(hours=24)
class SyncStatus(str, Enum):
SUCCESS = "success"
UNCHANGED = "unchanged"
ERROR = "error"
SKIPPED = "skipped"
class AttendeeData(TypedDict, total=False):
email: str | None
name: str | None
status: str | None
role: str | None
class EventData(TypedDict):
ics_uid: str
title: str | None
description: str | None
location: str | None
start_time: datetime
end_time: datetime
attendees: list[AttendeeData]
ics_raw_data: str
class SyncStats(TypedDict):
events_created: int
events_updated: int
events_deleted: int
class SyncResultBase(TypedDict):
status: SyncStatus
class SyncResult(SyncResultBase, total=False):
hash: str | None
events_found: int
total_events: int
events_created: int
events_updated: int
events_deleted: int
error: str | None
reason: str | None
class ICSFetchService:
def __init__(self):
self.client = httpx.AsyncClient(
timeout=30.0, headers={"User-Agent": "Reflector/1.0"}
)
async def fetch_ics(self, url: str) -> str:
response = await self.client.get(url)
response.raise_for_status()
return response.text
def parse_ics(self, ics_content: str) -> Calendar:
return Calendar.from_ical(ics_content)
def extract_room_events(
self, calendar: Calendar, room_name: str, room_url: str
) -> tuple[list[EventData], int]:
events = []
total_events = 0
now = datetime.now(timezone.utc)
window_start = now + EVENT_WINDOW_DELTA_START
window_end = now + EVENT_WINDOW_DELTA_END
for component in calendar.walk():
if component.name != "VEVENT":
continue
status = component.get("STATUS", "").upper()
if status == "CANCELLED":
continue
# Count total non-cancelled events in the time window
event_data = self._parse_event(component)
if event_data and window_start <= event_data["start_time"] <= window_end:
total_events += 1
# Check if event matches this room
if self._event_matches_room(component, room_name, room_url):
events.append(event_data)
return events, total_events
def _event_matches_room(self, event: Event, room_name: str, room_url: str) -> bool:
location = str(event.get("LOCATION", ""))
description = str(event.get("DESCRIPTION", ""))
# Only match full room URL
# XXX leaved here as a patterns, to later be extended with tinyurl or such too
patterns = [
room_url,
]
# Check location and description for patterns
text_to_check = f"{location} {description}".lower()
for pattern in patterns:
if pattern.lower() in text_to_check:
return True
return False
def _parse_event(self, event: Event) -> EventData | None:
uid = str(event.get("UID", ""))
summary = str(event.get("SUMMARY", ""))
description = str(event.get("DESCRIPTION", ""))
location = str(event.get("LOCATION", ""))
dtstart = event.get("DTSTART")
dtend = event.get("DTEND")
if not dtstart:
return None
# Convert fields
start_time = self._normalize_datetime(
dtstart.dt if hasattr(dtstart, "dt") else dtstart
)
end_time = (
self._normalize_datetime(dtend.dt if hasattr(dtend, "dt") else dtend)
if dtend
else start_time + timedelta(hours=1)
)
attendees = self._parse_attendees(event)
# Get raw event data for storage
raw_data = event.to_ical().decode("utf-8")
return {
"ics_uid": uid,
"title": summary,
"description": description,
"location": location,
"start_time": start_time,
"end_time": end_time,
"attendees": attendees,
"ics_raw_data": raw_data,
}
def _normalize_datetime(self, dt) -> datetime:
# Ensure datetime is with timezone, if not, assume UTC
if isinstance(dt, date) and not isinstance(dt, datetime):
dt = datetime.combine(dt, datetime.min.time())
dt = pytz.UTC.localize(dt)
elif isinstance(dt, datetime):
if dt.tzinfo is None:
dt = pytz.UTC.localize(dt)
else:
dt = dt.astimezone(pytz.UTC)
return dt
def _parse_attendees(self, event: Event) -> list[AttendeeData]:
# Extracts attendee information from both ATTENDEE and ORGANIZER properties.
# Handles malformed comma-separated email addresses in single ATTENDEE fields
# by splitting them into separate attendee entries. Returns a list of attendee
# data including email, name, status, and role information.
final_attendees = []
attendees = event.get("ATTENDEE", [])
if not isinstance(attendees, list):
attendees = [attendees]
for att in attendees:
email_str = str(att).replace("mailto:", "") if att else None
# Handle malformed comma-separated email addresses in a single ATTENDEE field
if email_str and "," in email_str:
# Split comma-separated emails and create separate attendee entries
email_parts = [email.strip() for email in email_str.split(",")]
for email in email_parts:
if email and "@" in email:
clean_email = email.replace("MAILTO:", "").replace(
"mailto:", ""
)
att_data: AttendeeData = {
"email": clean_email,
"name": att.params.get("CN")
if hasattr(att, "params") and email == email_parts[0]
else None,
"status": att.params.get("PARTSTAT")
if hasattr(att, "params") and email == email_parts[0]
else None,
"role": att.params.get("ROLE")
if hasattr(att, "params") and email == email_parts[0]
else None,
}
final_attendees.append(att_data)
else:
# Normal single attendee
att_data: AttendeeData = {
"email": email_str,
"name": att.params.get("CN") if hasattr(att, "params") else None,
"status": att.params.get("PARTSTAT")
if hasattr(att, "params")
else None,
"role": att.params.get("ROLE") if hasattr(att, "params") else None,
}
final_attendees.append(att_data)
# Add organizer
organizer = event.get("ORGANIZER")
if organizer:
org_email = (
str(organizer).replace("mailto:", "").replace("MAILTO:", "")
if organizer
else None
)
org_data: AttendeeData = {
"email": org_email,
"name": organizer.params.get("CN")
if hasattr(organizer, "params")
else None,
"role": "ORGANIZER",
}
final_attendees.append(org_data)
return final_attendees
class ICSSyncService:
def __init__(self):
self.fetch_service = ICSFetchService()
async def sync_room_calendar(self, session: AsyncSession, room: Room) -> SyncResult:
async with RedisAsyncLock(
f"ics_sync_room:{room.id}", skip_if_locked=True
) as lock:
if not lock.acquired:
logger.warning("ICS sync already in progress for room", room_id=room.id)
return {
"status": SyncStatus.SKIPPED,
"reason": "Sync already in progress",
}
return await self._sync_room_calendar(session, room)
async def _sync_room_calendar(
self, session: AsyncSession, room: Room
) -> SyncResult:
if not room.ics_enabled or not room.ics_url:
return {"status": SyncStatus.SKIPPED, "reason": "ICS not configured"}
try:
if not self._should_sync(room):
return {"status": SyncStatus.SKIPPED, "reason": "Not time to sync yet"}
ics_content = await self.fetch_service.fetch_ics(room.ics_url)
calendar = self.fetch_service.parse_ics(ics_content)
content_hash = hashlib.md5(ics_content.encode()).hexdigest()
if room.ics_last_etag == content_hash:
logger.info("No changes in ICS for room", room_id=room.id)
room_url = f"{settings.UI_BASE_URL}/{room.name}"
events, total_events = self.fetch_service.extract_room_events(
calendar, room.name, room_url
)
return {
"status": SyncStatus.UNCHANGED,
"hash": content_hash,
"events_found": len(events),
"total_events": total_events,
"events_created": 0,
"events_updated": 0,
"events_deleted": 0,
}
# Extract matching events
room_url = f"{settings.UI_BASE_URL}/{room.name}"
events, total_events = self.fetch_service.extract_room_events(
calendar, room.name, room_url
)
sync_result = await self._sync_events_to_database(session, room.id, events)
# Update room sync metadata
await rooms_controller.update(
session,
room,
{
"ics_last_sync": datetime.now(timezone.utc),
"ics_last_etag": content_hash,
},
mutate=False,
)
return {
"status": SyncStatus.SUCCESS,
"hash": content_hash,
"events_found": len(events),
"total_events": total_events,
**sync_result,
}
except Exception as e:
logger.error("Failed to sync ICS for room", room_id=room.id, error=str(e))
return {"status": SyncStatus.ERROR, "error": str(e)}
def _should_sync(self, room: Room) -> bool:
if not room.ics_last_sync:
return True
time_since_sync = datetime.now(timezone.utc) - room.ics_last_sync
return time_since_sync.total_seconds() >= room.ics_fetch_interval
async def _sync_events_to_database(
self, session: AsyncSession, room_id: str, events: list[EventData]
) -> SyncStats:
created = 0
updated = 0
current_ics_uids = []
for event_data in events:
calendar_event = CalendarEvent(room_id=room_id, **event_data)
existing = await calendar_events_controller.get_by_ics_uid(
session, room_id, event_data["ics_uid"]
)
if existing:
updated += 1
else:
created += 1
await calendar_events_controller.upsert(session, calendar_event)
current_ics_uids.append(event_data["ics_uid"])
# Soft delete events that are no longer in calendar
deleted = await calendar_events_controller.soft_delete_missing(
session, room_id, current_ics_uids
)
return {
"events_created": created,
"events_updated": updated,
"events_deleted": deleted,
}
ics_sync_service = ICSSyncService()

View File

@@ -123,12 +123,23 @@ class Settings(BaseSettings):
# Whereby integration
WHEREBY_API_URL: str = "https://api.whereby.dev/v1"
WHEREBY_API_KEY: NonEmptyString | None = None
# Jibri integration
JIBRI_RECORDINGS_PATH: str = "/recordings"
WHEREBY_WEBHOOK_SECRET: str | None = None
AWS_WHEREBY_ACCESS_KEY_ID: str | None = None
AWS_WHEREBY_ACCESS_KEY_SECRET: str | None = None
AWS_PROCESS_RECORDING_QUEUE_URL: str | None = None
SQS_POLLING_TIMEOUT_SECONDS: int = 60
# Jitsi Meet
JITSI_DOMAIN: str = "meet.jit.si"
JITSI_JWT_SECRET: str | None = None
JITSI_WEBHOOK_SECRET: str | None = None
JITSI_APP_ID: str = "reflector"
JITSI_JWT_ISSUER: str = "reflector"
JITSI_JWT_AUDIENCE: str = "jitsi"
# Zulip integration
ZULIP_REALM: str | None = None
ZULIP_API_KEY: str | None = None

View File

@@ -9,12 +9,12 @@ async def export_db(filename: str) -> None:
filename = pathlib.Path(filename).resolve()
settings.DATABASE_URL = f"sqlite:///{filename}"
from reflector.db import get_session_factory
from reflector.db.transcripts import transcripts_controller
from reflector.db import get_database, transcripts
session_factory = get_session_factory()
async with session_factory() as session:
transcripts = await transcripts_controller.get_all(session)
database = get_database()
await database.connect()
transcripts = await database.fetch_all(transcripts.select())
await database.disconnect()
def export_transcript(transcript, output_dir):
for topic in transcript.topics:

View File

@@ -8,12 +8,12 @@ async def export_db(filename: str) -> None:
filename = pathlib.Path(filename).resolve()
settings.DATABASE_URL = f"sqlite:///{filename}"
from reflector.db import get_session_factory
from reflector.db.transcripts import transcripts_controller
from reflector.db import get_database, transcripts
session_factory = get_session_factory()
async with session_factory() as session:
transcripts = await transcripts_controller.get_all(session)
database = get_database()
await database.connect()
transcripts = await database.fetch_all(transcripts.select())
await database.disconnect()
def export_transcript(transcript):
tid = transcript.id

View File

@@ -11,9 +11,6 @@ import time
from pathlib import Path
from typing import Any, Dict, List, Literal
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db import get_session_factory
from reflector.db.transcripts import SourceKind, TranscriptTopic, transcripts_controller
from reflector.logger import logger
from reflector.pipelines.main_file_pipeline import (
@@ -53,7 +50,6 @@ TranscriptId = str
# common interface for every flow: it needs an Entry in db with specific ceremony (file path + status + actual file in file system)
# ideally we want to get rid of it at some point
async def prepare_entry(
session: AsyncSession,
source_path: str,
source_language: str,
target_language: str,
@@ -61,7 +57,6 @@ async def prepare_entry(
file_path = Path(source_path)
transcript = await transcripts_controller.add(
session,
file_path.name,
# note that the real file upload has SourceKind: LIVE for the reason of it's an error
source_kind=SourceKind.FILE,
@@ -83,20 +78,16 @@ async def prepare_entry(
logger.info(f"Copied {source_path} to {upload_path}")
# pipelines expect entity status "uploaded"
await transcripts_controller.update(session, transcript, {"status": "uploaded"})
await transcripts_controller.update(transcript, {"status": "uploaded"})
return transcript.id
# same reason as prepare_entry
async def extract_result_from_entry(
session: AsyncSession,
transcript_id: TranscriptId,
output_path: str,
transcript_id: TranscriptId, output_path: str
) -> None:
post_final_transcript = await transcripts_controller.get_by_id(
session, transcript_id
)
post_final_transcript = await transcripts_controller.get_by_id(transcript_id)
# assert post_final_transcript.status == "ended"
# File pipeline doesn't set status to "ended", only live pipeline does https://github.com/Monadical-SAS/reflector/issues/582
@@ -124,7 +115,6 @@ async def extract_result_from_entry(
async def process_live_pipeline(
session: AsyncSession,
transcript_id: TranscriptId,
):
"""Process transcript_id with transcription and diarization"""
@@ -133,9 +123,7 @@ async def process_live_pipeline(
await live_pipeline_process(transcript_id=transcript_id)
print(f"Processing complete for transcript {transcript_id}", file=sys.stderr)
pre_final_transcript = await transcripts_controller.get_by_id(
session, transcript_id
)
pre_final_transcript = await transcripts_controller.get_by_id(transcript_id)
# assert documented behaviour: after process, the pipeline isn't ended. this is the reason of calling pipeline_post
assert pre_final_transcript.status != "ended"
@@ -172,17 +160,21 @@ async def process(
pipeline: Literal["live", "file"],
output_path: str = None,
):
session_factory = get_session_factory()
async with session_factory() as session:
from reflector.db import get_database
database = get_database()
# db connect is a part of ceremony
await database.connect()
try:
transcript_id = await prepare_entry(
session,
source_path,
source_language,
target_language,
)
pipeline_handlers = {
"live": lambda tid: process_live_pipeline(session, tid),
"live": process_live_pipeline,
"file": process_file_pipeline,
}
@@ -192,7 +184,9 @@ async def process(
await handler(transcript_id)
await extract_result_from_entry(session, transcript_id, output_path)
await extract_result_from_entry(transcript_id, output_path)
finally:
await database.disconnect()
if __name__ == "__main__":

View File

@@ -0,0 +1,17 @@
# Video Platform Abstraction Layer
"""
This module provides an abstraction layer for different video conferencing platforms.
It allows seamless switching between providers (Whereby, Daily.co, etc.) without
changing the core application logic.
"""
from .base import MeetingData, VideoPlatformClient, VideoPlatformConfig
from .registry import get_platform_client, register_platform
__all__ = [
"VideoPlatformClient",
"VideoPlatformConfig",
"MeetingData",
"get_platform_client",
"register_platform",
]

View File

@@ -0,0 +1,82 @@
from abc import ABC, abstractmethod
from datetime import datetime
from typing import Any, Dict, Optional
from pydantic import BaseModel
from reflector.db.rooms import Room
class MeetingData(BaseModel):
"""Standardized meeting data returned by all platforms."""
meeting_id: str
room_name: str
room_url: str
host_room_url: str
platform: str
extra_data: Dict[str, Any] = {} # Platform-specific data
class VideoPlatformConfig(BaseModel):
"""Configuration for a video platform."""
api_key: str
webhook_secret: str
api_url: Optional[str] = None
subdomain: Optional[str] = None
s3_bucket: Optional[str] = None
s3_region: Optional[str] = None
aws_role_arn: Optional[str] = None
aws_access_key_id: Optional[str] = None
aws_access_key_secret: Optional[str] = None
class VideoPlatformClient(ABC):
"""Abstract base class for video platform integrations."""
PLATFORM_NAME: str = ""
def __init__(self, config: VideoPlatformConfig):
self.config = config
@abstractmethod
async def create_meeting(
self, room_name_prefix: str, end_date: datetime, room: Room
) -> MeetingData:
"""Create a new meeting room."""
pass
@abstractmethod
async def get_room_sessions(self, room_name: str) -> Dict[str, Any]:
"""Get session information for a room."""
pass
@abstractmethod
async def delete_room(self, room_name: str) -> bool:
"""Delete a room. Returns True if successful."""
pass
@abstractmethod
async def upload_logo(self, room_name: str, logo_path: str) -> bool:
"""Upload a logo to the room. Returns True if successful."""
pass
@abstractmethod
def verify_webhook_signature(
self, body: bytes, signature: str, timestamp: Optional[str] = None
) -> bool:
"""Verify webhook signature for security."""
pass
def format_recording_config(self, room: Room) -> Dict[str, Any]:
"""Format recording configuration for the platform.
Can be overridden by specific implementations."""
if room.recording_type == "cloud" and self.config.s3_bucket:
return {
"type": room.recording_type,
"bucket": self.config.s3_bucket,
"region": self.config.s3_region,
"trigger": room.recording_trigger,
}
return {"type": room.recording_type}

View File

@@ -0,0 +1,54 @@
"""Factory for creating video platform clients based on configuration."""
from typing import TYPE_CHECKING, Literal, Optional, overload
from reflector.db.rooms import VideoPlatform
from reflector.settings import settings
from .base import VideoPlatformClient, VideoPlatformConfig
from .registry import get_platform_client
if TYPE_CHECKING:
from .jitsi import JitsiClient
from .whereby import WherebyClient
def get_platform_config(platform: str) -> VideoPlatformConfig:
"""Get configuration for a specific platform."""
if platform == VideoPlatform.WHEREBY:
return VideoPlatformConfig(
api_key=settings.WHEREBY_API_KEY or "",
webhook_secret=settings.WHEREBY_WEBHOOK_SECRET or "",
api_url=settings.WHEREBY_API_URL,
s3_bucket=settings.RECORDING_STORAGE_AWS_BUCKET_NAME,
aws_access_key_id=settings.AWS_WHEREBY_ACCESS_KEY_ID,
aws_access_key_secret=settings.AWS_WHEREBY_ACCESS_KEY_SECRET,
)
elif platform == VideoPlatform.JITSI:
return VideoPlatformConfig(
api_key="", # Jitsi uses JWT, no API key
webhook_secret=settings.JITSI_WEBHOOK_SECRET or "",
api_url=f"https://{settings.JITSI_DOMAIN}",
)
else:
raise ValueError(f"Unknown platform: {platform}")
@overload
def create_platform_client(platform: Literal["jitsi"]) -> "JitsiClient": ...
@overload
def create_platform_client(platform: Literal["whereby"]) -> "WherebyClient": ...
def create_platform_client(platform: str) -> VideoPlatformClient:
"""Create a video platform client instance."""
config = get_platform_config(platform)
return get_platform_client(platform, config)
def get_platform_for_room(room_id: Optional[str] = None) -> str:
"""Determine which platform to use for a room based on feature flags."""
# For now, default to whereby since we don't have feature flags yet
return VideoPlatform.WHEREBY

View File

@@ -0,0 +1,4 @@
from .client import JitsiClient, JitsiMeetingData
from .router import router
__all__ = ["JitsiClient", "JitsiMeetingData", "router"]

View File

@@ -0,0 +1,111 @@
import hmac
from datetime import datetime, timezone
from hashlib import sha256
from typing import Any, Dict, Optional
import jwt
from reflector.db.rooms import Room, VideoPlatform
from reflector.settings import settings
from reflector.utils import generate_uuid4
from ..base import MeetingData, VideoPlatformClient
class JitsiMeetingData(MeetingData):
@property
def user_jwt(self) -> str:
return self.extra_data.get("user_jwt", "")
@property
def host_jwt(self) -> str:
return self.extra_data.get("host_jwt", "")
@property
def domain(self) -> str:
return self.extra_data.get("domain", "")
class JitsiClient(VideoPlatformClient):
PLATFORM_NAME = VideoPlatform.JITSI
def _generate_jwt(self, room: str, moderator: bool, exp: datetime) -> str:
if not settings.JITSI_JWT_SECRET:
raise ValueError("JITSI_JWT_SECRET is required for JWT generation")
payload = {
"aud": settings.JITSI_JWT_AUDIENCE,
"iss": settings.JITSI_JWT_ISSUER,
"sub": settings.JITSI_DOMAIN,
"room": room,
"exp": int(exp.timestamp()),
"context": {
"user": {
"name": "Reflector User",
"moderator": moderator,
},
"features": {
"recording": True,
"livestreaming": False,
},
},
}
return jwt.encode(payload, settings.JITSI_JWT_SECRET, algorithm="HS256")
async def create_meeting(
self, room_name_prefix: str, end_date: datetime, room: Room
) -> JitsiMeetingData:
jitsi_room = f"reflector-{room.name}-{generate_uuid4()}"
user_jwt = self._generate_jwt(room=jitsi_room, moderator=False, exp=end_date)
host_jwt = self._generate_jwt(room=jitsi_room, moderator=True, exp=end_date)
room_url = f"https://{settings.JITSI_DOMAIN}/{jitsi_room}?jwt={user_jwt}"
host_room_url = f"https://{settings.JITSI_DOMAIN}/{jitsi_room}?jwt={host_jwt}"
return JitsiMeetingData(
meeting_id=generate_uuid4(),
room_name=jitsi_room,
room_url=room_url,
host_room_url=host_room_url,
platform=self.PLATFORM_NAME,
extra_data={
"user_jwt": user_jwt,
"host_jwt": host_jwt,
"domain": settings.JITSI_DOMAIN,
},
)
async def get_room_sessions(self, room_name: str) -> Dict[str, Any]:
return {
"roomName": room_name,
"sessions": [
{
"sessionId": generate_uuid4(),
"startTime": datetime.now(tz=timezone.utc).isoformat(),
"participants": [],
"isActive": True,
}
],
}
async def delete_room(self, room_name: str) -> bool:
return True
async def upload_logo(self, room_name: str, logo_path: str) -> bool:
return True
def verify_webhook_signature(
self, body: bytes, signature: str, timestamp: Optional[str] = None
) -> bool:
if not signature or not self.config.webhook_secret:
return False
try:
expected = hmac.new(
self.config.webhook_secret.encode(), body, sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
except Exception:
return False

View File

@@ -0,0 +1,165 @@
import hmac
from datetime import datetime
from hashlib import sha256
from typing import Any, Dict
from fastapi import APIRouter, HTTPException, Request
from pydantic import BaseModel
from reflector.db.meetings import meetings_controller
from reflector.settings import settings
try:
from reflector.video_platforms import create_platform_client
except ImportError:
# PyJWT not yet installed, will be added in final task
def create_platform_client(platform: str):
return None
router = APIRouter()
class JitsiWebhookEvent(BaseModel):
event: str
room: str
timestamp: datetime
data: Dict[str, Any] = {}
class JibriRecordingEvent(BaseModel):
room_name: str
recording_file: str
recording_status: str
timestamp: datetime
def verify_jitsi_webhook_signature(body: bytes, signature: str) -> bool:
"""Verify Jitsi webhook signature using HMAC-SHA256."""
if not signature or not settings.JITSI_WEBHOOK_SECRET:
return False
try:
client = create_platform_client("jitsi")
if client is None:
# Fallback verification when platform client not available
expected = hmac.new(
settings.JITSI_WEBHOOK_SECRET.encode(), body, sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
return client.verify_webhook_signature(body, signature)
except Exception:
return False
@router.post("/jitsi/events")
async def jitsi_events_webhook(event: JitsiWebhookEvent, request: Request):
"""
Handle Prosody event-sync webhooks from Jitsi Meet.
Expected event types:
- muc-occupant-joined: participant joined the room
- muc-occupant-left: participant left the room
- jibri-recording-on: recording started
- jibri-recording-off: recording stopped
"""
# Verify webhook signature
body = await request.body()
signature = request.headers.get("x-jitsi-signature", "")
if not verify_jitsi_webhook_signature(body, signature):
raise HTTPException(status_code=401, detail="Invalid webhook signature")
# Find meeting by room name
meeting = await meetings_controller.get_by_room_name(event.room)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
# Handle participant events
if event.event == "muc-occupant-joined":
# Store event and update participant count
await meetings_controller.participant_joined(
meeting.id, {"timestamp": event.timestamp, "data": event.data}
)
current_count = getattr(meeting, "num_clients", 0)
await meetings_controller.update_meeting(
meeting.id, num_clients=current_count + 1
)
elif event.event == "muc-occupant-left":
# Store event and update participant count
await meetings_controller.participant_left(
meeting.id, {"timestamp": event.timestamp, "data": event.data}
)
current_count = getattr(meeting, "num_clients", 0)
await meetings_controller.update_meeting(
meeting.id, num_clients=max(0, current_count - 1)
)
elif event.event == "jibri-recording-on":
# Store recording started event
await meetings_controller.recording_started(
meeting.id, {"timestamp": event.timestamp, "data": event.data}
)
elif event.event == "jibri-recording-off":
# Store recording stopped event
await meetings_controller.recording_stopped(
meeting.id, {"timestamp": event.timestamp, "data": event.data}
)
return {"status": "ok", "event": event.event, "room": event.room}
@router.post("/jibri/recording-complete")
async def jibri_recording_complete(event: JibriRecordingEvent, request: Request):
"""
Handle Jibri recording completion webhook.
This endpoint is called by the Jibri finalize script when a recording
is completed and uploaded to storage.
"""
# Verify webhook signature
body = await request.body()
signature = request.headers.get("x-jitsi-signature", "")
if not verify_jitsi_webhook_signature(body, signature):
raise HTTPException(status_code=401, detail="Invalid webhook signature")
# Find meeting by room name
meeting = await meetings_controller.get_by_room_name(event.room_name)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
# Store recording completion event
await meetings_controller.add_event(
meeting.id,
"recording_completed",
{
"recording_file": event.recording_file,
"recording_status": event.recording_status,
"timestamp": event.timestamp,
},
)
# TODO: Trigger recording processing pipeline
# This is where we would:
# 1. Download the recording file from Jibri storage
# 2. Create a transcript record in the database
# 3. Queue the audio processing tasks (chunking, transcription, etc.)
# 4. Update meeting status to indicate recording is being processed
return {
"status": "ok",
"room_name": event.room_name,
"recording_file": event.recording_file,
"message": "Recording processing queued",
}
@router.get("/jitsi/health")
async def jitsi_health_check():
"""Simple health check endpoint for Jitsi webhook configuration."""
return {
"status": "ok",
"service": "jitsi-webhooks",
"timestamp": datetime.now(tz=timezone.utc).isoformat(),
"webhook_secret_configured": bool(settings.JITSI_WEBHOOK_SECRET),
}

View File

@@ -0,0 +1,3 @@
"""Jitsi-specific worker tasks."""
# Placeholder for Jitsi recording tasks

View File

@@ -0,0 +1,56 @@
from typing import TYPE_CHECKING, Dict, Literal, Type, overload
from .base import VideoPlatformClient, VideoPlatformConfig
if TYPE_CHECKING:
from .jitsi import JitsiClient
from .whereby import WherebyClient
# Registry of available video platforms
_PLATFORMS: Dict[str, Type[VideoPlatformClient]] = {}
def register_platform(name: str, client_class: Type[VideoPlatformClient]):
"""Register a video platform implementation."""
_PLATFORMS[name.lower()] = client_class
@overload
def get_platform_client(
platform: Literal["jitsi"], config: VideoPlatformConfig
) -> "JitsiClient": ...
@overload
def get_platform_client(
platform: Literal["whereby"], config: VideoPlatformConfig
) -> "WherebyClient": ...
def get_platform_client(
platform: str, config: VideoPlatformConfig
) -> VideoPlatformClient:
"""Get a video platform client instance."""
platform_lower = platform.lower()
if platform_lower not in _PLATFORMS:
raise ValueError(f"Unknown video platform: {platform}")
client_class = _PLATFORMS[platform_lower]
return client_class(config)
def get_available_platforms() -> list[str]:
"""Get list of available platform names."""
return list(_PLATFORMS.keys())
# Auto-register built-in platforms
def _register_builtin_platforms():
from .jitsi import JitsiClient
from .whereby import WherebyClient
register_platform("jitsi", JitsiClient)
register_platform("whereby", WherebyClient)
_register_builtin_platforms()

View File

@@ -0,0 +1,6 @@
"""Whereby video platform integration."""
from .client import WherebyClient
from .router import router
__all__ = ["WherebyClient", "router"]

View File

@@ -0,0 +1,113 @@
import hmac
from datetime import datetime
from hashlib import sha256
from typing import Any, Dict, Optional
import httpx
from reflector.db.rooms import Room, VideoPlatform
from reflector.settings import settings
from ..base import MeetingData, VideoPlatformClient
class WherebyClient(VideoPlatformClient):
PLATFORM_NAME = VideoPlatform.WHEREBY
def __init__(self, config):
super().__init__(config)
self.headers = {
"Content-Type": "application/json; charset=utf-8",
"Authorization": f"Bearer {self.config.api_key}",
}
self.timeout = 10
async def create_meeting(
self, room_name_prefix: str, end_date: datetime, room: Room
) -> MeetingData:
data = {
"isLocked": room.is_locked,
"roomNamePrefix": room_name_prefix,
"roomNamePattern": "uuid",
"roomMode": room.room_mode,
"endDate": end_date.isoformat(),
"recording": {
"type": room.recording_type,
"destination": {
"provider": "s3",
"bucket": settings.RECORDING_STORAGE_AWS_BUCKET_NAME,
"accessKeyId": self.config.aws_access_key_id,
"accessKeySecret": self.config.aws_access_key_secret,
"fileFormat": "mp4",
},
"startTrigger": room.recording_trigger,
},
"fields": ["hostRoomUrl"],
}
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.config.api_url}/meetings",
headers=self.headers,
json=data,
timeout=self.timeout,
)
response.raise_for_status()
meeting_data = response.json()
return MeetingData(
meeting_id=meeting_data["meetingId"],
room_name=meeting_data["roomName"],
room_url=meeting_data["roomUrl"],
host_room_url=meeting_data["hostRoomUrl"],
platform=self.PLATFORM_NAME,
extra_data={
"startDate": meeting_data["startDate"],
"endDate": meeting_data["endDate"],
"recording": meeting_data.get("recording", {}),
},
)
async def get_room_sessions(self, room_name: str) -> Dict[str, Any]:
async with httpx.AsyncClient() as client:
response = await client.get(
f"{self.config.api_url}/insights/room-sessions?roomName={room_name}",
headers=self.headers,
timeout=self.timeout,
)
response.raise_for_status()
return response.json()
async def delete_room(self, room_name: str) -> bool:
return True
async def upload_logo(self, room_name: str, logo_path: str) -> bool:
try:
async with httpx.AsyncClient() as client:
with open(logo_path, "rb") as f:
response = await client.put(
f"{self.config.api_url}/rooms{room_name}/theme/logo",
headers={
"Authorization": f"Bearer {self.config.api_key}",
},
timeout=self.timeout,
files={"image": f},
)
response.raise_for_status()
return True
except Exception:
return False
def verify_webhook_signature(
self, body: bytes, signature: str, timestamp: Optional[str] = None
) -> bool:
if not signature or not self.config.webhook_secret:
return False
try:
expected = hmac.new(
self.config.webhook_secret.encode(), body, sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
except Exception:
return False

View File

@@ -68,7 +68,8 @@ async def whereby_webhook(event: WherebyWebhookEvent, request: Request):
raise HTTPException(status_code=404, detail="Meeting not found")
if event.type in ["room.client.joined", "room.client.left"]:
update_data = {"num_clients": event.data["numClients"]}
await meetings_controller.update_meeting(meeting.id, **update_data)
await meetings_controller.update_meeting(
meeting.id, num_clients=event.data["numClients"]
)
return {"status": "ok"}

View File

@@ -0,0 +1,4 @@
"""Whereby-specific worker tasks."""
# Placeholder for Whereby-specific background tasks
# This can be extended with Whereby-specific processing tasks in the future

View File

@@ -0,0 +1,126 @@
from pathlib import Path
from typing import Annotated, Any, Dict, Optional
import structlog
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
import reflector.auth as auth
from reflector.db.transcripts import SourceKind, transcripts_controller
from reflector.jibri_events import JitsiEventParser
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
from reflector.settings import settings
logger = structlog.get_logger(__name__)
router = APIRouter(prefix="/api/v1/jibri", tags=["jibri"])
class RecordingReadyRequest(BaseModel):
session_id: str
path: str # Relative path from recordings directory
meeting_url: str
@router.post("/recording-ready")
async def handle_recording_ready(
request: RecordingReadyRequest,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
) -> Dict[str, Any]:
user_id = user["sub"] if user else None
recordings_base = Path(settings.JIBRI_RECORDINGS_PATH or "/recordings")
recording_path = recordings_base / request.path
if not recording_path.exists():
raise HTTPException(
status_code=404, detail=f"Recording path not found: {request.path}"
)
recording_file = recording_path / "recording.mp4"
events_file = recording_path / "events.jsonl"
if not recording_file.exists():
raise HTTPException(status_code=404, detail="Recording file not found")
# Parse events if available
metadata = {}
participant_count = 0
if events_file.exists():
parser = JitsiEventParser()
metadata = parser.parse_events_file(str(recording_path))
participant_count = len(metadata.get("participants", []))
logger.info(
"Parsed Jibri events",
session_id=request.session_id,
event_count=metadata.get("event_count", 0),
participant_count=participant_count,
)
else:
logger.warning("No events file found", session_id=request.session_id)
metadata = {
"room": {"meeting_url": request.meeting_url, "name": request.session_id},
"participants": [],
"speaker_stats": {},
"event_count": 0,
}
# Create transcript using controller
title = f"Meeting: {metadata.get('room', {}).get('name', request.session_id)}"
transcript = await transcripts_controller.add(
name=title,
source_kind=SourceKind.FILE,
source_language="en",
target_language="en",
user_id=user_id,
)
# Store Jitsi data in appropriate fields
update_data = {}
# Store participants if available
if metadata.get("participants"):
update_data["participants"] = metadata["participants"]
# Store events data (room info, speaker stats, etc.)
update_data["events"] = {
"jitsi_metadata": metadata,
"session_id": request.session_id,
"recording_path": str(recording_path),
"meeting_url": request.meeting_url,
}
if update_data:
await transcripts_controller.update(transcript, update_data)
# Copy recording file to transcript data path
# The pipeline expects the file to be in the transcript's data path
upload_file = transcript.data_path / "upload.mp4"
upload_file.parent.mkdir(parents=True, exist_ok=True)
# Create symlink or copy the file
import shutil
shutil.copy2(recording_file, upload_file)
# Update status to uploaded
await transcripts_controller.update(transcript, {"status": "uploaded"})
# Trigger processing pipeline
task_pipeline_file_process.delay(transcript_id=transcript.id)
logger.info(
"Jibri recording ready for processing",
transcript_id=transcript.id,
session_id=request.session_id,
participant_count=participant_count,
)
return {
"status": "accepted",
"transcript_id": transcript.id,
"session_id": request.session_id,
"events_found": events_file.exists(),
"participant_count": participant_count,
}

View File

@@ -10,7 +10,6 @@ from reflector.db.meetings import (
meeting_consent_controller,
meetings_controller,
)
from reflector.db.rooms import rooms_controller
router = APIRouter()
@@ -42,34 +41,3 @@ async def meeting_audio_consent(
updated_consent = await meeting_consent_controller.upsert(consent)
return {"status": "success", "consent_id": updated_consent.id}
@router.patch("/meetings/{meeting_id}/deactivate")
async def meeting_deactivate(
meeting_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user)],
):
user_id = user["sub"] if user else None
if not user_id:
raise HTTPException(status_code=401, detail="Authentication required")
meeting = await meetings_controller.get_by_id(meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
if not meeting.is_active:
return {"status": "success", "meeting_id": meeting_id}
# Only room owner or meeting creator can deactivate
room = await rooms_controller.get_by_id(meeting.room_id)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
if user_id != room.user_id and user_id != meeting.user_id:
raise HTTPException(
status_code=403, detail="Only the room owner can deactivate meetings"
)
await meetings_controller.update_meeting(meeting_id, is_active=False)
return {"status": "success", "meeting_id": meeting_id}

View File

@@ -1,28 +1,35 @@
import logging
import sqlite3
from datetime import datetime, timedelta, timezone
from enum import Enum
from typing import Annotated, Any, Literal, Optional
from typing import Annotated, Literal, Optional
import asyncpg.exceptions
from fastapi import APIRouter, Depends, HTTPException
from fastapi_pagination import Page
from fastapi_pagination.ext.sqlalchemy import paginate
from fastapi_pagination.ext.databases import apaginate
from pydantic import BaseModel
from redis.exceptions import LockError
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.calendar_events import calendar_events_controller
from reflector.db import get_database
from reflector.db.meetings import meetings_controller
from reflector.db.rooms import rooms_controller
from reflector.redis_cache import RedisAsyncLock
from reflector.services.ics_sync import ics_sync_service
from reflector.db.rooms import VideoPlatform, rooms_controller
from reflector.settings import settings
from reflector.whereby import create_meeting, upload_logo
from reflector.video_platforms.factory import (
create_platform_client,
)
from reflector.worker.webhook import test_webhook
logger = logging.getLogger(__name__)
router = APIRouter()
def parse_datetime_with_timezone(iso_string: str) -> datetime:
dt = datetime.fromisoformat(iso_string)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
class Room(BaseModel):
id: str
@@ -37,11 +44,7 @@ class Room(BaseModel):
recording_type: str
recording_trigger: str
is_shared: bool
ics_url: Optional[str] = None
ics_fetch_interval: int = 300
ics_enabled: bool = False
ics_last_sync: Optional[datetime] = None
ics_last_etag: Optional[str] = None
platform: VideoPlatform = VideoPlatform.WHEREBY
class RoomDetails(Room):
@@ -53,22 +56,10 @@ class Meeting(BaseModel):
id: str
room_name: str
room_url: str
# TODO it's not always present, | None
host_room_url: str
start_date: datetime
end_date: datetime
user_id: str | None = None
room_id: str | None = None
is_locked: bool = False
room_mode: Literal["normal", "group"] = "normal"
recording_type: Literal["none", "local", "cloud"] = "cloud"
recording_trigger: Literal[
"none", "prompt", "automatic", "automatic-2nd-participant"
] = "automatic-2nd-participant"
num_clients: int = 0
is_active: bool = True
calendar_event_id: str | None = None
calendar_metadata: dict[str, Any] | None = None
class CreateRoom(BaseModel):
@@ -83,30 +74,22 @@ class CreateRoom(BaseModel):
is_shared: bool
webhook_url: str
webhook_secret: str
ics_url: Optional[str] = None
ics_fetch_interval: int = 300
ics_enabled: bool = False
platform: VideoPlatform
class UpdateRoom(BaseModel):
name: Optional[str] = None
zulip_auto_post: Optional[bool] = None
zulip_stream: Optional[str] = None
zulip_topic: Optional[str] = None
is_locked: Optional[bool] = None
room_mode: Optional[str] = None
recording_type: Optional[str] = None
recording_trigger: Optional[str] = None
is_shared: Optional[bool] = None
webhook_url: Optional[str] = None
webhook_secret: Optional[str] = None
ics_url: Optional[str] = None
ics_fetch_interval: Optional[int] = None
ics_enabled: Optional[bool] = None
class CreateRoomMeeting(BaseModel):
allow_duplicated: Optional[bool] = False
name: str
zulip_auto_post: bool
zulip_stream: str
zulip_topic: str
is_locked: bool
room_mode: str
recording_type: str
recording_trigger: str
is_shared: bool
webhook_url: str
webhook_secret: str
platform: VideoPlatform
class DeletionStatus(BaseModel):
@@ -121,123 +104,43 @@ class WebhookTestResult(BaseModel):
response_preview: str | None = None
class ICSStatus(BaseModel):
status: Literal["enabled", "disabled"]
last_sync: Optional[datetime] = None
next_sync: Optional[datetime] = None
last_etag: Optional[str] = None
events_count: int = 0
class SyncStatus(str, Enum):
success = "success"
unchanged = "unchanged"
error = "error"
skipped = "skipped"
class ICSSyncResult(BaseModel):
status: SyncStatus
hash: Optional[str] = None
events_found: int = 0
total_events: int = 0
events_created: int = 0
events_updated: int = 0
events_deleted: int = 0
error: Optional[str] = None
reason: Optional[str] = None
class CalendarEventResponse(BaseModel):
id: str
room_id: str
ics_uid: str
title: Optional[str] = None
description: Optional[str] = None
start_time: datetime
end_time: datetime
attendees: Optional[list[dict]] = None
location: Optional[str] = None
last_synced: datetime
created_at: datetime
updated_at: datetime
router = APIRouter()
def parse_datetime_with_timezone(iso_string: str) -> datetime:
"""Parse ISO datetime string and ensure timezone awareness (defaults to UTC if naive)."""
dt = datetime.fromisoformat(iso_string)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
return dt
@router.get("/rooms", response_model=Page[RoomDetails])
async def rooms_list(
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> list[RoomDetails]:
if not user and not settings.PUBLIC_MODE:
raise HTTPException(status_code=401, detail="Not authenticated")
user_id = user["sub"] if user else None
query = await rooms_controller.get_all(
session, user_id=user_id, order_by="-created_at", return_query=True
return await apaginate(
get_database(),
await rooms_controller.get_all(
user_id=user_id, order_by="-created_at", return_query=True
),
)
return await paginate(session, query)
@router.get("/rooms/{room_id}", response_model=RoomDetails)
async def rooms_get(
room_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_id_for_http(session, room_id, user_id=user_id)
room = await rooms_controller.get_by_id_for_http(room_id, user_id=user_id)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
return room
@router.get("/rooms/name/{room_name}", response_model=RoomDetails)
async def rooms_get_by_name(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
# Convert to RoomDetails format (add webhook fields if user is owner)
room_dict = room.__dict__.copy()
if user_id == room.user_id:
# User is owner, include webhook details if available
room_dict["webhook_url"] = getattr(room, "webhook_url", None)
room_dict["webhook_secret"] = getattr(room, "webhook_secret", None)
else:
# Non-owner, hide webhook details
room_dict["webhook_url"] = None
room_dict["webhook_secret"] = None
return RoomDetails(**room_dict)
@router.post("/rooms", response_model=Room)
async def rooms_create(
room: CreateRoom,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
return await rooms_controller.add(
session,
name=room.name,
user_id=user_id,
zulip_auto_post=room.zulip_auto_post,
@@ -250,9 +153,7 @@ async def rooms_create(
is_shared=room.is_shared,
webhook_url=room.webhook_url,
webhook_secret=room.webhook_secret,
ics_url=room.ics_url,
ics_fetch_interval=room.ics_fetch_interval,
ics_enabled=room.ics_enabled,
platform=room.platform,
)
@@ -261,14 +162,13 @@ async def rooms_update(
room_id: str,
info: UpdateRoom,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_id_for_http(session, room_id, user_id=user_id)
room = await rooms_controller.get_by_id_for_http(room_id, user_id=user_id)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
values = info.dict(exclude_unset=True)
await rooms_controller.update(session, room, values)
await rooms_controller.update(room, values)
return room
@@ -276,67 +176,78 @@ async def rooms_update(
async def rooms_delete(
room_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_id(session, room_id, user_id=user_id)
room = await rooms_controller.get_by_id(room_id, user_id=user_id)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
await rooms_controller.remove_by_id(session, room.id, user_id=user_id)
await rooms_controller.remove_by_id(room.id, user_id=user_id)
return DeletionStatus(status="ok")
@router.post("/rooms/{room_name}/meeting", response_model=Meeting)
async def rooms_create_meeting(
room_name: str,
info: CreateRoomMeeting,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
room = await rooms_controller.get_by_name(room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
try:
async with RedisAsyncLock(
f"create_meeting:{room_name}",
timeout=30,
extend_interval=10,
blocking_timeout=5.0,
) as lock:
current_time = datetime.now(timezone.utc)
current_time = datetime.now(timezone.utc)
meeting = await meetings_controller.get_active(room=room, current_time=current_time)
meeting = None
if not info.allow_duplicated:
meeting = await meetings_controller.get_active(
session, room=room, current_time=current_time
)
if meeting is None:
end_date = current_time + timedelta(hours=8)
platform = room.platform
client = create_platform_client(platform)
platform_meeting = await client.create_meeting("", end_date=end_date, room=room)
await client.upload_logo(platform_meeting.room_name, "./images/logo.png")
meeting_data = {
"meeting_id": platform_meeting.meeting_id,
"room_name": platform_meeting.room_name,
"room_url": platform_meeting.room_url,
"host_room_url": platform_meeting.host_room_url,
"start_date": current_time,
"end_date": end_date,
}
try:
meeting = await meetings_controller.create(
id=meeting_data["meeting_id"],
room_name=meeting_data["room_name"],
room_url=meeting_data["room_url"],
host_room_url=meeting_data["host_room_url"],
start_date=meeting_data["start_date"],
end_date=meeting_data["end_date"],
user_id=user_id,
room=room,
)
except (asyncpg.exceptions.UniqueViolationError, sqlite3.IntegrityError):
logger.info(
"Race condition detected for room %s - fetching existing meeting",
room.name,
)
logger.warning(
"Platform meeting %s was created but not used (resource leak) for room %s",
meeting_data["meeting_id"],
room.name,
)
meeting = await meetings_controller.get_active(
room=room, current_time=current_time
)
if meeting is None:
end_date = current_time + timedelta(hours=8)
whereby_meeting = await create_meeting("", end_date=end_date, room=room)
await upload_logo(whereby_meeting["roomName"], "./images/logo.png")
meeting = await meetings_controller.create(
session,
id=whereby_meeting["meetingId"],
room_name=whereby_meeting["roomName"],
room_url=whereby_meeting["roomUrl"],
host_room_url=whereby_meeting["hostRoomUrl"],
start_date=parse_datetime_with_timezone(
whereby_meeting["startDate"]
),
end_date=parse_datetime_with_timezone(whereby_meeting["endDate"]),
room=room,
logger.error(
"Meeting disappeared after race condition for room %s",
room.name,
exc_info=True,
)
raise HTTPException(
status_code=503, detail="Unable to join meeting - please try again"
)
except LockError:
logger.warning("Failed to acquire lock for room %s within timeout", room_name)
raise HTTPException(
status_code=503, detail="Meeting creation in progress, please try again"
)
if user_id != room.user_id:
meeting.host_room_url = ""
@@ -348,12 +259,10 @@ async def rooms_create_meeting(
async def rooms_test_webhook(
room_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
"""Test webhook configuration by sending a sample payload."""
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_id(session, room_id)
room = await rooms_controller.get_by_id(room_id)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
@@ -364,209 +273,3 @@ async def rooms_test_webhook(
result = await test_webhook(room_id)
return WebhookTestResult(**result)
@router.post("/rooms/{room_name}/ics/sync", response_model=ICSSyncResult)
async def rooms_sync_ics(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
if user_id != room.user_id:
raise HTTPException(
status_code=403, detail="Only room owner can trigger ICS sync"
)
if not room.ics_enabled or not room.ics_url:
raise HTTPException(status_code=400, detail="ICS not configured for this room")
result = await ics_sync_service.sync_room_calendar(session, room)
if result["status"] == "error":
raise HTTPException(
status_code=500, detail=result.get("error", "Unknown error")
)
return ICSSyncResult(**result)
@router.get("/rooms/{room_name}/ics/status", response_model=ICSStatus)
async def rooms_ics_status(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
if user_id != room.user_id:
raise HTTPException(
status_code=403, detail="Only room owner can view ICS status"
)
next_sync = None
if room.ics_enabled and room.ics_last_sync:
next_sync = room.ics_last_sync + timedelta(seconds=room.ics_fetch_interval)
events = await calendar_events_controller.get_by_room(
session, room.id, include_deleted=False
)
return ICSStatus(
status="enabled" if room.ics_enabled else "disabled",
last_sync=room.ics_last_sync,
next_sync=next_sync,
last_etag=room.ics_last_etag,
events_count=len(events),
)
@router.get("/rooms/{room_name}/meetings", response_model=list[CalendarEventResponse])
async def rooms_list_meetings(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
events = await calendar_events_controller.get_by_room(
session, room.id, include_deleted=False
)
if user_id != room.user_id:
for event in events:
event.description = None
event.attendees = None
return events
@router.get(
"/rooms/{room_name}/meetings/upcoming", response_model=list[CalendarEventResponse]
)
async def rooms_list_upcoming_meetings(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
minutes_ahead: int = 120,
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
events = await calendar_events_controller.get_upcoming(
session, room.id, minutes_ahead=minutes_ahead
)
if user_id != room.user_id:
for event in events:
event.description = None
event.attendees = None
return events
@router.get("/rooms/{room_name}/meetings/active", response_model=list[Meeting])
async def rooms_list_active_meetings(
room_name: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
current_time = datetime.now(timezone.utc)
meetings = await meetings_controller.get_all_active_for_room(
session, room=room, current_time=current_time
)
# Hide host URLs from non-owners
if user_id != room.user_id:
for meeting in meetings:
meeting.host_room_url = ""
return meetings
@router.get("/rooms/{room_name}/meetings/{meeting_id}", response_model=Meeting)
async def rooms_get_meeting(
room_name: str,
meeting_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
"""Get a single meeting by ID within a specific room."""
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
meeting = await meetings_controller.get_by_id(session, meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
if meeting.room_id != room.id:
raise HTTPException(
status_code=403, detail="Meeting does not belong to this room"
)
if user_id != room.user_id and not room.is_shared:
meeting.host_room_url = ""
return meeting
@router.post("/rooms/{room_name}/meetings/{meeting_id}/join", response_model=Meeting)
async def rooms_join_meeting(
room_name: str,
meeting_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
room = await rooms_controller.get_by_name(session, room_name)
if not room:
raise HTTPException(status_code=404, detail="Room not found")
meeting = await meetings_controller.get_by_id(session, meeting_id)
if not meeting:
raise HTTPException(status_code=404, detail="Meeting not found")
if meeting.room_id != room.id:
raise HTTPException(
status_code=403, detail="Meeting does not belong to this room"
)
if not meeting.is_active:
raise HTTPException(status_code=400, detail="Meeting is not active")
current_time = datetime.now(timezone.utc)
if meeting.end_date <= current_time:
raise HTTPException(status_code=400, detail="Meeting has ended")
# Hide host URL from non-owners
if user_id != room.user_id:
meeting.host_room_url = ""
return meeting

View File

@@ -3,13 +3,12 @@ from typing import Annotated, Literal, Optional
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi_pagination import Page
from fastapi_pagination.ext.sqlalchemy import paginate
from fastapi_pagination.ext.databases import apaginate
from jose import jwt
from pydantic import BaseModel, Field, constr, field_serializer
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db import get_database
from reflector.db.meetings import meetings_controller
from reflector.db.rooms import rooms_controller
from reflector.db.search import (
@@ -150,25 +149,24 @@ async def transcripts_list(
source_kind: SourceKind | None = None,
room_id: str | None = None,
search_term: str | None = None,
session: AsyncSession = Depends(get_session),
):
if not user and not settings.PUBLIC_MODE:
raise HTTPException(status_code=401, detail="Not authenticated")
user_id = user["sub"] if user else None
query = await transcripts_controller.get_all(
session,
user_id=user_id,
source_kind=SourceKind(source_kind) if source_kind else None,
room_id=room_id,
search_term=search_term,
order_by="-created_at",
return_query=True,
return await apaginate(
get_database(),
await transcripts_controller.get_all(
user_id=user_id,
source_kind=SourceKind(source_kind) if source_kind else None,
room_id=room_id,
search_term=search_term,
order_by="-created_at",
return_query=True,
),
)
return await paginate(session, query)
@router.get("/transcripts/search", response_model=SearchResponse)
async def transcripts_search(
@@ -180,7 +178,6 @@ async def transcripts_search(
user: Annotated[
Optional[auth.UserInfo], Depends(auth.current_user_optional)
] = None,
session: AsyncSession = Depends(get_session),
):
"""
Full-text search across transcript titles and content.
@@ -199,7 +196,7 @@ async def transcripts_search(
source_kind=source_kind,
)
results, total = await search_controller.search_transcripts(session, search_params)
results, total = await search_controller.search_transcripts(search_params)
return SearchResponse(
results=results,
@@ -214,11 +211,9 @@ async def transcripts_search(
async def transcripts_create(
info: CreateTranscript,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
return await transcripts_controller.add(
session,
info.name,
source_kind=info.source_kind or SourceKind.LIVE,
source_language=info.source_language,
@@ -338,11 +333,10 @@ class GetTranscriptTopicWithWordsPerSpeaker(GetTranscriptTopic):
async def transcript_get(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
return await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
@@ -351,16 +345,13 @@ async def transcript_update(
transcript_id: str,
info: UpdateTranscript,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
values = info.dict(exclude_unset=True)
updated_transcript = await transcripts_controller.update(
session, transcript, values
)
updated_transcript = await transcripts_controller.update(transcript, values)
return updated_transcript
@@ -368,20 +359,19 @@ async def transcript_update(
async def transcript_delete(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
if transcript.meeting_id:
meeting = await meetings_controller.get_by_id(session, transcript.meeting_id)
room = await rooms_controller.get_by_id(session, meeting.room_id)
meeting = await meetings_controller.get_by_id(transcript.meeting_id)
room = await rooms_controller.get_by_id(meeting.room_id)
if room.is_shared:
user_id = None
await transcripts_controller.remove_by_id(session, transcript.id, user_id=user_id)
await transcripts_controller.remove_by_id(transcript.id, user_id=user_id)
return DeletionStatus(status="ok")
@@ -392,11 +382,10 @@ async def transcript_delete(
async def transcript_get_topics(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
# convert to GetTranscriptTopic
@@ -412,11 +401,10 @@ async def transcript_get_topics(
async def transcript_get_topics_with_words(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
# convert to GetTranscriptTopicWithWords
@@ -434,11 +422,10 @@ async def transcript_get_topics_with_words_per_speaker(
transcript_id: str,
topic_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
# get the topic from the transcript
@@ -457,11 +444,10 @@ async def transcript_post_to_zulip(
topic: str,
include_topics: bool,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")
@@ -481,5 +467,5 @@ async def transcript_post_to_zulip(
if not message_updated:
response = await send_message_to_zulip(stream, topic, content)
await transcripts_controller.update(
session, transcript, {"zulip_message_id": response["id"]}
transcript, {"zulip_message_id": response["id"]}
)

View File

@@ -9,10 +9,8 @@ from typing import Annotated, Optional
import httpx
from fastapi import APIRouter, Depends, HTTPException, Request, Response, status
from jose import jwt
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import AudioWaveform, transcripts_controller
from reflector.settings import settings
from reflector.views.transcripts import ALGORITHM
@@ -34,7 +32,6 @@ async def transcript_get_audio_mp3(
request: Request,
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
token: str | None = None,
):
user_id = user["sub"] if user else None
@@ -51,7 +48,7 @@ async def transcript_get_audio_mp3(
raise unauthorized_exception
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if transcript.audio_location == "storage":
@@ -89,7 +86,7 @@ async def transcript_get_audio_mp3(
return range_requests_response(
request,
transcript.audio_mp3_filename.as_posix(),
transcript.audio_mp3_filename,
content_type="audio/mpeg",
content_disposition=f"attachment; filename={filename}",
)
@@ -99,18 +96,13 @@ async def transcript_get_audio_mp3(
async def transcript_get_audio_waveform(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> AudioWaveform:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if not transcript.audio_waveform_filename.exists():
raise HTTPException(status_code=404, detail="Audio not found")
audio_waveform = transcript.audio_waveform
if not audio_waveform:
raise HTTPException(status_code=404, detail="Audio waveform not found")
return audio_waveform
return transcript.audio_waveform

View File

@@ -8,10 +8,8 @@ from typing import Annotated, Optional
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel, ConfigDict, Field
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import TranscriptParticipant, transcripts_controller
from reflector.views.types import DeletionStatus
@@ -39,11 +37,10 @@ class UpdateParticipant(BaseModel):
async def transcript_get_participants(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> list[Participant]:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if transcript.participants is None:
@@ -60,11 +57,10 @@ async def transcript_add_participant(
transcript_id: str,
participant: CreateParticipant,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> Participant:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
# ensure the speaker is unique
@@ -77,7 +73,7 @@ async def transcript_add_participant(
)
obj = await transcripts_controller.upsert_participant(
session, transcript, TranscriptParticipant(**participant.dict())
transcript, TranscriptParticipant(**participant.dict())
)
return Participant.model_validate(obj)
@@ -87,11 +83,10 @@ async def transcript_get_participant(
transcript_id: str,
participant_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> Participant:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
for p in transcript.participants:
@@ -107,11 +102,10 @@ async def transcript_update_participant(
participant_id: str,
participant: UpdateParticipant,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> Participant:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
# ensure the speaker is unique
@@ -136,7 +130,7 @@ async def transcript_update_participant(
fields = participant.dict(exclude_unset=True)
obj = obj.copy(update=fields)
await transcripts_controller.upsert_participant(session, transcript, obj)
await transcripts_controller.upsert_participant(transcript, obj)
return Participant.model_validate(obj)
@@ -145,11 +139,10 @@ async def transcript_delete_participant(
transcript_id: str,
participant_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> DeletionStatus:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
await transcripts_controller.delete_participant(session, transcript, participant_id)
await transcripts_controller.delete_participant(transcript, participant_id)
return DeletionStatus(status="ok")

View File

@@ -3,10 +3,8 @@ from typing import Annotated, Optional
import celery
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import transcripts_controller
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
@@ -21,11 +19,10 @@ class ProcessStatus(BaseModel):
async def transcript_process(
transcript_id: str,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if transcript.locked:

View File

@@ -8,10 +8,8 @@ from typing import Annotated, Optional
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel, Field
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import transcripts_controller
router = APIRouter()
@@ -38,11 +36,10 @@ async def transcript_assign_speaker(
transcript_id: str,
assignment: SpeakerAssignment,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> SpeakerAssignmentStatus:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if not transcript:
@@ -82,9 +79,7 @@ async def transcript_assign_speaker(
# if the participant does not have a speaker, create one
if participant.speaker is None:
participant.speaker = transcript.find_empty_speaker()
await transcripts_controller.upsert_participant(
session, transcript, participant
)
await transcripts_controller.upsert_participant(transcript, participant)
speaker = participant.speaker
@@ -105,7 +100,6 @@ async def transcript_assign_speaker(
for topic in changed_topics:
transcript.upsert_topic(topic)
await transcripts_controller.update(
session,
transcript,
{
"topics": transcript.topics_dump(),
@@ -120,11 +114,10 @@ async def transcript_merge_speaker(
transcript_id: str,
merge: SpeakerMerge,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
) -> SpeakerAssignmentStatus:
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if not transcript:
@@ -170,7 +163,6 @@ async def transcript_merge_speaker(
for topic in changed_topics:
transcript.upsert_topic(topic)
await transcripts_controller.update(
session,
transcript,
{
"topics": transcript.topics_dump(),

View File

@@ -3,10 +3,8 @@ from typing import Annotated, Optional
import av
from fastapi import APIRouter, Depends, HTTPException, UploadFile
from pydantic import BaseModel
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import transcripts_controller
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
@@ -24,11 +22,10 @@ async def transcript_record_upload(
total_chunks: int,
chunk: UploadFile,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if transcript.locked:
@@ -92,7 +89,7 @@ async def transcript_record_upload(
container.close()
# set the status to "uploaded"
await transcripts_controller.update(session, transcript, {"status": "uploaded"})
await transcripts_controller.update(transcript, {"status": "uploaded"})
# launch a background task to process the file
task_pipeline_file_process.delay(transcript_id=transcript_id)

View File

@@ -1,10 +1,8 @@
from typing import Annotated, Optional
from fastapi import APIRouter, Depends, HTTPException, Request
from sqlalchemy.ext.asyncio import AsyncSession
import reflector.auth as auth
from reflector.db import get_session
from reflector.db.transcripts import transcripts_controller
from .rtc_offer import RtcOffer, rtc_offer_base
@@ -18,11 +16,10 @@ async def transcript_record_webrtc(
params: RtcOffer,
request: Request,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
session: AsyncSession = Depends(get_session),
):
user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id_for_http(
session, transcript_id, user_id=user_id
transcript_id, user_id=user_id
)
if transcript.locked:

View File

@@ -24,7 +24,7 @@ async def transcript_events_websocket(
# user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
):
# user_id = user["sub"] if user else None
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
raise HTTPException(status_code=404, detail="Transcript not found")

View File

@@ -20,7 +20,7 @@ else:
"reflector.worker.healthcheck",
"reflector.worker.process",
"reflector.worker.cleanup",
"reflector.worker.ics_sync",
"reflector.worker.jitsi_events",
]
)
@@ -34,18 +34,14 @@ else:
"task": "reflector.worker.process.process_meetings",
"schedule": float(settings.SQS_POLLING_TIMEOUT_SECONDS),
},
"process_jitsi_events": {
"task": "reflector.worker.jitsi_events.process_jitsi_events",
"schedule": 5.0, # Process every 5 seconds
},
"reprocess_failed_recordings": {
"task": "reflector.worker.process.reprocess_failed_recordings",
"schedule": crontab(hour=5, minute=0), # Midnight EST
},
"sync_all_ics_calendars": {
"task": "reflector.worker.ics_sync.sync_all_ics_calendars",
"schedule": 60.0, # Run every minute to check which rooms need sync
},
"create_upcoming_meetings": {
"task": "reflector.worker.ics_sync.create_upcoming_meetings",
"schedule": 30.0, # Run every 30 seconds to create upcoming meetings
},
}
if settings.PUBLIC_MODE:

View File

@@ -5,21 +5,22 @@ Deletes old anonymous transcripts and their associated meetings/recordings.
Transcripts are the main entry point - any associated data is also removed.
"""
import asyncio
from datetime import datetime, timedelta, timezone
from typing import TypedDict
import structlog
from celery import shared_task
from databases import Database
from pydantic.types import PositiveInt
from sqlalchemy import delete, select
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.asynctask import asynctask
from reflector.db.base import MeetingModel, RecordingModel, TranscriptModel
from reflector.db.transcripts import transcripts_controller
from reflector.db import get_database
from reflector.db.meetings import meetings
from reflector.db.recordings import recordings
from reflector.db.transcripts import transcripts, transcripts_controller
from reflector.settings import settings
from reflector.storage import get_recordings_storage
from reflector.worker.session_decorator import with_session
logger = structlog.get_logger(__name__)
@@ -34,49 +35,51 @@ class CleanupStats(TypedDict):
async def delete_single_transcript(
session: AsyncSession, transcript_data: dict, stats: CleanupStats
db: Database, transcript_data: dict, stats: CleanupStats
):
transcript_id = transcript_data["id"]
meeting_id = transcript_data["meeting_id"]
recording_id = transcript_data["recording_id"]
try:
if meeting_id:
await session.execute(
delete(MeetingModel).where(MeetingModel.id == meeting_id)
)
stats["meetings_deleted"] += 1
logger.info("Deleted associated meeting", meeting_id=meeting_id)
async with db.transaction(isolation="serializable"):
if meeting_id:
await db.execute(meetings.delete().where(meetings.c.id == meeting_id))
stats["meetings_deleted"] += 1
logger.info("Deleted associated meeting", meeting_id=meeting_id)
if recording_id:
result = await session.execute(
select(RecordingModel).where(RecordingModel.id == recording_id)
)
recording = result.mappings().first()
if recording:
try:
await get_recordings_storage().delete_file(recording["object_key"])
except Exception as storage_error:
logger.warning(
"Failed to delete recording from storage",
recording_id=recording_id,
object_key=recording["object_key"],
error=str(storage_error),
if recording_id:
recording = await db.fetch_one(
recordings.select().where(recordings.c.id == recording_id)
)
if recording:
try:
await get_recordings_storage().delete_file(
recording["object_key"]
)
except Exception as storage_error:
logger.warning(
"Failed to delete recording from storage",
recording_id=recording_id,
object_key=recording["object_key"],
error=str(storage_error),
)
await db.execute(
recordings.delete().where(recordings.c.id == recording_id)
)
stats["recordings_deleted"] += 1
logger.info(
"Deleted associated recording", recording_id=recording_id
)
await session.execute(
delete(RecordingModel).where(RecordingModel.id == recording_id)
)
stats["recordings_deleted"] += 1
logger.info("Deleted associated recording", recording_id=recording_id)
await transcripts_controller.remove_by_id(session, transcript_id)
stats["transcripts_deleted"] += 1
logger.info(
"Deleted transcript",
transcript_id=transcript_id,
created_at=transcript_data["created_at"].isoformat(),
)
await transcripts_controller.remove_by_id(transcript_id)
stats["transcripts_deleted"] += 1
logger.info(
"Deleted transcript",
transcript_id=transcript_id,
created_at=transcript_data["created_at"].isoformat(),
)
except Exception as e:
error_msg = f"Failed to delete transcript {transcript_id}: {str(e)}"
logger.error(error_msg, exc_info=e)
@@ -84,30 +87,18 @@ async def delete_single_transcript(
async def cleanup_old_transcripts(
session: AsyncSession, cutoff_date: datetime, stats: CleanupStats
db: Database, cutoff_date: datetime, stats: CleanupStats
):
"""Delete old anonymous transcripts and their associated recordings/meetings."""
query = select(
TranscriptModel.id,
TranscriptModel.meeting_id,
TranscriptModel.recording_id,
TranscriptModel.created_at,
).where(
(TranscriptModel.created_at < cutoff_date) & (TranscriptModel.user_id.is_(None))
query = transcripts.select().where(
(transcripts.c.created_at < cutoff_date) & (transcripts.c.user_id.is_(None))
)
result = await session.execute(query)
old_transcripts = result.mappings().all()
old_transcripts = await db.fetch_all(query)
logger.info(f"Found {len(old_transcripts)} old transcripts to delete")
for transcript_data in old_transcripts:
try:
await delete_single_transcript(session, transcript_data, stats)
except Exception as e:
error_msg = f"Failed to delete transcript {transcript_data['id']}: {str(e)}"
logger.error(error_msg, exc_info=e)
stats["errors"].append(error_msg)
await delete_single_transcript(db, transcript_data, stats)
def log_cleanup_results(stats: CleanupStats):
@@ -127,7 +118,6 @@ def log_cleanup_results(stats: CleanupStats):
async def cleanup_old_public_data(
session: AsyncSession,
days: PositiveInt | None = None,
) -> CleanupStats | None:
if days is None:
@@ -150,7 +140,8 @@ async def cleanup_old_public_data(
"errors": [],
}
await cleanup_old_transcripts(session, cutoff_date, stats)
db = get_database()
await cleanup_old_transcripts(db, cutoff_date, stats)
log_cleanup_results(stats)
return stats
@@ -161,6 +152,5 @@ async def cleanup_old_public_data(
retry_kwargs={"max_retries": 3, "countdown": 300},
)
@asynctask
@with_session
async def cleanup_old_public_data_task(session: AsyncSession, days: int | None = None):
await cleanup_old_public_data(session, days=days)
def cleanup_old_public_data_task(days: int | None = None):
asyncio.run(cleanup_old_public_data(days=days))

View File

@@ -1,186 +0,0 @@
from datetime import datetime, timedelta, timezone
import structlog
from celery import shared_task
from celery.utils.log import get_task_logger
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.asynctask import asynctask
from reflector.db.calendar_events import calendar_events_controller
from reflector.db.meetings import meetings_controller
from reflector.db.rooms import rooms_controller
from reflector.redis_cache import RedisAsyncLock
from reflector.services.ics_sync import SyncStatus, ics_sync_service
from reflector.whereby import create_meeting, upload_logo
from reflector.worker.session_decorator import with_session
logger = structlog.wrap_logger(get_task_logger(__name__))
@shared_task
@asynctask
@with_session
async def sync_room_ics(session: AsyncSession, room_id: str):
try:
room = await rooms_controller.get_by_id(session, room_id)
if not room:
logger.warning("Room not found for ICS sync", room_id=room_id)
return
if not room.ics_enabled or not room.ics_url:
logger.debug("ICS not enabled for room", room_id=room_id)
return
logger.info("Starting ICS sync for room", room_id=room_id, room_name=room.name)
result = await ics_sync_service.sync_room_calendar(session, room)
if result["status"] == SyncStatus.SUCCESS:
logger.info(
"ICS sync completed successfully",
room_id=room_id,
events_found=result.get("events_found", 0),
events_created=result.get("events_created", 0),
events_updated=result.get("events_updated", 0),
events_deleted=result.get("events_deleted", 0),
)
elif result["status"] == SyncStatus.UNCHANGED:
logger.debug("ICS content unchanged", room_id=room_id)
elif result["status"] == SyncStatus.ERROR:
logger.error("ICS sync failed", room_id=room_id, error=result.get("error"))
else:
logger.debug(
"ICS sync skipped", room_id=room_id, reason=result.get("reason")
)
except Exception as e:
logger.error("Unexpected error during ICS sync", room_id=room_id, error=str(e))
@shared_task
@asynctask
@with_session
async def sync_all_ics_calendars(session: AsyncSession):
try:
logger.info("Starting sync for all ICS-enabled rooms")
ics_enabled_rooms = await rooms_controller.get_ics_enabled(session)
logger.info(f"Found {len(ics_enabled_rooms)} rooms with ICS enabled")
for room in ics_enabled_rooms:
if not _should_sync(room):
logger.debug("Skipping room, not time to sync yet", room_id=room.id)
continue
sync_room_ics.delay(room.id)
logger.info("Queued sync tasks for all eligible rooms")
except Exception as e:
logger.error("Error in sync_all_ics_calendars", error=str(e))
def _should_sync(room) -> bool:
if not room.ics_last_sync:
return True
time_since_sync = datetime.now(timezone.utc) - room.ics_last_sync
return time_since_sync.total_seconds() >= room.ics_fetch_interval
MEETING_DEFAULT_DURATION = timedelta(hours=1)
async def create_upcoming_meetings_for_event(
session: AsyncSession, event, create_window, room_id, room
):
if event.start_time <= create_window:
return
existing_meeting = await meetings_controller.get_by_calendar_event(
session, event.id
)
if existing_meeting:
return
logger.info(
"Pre-creating meeting for calendar event",
room_id=room_id,
event_id=event.id,
event_title=event.title,
)
try:
end_date = event.end_time or (event.start_time + MEETING_DEFAULT_DURATION)
whereby_meeting = await create_meeting(
"",
end_date=end_date,
room=room,
)
await upload_logo(whereby_meeting["roomName"], "./images/logo.png")
meeting = await meetings_controller.create(
session,
id=whereby_meeting["meetingId"],
room_name=whereby_meeting["roomName"],
room_url=whereby_meeting["roomUrl"],
host_room_url=whereby_meeting["hostRoomUrl"],
start_date=datetime.fromisoformat(whereby_meeting["startDate"]),
end_date=datetime.fromisoformat(whereby_meeting["endDate"]),
room=room,
calendar_event_id=event.id,
calendar_metadata={
"title": event.title,
"description": event.description,
"attendees": event.attendees,
},
)
logger.info(
"Meeting pre-created successfully",
meeting_id=meeting.id,
event_id=event.id,
)
except Exception as e:
logger.error(
"Failed to pre-create meeting",
room_id=room_id,
event_id=event.id,
error=str(e),
)
@shared_task
@asynctask
@with_session
async def create_upcoming_meetings(session: AsyncSession):
async with RedisAsyncLock("create_upcoming_meetings", skip_if_locked=True) as lock:
if not lock.acquired:
logger.warning(
"Another worker is already creating upcoming meetings, skipping"
)
return
try:
logger.info("Starting creation of upcoming meetings")
ics_enabled_rooms = await rooms_controller.get_ics_enabled(session)
now = datetime.now(timezone.utc)
create_window = now - timedelta(minutes=6)
for room in ics_enabled_rooms:
events = await calendar_events_controller.get_upcoming(
session,
room.id,
minutes_ahead=7,
)
for event in events:
await create_upcoming_meetings_for_event(
session, event, create_window, room.id, room
)
logger.info("Completed pre-creation check for upcoming meetings")
except Exception as e:
logger.error("Error in create_upcoming_meetings", error=str(e))

View File

@@ -0,0 +1,281 @@
"""
Celery tasks for consuming Jitsi events from Redis queues.
"""
import json
from datetime import datetime
from typing import Any, Dict
import redis
import structlog
from sqlalchemy.orm import Session
from reflector.database import get_db_sync
from reflector.models import Meeting, Transcript
from reflector.settings import settings
from reflector.worker.app import app
logger = structlog.get_logger(__name__)
class JitsiEventProcessor:
"""Process Jitsi events from Redis queues."""
def __init__(self):
self.redis_client = redis.Redis(
host=settings.REDIS_HOST or "redis",
port=settings.REDIS_PORT or 6379,
decode_responses=True,
)
self.participants = {} # room_name -> {jid: participant_info}
self.speaker_stats = {} # room_name -> {jid: stats}
def process_participant_joined(self, data: Dict[str, Any], db: Session):
"""Track participant joining a room."""
room_name = data["room_name"]
participant = {
"jid": data["participant_jid"],
"nick": data["participant_nick"],
"id": data["participant_id"],
"is_moderator": data.get("is_moderator", False),
"joined_at": datetime.now(),
}
if room_name not in self.participants:
self.participants[room_name] = {}
self.participants[room_name][participant["jid"]] = participant
logger.info(
"Participant joined",
room=room_name,
participant=participant["nick"],
total_participants=len(self.participants[room_name]),
)
# Update meeting in database if exists
meeting = (
db.query(Meeting)
.filter(
Meeting.room_name == room_name,
Meeting.status.in_(["active", "pending"]),
)
.first()
)
if meeting:
# Store participant info in meeting metadata
metadata = meeting.metadata or {}
if "participants" not in metadata:
metadata["participants"] = []
metadata["participants"].append(
{
"id": participant["id"],
"name": participant["nick"],
"joined_at": participant["joined_at"].isoformat(),
"is_moderator": participant["is_moderator"],
}
)
meeting.metadata = metadata
db.commit()
def process_participant_left(self, data: Dict[str, Any], db: Session):
"""Track participant leaving a room."""
room_name = data["room_name"]
participant_jid = data["participant_jid"]
if room_name in self.participants:
if participant_jid in self.participants[room_name]:
participant = self.participants[room_name][participant_jid]
participant["left_at"] = datetime.now()
logger.info(
"Participant left",
room=room_name,
participant=participant["nick"],
duration=(
participant["left_at"] - participant["joined_at"]
).total_seconds(),
)
# Update meeting in database
meeting = (
db.query(Meeting)
.filter(
Meeting.room_name == room_name,
Meeting.status.in_(["active", "pending"]),
)
.first()
)
if meeting and meeting.metadata and "participants" in meeting.metadata:
for p in meeting.metadata["participants"]:
if p["id"] == participant["id"]:
p["left_at"] = participant["left_at"].isoformat()
break
db.commit()
def process_speaker_stats(self, data: Dict[str, Any], db: Session):
"""Update speaker statistics."""
room_name = data["room_jid"].split("@")[0]
self.speaker_stats[room_name] = data["stats"]
logger.debug(
"Speaker stats updated", room=room_name, speakers=len(data["stats"])
)
def process_recording_completed(self, data: Dict[str, Any], db: Session):
"""Process completed recording with all metadata."""
room_name = data["room_name"]
meeting_url = data["meeting_url"]
recording_path = data["recording_path"]
recording_file = data["recording_file"]
logger.info(
"Recording completed", room=room_name, url=meeting_url, path=recording_path
)
# Get participant data for this room
participants = self.participants.get(room_name, {})
speaker_stats = self.speaker_stats.get(room_name, {})
# Create transcript record with full metadata
transcript = Transcript(
title=f"Recording: {room_name}",
source_url=meeting_url,
metadata={
"jitsi": {
"room_name": room_name,
"meeting_url": meeting_url,
"recording_path": recording_path,
"participants": [
{
"id": p["id"],
"name": p["nick"],
"joined_at": p["joined_at"].isoformat(),
"left_at": p.get("left_at", datetime.now()).isoformat(),
"is_moderator": p["is_moderator"],
"speaking_time": speaker_stats.get(p["jid"], {}).get(
"total_time", 0
),
}
for p in participants.values()
],
"speaker_stats": speaker_stats,
}
},
status="pending",
)
db.add(transcript)
db.commit()
# Trigger processing pipeline
from reflector.pipelines.main_transcript_pipeline import TranscriptMainPipeline
pipeline = TranscriptMainPipeline()
pipeline.create(transcript.id, recording_file)
# Clean up room data
self.participants.pop(room_name, None)
self.speaker_stats.pop(room_name, None)
logger.info(
"Transcript created",
transcript_id=transcript.id,
participants=len(participants),
has_speaker_stats=bool(speaker_stats),
)
processor = JitsiEventProcessor()
@app.task(name="reflector.worker.jitsi_events.process_jitsi_events")
def process_jitsi_events():
"""
Process Jitsi events from Redis queue.
This should be called periodically by Celery Beat.
"""
db = next(get_db_sync())
processed = 0
try:
# Process up to 100 events per run
for _ in range(100):
# Pop event from queue (blocking with 1 second timeout)
event_data = processor.redis_client.brpop(
["jitsi:events:queue", "jitsi:recordings:queue"], timeout=1
)
if not event_data:
break
queue_name, event_json = event_data
event = json.loads(event_json)
event_type = event["type"]
data = event["data"]
logger.debug(f"Processing event: {event_type}")
# Route to appropriate processor
if event_type == "participant_joined":
processor.process_participant_joined(data, db)
elif event_type == "participant_left":
processor.process_participant_left(data, db)
elif event_type == "speaker_stats_update":
processor.process_speaker_stats(data, db)
elif event_type == "recording_completed":
processor.process_recording_completed(data, db)
else:
logger.warning(f"Unknown event type: {event_type}")
processed += 1
if processed > 0:
logger.info(f"Processed {processed} Jitsi events")
except Exception as e:
logger.error(f"Error processing Jitsi events: {e}")
raise
finally:
db.close()
return processed
@app.task(name="reflector.worker.jitsi_events.consume_jitsi_stream")
def consume_jitsi_stream():
"""
Alternative: Use Redis Streams for more reliable event processing.
Redis Streams provide better guarantees and consumer groups.
"""
db = next(get_db_sync())
try:
# Read from stream with consumer group
events = processor.redis_client.xreadgroup(
"reflector-consumers",
"worker-1",
{"jitsi:events": ">"},
count=10,
block=1000,
)
for stream_name, messages in events:
for message_id, data in messages:
event = json.loads(data[b"event"])
# Process event...
# Acknowledge message
processor.redis_client.xack(
stream_name, "reflector-consumers", message_id
)
except Exception as e:
logger.error(f"Error consuming stream: {e}")
raise
finally:
db.close()

View File

@@ -9,8 +9,6 @@ import structlog
from celery import shared_task
from celery.utils.log import get_task_logger
from pydantic import ValidationError
from redis.exceptions import LockError
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db.meetings import meetings_controller
from reflector.db.recordings import Recording, recordings_controller
@@ -18,10 +16,8 @@ from reflector.db.rooms import rooms_controller
from reflector.db.transcripts import SourceKind, transcripts_controller
from reflector.pipelines.main_file_pipeline import task_pipeline_file_process
from reflector.pipelines.main_live_pipeline import asynctask
from reflector.redis_cache import get_redis_client
from reflector.settings import settings
from reflector.whereby import get_room_sessions
from reflector.worker.session_decorator import with_session
from reflector.video_platforms.factory import create_platform_client
logger = structlog.wrap_logger(get_task_logger(__name__))
@@ -77,39 +73,30 @@ def process_messages():
@shared_task
@asynctask
@with_session
async def process_recording(session: AsyncSession, bucket_name: str, object_key: str):
async def process_recording(bucket_name: str, object_key: str):
logger.info("Processing recording: %s/%s", bucket_name, object_key)
# extract a guid and a datetime from the object key
room_name = f"/{object_key[:36]}"
recorded_at = parse_datetime_with_timezone(object_key[37:57])
meeting = await meetings_controller.get_by_room_name(session, room_name)
if not meeting:
logger.warning("Room not found, may be deleted ?", room_name=room_name)
return
meeting = await meetings_controller.get_by_room_name(room_name)
room = await rooms_controller.get_by_id(meeting.room_id)
room = await rooms_controller.get_by_id(session, meeting.room_id)
recording = await recordings_controller.get_by_object_key(
session, bucket_name, object_key
)
recording = await recordings_controller.get_by_object_key(bucket_name, object_key)
if not recording:
recording = await recordings_controller.create(
session,
Recording(
bucket_name=bucket_name,
object_key=object_key,
recorded_at=recorded_at,
meeting_id=meeting.id,
),
)
)
transcript = await transcripts_controller.get_by_recording_id(session, recording.id)
transcript = await transcripts_controller.get_by_recording_id(recording.id)
if transcript:
await transcripts_controller.update(
session,
transcript,
{
"topics": [],
@@ -117,7 +104,6 @@ async def process_recording(session: AsyncSession, bucket_name: str, object_key:
)
else:
transcript = await transcripts_controller.add(
session,
"",
source_kind=SourceKind.ROOM,
source_language="en",
@@ -153,110 +139,44 @@ async def process_recording(session: AsyncSession, bucket_name: str, object_key:
finally:
container.close()
await transcripts_controller.update(session, transcript, {"status": "uploaded"})
await transcripts_controller.update(transcript, {"status": "uploaded"})
task_pipeline_file_process.delay(transcript_id=transcript.id)
@shared_task
@asynctask
@with_session
async def process_meetings(session: AsyncSession):
"""
Checks which meetings are still active and deactivates those that have ended.
Deactivation logic:
- Active sessions: Keep meeting active regardless of scheduled time
- No active sessions:
* Calendar meetings:
- If previously used (had sessions): Deactivate immediately
- If never used: Keep active until scheduled end time, then deactivate
* On-the-fly meetings: Deactivate immediately (created when someone joins,
so no sessions means everyone left)
Uses distributed locking to prevent race conditions when multiple workers
process the same meeting simultaneously.
"""
async def process_meetings():
logger.info("Processing meetings")
meetings = await meetings_controller.get_all_active(session)
current_time = datetime.now(timezone.utc)
redis_client = get_redis_client()
processed_count = 0
skipped_count = 0
meetings = await meetings_controller.get_all_active()
for meeting in meetings:
logger_ = logger.bind(meeting_id=meeting.id, room_name=meeting.room_name)
lock_key = f"meeting_process_lock:{meeting.id}"
lock = redis_client.lock(lock_key, timeout=120)
try:
if not lock.acquire(blocking=False):
logger_.debug("Meeting is being processed by another worker, skipping")
skipped_count += 1
continue
# Process the meeting
should_deactivate = False
end_date = meeting.end_date
if end_date.tzinfo is None:
end_date = end_date.replace(tzinfo=timezone.utc)
# This API call could be slow, extend lock if needed
response = await get_room_sessions(meeting.room_name)
try:
# Extend lock after slow operation to ensure we still hold it
lock.extend(120, replace_ttl=True)
except LockError:
logger_.warning("Lost lock for meeting, skipping")
continue
room_sessions = response.get("results", [])
has_active_sessions = room_sessions and any(
rs["endedAt"] is None for rs in room_sessions
)
has_had_sessions = bool(room_sessions)
if has_active_sessions:
logger_.debug("Meeting still has active sessions, keep it")
elif has_had_sessions:
should_deactivate = True
logger_.info("Meeting ended - all participants left")
elif current_time > end_date:
should_deactivate = True
logger_.info(
"Meeting deactivated - scheduled time ended with no participants",
is_active = False
end_date = meeting.end_date
if end_date.tzinfo is None:
end_date = end_date.replace(tzinfo=timezone.utc)
if end_date > datetime.now(timezone.utc):
# Get room sessions using platform client
platform = getattr(meeting, "platform", "whereby")
client = create_platform_client(platform)
if client:
response = await client.get_room_sessions(meeting.room_name)
room_sessions = response.get("results", [])
is_active = not room_sessions or any(
rs["endedAt"] is None for rs in room_sessions
)
else:
logger_.debug("Meeting not yet started, keep it")
# Fallback: assume meeting is still active if we can't check
is_active = True
if not is_active:
await meetings_controller.update_meeting(meeting.id, is_active=False)
logger.info("Meeting %s is deactivated", meeting.id)
if should_deactivate:
await meetings_controller.update_meeting(
session, meeting.id, is_active=False
)
logger_.info("Meeting is deactivated")
processed_count += 1
except Exception:
logger_.error("Error processing meeting", exc_info=True)
finally:
try:
lock.release()
except LockError:
pass # Lock already released or expired
logger.info(
"Processed meetings finished",
processed_count=processed_count,
skipped_count=skipped_count,
)
logger.info("Processed meetings")
@shared_task
@asynctask
@with_session
async def reprocess_failed_recordings(session: AsyncSession):
async def reprocess_failed_recordings():
"""
Find recordings in the S3 bucket and check if they have proper transcriptions.
If not, requeue them for processing.
@@ -287,7 +207,7 @@ async def reprocess_failed_recordings(session: AsyncSession):
continue
recording = await recordings_controller.get_by_object_key(
session, bucket_name, object_key
bucket_name, object_key
)
if not recording:
logger.info(f"Queueing recording for processing: {object_key}")
@@ -298,12 +218,10 @@ async def reprocess_failed_recordings(session: AsyncSession):
transcript = None
try:
transcript = await transcripts_controller.get_by_recording_id(
session, recording.id
recording.id
)
except ValidationError:
await transcripts_controller.remove_by_recording_id(
session, recording.id
)
await transcripts_controller.remove_by_recording_id(recording.id)
logger.warning(
f"Removed invalid transcript for recording: {recording.id}"
)

View File

@@ -1,109 +0,0 @@
"""
Session management decorator for async worker tasks.
This decorator ensures that all worker tasks have a properly managed database session
that stays open for the entire duration of the task execution.
"""
import functools
from typing import Any, Callable, TypeVar
from celery import current_task
from reflector.db import get_session_factory
from reflector.db.transcripts import transcripts_controller
from reflector.logger import logger
F = TypeVar("F", bound=Callable[..., Any])
def with_session(func: F) -> F:
"""
Decorator that provides an AsyncSession as the first argument to the decorated function.
This should be used AFTER the @asynctask decorator on Celery tasks to ensure
proper session management throughout the task execution.
Example:
@shared_task
@asynctask
@with_session
async def my_task(session: AsyncSession, arg1: str, arg2: int):
# session is automatically provided and managed
result = await some_controller.get_by_id(session, arg1)
...
"""
@functools.wraps(func)
async def wrapper(*args, **kwargs):
session_factory = get_session_factory()
async with session_factory() as session:
async with session.begin():
# Pass session as first argument to the decorated function
return await func(session, *args, **kwargs)
return wrapper
def with_session_and_transcript(func: F) -> F:
"""
Decorator that provides both an AsyncSession and a Transcript to the decorated function.
This decorator:
1. Extracts transcript_id from kwargs
2. Creates and manages a database session
3. Fetches the transcript using the session
4. Creates an enhanced logger with Celery task context
5. Passes session, transcript, and logger to the decorated function
This should be used AFTER the @asynctask decorator on Celery tasks.
Example:
@shared_task
@asynctask
@with_session_and_transcript
async def my_task(session: AsyncSession, transcript: Transcript, logger: Logger, arg1: str):
# session, transcript, and logger are automatically provided
room = await rooms_controller.get_by_id(session, transcript.room_id)
...
"""
@functools.wraps(func)
async def wrapper(*args, **kwargs):
transcript_id = kwargs.pop("transcript_id", None)
if not transcript_id:
raise ValueError(
"transcript_id is required for @with_session_and_transcript"
)
session_factory = get_session_factory()
async with session_factory() as session:
async with session.begin():
# Fetch the transcript
transcript = await transcripts_controller.get_by_id(
session, transcript_id
)
if not transcript:
raise Exception(f"Transcript {transcript_id} not found")
# Create enhanced logger with Celery task context
tlogger = logger.bind(transcript_id=transcript.id)
if current_task:
tlogger = tlogger.bind(
task_id=current_task.request.id,
task_name=current_task.name,
worker_hostname=current_task.request.hostname,
task_retries=current_task.request.retries,
transcript_id=transcript_id,
)
try:
# Pass session, transcript, and logger to the decorated function
return await func(
session, transcript=transcript, logger=tlogger, *args, **kwargs
)
except Exception:
tlogger.exception("Error in task execution")
raise
return wrapper

View File

@@ -10,14 +10,12 @@ import httpx
import structlog
from celery import shared_task
from celery.utils.log import get_task_logger
from sqlalchemy.ext.asyncio import AsyncSession
from reflector.db.rooms import rooms_controller
from reflector.db.transcripts import transcripts_controller
from reflector.pipelines.main_live_pipeline import asynctask
from reflector.settings import settings
from reflector.utils.webvtt import topics_to_webvtt
from reflector.worker.session_decorator import with_session
logger = structlog.wrap_logger(get_task_logger(__name__))
@@ -41,13 +39,11 @@ def generate_webhook_signature(payload: bytes, secret: str, timestamp: str) -> s
retry_backoff_max=3600, # Max 1 hour between retries
)
@asynctask
@with_session
async def send_transcript_webhook(
self,
transcript_id: str,
room_id: str,
event_id: str,
session: AsyncSession,
):
log = logger.bind(
transcript_id=transcript_id,
@@ -57,12 +53,12 @@ async def send_transcript_webhook(
try:
# Fetch transcript and room
transcript = await transcripts_controller.get_by_id(session, transcript_id)
transcript = await transcripts_controller.get_by_id(transcript_id)
if not transcript:
log.error("Transcript not found, skipping webhook")
return
room = await rooms_controller.get_by_id(session, room_id)
room = await rooms_controller.get_by_id(room_id)
if not room:
log.error("Room not found, skipping webhook")
return

Some files were not shown because too many files have changed in this diff Show More