* fix: refactor modal API key configuration for better separation of concerns - Split generic MODAL_API_KEY into service-specific keys: - TRANSCRIPT_API_KEY for transcription service - DIARIZATION_API_KEY for diarization service - TRANSLATE_API_KEY for translation service - Remove deprecated *_MODAL_API_KEY settings - Add proper validation to ensure URLs are set when using modal processors - Update README with new configuration format BREAKING CHANGE: Configuration keys have changed. Update your .env file: - TRANSCRIPT_MODAL_API_KEY → TRANSCRIPT_API_KEY - LLM_MODAL_API_KEY → (removed, use TRANSCRIPT_API_KEY) - Add DIARIZATION_API_KEY and TRANSLATE_API_KEY if using those services * fix: update Modal backend configuration to use service-specific API keys - Changed from generic MODAL_API_KEY to service-specific keys: - TRANSCRIPT_MODAL_API_KEY for transcription - DIARIZATION_MODAL_API_KEY for diarization - TRANSLATION_MODAL_API_KEY for translation - Updated audio_transcript_modal.py and audio_diarization_modal.py to use modal_api_key parameter - Updated documentation in README.md, CLAUDE.md, and env.example * feat: implement auto/modal pattern for translation processor - Created TranscriptTranslatorAutoProcessor following the same pattern as transcript/diarization - Created TranscriptTranslatorModalProcessor with TRANSLATION_MODAL_API_KEY support - Added TRANSLATION_BACKEND setting (defaults to "modal") - Updated all imports to use TranscriptTranslatorAutoProcessor instead of TranscriptTranslatorProcessor - Updated env.example with TRANSLATION_BACKEND and TRANSLATION_MODAL_API_KEY - Updated test to expect TranscriptTranslatorModalProcessor name - All tests passing * refactor: simplify transcript_translator base class to match other processors - Moved all implementation from base class to modal processor - Base class now only defines abstract _translate method - Follows the same minimal pattern as audio_diarization and audio_transcript base classes - Updated test mock to use _translate instead of get_translation - All tests passing * chore: clean up settings and improve type annotations - Remove deprecated generic API key variables from settings - Add comments to group Modal-specific settings - Improve type annotations for modal_api_key parameters * fix: typing * fix: passing key to openai * test: fix rtc test failing due to change on transcript It also correctly setup database from sqlite, in case our configuration is setup to postgres. * ci: deactivate translation backend by default * test: fix modal->mock * refactor: implementing igor review, mock to passthrough
5.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Reflector is an AI-powered audio transcription and meeting analysis platform with real-time processing capabilities. The system consists of:
- Frontend: Next.js 14 React application (
www/) with Chakra UI, real-time WebSocket integration - Backend: Python FastAPI server (
server/) with async database operations and background processing - Processing: GPU-accelerated ML pipeline for transcription, diarization, summarization via Modal.com
- Infrastructure: Redis, PostgreSQL/SQLite, Celery workers, WebRTC streaming
Development Commands
Backend (Python) - cd server/
Setup and Dependencies:
# Install dependencies
uv sync
# Database migrations (first run or schema changes)
uv run alembic upgrade head
# Start services
docker compose up -d redis
Development:
# Start FastAPI server
uv run -m reflector.app --reload
# Start Celery worker for background tasks
uv run celery -A reflector.worker.app worker --loglevel=info
# Start Celery beat scheduler (optional, for cron jobs)
uv run celery -A reflector.worker.app beat
Testing:
# Run all tests with coverage
uv run pytest
# Run specific test file
uv run pytest tests/test_transcripts.py
# Run tests with verbose output
uv run pytest -v
Process Audio Files:
# Process local audio file manually
uv run python -m reflector.tools.process path/to/audio.wav
Frontend (Next.js) - cd www/
Setup:
# Install dependencies
yarn install
# Copy configuration templates
cp .env_template .env
cp config-template.ts config.ts
Development:
# Start development server
yarn dev
# Generate TypeScript API client from OpenAPI spec
yarn openapi
# Lint code
yarn lint
# Format code
yarn format
# Build for production
yarn build
Docker Compose (Full Stack)
# Start all services
docker compose up -d
# Start specific services
docker compose up -d redis server worker
Architecture Overview
Backend Processing Pipeline
The audio processing follows a modular pipeline architecture:
- Audio Input: WebRTC streaming, file upload, or cloud recording ingestion
- Chunking: Audio split into processable segments (
AudioChunkerProcessor) - Transcription: Whisper or Modal.com GPU processing (
AudioTranscriptAutoProcessor) - Diarization: Speaker identification (
AudioDiarizationAutoProcessor) - Text Processing: Formatting, translation, topic detection
- Summarization: AI-powered summaries and title generation
- Storage: Database persistence with optional S3 backend
Database Models
Core entities:
transcript: Main table with processing results, summaries, topics, participantsmeeting: Live meeting sessions with consent managementroom: Virtual meeting spaces with configurationrecording: Audio/video file metadata and processing status
API Structure
All endpoints prefixed /v1/:
transcripts/- CRUD operations for transcriptstranscripts_audio/- Audio streaming and downloadtranscripts_webrtc/- Real-time WebRTC endpointstranscripts_websocket/- WebSocket for live updatesmeetings/- Meeting lifecycle managementrooms/- Virtual room management
Frontend Architecture
- App Router: Next.js 14 with route groups for organization
- State: React Context pattern, no Redux
- Real-time: WebSocket integration for live transcription updates
- Auth: NextAuth.js with Authentik OAuth/OIDC provider
- UI: Chakra UI components with Tailwind CSS utilities
Key Configuration
Environment Variables
Backend (server/.env):
DATABASE_URL- Database connection stringREDIS_URL- Redis broker for CeleryTRANSCRIPT_BACKEND=modal+TRANSCRIPT_MODAL_API_KEY- Modal.com transcriptionDIARIZATION_BACKEND=modal+DIARIZATION_MODAL_API_KEY- Modal.com diarizationTRANSLATION_BACKEND=modal+TRANSLATION_MODAL_API_KEY- Modal.com translationWHEREBY_API_KEY- Video platform integrationREFLECTOR_AUTH_BACKEND- Authentication method (none, jwt)
Frontend (www/.env):
NEXTAUTH_URL,NEXTAUTH_SECRET- Authentication configurationNEXT_PUBLIC_REFLECTOR_API_URL- Backend API endpointREFLECTOR_DOMAIN_CONFIG- Feature flags and domain settings
Testing Strategy
- Backend: pytest with async support, HTTP client mocking, audio processing tests
- Frontend: No current test suite - opportunities for Jest/React Testing Library
- Coverage: Backend maintains test coverage reports in
htmlcov/
GPU Processing
Modal.com integration for scalable ML processing:
- Deploy changes:
modal run server/gpu/path/to/model.py - Requires Modal account with
REFLECTOR_GPU_APIKEYsecret - Fallback to local processing when Modal unavailable
Common Issues
- Permissions: Browser microphone access required in System Preferences
- Audio Routing: Use BlackHole (Mac) for merging multiple audio sources
- WebRTC: Ensure proper CORS configuration for cross-origin streaming
- Database: Run
uv run alembic upgrade headafter pulling schema changes
Pipeline/worker related info
If you need to do any worker/pipeline related work, search for "Pipeline" classes and their "create" or "build" methods to find the main processor sequence. Look for task orchestration patterns (like "chord", "group", or "chain") to identify the post-processing flow with parallel execution chains. This will give you abstract vision on how processing pipeling is organized.