reflector

mirror of https://github.com/Monadical-SAS/reflector.git synced 2026-04-09 23:36:47 +00:00

Author	SHA1	Message	Date
Igor Monadical	9a258abc02	feat: api tokens (#705 ) * feat: api tokens (vibe) * self-review * remove token terminology + pr comments (vibe) * return email_verified --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-10-20 12:55:25 -04:00
Sergey Mankovsky	9a71af145e	fix: update transcript list on reprocess (#676 ) * Update transcript list on reprocess * Fix transcript create * Fix multiple sockets issue * Pass token in sec websocket protocol * userEvent parse example * transcript list invalidation non-abstraction * Emit only relevant events to the user room * Add ws close code const * Refactor user websocket endpoint * Refactor user events provider --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-10-07 19:11:30 +02:00
Sergey Mankovsky	5d98754305	fix: security review (#656 ) * Add security review doc * Add tests to reproduce security issues * Fix security issues * Fix tests * Set auth auth backend for tests * Fix ics api tests * Fix transcript mutate check * Update frontent env var names * Remove permissions doc	2025-09-29 23:07:49 +02:00
Mathieu Virbel	6f680b5795	feat: calendar integration (#608 ) * feat: calendar integration * feat: add ICS calendar API endpoints for room configuration and sync * feat: add Celery background tasks for ICS sync * feat: implement Phase 2 - Multiple active meetings per room with grace period This commit adds support for multiple concurrent meetings per room, implementing grace period logic and improved meeting lifecycle management for calendar integration. ## Database Changes - Remove unique constraint preventing multiple active meetings per room - Add last_participant_left_at field to track when meeting becomes empty - Add grace_period_minutes field (default: 15) for configurable grace period ## Meeting Controller Enhancements - Add get_all_active_for_room() to retrieve all active meetings for a room - Add get_active_by_calendar_event() to find meetings by calendar event ID - Maintain backward compatibility with existing get_active() method ## New API Endpoints - GET /rooms/{room_name}/meetings/active - List all active meetings - POST /rooms/{room_name}/meetings/{meeting_id}/join - Join specific meeting ## Meeting Lifecycle Improvements - 15-minute grace period after last participant leaves - Automatic reactivation when participant rejoins during grace period - Force close calendar meetings 30 minutes after scheduled end time - Update process_meetings task to handle multiple active meetings ## Whereby Integration - Clear grace period when participants join via webhook events - Track participant count for grace period management ## Testing - Add comprehensive tests for multiple active meetings - Test grace period behavior and participant rejoin scenarios - Test calendar meeting force closure logic - All 5 new tests passing This enables proper calendar integration with overlapping meetings while preventing accidental meeting closures through the grace period mechanism. * feat: implement frontend for calendar integration (Phase 3 & 4) - Created MeetingSelection component for choosing between multiple active meetings - Shows both active meetings and upcoming calendar events (30 min ahead) - Displays meeting metadata with privacy controls (owner-only details) - Supports creation of unscheduled meetings alongside calendar meetings - Added waiting page for users joining before scheduled start time - Shows countdown timer until meeting begins - Auto-transitions to meeting when calendar event becomes active - Handles early joining with proper routing - Created collapsible info panel showing meeting details - Displays calendar metadata (title, description, attendees) - Shows participant count and duration - Privacy-aware: sensitive info only visible to room owners - Integrated ICS settings into room configuration dialog - Test connection functionality with immediate feedback - Manual sync trigger with detailed results - Shows last sync time and ETag for monitoring - Configurable sync intervals (1 min to 1 hour) - New /room/{roomName} route for meeting selection - Waiting room at /room/{roomName}/wait?eventId={id} - Classic room page at /{roomName} with meeting info - Uses sessionStorage to pass selected meeting between pages - Added new endpoints for active/upcoming meetings - Regenerated TypeScript client with latest OpenAPI spec - Proper error handling and loading states - Auto-refresh every 30 seconds for live updates - Color-coded badges for meeting status - Attendee status indicators (accepted/declined/tentative) - Responsive design with Chakra UI components - Clear visual hierarchy between active and upcoming meetings - Smart truncation for long attendee lists This completes the frontend implementation for calendar integration, enabling users to seamlessly join scheduled meetings from their calendar applications. * WIP: Migrate calendar integration frontend to React Query - Migrate all calendar components from useApi to React Query hooks - Fix Chakra UI v3 compatibility issues (Card, Progress, spacing props, leftIcon) - Update backend Meeting model to include calendar fields - Replace imperative API calls with declarative React Query patterns - Remove old OpenAPI generated files that conflict with new structure * fix: alembic migrations * feat: add calendar migration * feat: update ics, first version working * feat: implement tabbed interface for room edit dialog - Add General, Calendar, and Share tabs to organize room settings - Move ICS settings to dedicated Calendar tab - Move Zulip configuration to Share tab - Keep basic room settings and webhooks in General tab - Remove redundant migration file - Fix Chakra UI v3 compatibility issues in calendar components 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: infinite loop * feat: improve ICS calendar sync UX and fix room URL matching - Replace "Test Connection" button with "Force Sync" button (Edit Room only) - Show detailed sync results: total events downloaded vs room matches - Remove emoticons and auto-hide timeout for cleaner UX - Fix room URL matching to use UI_BASE_URL instead of BASE_URL - Replace FaSync icon with LuRefreshCw for consistency - Clear sync results when dialog closes or Force Sync pressed - Update tests to reflect UI_BASE_URL change and exact URL matching 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: reorganize room edit dialog and fix Force Sync button - Move WebHook configuration from General to dedicated WebHook tab - Add WebHook tab after Share tab in room edit dialog - Fix Force Sync button not appearing by adding missing isEditing prop - Fix indentation issues in MeetingSelection component 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: complete calendar integration with UI improvements and code cleanup Calendar Integration Tasks: - Update upcoming meetings window from 30 to 120 minutes - Include currently happening events in upcoming meetings API - Create shared time utility functions (formatDateTime, formatCountdown, formatStartedAgo) - Improve ongoing meetings UI logic with proper time detection - Fix backend code organization and remove excessive documentation UI/UX Improvements: - Restructure room page layout using MinimalHeader pattern - Remove borders from header and footer elements - Change button text from "Leave Meeting" to "Leave Room" - Remove "Back to Reflector" footer for cleaner design - Extract WaitPageClient component for better separation Backend Changes: - calendar_events.py: Fix import organization and extend timing window - rooms.py: Update API default from 30 to 120 minutes - Enhanced test coverage for ongoing meeting scenarios Frontend Changes: - MinimalHeader: Add onLeave prop for custom navigation - MeetingSelection: Complete layout restructure with shared utilities - timeUtils: New shared utility file for consistent time formatting 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: remove wait page and simplify Join button with 5-minute disable logic - Remove entire wait page directory and associated files - Update handleJoinUpcoming to create unscheduled meeting directly - Simplify Join button to single state: - Always shows "Join" text - Blue when meeting can be joined (ongoing or within 5 minutes) - Gray/disabled when more than 5 minutes away - Remove confusing "Join Now", "Join Early" text variations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: improve calendar integration and meeting UI - Refactor ICS sync tasks to use @asynctask decorator for cleaner async handling - Extract meeting creation logic into reusable function - Improve meeting selection UI with distinct current/upcoming sections - Add early join functionality for upcoming meetings within 5-minute window - Simplify non-ICS room workflow with direct Whereby embed - Fix import paths and component organization 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: restore original recording consent functionality - Remove custom ConsentDialogButton and WherebyEmbed components - Merge RoomClient logic back into main room page - Restore original consent UI: blue button with toast modal - Maintain calendar integration features for ICS-enabled rooms - Add consent-handler.md documentation of original functionality - Preserve focus management and accessibility features * fix: redirect Join Now button to local meeting page - Change handleJoinDirect to use onMeetingSelect instead of opening external URL - Join Now button now navigates to /{roomName}/{meetingId} instead of whereby.com - Maintains proper routing within the application * feat: remove restrictive message for non-owners in private rooms - Remove confusing message about room owner permissions - Cleaner UI for all users regardless of ownership status - Users will only see available meetings and join options * feat: improve meeting selection UI for better readability - Limit page content to max 800px width for better 4K display readability - Remove LIVE tag badge for cleaner interface - Remove shadow from main live meeting box - Remove blue border and hover effects for minimal design - Change background to neutral gray for less visual noise * feat: add room by name endpoint for non-authenticated access - Add GET /rooms/name/{room_name} backend endpoint - Endpoint supports non-authenticated access for public rooms - Returns RoomDetails with webhook fields hidden for non-owners - Update useRoomGetByName hook to use new direct endpoint - Remove authentication requirement from frontend hook - Regenerate API client types Fixes: Non-authenticated users can now access room lobbies * feat: add friendly message when no meetings are ongoing - Show centered message with calendar icon when no meetings are active - Message text: 'No meetings right now' with helpful description - Contextual text for owners/shared rooms mentioning quick meeting option - Consistent gray styling matching the rest of the interface - Only displays when both currentMeetings and upcomingMeetings are empty * style: center no meetings message and remove background - Change from Box to Flex with flex=1 for vertical centering - Remove gray background, border radius, and padding - Message now appears cleanly centered in available space - Maintains horizontal and vertical centering * feat: move Create Meeting button to header - Remove 'Start a Quick Meeting' box from main content area - Add showCreateButton and onCreateMeeting props to MinimalHeader - Create Meeting button now appears in header left of Leave Room - Only shows for room owners or shared room users - Update no meetings message to remove reference to quick meeting below - Cleaner, more accessible UI with actions in the header * style: change room title and no meetings text to pure black - Update room title in MinimalHeader from gray.700 to black - Update 'No meetings right now' text from gray.700 to black - Improves visual hierarchy and readability - Consistent with other pages' styling * style: linting * fix: remove plan files * fix: alembic migration with named foreign keys * feat: add SyncStatus enum and refactor ICS sync to use rooms controller - Add SyncStatus enum to replace string literals in ICS sync status - Replace direct SQL queries in worker with rooms_controller.get_ics_enabled() - Improve type safety and maintainability of ICS sync code - Enum values: SUCCESS, UNCHANGED, ERROR, SKIPPED maintain backward compatibility * refactor: remove unnecessary docstring from get_ics_enabled method The function name is self-explanatory * fix: import top level * feat: use Literal type for ICSStatus.status field - Changed ICSStatus.status from str to Literal['enabled', 'disabled'] - Improves type safety and API documentation * feat: update TypeScript definitions for ICSStatus Literal type - OpenAPI generation now properly reflects Literal['enabled', 'disabled'] type - Improves type safety for frontend consumers of the API - Applied automatic formatting via pre-commit hooks * refactor: replace loguru with structlog in ics_sync service - Replace loguru import with structlog in services/ics_sync.py - Update logging calls to use structlog's structured format with keyword args - Maintains consistency with other services using structlog - Changes: logger.info(f'...') -> logger.info('...', key=value) format * chore: remove loguru dependency and improve type annotations - Remove loguru from dependencies in pyproject.toml (replaced with structlog) - Update meeting controller methods to properly return Optional types - Update dependency lock file after loguru removal * fix: resolve pyflakes warnings in ics_sync and meetings modules Remove unused imports and variables to clean up code quality * Remove grace period logic and improve meeting deactivation - Removed grace_period_minutes and last_participant_left_at fields - Simplified deactivation logic based on actual usage patterns: * Active sessions: Keep meeting active regardless of scheduled time * Calendar meetings: Wait until scheduled end if unused, deactivate immediately once used and empty * On-the-fly meetings: Deactivate immediately when empty - Created migration to drop unused database columns - Updated tests to remove grace period test cases * Update test to match new deactivation logic for calendar meetings * fix: remove unwanted file * fix: incompleted changes from EVENT_WINDOW* * fix: update room ICS API tests to include required webhook fields and correct URL - Add webhook_url and webhook_secret fields to room creation tests - Fix room URL matching in ICS sync test to use UI_BASE_URL instead of BASE_URL - Aligns test with actual API requirements and ICS sync service implementation * fix: add Redis distributed locking to prevent race conditions in process_meetings - Implement per-meeting locks using Redis to prevent concurrent processing - Add lock extension after slow API calls (Whereby) to handle long-running operations - Use redis-py's built-in lock.extend() with replace_ttl=True for simple TTL refresh - Track and log skipped meetings when locked by other workers - Document SSRF analysis showing it's low-risk due to async worker isolation This prevents multiple workers from processing the same meeting simultaneously, which could cause state corruption or duplicate deactivations. * refactor: rename MinimalHeader to MeetingMinimalHeader for clarity * fix: minor code quality improvements - add emoji constants, fix type safety, cleanup comments * fix: database migration * self-pr review * self-pr review * self-pr review treeshake * fix: local fixes * fix: creation of meeting * fix: meeting selection create button * compile fix * fix: meeting selection responsive * fix: rework process logic for meeting * fix: meeting useEffect frontend-only dedupe (#647) * meeting useEffect frontend-only dedupe * format * also get room by name backend fix --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com> * invalidate meeting list on new meeting * test fix * room url copy button for ics * calendar refresh quick action icon * remove work.md * meeting page frontend fixes * hide number of meeting participants * Revert "hide number of meeting participants" This reverts commit `38906c5d1a`. * ui bits * ui bits * remove log * room name typing stricten * feat: protect atomic operation involving external service with redlock --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Igor Monadical <igor@monadical.com> Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-09-17 16:43:20 -06:00
Sergey Mankovsky	ab859d65a6	feat: self-hosted gpu api (#636 ) * Self-hosted gpu api * Refactor self-hosted api * Rename model api tests * Use lifespan instead of startup event * Fix self hosted imports * Add newlines * Add response models * Move gpu dir to the root * Add project description * Refactor lifespan * Update env var names for model api tests * Preload diarizarion service * Refactor uploaded file paths	2025-09-17 18:52:03 +02:00
Igor Monadical	79f161436e	chore: meeting user id removal and room id requirement (#635 ) * chore: remove meeting user id and make meeting room id required * meeting room_id optional * orphaned meeting room ids DATA migration * ci fix * fix meeting_room_id_fkey downgrade * fix migration rollback * fix: put index back (meeting room id) * fix: put index back (meeting room id) * fix: put index back (meeting room id) * remove noop migrations --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-09-12 13:07:58 -04:00
Igor Monadical	f81fe9948a	fix: anonymous users transcript permissions (#621 ) * fix: public transcript visibility * fix: transcript permissions frontend * dead code removal * chore: remove unused code * fix search tests * fix search tests --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-09-09 10:50:29 -04:00
Sergey Mankovsky	0663700a61	fix: align whisper transcriber api with parakeet (#602 ) * Documents transcriber api * Update whisper transcriber api to match parakeet * Update api transcription spec * Return 400 for unsupported file type * Add params to api spec * Update whisper transcriber implementation to match parakeet	2025-09-05 10:52:14 +02:00
Igor Monadical	695d1a957d	fix: search-logspam (#593 ) * fix: search-logspam * llm comment * fix tests --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-08-29 18:55:51 -04:00
Mathieu Virbel	6f0c7c1a5e	feat(cleanup): add automatic data retention for public instances (#574 ) * feat(cleanup): add automatic data retention for public instances - Add Celery task to clean up anonymous data after configurable retention period - Delete transcripts, meetings, and orphaned recordings older than retention days - Only runs when PUBLIC_MODE is enabled to prevent accidental data loss - Properly removes all associated files (local and S3 storage) - Add manual cleanup tool for testing and intervention - Configure retention via PUBLIC_DATA_RETENTION_DAYS setting (default: 7 days) Fixes #571 * fix: apply pre-commit formatting fixes * fix: properly delete recording files from storage during cleanup - Add storage deletion for orphaned recordings in both cleanup task and manual tool - Delete from storage before removing database records - Log warnings if storage deletion fails but continue with database cleanup * Apply suggestion from @pr-agent-monadical[bot] Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com> * Apply suggestion from @pr-agent-monadical[bot] Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com> * refactor: cleanup_old_data for better logging * fix: linting * test: fix meeting cleanup test to not require room controller - Simplify test by directly inserting meetings into database - Remove dependency on non-existent rooms_controller.create method - Tests now pass successfully * fix: linting * refactor: simplify cleanup tool to use worker implementation - Remove duplicate cleanup logic from manual tool - Use the same _cleanup_old_public_data function from worker - Remove dry-run feature as requested - Prevent code duplication and ensure consistency - Update documentation to reflect changes * refactor: split cleanup worker into smaller functions - Move all imports to the top of the file - Extract cleanup logic into separate functions: - cleanup_old_transcripts() - cleanup_old_meetings() - cleanup_orphaned_recordings() - log_cleanup_results() - Make code more maintainable and testable - Add days parameter support to Celery task - Update manual tool to work with refactored code * feat: add TypedDict typing for cleanup stats - Add CleanupStats TypedDict for better type safety - Update all function signatures to use proper typing - Add return type annotations to _cleanup_old_public_data - Improves code maintainability and IDE support * feat: add CASCADE DELETE to meeting_consent foreign key - Add ondelete="CASCADE" to meeting_consent.meeting_id foreign key - Generate and apply migration to update existing constraint - Remove manual consent deletion from cleanup code - Add unit test to verify CASCADE DELETE behavior * style: linting * fix: alembic migration branchpoint * fix: correct downgrade constraint name in CASCADE DELETE migration * fix: regenerate CASCADE DELETE migration with proper constraint names - Delete problematic migration and regenerate with correct names - Use explicit constraint name in both upgrade and downgrade - Ensure migration works bidirectionally - All tests passing including CASCADE DELETE test * style: linting * refactor: simplify cleanup to use transcripts as entry point - Remove orphaned_recordings cleanup (not part of this PR scope) - Remove separate old_meetings cleanup - Transcripts are now the main entry point for cleanup - Associated meetings and recordings are deleted with their transcript - Use single database connection for all operations - Update tests to reflect new approach * refactor: cleanup and rename functions for clarity - Rename _cleanup_old_public_data to cleanup_old_public_data (make public) - Rename celery task to cleanup_old_public_data_task for clarity - Update docstrings and improve code organization - Remove unnecessary comments and simplify deletion logic - Update tests to use new function names - All tests passing * style: linting\ * style: typing and review * fix: add transaction on cleanup_single_transcript * fix: naming --------- Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>	2025-08-29 08:47:14 -06:00
Mathieu Virbel	9dfd76996f	fix: file pipeline status reporting and websocket updates (#589 ) * feat: use file pipeline for upload and reprocess action * fix: make file pipeline correctly report status events * fix: duplication of transcripts_controller * fix: tests * test: fix file upload test * test: fix reprocess * fix: also patch from main_file_pipeline (how patch is done is dependent of file import unfortunately)	2025-08-29 00:58:14 -06:00
Igor Loskutov	124ce03bf8	fix: Igor/evaluation (#575 ) * fix: impossible import error (#563) * evaluation cli - database events experiment * hallucinations * evaluation - unhallucinate * evaluation - unhallucinate * roll back reliability link * self reviewio * lint * self review * add file pipeline to cli * add file pipeline to cli + sorting * remove cli tests * remove ai comments * comments	2025-08-28 12:07:34 -04:00
Igor Loskutov	d70beee51b	fix: include shared rooms to search (#558 ) * include shared rooms to search * tests vibe * tests vibe * tests vibe * tests vibe * tests vibe * tests vibe * tests vibe * remove tests, thats too much	2025-08-21 14:52:29 -04:00
Mathieu Virbel	3ea7f6b7b6	feat: pipeline improvement with file processing, parakeet, silero-vad (#540 ) * feat: improve pipeline threading, and transcriber (parakeet and silero vad) * refactor: remove whisperx, implement parakeet * refactor: make audio_chunker more smart and wait for speech, instead of fixed frame * refactor: make audio merge to always downscale the audio to 16k for transcription * refactor: make the audio transcript modal accepting batches * refactor: improve type safety and remove prometheus metrics - Add DiarizationSegment TypedDict for proper diarization typing - Replace List/Optional with modern Python list/\| None syntax - Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor - Add comprehensive file processing pipeline with parallel execution - Update processor imports and type annotations throughout - Implement optimized file pipeline as default in process.py tool * refactor: convert FileDiarizationProcessor I/O types to BaseModel Update FileDiarizationInput and FileDiarizationOutput to inherit from BaseModel instead of plain classes, following the standard pattern used by other processors in the codebase. * test: add tests for file transcript and diarization with pytest-recording * build: add pytest-recording * feat: add local pyannote for testing * fix: replace PyAV AudioResampler with torchaudio for reliable audio processing - Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument - Use torchaudio.functional.resample for robust sample rate conversion - Optimize processing: skip conversion for already 16kHz mono audio - Add direct WAV writing with Python wave module for better performance - Consolidate duplicate downsample checks for cleaner code - Maintain list[av.AudioFrame] input interface - Required for Silero VAD which needs 16kHz mono audio * fix: replace PyAV AudioResampler with torchaudio solution - Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor - Replaces problematic PyAV AudioResampler with torchaudio.functional.resample - Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono - Uses direct WAV writing with Python's wave module for better performance - Fixes test_basic_process to disable diarization (pyannote dependency not installed) - Updates test expectations to match actual processor behavior - Removes unused pydub dependency from pyproject.toml - Adds comprehensive TEST_ANALYSIS.md documenting test suite status * feat: add parameterized test for both diarization modes - Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True] - Test with diarization=False always passes (tests core AudioMergeProcessor functionality) - Test with diarization=True gracefully skips when pyannote.audio is not installed - Provides comprehensive test coverage for both pipeline configurations * fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor - Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute - Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called - Updates property usage in _diarize method to use new name - Now correctly supports pipeline initialization for diarization processing * fix: add local for pyannote * test: add diarization test * fix: resample on audio merge now working * fix: correctly restore timestamp * fix: display exception in a threaded processor if that happen * Update pyproject.toml * ci: remove option * ci: update astral-sh/setup-uv * test: add monadical url for pytest-recording * refactor: remove previous version * build: move faster whisper to local dep * test: fix missing import * refactor: improve main_file_pipeline organization and error handling - Move all imports to the top of the file - Create unified EmptyPipeline class to replace duplicate mock pipeline code - Remove timeout and fallback logic - let processors handle their own retries - Fix error handling to raise any exception from parallel tasks - Add proper type hints and validation for captured results * fix: wrong function * fix: remove task_done * feat: add configurable file processing timeouts for modal processors - Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription - Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization - Replace hardcoded timeout=600 with configurable settings in modal processors - Allows customization of timeout values via environment variables * fix: use logger * fix: worker process meetings now use file pipeline * fix: topic not gathered * refactor: remove prepare(), pipeline now work * refactor: implement many review from Igor * test: add test for test_pipeline_main_file * refactor: remove doc * doc: add doc * ci: update build to use native arm64 builder * fix: merge fixes * refactor: changes from Igor review + add test (not by default) to test gpu modal part * ci: update to our own runner linux-amd64 * ci: try using suggested mode=min * fix: update diarizer for latest modal, and use volume * fix: modal file extension detection * fix: put the diarizer as A100	2025-08-20 20:07:19 -06:00
Igor Loskutov	009590c080	feat: search frontend (#551 ) * feat: better highlight * feat(search): add long_summary to search vector for improved search results - Update search vector to include long_summary with weight B (between title A and webvtt C) - Modify SearchController to fetch long_summary and prioritize its snippets - Generate snippets from long_summary first (max 2), then from webvtt for remaining slots - Add comprehensive tests for long_summary search functionality - Create migration to update search_vector_en column in PostgreSQL This improves search quality by including summarized content which often contains key topics and themes that may not be explicitly mentioned in the transcript. * fix: address code review feedback for search enhancements - Fix test file inconsistencies by removing references to non-existent model fields - Comment out tests for unimplemented features (room_ids, status filters, date ranges) - Update tests to only use currently available fields (room_id singular, no room_name/processing_status) - Mark future functionality tests with @pytest.mark.skip - Make snippet counts configurable - Add LONG_SUMMARY_MAX_SNIPPETS constant (default: 2) - Replace hardcoded value with configurable constant - Improve error handling consistency in WebVTT parsing - Use different log levels for different error types (debug for malformed, warning for decode, error for unexpected) - Add catch-all exception handler for unexpected errors - Include stack trace for critical errors All existing tests pass with these changes. * fix: correct datetime test to include required duration field * feat: better highlight * feat: search room names * feat: acknowledge deleted room * feat: search filters fix and rank removal * chore: minor refactoring * feat: better matches frontend * chore: self-review (vibe) * chore: self-review WIP * chore: self-review WIP * chore: self-review WIP * chore: self-review WIP * chore: self-review WIP * chore: self-review WIP * chore: self-review WIP * remove swc (vibe) * search url query sync (vibe) * search url query sync (vibe) * better casts and cap while * PR review + simplify frontend hook * pr: remove search db timeouts * cleanup tests * tests cleanup * frontend cleanup * index declarations * refactor frontend (self-review) * fix search pagination * clear "x" for search input * pagination max pages fix * chore: cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * lockfile * pr review	2025-08-20 20:56:45 -04:00
Sergey Mankovsky	b9d891d342	feat: delete recording with transcript (#547 ) * Delete recording with transcript * Delete confirmation dialog * Use aws storage abstraction for recording deletion * Test recording deleted with transcript * Use get transcript storage * Fix the test * Add env vars for recording storage	2025-08-14 20:45:30 +02:00
Mathieu Virbel	9eab952c63	feat: postgresql migration and removal of sqlite in pytest (#546 ) * feat: remove support of sqlite, 100% postgres * fix: more migration and make datetime timezone aware in postgres * fix: change how database is get, and use contextvar to have difference instance between different loops * test: properly use client fixture that handle lifetime/database connection * fix: add missing client fixture parameters to test functions This commit fixes NameError issues where test functions were trying to use the 'client' fixture but didn't have it as a parameter. The changes include: 1. Added 'client' parameter to test functions in: - test_transcripts_audio_download.py (6 functions including fixture) - test_transcripts_speaker.py (3 functions) - test_transcripts_upload.py (1 function) - test_transcripts_rtc_ws.py (2 functions + appserver fixture) 2. Resolved naming conflicts in test_transcripts_rtc_ws.py where both HTTP client and StreamClient were using variable name 'client'. StreamClient instances are now named 'stream_client' to avoid conflicts. 3. Added missing 'from reflector.app import app' import in rtc_ws tests. Background: Previously implemented contextvars solution with get_database() function resolves asyncio event loop conflicts in Celery tasks. The global client fixture was also created to replace manual AsyncClient instances, ensuring proper FastAPI application lifecycle management and database connections during tests. All tests now pass except for 2 pre-existing RTC WebSocket test failures related to asyncpg connection issues unrelated to these fixes. * fix: ensure task are correctly closed * fix: make separate event loop for the live server * fix: make default settings pointing at postgres * build: remove pytest-docker deps out of dev, just tests group	2025-08-14 11:40:52 -06:00
Igor Loskutov	6fb5cb21c2	feat: search backend (#537 ) * docs: transient docs * chore: cleanup * webvtt WIP * webvtt field * chore: webvtt tests comments * chore: remove useless tests * feat: search TASK.md * feat: full text search by title/webvtt * chore: search api task * feat: search api * feat: search API * chore: rm task md * chore: roll back unnecessary validators * chore: pr review WIP * chore: pr review WIP * chore: pr review * chore: top imports * feat: better lint + ci * feat: better lint + ci * feat: better lint + ci * feat: better lint + ci * chore: lint * chore: lint * fix: db datetime definitions * fix: flush() params * fix: update transcript mutability expectation / test * fix: update transcript mutability expectation / test * chore: auto review * chore: new controller extraction * chore: new controller extraction * chore: cleanup * chore: review WIP * chore: pr WIP * chore: remove ci lint * chore: openapi regeneration * chore: openapi regeneration * chore: postgres test doc * fix: .dockerignore for arm binaries * fix: .dockerignore for arm binaries * fix: cap test loops * fix: cap test loops * fix: cap test loops * fix: get_transcript_topics * chore: remove flow.md docs and claude guidance * chore: remove claude.md db doc * chore: remove claude.md db doc * chore: remove claude.md db doc * chore: remove claude.md db doc	2025-08-13 10:03:38 -04:00
Mathieu Virbel	dc177af3ff	feat: implement service-specific Modal API keys with auto processor pattern (#528 ) * fix: refactor modal API key configuration for better separation of concerns - Split generic MODAL_API_KEY into service-specific keys: - TRANSCRIPT_API_KEY for transcription service - DIARIZATION_API_KEY for diarization service - TRANSLATE_API_KEY for translation service - Remove deprecated _MODAL_API_KEY settings - Add proper validation to ensure URLs are set when using modal processors - Update README with new configuration format BREAKING CHANGE: Configuration keys have changed. Update your .env file: - TRANSCRIPT_MODAL_API_KEY → TRANSCRIPT_API_KEY - LLM_MODAL_API_KEY → (removed, use TRANSCRIPT_API_KEY) - Add DIARIZATION_API_KEY and TRANSLATE_API_KEY if using those services fix: update Modal backend configuration to use service-specific API keys - Changed from generic MODAL_API_KEY to service-specific keys: - TRANSCRIPT_MODAL_API_KEY for transcription - DIARIZATION_MODAL_API_KEY for diarization - TRANSLATION_MODAL_API_KEY for translation - Updated audio_transcript_modal.py and audio_diarization_modal.py to use modal_api_key parameter - Updated documentation in README.md, CLAUDE.md, and env.example * feat: implement auto/modal pattern for translation processor - Created TranscriptTranslatorAutoProcessor following the same pattern as transcript/diarization - Created TranscriptTranslatorModalProcessor with TRANSLATION_MODAL_API_KEY support - Added TRANSLATION_BACKEND setting (defaults to "modal") - Updated all imports to use TranscriptTranslatorAutoProcessor instead of TranscriptTranslatorProcessor - Updated env.example with TRANSLATION_BACKEND and TRANSLATION_MODAL_API_KEY - Updated test to expect TranscriptTranslatorModalProcessor name - All tests passing * refactor: simplify transcript_translator base class to match other processors - Moved all implementation from base class to modal processor - Base class now only defines abstract _translate method - Follows the same minimal pattern as audio_diarization and audio_transcript base classes - Updated test mock to use _translate instead of get_translation - All tests passing * chore: clean up settings and improve type annotations - Remove deprecated generic API key variables from settings - Add comments to group Modal-specific settings - Improve type annotations for modal_api_key parameters * fix: typing * fix: passing key to openai * test: fix rtc test failing due to change on transcript It also correctly setup database from sqlite, in case our configuration is setup to postgres. * ci: deactivate translation backend by default * test: fix modal->mock * refactor: implementing igor review, mock to passthrough	2025-08-04 12:07:30 -06:00
Mathieu Virbel	28ac031ff6	feat: use llamaindex everywhere (#525 ) * feat: use llamaindex for transcript final title too * refactor: removed llm backend, replaced with one single class+llamaindex * refactor: self-review * fix: typing * fix: tests * refactor: extract clean_title and add tests * test: fix * test: remove ensure_casing/nltk * fix: tiny mistake	2025-08-01 12:13:00 -06:00
Mathieu Virbel	f5b82d44e3	style: use ruff for linting and formatting (#524 )	2025-07-31 17:57:43 -06:00
Mathieu Virbel	406164033d	feat: new summary using phi-4 and llama-index (#519 ) * feat: add litellm backend implementation * refactor: improve generate/completion methods for base LLM * refactor: remove tokenizer logic * style: apply code formatting * fix: remove hallucinations from LLM responses * refactor: comprehensive LLM and summarization rework * chore: remove debug code * feat: add structured output support to LiteLLM * refactor: apply self-review improvements * docs: add model structured output comments * docs: update model structured output comments * style: apply linting and formatting fixes * fix: resolve type logic bug * refactor: apply PR review feedback * refactor: apply additional PR review feedback * refactor: apply final PR review feedback * fix: improve schema passing for LLMs without structured output * feat: add PR comments and logger improvements * docs: update README and add HTTP logging * feat: improve HTTP logging * feat: add summary chunking functionality * fix: resolve title generation runtime issues * refactor: apply self-review improvements * style: apply linting and formatting * feat: implement LiteLLM class structure * style: apply linting and formatting fixes * docs: env template model name fix * chore: remove older litellm class * chore: format * refactor: simplify OpenAILLM * refactor: OpenAILLM tokenizer * refactor: self-review * refactor: self-review * refactor: self-review * chore: format * chore: remove LLM_USE_STRUCTURED_OUTPUT from envs * chore: roll back migration lint changes * chore: roll back migration lint changes * fix: make summary llm configuration optional for the tests * fix: missing f-string * fix: tweak the prompt for summary title * feat: try llamaindex for summarization * fix: complete refactor of summary builder using llamaindex and structured output when possible * fix: separate prompt as constant * fix: typings * fix: enhance prompt to prevent mentioning others subject while summarize one * fix: various changes after self-review * fix: from igor review --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2025-07-31 15:29:29 -06:00
Igor Loskutov	27b43d85ab	feat: Diarization cli (#509 ) * diarisation cli * feat: s3 upload for modal diarisation cli call * chore: cleanup * chore: s3 cleanup improvement * chore: lint * chore: cleanup * chore: cleanup * chore: cleanup * chore: cleanup	2025-07-25 16:24:06 -04:00
Mathieu Virbel	2a2af5fff2	fix: remove fief out of the source code (#502 ) * fix: remove fief out of the source code * fix: remove corresponding test about migration	2025-07-21 21:09:05 -06:00
Mathieu Virbel	86ce68651f	build: move to uv (#488 ) * build: move to uv * build: add packages declaration * build: move to python 3.12, as sentencespiece does not work on 3.13 * ci: remove pre-commit check, will be done in another branch. * ci: fix name checkout * ci: update lock and dockerfile * test: remove event_loop, not needed in python 3.12 * test: updated test due to av returning AudioFrame with 4096 samples instead of 1024 * build: prevent using fastapi cli, because there is no way to set default port I don't want to pass --port 1250 every time, so back on previous approach. I deactivated auto-reload for production. * ci: remove main.py * test: fix quirck with httpx	2025-07-16 18:10:11 -06:00
Mathieu Virbel	5267ab2d37	feat: retake summary using NousResearch/Hermes-3-Llama-3.1-8B model (#415 ) This feature a new modal endpoint, and a complete new way to build the summary. ## SummaryBuilder The summary builder is based on conversational model, where an exchange between the model and the user is made. This allow more context inclusion and a better respect of the rules. It requires an endpoint with OpenAI-like completions endpoint (/v1/chat/completions) ## vLLM Hermes3 Unlike previous deployment, this one use vLLM, which gives OpenAI-like completions endpoint out of the box. It could also handle guided JSON generation, so jsonformer is not needed. But, the model is quite good to follow JSON schema if asked in the prompt. ## Conversion of long/short into summary builder The builder is identifying participants, find key subjects, get a summary for each, then get a quick recap. The quick recap is used as a short_summary, while the markdown including the quick recap + key subjects + summaries are used for the long_summary. This is why the nextjs component has to be updated, to correctly style h1 and keep the new line of the markdown.	2024-09-14 02:28:38 +02:00
Mathieu Virbel	873cbb0a42	fix: user migration confusion with user_id (#401 ) + added tests	2024-09-03 22:07:36 +02:00
Mathieu Virbel	03561453c5	feat: Monadical SSO as replacement of Fief (#393 ) * sso: first pass for integrating SSO still have issue on refreshing maybe customize the login page, or completely avoid it make 100% to understand how session server/client are working need to test with different configuration option (features flags and requireLogin) * sso: correctly handle refresh token, with pro-active refresh Going on interceptors make extra calls to reflector when 401. We need then to circle back with NextJS backend to update the jwt, session, then retry the failed request. I prefered to go pro-active, and ensure the session AND jwt are always up to date. A minute before the expiration, we'll try to refresh it. useEffect() of NextJS cannot be asynchronous, so we cannot wait for the token to be refreshed. Every 20s, a minute before the expiration (so 3x in total max) we'll try to renew. When the accessToken is renewed, the session is updated, and dispatching up to the client, which updates the useApi(). Therefore, no component will left without a incorrect token. * fixes: issue with missing key on react-select-search because the default value is undefined * sso: fixes login/logout button, and avoid seeing the login with authentik page when clicking * sso: ensure /transcripts/new is not behind protected page, and feature flags page are honored * sso: fixes user sub->id * fixes: remove old layout not used * fixes: set default NEXT_PUBLIC_SITE_URL as localhost * fixes: removing fief again due to merge with main * sso: ensure session is always ready before doing any action * sso: add migration from fief to jwt in server, only from transcripts list * fixes: user tests * fixes: compilation issues	2024-09-03 19:27:15 +02:00
Sergey Mankovsky	9b36adedae	Fix broken test	2024-07-19 12:30:01 +02:00
Sergey Mankovsky	fa6467c5ae	Merge pull request #375 from Monadical-SAS/restart-processing Restart processing	2024-07-19 12:00:30 +02:00
Sergey Mankovsky	68be967e66	Don't request permission for file upload	2024-07-18 15:18:36 +02:00
Sergey Mankovsky	562f2c94f9	Restart processing	2024-07-18 11:34:42 +02:00
Sergey Mankovsky	df5b735959	Chunked filed upload	2024-07-15 11:25:47 +02:00
Sergey Mankovsky	42bdabbc1b	Merge pull request #335 from Monadical-SAS/sara/UI-improvements Sara/UI improvements & fix transcript deletion	2024-05-28 12:28:50 +02:00
projects-g	72b22d1005	Update all modal deployments and change seamless configuration due to changes in src repo (#353 ) * update all modal deployments and change seamless configuration due to change in src repo * add fixture	2024-04-16 21:12:24 +05:30
Sara	6fe61cd5e3	fix transcript delete	2024-01-13 18:27:12 +01:00
Mathieu Virbel	eba60b7de4	Merge branch 'main' into feat-api-speaker-reassignment	2023-12-15 11:14:19 +01:00
Mathieu Virbel	a15a63bc8d	server: add reviewed field in transcript	2023-12-13 15:42:17 +01:00
Mathieu Virbel	6585002dfa	tests/upload: use shorter audio	2023-12-13 12:09:56 +01:00
Mathieu Virbel	f7f67521fc	server: try reconcile both tests webrtc and upload with celery worker	2023-12-13 11:25:46 +01:00
Mathieu Virbel	e5e1b70213	server: include endpoint to upload a audio/video file	2023-12-12 20:39:15 +01:00
Mathieu Virbel	37b11fdcb8	server: allow reassign speaker range using participant_id	2023-12-12 10:57:21 +01:00
Mathieu Virbel	d790308ec7	server: add an endpoint to merge speaker	2023-12-11 19:56:24 +01:00
Mathieu Virbel	07b29d42a7	server: add topic duration, and endpoint for getting words group per speaker on a topic	2023-12-11 19:46:05 +01:00
Mathieu Virbel	6f3d7df507	server: add API to reassign speakers, and get topics with words	2023-12-06 16:41:18 +01:00
Mathieu Virbel	7ac6d25217	server: add participant API Also break out views into different files for easier reading	2023-11-30 19:13:37 +01:00
Mathieu Virbel	3ebb21923b	server: enhance diarization algorithm	2023-11-29 20:34:43 +01:00
Mathieu Virbel	99b973f36f	server: fix tests	2023-11-22 14:41:40 +01:00
Mathieu Virbel	5ffa931822	server: update backend tests results (rpc does not work with chords)	2023-11-22 14:41:40 +01:00
Sara	a846e38fbd	fix waveform in pipeline	2023-11-17 13:38:32 +01:00

1 2

100 Commits