Commit Graph

454 Commits

Author SHA1 Message Date
e91979abbc feat: use jitsi file system 2025-09-17 15:16:03 -06:00
95e8011975 Merge main into jisti-integration branch
- Resolved conflicts in server/reflector/views/rooms.py to keep platform-agnostic approach
- Resolved conflicts in www/app/[roomName]/page.tsx to keep VideoPlatformEmbed approach
- Accepted main's version of generated API files (schemas.gen.ts, services.gen.ts, types.gen.ts)
- Removed config-template.ts as per main branch changes
2025-09-15 12:53:49 -06:00
5f143fe364 fix: zulip and consent handler on the file pipeline (#645) 2025-09-15 10:49:20 -06:00
Igor Monadical
79f161436e chore: meeting user id removal and room id requirement (#635)
* chore: remove meeting user id and make meeting room id required

* meeting room_id optional

* orphaned meeting room ids DATA migration

* ci fix

* fix meeting_room_id_fkey downgrade

* fix migration rollback

* fix: put index back (meeting room id)

* fix: put index back (meeting room id)

* fix: put index back (meeting room id)

* remove noop migrations

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-09-12 13:07:58 -04:00
Igor Monadical
b3a8e9739d chore: whereby & s3 settings env error reporting (#637)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-09-11 17:52:34 -04:00
fc363bd49b fix: missing follow_redirects=True on modal endpoint (#630) 2025-09-10 08:15:47 -06:00
Igor Monadical
cde99ca271 fix: auth post (#624)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-09-09 15:48:07 -04:00
Igor Monadical
f81fe9948a fix: anonymous users transcript permissions (#621)
* fix: public transcript visibility

* fix: transcript permissions frontend

* dead code removal

* chore: remove unused code

* fix search tests

* fix search tests

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-09-09 10:50:29 -04:00
Igor Monadical
c4d2825c81 feat: frontend openapi react query (#606)
* refactor: migrate from @hey-api/openapi-ts to openapi-react-query

- Replace @hey-api/openapi-ts with openapi-typescript and openapi-react-query
- Generate TypeScript types from OpenAPI spec
- Set up React Query infrastructure with QueryClientProvider
- Migrate all API hooks to use React Query patterns
- Maintain backward compatibility for existing components
- Remove old API infrastructure and dependencies

* fix: resolve import errors and add missing api hooks

- Create constants.ts for RECORD_A_MEETING_URL
- Add api-types.ts for backward compatible type exports
- Update all imports from deleted api folder to new locations
- Add missing React Query hooks for rooms and zulip operations
- Create useApi compatibility layer for unmigrated components

* feat: migrate components to React Query hooks

- Add comprehensive API hooks for all operations
- Migrate rooms page to use React Query mutations
- Update transcript title component to use mutation hook
- Refactor share/privacy component with proper error handling
- Remove direct API client usage in favor of hooks

* feat: complete migration from @hey-api/openapi-ts to openapi-react-query

- Migrated all components from useApi compatibility layer to direct React Query hooks
- Added new hooks for participant operations, room meetings, and speaker operations
- Updated all imports from old api module to api-types
- Fixed TypeScript types and API endpoint signatures
- Removed deprecated useApi.ts compatibility layer
- Fixed SourceKind enum values to match OpenAPI spec
- Added @ts-ignore for Zulip endpoints not in OpenAPI spec yet
- Fixed all compilation errors and type issues

* fix: authentication flow with React Query migration

- Fix middleware management in apiClient to properly handle auth tokens
- Update ApiAuthProvider to correctly configure base URL and auth
- Add missing NextAuth API route handler at app/api/auth/[...nextauth]/route.ts
- Remove middleware ejection attempts (not supported by openapi-fetch)
- Use global variables to store current auth token and API URL
- Setup middleware once on initialization instead of repeatedly adding

This fixes the login/logout flow that was broken after migrating from
the useApi compatibility layer to native React Query hooks.

* fix: prevent unauthorized API calls before authentication

- Add global AuthGuard component to handle authentication at layout level
- Make all API query hooks conditional on authentication status
- Define public routes (like /transcripts/new) that don't require auth
- Fix login flow to use NextAuth signIn instead of non-existent /login route
- Prevent 401 errors by waiting for auth token before making API calls

Previously, all routes under (app) were publicly accessible with each page
handling auth individually. Now authentication is enforced globally while
still allowing specific routes to remain public.

* refactor: remove redundant client-side AuthGuard

The authentication is already properly handled by Next.js middleware
in middleware.ts with LOGIN_REQUIRED_PAGES. The middleware approach is
superior as it:
- Provides server-side protection before page loads
- Prevents flash of unauthorized content
- Centralizes auth logic in one place
- Better performance (no client-side JS needed)

Keep the API hooks conditional to prevent 401 errors before token is ready.

* fix: use direct status check for API query authentication

Changed all query hooks to use direct `status === "authenticated"` check
instead of derived `isAuthenticated && !isLoading` to avoid race conditions
where queries might fire before the authentication token is properly set.

This prevents the brief 401 errors that occur on page refresh when the
session is being restored.

* fix: correct content-type header for FormData uploads

Previously, the API client was setting a default Content-Type of application/json
for all requests, which broke file uploads that need multipart/form-data.

Now the client only sets application/json when the body is not FormData,
allowing FormData to automatically set the correct multipart boundary.

* fix: resolve authentication race condition with React Query

Previously, API calls were being made before the auth token was configured,
causing initial 401 errors that would retry with 200 after token setup.

Changes:
- Add global auth readiness tracking in apiClient
- Create useAuthReady hook that checks both session and token state
- Update all API hooks to use isAuthReady instead of just session status
- Add AuthWrapper component at layout level for consistent loading UX
- Show spinner while authentication initializes across all pages

This ensures API calls only fire after authentication is fully configured,
eliminating the 401/retry pattern and improving user experience.

* refactor: clean up api-hooks.ts comments and improve search invalidation

- Remove redundant function category comments (exports are self-explanatory)
- Remove obvious inline comments for query invalidation
- Fix search endpoint invalidation to clear all queries regardless of parameters

* refactor: remove api-types.ts compatibility layer

- Migrated all 29 files from api-types.ts to use reflector-api.d.ts directly
- Removed $SourceKind manual enum in favor of OpenAPI-generated types
- Fixed unrelated Spinner component TypeScript error in AuthWrapper.tsx
- All imports now use: import type { components } from "path/to/reflector-api"
- Deleted api-types.ts file completely

* refactor: rename api-hooks.ts to apiHooks.ts for consistency

- Renamed api-hooks.ts to apiHooks.ts to follow camelCase convention
- Updated all 21 import statements across the codebase
- Maintains consistency with other non-component files (apiClient.tsx, useAuthReady.ts, etc.)
- Follows established naming pattern: PascalCase for components, camelCase for utilities/hooks

* chore: add .playwright-mcp to .gitignore

* refactor: remove SK helper object and use inline type casting in FilterSidebar

Replace the SK (SourceKind) helper object with direct inline type casting
to simplify the code and reduce unnecessary abstraction.

* chore: clean up migration comments from React Query refactoring

- Remove temporary "// Use new React Query hooks" comments
- Remove "// React Query hooks" comments from browse and rooms pages
- Update package.json script name from codegen to openapi for consistency

* refactor: remove Redis dependencies from frontend authentication

- Replace Redis/Redlock with in-memory cache for token management
- Remove @vercel/kv, ioredis, and redlock dependencies from package.json
- Implement simple lock mechanism for concurrent token refresh prevention
- Use Map-based cache with TTL for token storage
- Maintain same authentication flow without external dependencies

This simplifies the infrastructure requirements and removes the need for
Redis while maintaining the same functionality through in-memory caching.

* fix: add staleTime to prevent cross-tab staled data

* fix: remove infinite re-render loop in useSessionAccessToken

The hook was maintaining redundant local state that caused re-renders
on every update, which triggered NextAuth to continuously refetch the
session, resulting in hundreds of POST requests to /api/auth/session.

Simplified the hook to directly return session values without
unnecessary state duplication.

* fix: handle undefined access tokens in auth.ts

Added fallback to empty string for potentially undefined access_token
and refresh_token from NextAuth account object to satisfy
JWTWithAccessToken type requirements.

* Igor/mathieu/frontend openapi react query (#597)

* small typing

* typing fixes

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>

* self-review-fix

* authReady callback simplify

* fix auth

* fix compose

* room detail page fix

* compile fix

* room edit fix

* normalize auth provider

* room edition state granular management

* cover TODOs + cross-tab cache

* session auto refresh blink

* schema generator error type doc

* protect from zombie auth

* clarify access token refresh logic a bit

* remove react-query tab sharing cache

* remove react-query tab sharing cache

* websocket dupe react devmode protection

* invalidate room on room update

* redis cache

* test ts server

* ci randomness

* less edgy config (ci)

* less edgy config (ci)

* less edgy config (ci)

* ci randomness

* ci randomness

* ci randomness

* ci randomness

* less edgy config (ci)

* added vs edited room state cleanup

* file upload real-time state management fix

* prettier auth state ternary

* prettier auth state ternary

* proper api address from env

* INTERVAL_REFRESH_MS

* node version 20 for tests

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* github debug

* CI debug

* CI debug

* nextjs magic

* nextjs magic

* doc

* client-side stale auth soft safety net

---------

Co-authored-by: Mathieu Virbel <mat@meltingrocks.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-09-05 16:01:31 -06:00
0663700a61 fix: align whisper transcriber api with parakeet (#602)
* Documents transcriber api

* Update whisper transcriber api to match parakeet

* Update api transcription spec

* Return 400 for unsupported file type

* Add params to api spec

* Update whisper transcriber implementation to match parakeet
2025-09-05 10:52:14 +02:00
dc82f8bb3b fix: source kind for file processing (#601) 2025-09-04 08:42:21 -06:00
41224a424c docs: move platform-jitsi.md to docs/ directory 2025-09-02 18:28:50 -06:00
dd0089906f fix: replace datetime.utcnow() with datetime.now(tz=timezone.utc) in Jitsi health check 2025-09-02 18:25:55 -06:00
fa559b1970 feat: update and expand video platform tests
- Update existing tests for StrEnum instead of string literals
- Add comprehensive WherebyClient tests with HTTP mocking
- Add webhook event storage tests for participant and recording events
- Add typing overload tests for create_platform_client factory
- Update webhook test paths to new video_platforms router locations
- Fix mock ordering and parameter issues in async tests
- Test all platform client functionality including signature verification
- Verify webhook event storage with proper timestamp handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:16:41 -06:00
52eff2acc0 feat: clean up legacy code and remove excessive documentation
- Remove excessive inline comments from meeting creation flow
- Remove verbose docstrings from simple property methods and basic functions
- Clean up obvious comments like 'Generate JWT tokens', 'Build room URLs'
- Remove unnecessary explanatory comments in platform clients
- Keep only essential documentation for complex logic
- Simplify race condition handling comments
- Remove excessive method documentation for simple operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:08:12 -06:00
7875ec3432 feat: move platform routers to video_platforms folders
- Move Jitsi router from views/jitsi.py to video_platforms/jitsi/router.py
- Move Whereby router from views/whereby.py to video_platforms/whereby/router.py
- Update __init__.py files to export routers from platform packages
- Update app.py imports to use video_platforms instead of views
- Remove old view files after successful migration
- Maintain exact same API endpoint paths (/v1/jitsi, /v1/whereby)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:05:04 -06:00
398be06fad feat: add typing overloads and clean up platform client factory
- Add typing overloads to get_platform_client for JitsiClient and WherebyClient return types
- Add overloads to create_platform_client in factory for better IDE support
- Remove PyJWT fallback imports from views/rooms.py
- Remove platform defaults from CreateRoom and UpdateRoom models
- Clean up legacy whereby fallback code in meeting creation
- Use direct platform client access instead of conditional fallbacks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 18:02:43 -06:00
da700069d9 Add webhook events storage to meetings model
- Add events column as JSON type to meetings table with default empty array
- Add events: List[Dict[str, Any]] field to Meeting model
- Create migration 2890b5104577 for events column and apply successfully
- Add MeetingController helper methods for event storage:
  - add_event() for generic event storage with timestamps
  - participant_joined(), participant_left() for participant tracking
  - recording_started(), recording_stopped() for recording events
  - get_events() for event retrieval
- Update Jitsi webhook endpoints to store events:
  - Store participant join/leave events with data and timestamps
  - Store recording start/stop events from Prosody webhooks
  - Store recording completion events from Jibri finalize script
- Events stored with type, timestamp, and data for webhook history tracking
- Fix linting and formatting issues

Addresses PR feedback point 12: save webhook events in meetings events field
2025-09-02 17:53:35 -06:00
51229a1790 Fix Jitsi client issues and create typed meeting data
- Remove 'transcription': True from JWT features in _generate_jwt
- Replace int(time.time()) with generate_uuid4() for room naming to avoid conflicts
- Replace datetime.utcnow() with datetime.now(tz=timezone.utc) for proper timezone handling
- Create JitsiMeetingData(MeetingData) class with typed extra_data properties
- Update PLATFORM_NAME = VideoPlatform.JITSI to use enum
- Update create_meeting to return JitsiMeetingData instance with proper typing
- Fix get_room_sessions mock to use timezone-aware datetime
- Export JitsiMeetingData from jitsi module

Addresses PR feedback points 4, 5, 6, 10: remove transcription features, use UUID,
proper datetime handling, and typed meeting data
2025-09-02 17:44:04 -06:00
2d2c23f7cc Create video_platforms/whereby structure and WherebyClient
- Create video_platforms/whereby/ directory with __init__.py, client.py, tasks.py
- Implement WherebyClient inheriting from VideoPlatformClient interface
- Move all functions from whereby.py into WherebyClient methods
- Use VideoPlatform.WHEREBY enum for PLATFORM_NAME
- Register WherebyClient in platform registry
- Update factory.py to include S3 bucket config for whereby
- Update worker process to use platform abstraction for get_room_sessions
- Preserve exact API behavior for meeting activity detection
- Maintain AWS S3 configuration handling in WherebyClient
- Fix linting and formatting issues

Addresses PR feedback point 7: implement video_platforms/whereby structure
Note: whereby.py kept for legacy fallback until task 7 cleanup
2025-09-02 17:40:32 -06:00
0acb9cac79 Replace Literal with VideoPlatform StrEnum for platform field
- Create VideoPlatform StrEnum with WHEREBY and JITSI values
- Update rooms.py and meetings.py to use VideoPlatform enum
- Update views/rooms.py and video_platforms/factory.py to use enum values
- Generate new migration with proper server_default='whereby'
- Apply migration successfully with backward compatibility
- Fix linting and formatting issues

Addresses PR feedback point 1: use StrEnum instead of Literal[]
2025-09-02 17:36:14 -06:00
249234238c feat: add comprehensive video platform test suite
- Created complete test coverage for video platform abstraction
- Tests for base classes, JitsiClient implementation, and platform registry
- JWT generation tests with proper mocking and error scenarios
- Webhook signature verification tests (valid/invalid/missing secret)
- Platform factory tests for Jitsi and Whereby configuration
- Registry tests for platform registration and client creation
- Webhook endpoint tests with signature verification and error cases
- Integration tests for rooms endpoint with platform abstraction
- 24 comprehensive test cases covering all video platform functionality
- All tests passing with proper mocking and isolation

🤖 Generated with Claude Code
2025-09-02 16:54:58 -06:00
42a603d5c3 feat: add PyJWT dependency and finalize Jitsi integration
- Added PyJWT>=2.8.0 to pyproject.toml dependencies
- Installed dependency via uv sync successfully
- Verified JWT generation functionality works correctly
- Confirmed platform factory creates JitsiClient instances
- Validated database migrations applied (platform fields available)
- Tested webhook endpoints are registered and functional
- Verified FastAPI app starts without errors with full integration
- All integration tests pass - Jitsi platform fully functional

🤖 Generated with Claude Code
2025-09-02 16:28:44 -06:00
6d2092f950 feat: create comprehensive Jitsi integration documentation
- Added complete end-user configuration guide at server/platform-jitsi.md
- Covers prerequisites, environment setup, and Jitsi Meet configuration
- Includes JWT authentication, Jibri recording, and Prosody event-sync setup
- Provides troubleshooting guide with common issues and solutions
- Documents security best practices and performance optimization
- Includes testing procedures and migration guidance from Whereby
- Ready for production deployment with step-by-step instructions
- Uses environment variable placeholders for security

🤖 Generated with Claude Code
2025-09-02 16:24:47 -06:00
f2bb6aaecb feat: update rooms.py to use video platform abstraction
- Added platform field to Room, CreateRoom, and UpdateRoom models
- Updated rooms_create function to pass platform parameter
- Rewrote rooms_create_meeting to use platform factory pattern
- Added graceful fallback to legacy whereby implementation
- Maintained API compatibility and error handling patterns
- Prepared for multi-platform support (Whereby/Jitsi)

🤖 Generated with Claude Code
2025-09-02 16:21:58 -06:00
2b136ac7b0 feat: create Jitsi webhook endpoints for event handling
- Added comprehensive Jitsi webhook endpoint in views/jitsi.py
- Handles Prosody event-sync events (muc-occupant-joined/left)
- Implements participant counting following whereby.py pattern
- Added Jibri recording completion webhook endpoint
- Includes signature verification with fallback when platform client unavailable
- Registered router in app.py for /v1/jitsi endpoints
- Added health check endpoint for webhook configuration

🤖 Generated with Claude Code
2025-09-02 16:19:54 -06:00
3f4fc26483 feat: register Jitsi platform in video platforms factory and registry
- Added JitsiClient registration to platform registry
- Enables dynamic platform selection through factory pattern
- Factory configuration already supports Jitsi settings
- Platform abstraction layer now supports beide Whereby and Jitsi

🤖 Generated with Claude Code
2025-09-02 16:17:32 -06:00
8e5ef5bca6 feat: implement JitsiClient with JWT authentication
Complete implementation of JitsiClient following VideoPlatformClient interface
with JWT-based room access control and webhook signature verification.

- Add JWT token generation with proper payload structure
- Implement unique room name generation with timestamp
- Create separate user/host JWT tokens with moderator permissions
- Build secure room URLs with embedded JWT parameters
- Add HMAC-SHA256 webhook signature verification for Prosody events
- Implement all abstract methods with Jitsi-specific behavior
- Include comprehensive typing and error handling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:15:49 -06:00
d49fdcb38d feat: create video platforms architecture with Jitsi directory structure
Create complete video platforms abstraction layer following daily.co branch
pattern with Jitsi-specific directory structure.

- Add video_platforms base module with abstract classes
- Create VideoPlatformClient, MeetingData, VideoPlatformConfig interfaces
- Add platform registry system for client management
- Create factory pattern for platform client creation
- Add Jitsi directory structure with __init__.py, tasks.py, client.py
- Configure Jitsi platform in factory with JWT-based authentication

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:14:42 -06:00
d42380abf1 feat: add Jitsi configuration settings
Add comprehensive Jitsi Meet configuration settings to settings.py
following the same pattern as WHEREBY settings.

- Add JITSI_DOMAIN with meet.jit.si default
- Add JITSI_JWT_SECRET for JWT token signing
- Add JITSI_WEBHOOK_SECRET for webhook validation
- Add JITSI_APP_ID, JITSI_JWT_ISSUER, JITSI_JWT_AUDIENCE for JWT configuration
- Follow consistent naming and typing patterns

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:11:00 -06:00
cf64e1a3d9 feat: add database migration for platform field
Generate Alembic migration to add platform column to rooms and meetings
tables enabling multi-platform video conferencing support.

- Add platform column to meeting table with whereby default
- Add platform column to room table with whereby default
- Migration tested successfully with alembic upgrade head

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:10:11 -06:00
ea53ca7000 feat: add platform field to Room and Meeting models
Add platform column to rooms and meetings database tables with Literal typing
to support multiple video conferencing platforms (whereby, jitsi).

- Add platform column to rooms/meetings SQLAlchemy tables with whereby default
- Update Room/Meeting Pydantic models with platform field and Literal typing
- Modify RoomController.add() to accept platform parameter
- Update MeetingController.create() to copy platform from room

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-02 16:08:38 -06:00
Igor Monadical
695d1a957d fix: search-logspam (#593)
* fix: search-logspam

* llm comment

* fix tests

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-08-29 18:55:51 -04:00
84a381220b fix: make webhook secret/url allowing null (#590) 2025-08-29 11:55:18 -06:00
88ed7cfa78 feat(rooms): add webhook for transcript completion (#578)
* feat(rooms): add webhook notifications for transcript completion

- Add webhook_url and webhook_secret fields to rooms table
- Create Celery task with 24-hour retry window using exponential backoff
- Send transcript metadata, diarized text, topics, and summaries via webhook
- Add HMAC signature verification for webhook security
- Add test endpoint POST /v1/rooms/{room_id}/webhook/test
- Update frontend with webhook configuration UI and test button
- Auto-generate webhook secret if not provided
- Trigger webhook after successful file pipeline processing for room recordings

* style: linting

* fix: remove unwanted files

* fix: update openapi gen

* fix: self-review

* docs: add comprehensive webhook documentation

- Document webhook configuration, events, and payloads
- Include transcript.completed and test event examples
- Add security considerations and best practices
- Provide example webhook receiver implementation
- Document retry policy and signature verification

* fix: remove audio_mp3_url from webhook payload

- Remove audio download URL generation from webhook
- Update documentation to reflect the change
- Keep only frontend_url for accessing transcripts

* docs: remove unwanted section

* fix: correct API method name and type imports for rooms

- Fix v1RoomsRetrieve to v1RoomsGet
- Update Room type to RoomDetails throughout frontend
- Fix type imports in useRoomList, RoomList, RoomTable, and RoomCards

* feat: add show/hide toggle for webhook secret field

- Add eye icon button to reveal/hide webhook secret when editing
- Show password dots when webhook secret is hidden
- Reset visibility state when opening/closing dialog
- Only show toggle button when editing existing room with secret

* fix: resolve event loop conflict in webhook test endpoint

- Extract webhook test logic into shared async function
- Call async function directly from FastAPI endpoint
- Keep Celery task wrapper for background processing
- Fixes RuntimeError: event loop already running

* refactor: remove unnecessary Celery task for webhook testing

- Webhook testing is synchronous and provides immediate feedback
- No need for background processing via Celery
- Keep only the async function called directly from API endpoint

* feat: improve webhook test error messages and display

- Show HTTP status code in error messages
- Parse JSON error responses to extract meaningful messages
- Improved UI layout for webhook test results
- Added colored background for success/error states
- Better text wrapping for long error messages

* docs: adjust doc

* fix: review

* fix: update attempts to match close 24h

* fix: add event_id

* fix: changed to uuid, to have new event_id when reprocess.

* style: linting

* fix: alembic revision
2025-08-29 10:07:49 -06:00
6f0c7c1a5e feat(cleanup): add automatic data retention for public instances (#574)
* feat(cleanup): add automatic data retention for public instances

- Add Celery task to clean up anonymous data after configurable retention period
- Delete transcripts, meetings, and orphaned recordings older than retention days
- Only runs when PUBLIC_MODE is enabled to prevent accidental data loss
- Properly removes all associated files (local and S3 storage)
- Add manual cleanup tool for testing and intervention
- Configure retention via PUBLIC_DATA_RETENTION_DAYS setting (default: 7 days)

Fixes #571

* fix: apply pre-commit formatting fixes

* fix: properly delete recording files from storage during cleanup

- Add storage deletion for orphaned recordings in both cleanup task and manual tool
- Delete from storage before removing database records
- Log warnings if storage deletion fails but continue with database cleanup

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* refactor: cleanup_old_data for better logging

* fix: linting

* test: fix meeting cleanup test to not require room controller

- Simplify test by directly inserting meetings into database
- Remove dependency on non-existent rooms_controller.create method
- Tests now pass successfully

* fix: linting

* refactor: simplify cleanup tool to use worker implementation

- Remove duplicate cleanup logic from manual tool
- Use the same _cleanup_old_public_data function from worker
- Remove dry-run feature as requested
- Prevent code duplication and ensure consistency
- Update documentation to reflect changes

* refactor: split cleanup worker into smaller functions

- Move all imports to the top of the file
- Extract cleanup logic into separate functions:
  - cleanup_old_transcripts()
  - cleanup_old_meetings()
  - cleanup_orphaned_recordings()
  - log_cleanup_results()
- Make code more maintainable and testable
- Add days parameter support to Celery task
- Update manual tool to work with refactored code

* feat: add TypedDict typing for cleanup stats

- Add CleanupStats TypedDict for better type safety
- Update all function signatures to use proper typing
- Add return type annotations to _cleanup_old_public_data
- Improves code maintainability and IDE support

* feat: add CASCADE DELETE to meeting_consent foreign key

- Add ondelete="CASCADE" to meeting_consent.meeting_id foreign key
- Generate and apply migration to update existing constraint
- Remove manual consent deletion from cleanup code
- Add unit test to verify CASCADE DELETE behavior

* style: linting

* fix: alembic migration branchpoint

* fix: correct downgrade constraint name in CASCADE DELETE migration

* fix: regenerate CASCADE DELETE migration with proper constraint names

- Delete problematic migration and regenerate with correct names
- Use explicit constraint name in both upgrade and downgrade
- Ensure migration works bidirectionally
- All tests passing including CASCADE DELETE test

* style: linting

* refactor: simplify cleanup to use transcripts as entry point

- Remove orphaned_recordings cleanup (not part of this PR scope)
- Remove separate old_meetings cleanup
- Transcripts are now the main entry point for cleanup
- Associated meetings and recordings are deleted with their transcript
- Use single database connection for all operations
- Update tests to reflect new approach

* refactor: cleanup and rename functions for clarity

- Rename _cleanup_old_public_data to cleanup_old_public_data (make public)
- Rename celery task to cleanup_old_public_data_task for clarity
- Update docstrings and improve code organization
- Remove unnecessary comments and simplify deletion logic
- Update tests to use new function names
- All tests passing

* style: linting\

* style: typing and review

* fix: add transaction on cleanup_single_transcript

* fix: naming

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
2025-08-29 08:47:14 -06:00
9dfd76996f fix: file pipeline status reporting and websocket updates (#589)
* feat: use file pipeline for upload and reprocess action

* fix: make file pipeline correctly report status events

* fix: duplication of transcripts_controller

* fix: tests

* test: fix file upload test

* test: fix reprocess

* fix: also patch from main_file_pipeline

(how patch is done is dependent of file import unfortunately)
2025-08-29 00:58:14 -06:00
f5331a2107 style: more type annotations to parakeet transcriber (#581)
* feat: add comprehensive type annotations to Parakeet transcriber

- Add TypedDict for WordTiming with word, start, end fields
- Add NamedTuple for TimeSegment, AudioSegment, and TranscriptResult
- Add type hints to all generator functions (vad_segment_generator, batch_speech_segments, etc.)
- Add enforce_word_timing_constraints function to prevent word timing overlaps
- Refactor batch_segment_to_audio_segment to reuse pad_audio function

* doc: add note about space
2025-08-28 12:22:07 -06:00
Igor Loskutov
124ce03bf8 fix: Igor/evaluation (#575)
* fix: impossible import error (#563)

* evaluation cli - database events experiment

* hallucinations

* evaluation - unhallucinate

* evaluation - unhallucinate

* roll back reliability link

* self reviewio

* lint

* self review

* add file pipeline to cli

* add file pipeline to cli + sorting

* remove cli tests

* remove ai comments

* comments
2025-08-28 12:07:34 -04:00
7030e0f236 fix: optimize parakeet transcription batching algorithm (#577)
* refactor: optimize transcription batching to accumulate speech segments

- Changed VAD segment generator to return full audio array instead of segments
- Removed segment filtering step
- Modified batch_segments to accumulate maximum speech including silence
- Transcribe larger continuous chunks instead of individual speech segments

* fix: correct transcribe_batch call to use list and fix batch unpacking

* fix: simplify

* fix: remove unused variables

* fix: add typing
2025-08-27 10:32:04 -06:00
9265d201b5 fix: restore previous behavior on live pipeline + audio downscaler (#561)
This commit restore the original behavior with frame cutting. While
silero is used on our gpu for files, look like it's not working great on
the live pipeline. To be investigated, but at the moment, what we keep
is:

- refactored to extract the downscale for further processing in the
pipeline
- remove any downscale implementation from audio_chunker and audio_merge
- removed batching from audio_merge too for now
2025-08-22 10:49:26 -06:00
0c3878ac3c fix: docker image not loading libgomp.so.1 for torch (#560)
On ARM64, the docker iamge crash because torch cannot load libgomp.so.1
-- Look like pytorch does not install the same packages depending the
platform.

AMD64:

/app/.venv/lib/python3.12/site-packages/torch/lib/libgomp.so.1
/app/.venv/lib/python3.12/site-packages/ctranslate2.libs/libgomp-a34b3233.so.1.0.0
/app/.venv/lib/python3.12/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0

ARM64:

/app/.venv/lib/python3.12/site-packages/ctranslate2.libs/libgomp-d22c30c5.so.1.0.0
/app/.venv/lib/python3.12/site-packages/scikit_learn.libs/libgomp-947d5fa1.so.1.0.0
/app/.venv/lib/python3.12/site-packages/torch.libs/libgomp-947d5fa1.so.1.0.0
2025-08-21 16:41:35 -06:00
Igor Loskutov
d70beee51b fix: include shared rooms to search (#558)
* include shared rooms to search

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* remove tests, thats too much
2025-08-21 14:52:29 -04:00
Igor Loskutov
07981e8090 fix: webvtt db null expectation mismatch (#556) 2025-08-20 23:22:41 -06:00
7592679a35 build: separate silero-vad and force torch to be resolved without nvidia (#555)
* build: separate silero-vad and force torch to be resolved without nvidia

* build: also add torchaudio as cpu version
2025-08-20 22:23:48 -06:00
af16178f86 ci: use github-token to get around potential api throttling + rework dockerfile (#554)
* ci: use github-token to get around potential api throttling

* build: put pyannote-audio separate to the project

* fix: now that we have a readme, use it

* build: add UV_NO_CACHE
2025-08-20 21:59:29 -06:00
3ea7f6b7b6 feat: pipeline improvement with file processing, parakeet, silero-vad (#540)
* feat: improve pipeline threading, and transcriber (parakeet and silero vad)

* refactor: remove whisperx, implement parakeet

* refactor: make audio_chunker more smart and wait for speech, instead of fixed frame

* refactor: make audio merge to always downscale the audio to 16k for transcription

* refactor: make the audio transcript modal accepting batches

* refactor: improve type safety and remove prometheus metrics

- Add DiarizationSegment TypedDict for proper diarization typing
- Replace List/Optional with modern Python list/| None syntax
- Remove all Prometheus metrics from TranscriptDiarizationAssemblerProcessor
- Add comprehensive file processing pipeline with parallel execution
- Update processor imports and type annotations throughout
- Implement optimized file pipeline as default in process.py tool

* refactor: convert FileDiarizationProcessor I/O types to BaseModel

Update FileDiarizationInput and FileDiarizationOutput to inherit from
BaseModel instead of plain classes, following the standard pattern
used by other processors in the codebase.

* test: add tests for file transcript and diarization with pytest-recording

* build: add pytest-recording

* feat: add local pyannote for testing

* fix: replace PyAV AudioResampler with torchaudio for reliable audio processing

- Replace problematic PyAV AudioResampler that was causing ValueError: [Errno 22] Invalid argument
- Use torchaudio.functional.resample for robust sample rate conversion
- Optimize processing: skip conversion for already 16kHz mono audio
- Add direct WAV writing with Python wave module for better performance
- Consolidate duplicate downsample checks for cleaner code
- Maintain list[av.AudioFrame] input interface
- Required for Silero VAD which needs 16kHz mono audio

* fix: replace PyAV AudioResampler with torchaudio solution

- Resolves ValueError: [Errno 22] Invalid argument in AudioMergeProcessor
- Replaces problematic PyAV AudioResampler with torchaudio.functional.resample
- Optimizes processing to skip unnecessary conversions when audio is already 16kHz mono
- Uses direct WAV writing with Python's wave module for better performance
- Fixes test_basic_process to disable diarization (pyannote dependency not installed)
- Updates test expectations to match actual processor behavior
- Removes unused pydub dependency from pyproject.toml
- Adds comprehensive TEST_ANALYSIS.md documenting test suite status

* feat: add parameterized test for both diarization modes

- Adds @pytest.mark.parametrize to test_basic_process with enable_diarization=[False, True]
- Test with diarization=False always passes (tests core AudioMergeProcessor functionality)
- Test with diarization=True gracefully skips when pyannote.audio is not installed
- Provides comprehensive test coverage for both pipeline configurations

* fix: resolve pipeline property naming conflict in AudioDiarizationPyannoteProcessor

- Renames 'pipeline' property to 'diarization_pipeline' to avoid conflict with base Processor.pipeline attribute
- Fixes AttributeError: 'property 'pipeline' object has no setter' when set_pipeline() is called
- Updates property usage in _diarize method to use new name
- Now correctly supports pipeline initialization for diarization processing

* fix: add local for pyannote

* test: add diarization test

* fix: resample on audio merge now working

* fix: correctly restore timestamp

* fix: display exception in a threaded processor if that happen

* Update pyproject.toml

* ci: remove option

* ci: update astral-sh/setup-uv

* test: add monadical url for pytest-recording

* refactor: remove previous version

* build: move faster whisper to local dep

* test: fix missing import

* refactor: improve main_file_pipeline organization and error handling

- Move all imports to the top of the file
- Create unified EmptyPipeline class to replace duplicate mock pipeline code
- Remove timeout and fallback logic - let processors handle their own retries
- Fix error handling to raise any exception from parallel tasks
- Add proper type hints and validation for captured results

* fix: wrong function

* fix: remove task_done

* feat: add configurable file processing timeouts for modal processors

- Add TRANSCRIPT_FILE_TIMEOUT setting (default: 600s) for file transcription
- Add DIARIZATION_FILE_TIMEOUT setting (default: 600s) for file diarization
- Replace hardcoded timeout=600 with configurable settings in modal processors
- Allows customization of timeout values via environment variables

* fix: use logger

* fix: worker process meetings now use file pipeline

* fix: topic not gathered

* refactor: remove prepare(), pipeline now work

* refactor: implement many review from Igor

* test: add test for test_pipeline_main_file

* refactor: remove doc

* doc: add doc

* ci: update build to use native arm64 builder

* fix: merge fixes

* refactor: changes from Igor review + add test (not by default) to test gpu modal part

* ci: update to our own runner linux-amd64

* ci: try using suggested mode=min

* fix: update diarizer for latest modal, and use volume

* fix: modal file extension detection

* fix: put the diarizer as A100
2025-08-20 20:07:19 -06:00
Igor Loskutov
009590c080 feat: search frontend (#551)
* feat: better highlight

* feat(search): add long_summary to search vector for improved search results

- Update search vector to include long_summary with weight B (between title A and webvtt C)
- Modify SearchController to fetch long_summary and prioritize its snippets
- Generate snippets from long_summary first (max 2), then from webvtt for remaining slots
- Add comprehensive tests for long_summary search functionality
- Create migration to update search_vector_en column in PostgreSQL

This improves search quality by including summarized content which often contains
key topics and themes that may not be explicitly mentioned in the transcript.

* fix: address code review feedback for search enhancements

- Fix test file inconsistencies by removing references to non-existent model fields
  - Comment out tests for unimplemented features (room_ids, status filters, date ranges)
  - Update tests to only use currently available fields (room_id singular, no room_name/processing_status)
  - Mark future functionality tests with @pytest.mark.skip

- Make snippet counts configurable
  - Add LONG_SUMMARY_MAX_SNIPPETS constant (default: 2)
  - Replace hardcoded value with configurable constant

- Improve error handling consistency in WebVTT parsing
  - Use different log levels for different error types (debug for malformed, warning for decode, error for unexpected)
  - Add catch-all exception handler for unexpected errors
  - Include stack trace for critical errors

All existing tests pass with these changes.

* fix: correct datetime test to include required duration field

* feat: better highlight

* feat: search room names

* feat: acknowledge deleted room

* feat: search filters fix and rank removal

* chore: minor refactoring

* feat: better matches frontend

* chore: self-review (vibe)

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* remove swc (vibe)

* search url query sync (vibe)

* search url query sync (vibe)

* better casts and cap while

* PR review + simplify frontend hook

* pr: remove search db timeouts

* cleanup tests

* tests cleanup

* frontend cleanup

* index declarations

* refactor frontend (self-review)

* fix search pagination

* clear "x" for search input

* pagination max pages fix

* chore: cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* lockfile

* pr review
2025-08-20 20:56:45 -04:00
Igor Loskutov
fe5d344cff diarization cli: throw on modal errors (#553) 2025-08-20 10:21:52 -04:00
Igor Loskutov
86455ce573 chore: type fixes (#544)
* chore: type fixes

* chore: type fixes
2025-08-18 16:31:23 -04:00