Commit Graph

81 Commits

Author SHA1 Message Date
Igor Monadical
695d1a957d fix: search-logspam (#593)
* fix: search-logspam

* llm comment

* fix tests

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-08-29 18:55:51 -04:00
84a381220b fix: make webhook secret/url allowing null (#590) 2025-08-29 11:55:18 -06:00
88ed7cfa78 feat(rooms): add webhook for transcript completion (#578)
* feat(rooms): add webhook notifications for transcript completion

- Add webhook_url and webhook_secret fields to rooms table
- Create Celery task with 24-hour retry window using exponential backoff
- Send transcript metadata, diarized text, topics, and summaries via webhook
- Add HMAC signature verification for webhook security
- Add test endpoint POST /v1/rooms/{room_id}/webhook/test
- Update frontend with webhook configuration UI and test button
- Auto-generate webhook secret if not provided
- Trigger webhook after successful file pipeline processing for room recordings

* style: linting

* fix: remove unwanted files

* fix: update openapi gen

* fix: self-review

* docs: add comprehensive webhook documentation

- Document webhook configuration, events, and payloads
- Include transcript.completed and test event examples
- Add security considerations and best practices
- Provide example webhook receiver implementation
- Document retry policy and signature verification

* fix: remove audio_mp3_url from webhook payload

- Remove audio download URL generation from webhook
- Update documentation to reflect the change
- Keep only frontend_url for accessing transcripts

* docs: remove unwanted section

* fix: correct API method name and type imports for rooms

- Fix v1RoomsRetrieve to v1RoomsGet
- Update Room type to RoomDetails throughout frontend
- Fix type imports in useRoomList, RoomList, RoomTable, and RoomCards

* feat: add show/hide toggle for webhook secret field

- Add eye icon button to reveal/hide webhook secret when editing
- Show password dots when webhook secret is hidden
- Reset visibility state when opening/closing dialog
- Only show toggle button when editing existing room with secret

* fix: resolve event loop conflict in webhook test endpoint

- Extract webhook test logic into shared async function
- Call async function directly from FastAPI endpoint
- Keep Celery task wrapper for background processing
- Fixes RuntimeError: event loop already running

* refactor: remove unnecessary Celery task for webhook testing

- Webhook testing is synchronous and provides immediate feedback
- No need for background processing via Celery
- Keep only the async function called directly from API endpoint

* feat: improve webhook test error messages and display

- Show HTTP status code in error messages
- Parse JSON error responses to extract meaningful messages
- Improved UI layout for webhook test results
- Added colored background for success/error states
- Better text wrapping for long error messages

* docs: adjust doc

* fix: review

* fix: update attempts to match close 24h

* fix: add event_id

* fix: changed to uuid, to have new event_id when reprocess.

* style: linting

* fix: alembic revision
2025-08-29 10:07:49 -06:00
6f0c7c1a5e feat(cleanup): add automatic data retention for public instances (#574)
* feat(cleanup): add automatic data retention for public instances

- Add Celery task to clean up anonymous data after configurable retention period
- Delete transcripts, meetings, and orphaned recordings older than retention days
- Only runs when PUBLIC_MODE is enabled to prevent accidental data loss
- Properly removes all associated files (local and S3 storage)
- Add manual cleanup tool for testing and intervention
- Configure retention via PUBLIC_DATA_RETENTION_DAYS setting (default: 7 days)

Fixes #571

* fix: apply pre-commit formatting fixes

* fix: properly delete recording files from storage during cleanup

- Add storage deletion for orphaned recordings in both cleanup task and manual tool
- Delete from storage before removing database records
- Log warnings if storage deletion fails but continue with database cleanup

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* refactor: cleanup_old_data for better logging

* fix: linting

* test: fix meeting cleanup test to not require room controller

- Simplify test by directly inserting meetings into database
- Remove dependency on non-existent rooms_controller.create method
- Tests now pass successfully

* fix: linting

* refactor: simplify cleanup tool to use worker implementation

- Remove duplicate cleanup logic from manual tool
- Use the same _cleanup_old_public_data function from worker
- Remove dry-run feature as requested
- Prevent code duplication and ensure consistency
- Update documentation to reflect changes

* refactor: split cleanup worker into smaller functions

- Move all imports to the top of the file
- Extract cleanup logic into separate functions:
  - cleanup_old_transcripts()
  - cleanup_old_meetings()
  - cleanup_orphaned_recordings()
  - log_cleanup_results()
- Make code more maintainable and testable
- Add days parameter support to Celery task
- Update manual tool to work with refactored code

* feat: add TypedDict typing for cleanup stats

- Add CleanupStats TypedDict for better type safety
- Update all function signatures to use proper typing
- Add return type annotations to _cleanup_old_public_data
- Improves code maintainability and IDE support

* feat: add CASCADE DELETE to meeting_consent foreign key

- Add ondelete="CASCADE" to meeting_consent.meeting_id foreign key
- Generate and apply migration to update existing constraint
- Remove manual consent deletion from cleanup code
- Add unit test to verify CASCADE DELETE behavior

* style: linting

* fix: alembic migration branchpoint

* fix: correct downgrade constraint name in CASCADE DELETE migration

* fix: regenerate CASCADE DELETE migration with proper constraint names

- Delete problematic migration and regenerate with correct names
- Use explicit constraint name in both upgrade and downgrade
- Ensure migration works bidirectionally
- All tests passing including CASCADE DELETE test

* style: linting

* refactor: simplify cleanup to use transcripts as entry point

- Remove orphaned_recordings cleanup (not part of this PR scope)
- Remove separate old_meetings cleanup
- Transcripts are now the main entry point for cleanup
- Associated meetings and recordings are deleted with their transcript
- Use single database connection for all operations
- Update tests to reflect new approach

* refactor: cleanup and rename functions for clarity

- Rename _cleanup_old_public_data to cleanup_old_public_data (make public)
- Rename celery task to cleanup_old_public_data_task for clarity
- Update docstrings and improve code organization
- Remove unnecessary comments and simplify deletion logic
- Update tests to use new function names
- All tests passing

* style: linting\

* style: typing and review

* fix: add transaction on cleanup_single_transcript

* fix: naming

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
2025-08-29 08:47:14 -06:00
9dfd76996f fix: file pipeline status reporting and websocket updates (#589)
* feat: use file pipeline for upload and reprocess action

* fix: make file pipeline correctly report status events

* fix: duplication of transcripts_controller

* fix: tests

* test: fix file upload test

* test: fix reprocess

* fix: also patch from main_file_pipeline

(how patch is done is dependent of file import unfortunately)
2025-08-29 00:58:14 -06:00
Igor Loskutov
d70beee51b fix: include shared rooms to search (#558)
* include shared rooms to search

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* tests vibe

* remove tests, thats too much
2025-08-21 14:52:29 -04:00
Igor Loskutov
07981e8090 fix: webvtt db null expectation mismatch (#556) 2025-08-20 23:22:41 -06:00
Igor Loskutov
009590c080 feat: search frontend (#551)
* feat: better highlight

* feat(search): add long_summary to search vector for improved search results

- Update search vector to include long_summary with weight B (between title A and webvtt C)
- Modify SearchController to fetch long_summary and prioritize its snippets
- Generate snippets from long_summary first (max 2), then from webvtt for remaining slots
- Add comprehensive tests for long_summary search functionality
- Create migration to update search_vector_en column in PostgreSQL

This improves search quality by including summarized content which often contains
key topics and themes that may not be explicitly mentioned in the transcript.

* fix: address code review feedback for search enhancements

- Fix test file inconsistencies by removing references to non-existent model fields
  - Comment out tests for unimplemented features (room_ids, status filters, date ranges)
  - Update tests to only use currently available fields (room_id singular, no room_name/processing_status)
  - Mark future functionality tests with @pytest.mark.skip

- Make snippet counts configurable
  - Add LONG_SUMMARY_MAX_SNIPPETS constant (default: 2)
  - Replace hardcoded value with configurable constant

- Improve error handling consistency in WebVTT parsing
  - Use different log levels for different error types (debug for malformed, warning for decode, error for unexpected)
  - Add catch-all exception handler for unexpected errors
  - Include stack trace for critical errors

All existing tests pass with these changes.

* fix: correct datetime test to include required duration field

* feat: better highlight

* feat: search room names

* feat: acknowledge deleted room

* feat: search filters fix and rank removal

* chore: minor refactoring

* feat: better matches frontend

* chore: self-review (vibe)

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* remove swc (vibe)

* search url query sync (vibe)

* search url query sync (vibe)

* better casts and cap while

* PR review + simplify frontend hook

* pr: remove search db timeouts

* cleanup tests

* tests cleanup

* frontend cleanup

* index declarations

* refactor frontend (self-review)

* fix search pagination

* clear "x" for search input

* pagination max pages fix

* chore: cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* lockfile

* pr review
2025-08-20 20:56:45 -04:00
2fccd81bcd fix: use structlog not logging (#550) 2025-08-15 15:41:23 -06:00
1311714451 ci: add pre-commit hook and fix linting issues (#545)
* style: deactivate PLC0415 only on part that it's ok

+ re-run pre-commit run --all

* ci: add pre-commit hook

* build: move from yarn to pnpm

* build: move from yarn to pnpm

* build: fix node-version

* ci: install pnpm prior node (?)

* build: update deps and pnpm trying to fix vercel build

* feat: docker www corepack

* style: pre-commit

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-08-14 20:59:54 -06:00
b9d891d342 feat: delete recording with transcript (#547)
* Delete recording with transcript

* Delete confirmation dialog

* Use aws storage abstraction for recording deletion

* Test recording deleted with transcript

* Use get transcript storage

* Fix the test

* Add env vars for recording storage
2025-08-14 20:45:30 +02:00
9eab952c63 feat: postgresql migration and removal of sqlite in pytest (#546)
* feat: remove support of sqlite, 100% postgres

* fix: more migration and make datetime timezone aware in postgres

* fix: change how database is get, and use contextvar to have difference instance between different loops

* test: properly use client fixture that handle lifetime/database connection

* fix: add missing client fixture parameters to test functions

This commit fixes NameError issues where test functions were trying to use
the 'client' fixture but didn't have it as a parameter. The changes include:

1. Added 'client' parameter to test functions in:
   - test_transcripts_audio_download.py (6 functions including fixture)
   - test_transcripts_speaker.py (3 functions)
   - test_transcripts_upload.py (1 function)
   - test_transcripts_rtc_ws.py (2 functions + appserver fixture)

2. Resolved naming conflicts in test_transcripts_rtc_ws.py where both HTTP
   client and StreamClient were using variable name 'client'. StreamClient
   instances are now named 'stream_client' to avoid conflicts.

3. Added missing 'from reflector.app import app' import in rtc_ws tests.

Background: Previously implemented contextvars solution with get_database()
function resolves asyncio event loop conflicts in Celery tasks. The global
client fixture was also created to replace manual AsyncClient instances,
ensuring proper FastAPI application lifecycle management and database
connections during tests.

All tests now pass except for 2 pre-existing RTC WebSocket test failures
related to asyncpg connection issues unrelated to these fixes.

* fix: ensure task are correctly closed

* fix: make separate event loop for the live server

* fix: make default settings pointing at postgres

* build: remove pytest-docker deps out of dev, just tests group
2025-08-14 11:40:52 -06:00
Igor Loskutov
6fb5cb21c2 feat: search backend (#537)
* docs: transient docs

* chore: cleanup

* webvtt WIP

* webvtt field

* chore: webvtt tests comments

* chore: remove useless tests

* feat: search TASK.md

* feat: full text search by title/webvtt

* chore: search api task

* feat: search api

* feat: search API

* chore: rm task md

* chore: roll back unnecessary validators

* chore: pr review WIP

* chore: pr review WIP

* chore: pr review

* chore: top imports

* feat: better lint + ci

* feat: better lint + ci

* feat: better lint + ci

* feat: better lint + ci

* chore: lint

* chore: lint

* fix: db datetime definitions

* fix: flush() params

* fix: update transcript mutability expectation / test

* fix: update transcript mutability expectation / test

* chore: auto review

* chore: new controller extraction

* chore: new controller extraction

* chore: cleanup

* chore: review WIP

* chore: pr WIP

* chore: remove ci lint

* chore: openapi regeneration

* chore: openapi regeneration

* chore: postgres test doc

* fix: .dockerignore for arm binaries

* fix: .dockerignore for arm binaries

* fix: cap test loops

* fix: cap test loops

* fix: cap test loops

* fix: get_transcript_topics

* chore: remove flow.md docs and claude guidance

* chore: remove claude.md db doc

* chore: remove claude.md db doc

* chore: remove claude.md db doc

* chore: remove claude.md db doc
2025-08-13 10:03:38 -04:00
f5b82d44e3 style: use ruff for linting and formatting (#524) 2025-07-31 17:57:43 -06:00
2a2af5fff2 fix: remove fief out of the source code (#502)
* fix: remove fief out of the source code

* fix: remove corresponding test about migration
2025-07-21 21:09:05 -06:00
033bd4bc48 feat: improve transcript listing with room_id (#496)
Added a new field in transcript for room_id, and set room_id/meeting_id
in a transcript now. Use this field to list the transcripts. URL is now
very fast.
2025-07-17 15:43:36 -06:00
baf2822b81 fix: browse page timestamps show UTC instead of user local time (#482)
* fix: browse page timestamps show UTC instead of user local time

Closes #474

* fix: tests
2025-07-15 21:17:53 -06:00
f3ae187274 fix: waveform can generate NaN in json database (#481)
* refactor: fixes transcript duration type, NaN in waveform, and prepare for postgres migration

* fix: ensure we don't have NaN in waveform

* fix: missing assertionerror

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* fix: potential empty array

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
2025-07-15 20:46:19 -06:00
9deb717e5b refactor: improve transcript list performance (#480)
* refactor: improve transcript list performance

* fix: sync openapi

* fix: frontend types

* fix: remove drop table _alembic_tmp_meeting

* fix: remove create table too

* fix: remove uq_recording_object_key
2025-07-15 15:10:05 -06:00
3d370336cc fix: alembic migrations (#470)
* fix: alembic migrations

This commit fixes all the migrations that was half-backed, due to auto
creation in the db init before. The process was to checkout at the
commit where the migration was created, and use --autogenerate to
regenerate at the state of the migration. 4 migrations was fixed.

It also includes a workflow to ensure migration can applies correctly.

* fix: db migration check

* fix: nullable on meeting_consent

* fix: try fixing tests
2025-06-27 12:03:10 -06:00
9f70f76557 fix: prevent auto creation of database (#469) 2025-06-26 11:52:20 -06:00
Igor Loskutov
351fb54f25 ai review partial fix 2025-06-19 12:49:23 -04:00
Igor Loskutov
6cb46dc64f format 2025-06-19 10:13:57 -04:00
Igor Loskutov
49f4b65f47 lint 2025-06-18 19:47:08 -04:00
Igor Loskutov
9c20e8b9f6 self-review 2025-06-18 17:32:13 -04:00
Igor Loskutov
c23e0e07ef update audio-deleted flow 2025-06-18 15:43:50 -04:00
Igor Loskutov
0c91f5dd59 slop review WIP 2025-06-17 19:26:11 -04:00
Igor Loskutov
91c7c8b83a meeting consent vibe 2025-06-17 16:30:23 -04:00
Igor Loskutov
7bb2962f94 consent preparation 2025-06-17 12:18:41 -04:00
f43045b41c Add recordings 2025-03-11 15:12:25 +01:00
78a522ac7c Fix reprocessing 2025-03-03 15:15:17 +01:00
326a4688d8 Shared rooms and transcripts 2025-02-10 14:59:18 +01:00
8b60c56f41 Remove is active check 2025-02-05 13:00:30 +01:00
0477c8ec44 Fix linting 2025-01-28 13:02:49 +01:00
dd021e9e71 Deactivate meeting when session ends 2025-01-28 12:41:23 +01:00
753a2615db Fix transcript reprocessing 2025-01-21 16:52:51 +01:00
159bd82e1c Create new meeting after previous has ended 2024-12-24 14:18:35 +01:00
2cbcfefb3f Remove viewer room url 2024-10-11 13:50:22 +02:00
4899a3854c Fix paging 2024-10-04 19:20:41 +02:00
ecb91bedc3 Add shared rooms 2024-10-04 17:20:35 +02:00
39d02ab265 Add transcript source kind 2024-10-04 16:38:29 +02:00
ebb32ee613 Transcriptions filtering and search 2024-10-04 13:28:17 +02:00
83857507ea Make sure room names are unique 2024-09-25 13:13:18 +02:00
6d976044d0 Update zulip message 2024-09-06 16:09:44 +02:00
f5f8992b79 Automatic recording by default 2024-09-04 12:57:05 +02:00
5c89a07996 Room config 2024-09-04 12:34:28 +02:00
873cbb0a42 fix: user migration confusion with user_id (#401)
+ added tests
2024-09-03 22:07:36 +02:00
03561453c5 feat: Monadical SSO as replacement of Fief (#393)
* sso: first pass for integrating SSO

still have issue on refreshing
maybe customize the login page, or completely avoid it
make 100% to understand how session server/client are working
need to test with different configuration option (features flags and
requireLogin)

* sso: correctly handle refresh token, with pro-active refresh

Going on interceptors make extra calls to reflector when 401.
We need then to circle back with NextJS backend to update the jwt,
session, then retry the failed request.

I prefered to go pro-active, and ensure the session AND jwt are always
up to date.

A minute before the expiration, we'll try to refresh it. useEffect() of
NextJS cannot be asynchronous, so we cannot wait for the token to be
refreshed.

Every 20s, a minute before the expiration (so 3x in total max) we'll try
to renew. When the accessToken is renewed, the session is updated, and
dispatching up to the client, which updates the useApi().

Therefore, no component will left without a incorrect token.

* fixes: issue with missing key on react-select-search because the default value is undefined

* sso: fixes login/logout button, and avoid seeing the login with authentik page when clicking

* sso: ensure /transcripts/new is not behind protected page, and feature flags page are honored

* sso: fixes user sub->id

* fixes: remove old layout not used

* fixes: set default NEXT_PUBLIC_SITE_URL as localhost

* fixes: removing fief again due to merge with main

* sso: ensure session is always ready before doing any action

* sso: add migration from fief to jwt in server, only from transcripts list

* fixes: user tests

* fixes: compilation issues
2024-09-03 19:27:15 +02:00
f7d2cdea2a Fix linting 2024-08-28 13:44:24 +02:00
2e7ae9a6ab Celery task for posting 2024-08-28 13:42:08 +02:00