Commit Graph

50 Commits

Author SHA1 Message Date
e0c71c5548 refactor: migrate to SQLAlchemy 2.0 ORM-style patterns
- Replace __table__.join() with ORM-style joins using select_from().outerjoin()
- Replace __table__.delete() with delete(Model) in tests
- Migrate from **row.__dict__ to model_validate() with ConfigDict(from_attributes=True)
- Add ConfigDict(from_attributes=True) to all Pydantic models for proper SQLAlchemy model conversion
- Update all controller methods to use model_validate() instead of dict unpacking

This completes the migration to SQLAlchemy 2.0 recommended patterns while maintaining
backwards compatibility and improving code consistency.
2025-09-23 16:46:37 -06:00
06639d4d8f feat: migrate SQLAlchemy from 1.4 to 2.0 with ORM style
- Remove encode/databases dependency, use native SQLAlchemy 2.0 async
- Convert all table definitions to Declarative Mapping pattern
- Update all controllers to accept session parameter (dependency injection)
- Convert all queries from Core style to ORM style
- Remove PostgreSQL compatibility checks (PostgreSQL only now)
- Add proper typing for engine and session factories
2025-09-18 12:19:53 -06:00
9dfd76996f fix: file pipeline status reporting and websocket updates (#589)
* feat: use file pipeline for upload and reprocess action

* fix: make file pipeline correctly report status events

* fix: duplication of transcripts_controller

* fix: tests

* test: fix file upload test

* test: fix reprocess

* fix: also patch from main_file_pipeline

(how patch is done is dependent of file import unfortunately)
2025-08-29 00:58:14 -06:00
Igor Loskutov
009590c080 feat: search frontend (#551)
* feat: better highlight

* feat(search): add long_summary to search vector for improved search results

- Update search vector to include long_summary with weight B (between title A and webvtt C)
- Modify SearchController to fetch long_summary and prioritize its snippets
- Generate snippets from long_summary first (max 2), then from webvtt for remaining slots
- Add comprehensive tests for long_summary search functionality
- Create migration to update search_vector_en column in PostgreSQL

This improves search quality by including summarized content which often contains
key topics and themes that may not be explicitly mentioned in the transcript.

* fix: address code review feedback for search enhancements

- Fix test file inconsistencies by removing references to non-existent model fields
  - Comment out tests for unimplemented features (room_ids, status filters, date ranges)
  - Update tests to only use currently available fields (room_id singular, no room_name/processing_status)
  - Mark future functionality tests with @pytest.mark.skip

- Make snippet counts configurable
  - Add LONG_SUMMARY_MAX_SNIPPETS constant (default: 2)
  - Replace hardcoded value with configurable constant

- Improve error handling consistency in WebVTT parsing
  - Use different log levels for different error types (debug for malformed, warning for decode, error for unexpected)
  - Add catch-all exception handler for unexpected errors
  - Include stack trace for critical errors

All existing tests pass with these changes.

* fix: correct datetime test to include required duration field

* feat: better highlight

* feat: search room names

* feat: acknowledge deleted room

* feat: search filters fix and rank removal

* chore: minor refactoring

* feat: better matches frontend

* chore: self-review (vibe)

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* chore: self-review WIP

* remove swc (vibe)

* search url query sync (vibe)

* search url query sync (vibe)

* better casts and cap while

* PR review + simplify frontend hook

* pr: remove search db timeouts

* cleanup tests

* tests cleanup

* frontend cleanup

* index declarations

* refactor frontend (self-review)

* fix search pagination

* clear "x" for search input

* pagination max pages fix

* chore: cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* lockfile

* pr review
2025-08-20 20:56:45 -04:00
2fccd81bcd fix: use structlog not logging (#550) 2025-08-15 15:41:23 -06:00
1311714451 ci: add pre-commit hook and fix linting issues (#545)
* style: deactivate PLC0415 only on part that it's ok

+ re-run pre-commit run --all

* ci: add pre-commit hook

* build: move from yarn to pnpm

* build: move from yarn to pnpm

* build: fix node-version

* ci: install pnpm prior node (?)

* build: update deps and pnpm trying to fix vercel build

* feat: docker www corepack

* style: pre-commit

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-08-14 20:59:54 -06:00
b9d891d342 feat: delete recording with transcript (#547)
* Delete recording with transcript

* Delete confirmation dialog

* Use aws storage abstraction for recording deletion

* Test recording deleted with transcript

* Use get transcript storage

* Fix the test

* Add env vars for recording storage
2025-08-14 20:45:30 +02:00
9eab952c63 feat: postgresql migration and removal of sqlite in pytest (#546)
* feat: remove support of sqlite, 100% postgres

* fix: more migration and make datetime timezone aware in postgres

* fix: change how database is get, and use contextvar to have difference instance between different loops

* test: properly use client fixture that handle lifetime/database connection

* fix: add missing client fixture parameters to test functions

This commit fixes NameError issues where test functions were trying to use
the 'client' fixture but didn't have it as a parameter. The changes include:

1. Added 'client' parameter to test functions in:
   - test_transcripts_audio_download.py (6 functions including fixture)
   - test_transcripts_speaker.py (3 functions)
   - test_transcripts_upload.py (1 function)
   - test_transcripts_rtc_ws.py (2 functions + appserver fixture)

2. Resolved naming conflicts in test_transcripts_rtc_ws.py where both HTTP
   client and StreamClient were using variable name 'client'. StreamClient
   instances are now named 'stream_client' to avoid conflicts.

3. Added missing 'from reflector.app import app' import in rtc_ws tests.

Background: Previously implemented contextvars solution with get_database()
function resolves asyncio event loop conflicts in Celery tasks. The global
client fixture was also created to replace manual AsyncClient instances,
ensuring proper FastAPI application lifecycle management and database
connections during tests.

All tests now pass except for 2 pre-existing RTC WebSocket test failures
related to asyncpg connection issues unrelated to these fixes.

* fix: ensure task are correctly closed

* fix: make separate event loop for the live server

* fix: make default settings pointing at postgres

* build: remove pytest-docker deps out of dev, just tests group
2025-08-14 11:40:52 -06:00
Igor Loskutov
6fb5cb21c2 feat: search backend (#537)
* docs: transient docs

* chore: cleanup

* webvtt WIP

* webvtt field

* chore: webvtt tests comments

* chore: remove useless tests

* feat: search TASK.md

* feat: full text search by title/webvtt

* chore: search api task

* feat: search api

* feat: search API

* chore: rm task md

* chore: roll back unnecessary validators

* chore: pr review WIP

* chore: pr review WIP

* chore: pr review

* chore: top imports

* feat: better lint + ci

* feat: better lint + ci

* feat: better lint + ci

* feat: better lint + ci

* chore: lint

* chore: lint

* fix: db datetime definitions

* fix: flush() params

* fix: update transcript mutability expectation / test

* fix: update transcript mutability expectation / test

* chore: auto review

* chore: new controller extraction

* chore: new controller extraction

* chore: cleanup

* chore: review WIP

* chore: pr WIP

* chore: remove ci lint

* chore: openapi regeneration

* chore: openapi regeneration

* chore: postgres test doc

* fix: .dockerignore for arm binaries

* fix: .dockerignore for arm binaries

* fix: cap test loops

* fix: cap test loops

* fix: cap test loops

* fix: get_transcript_topics

* chore: remove flow.md docs and claude guidance

* chore: remove claude.md db doc

* chore: remove claude.md db doc

* chore: remove claude.md db doc

* chore: remove claude.md db doc
2025-08-13 10:03:38 -04:00
f5b82d44e3 style: use ruff for linting and formatting (#524) 2025-07-31 17:57:43 -06:00
033bd4bc48 feat: improve transcript listing with room_id (#496)
Added a new field in transcript for room_id, and set room_id/meeting_id
in a transcript now. Use this field to list the transcripts. URL is now
very fast.
2025-07-17 15:43:36 -06:00
baf2822b81 fix: browse page timestamps show UTC instead of user local time (#482)
* fix: browse page timestamps show UTC instead of user local time

Closes #474

* fix: tests
2025-07-15 21:17:53 -06:00
f3ae187274 fix: waveform can generate NaN in json database (#481)
* refactor: fixes transcript duration type, NaN in waveform, and prepare for postgres migration

* fix: ensure we don't have NaN in waveform

* fix: missing assertionerror

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* fix: potential empty array

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
2025-07-15 20:46:19 -06:00
9deb717e5b refactor: improve transcript list performance (#480)
* refactor: improve transcript list performance

* fix: sync openapi

* fix: frontend types

* fix: remove drop table _alembic_tmp_meeting

* fix: remove create table too

* fix: remove uq_recording_object_key
2025-07-15 15:10:05 -06:00
Igor Loskutov
6cb46dc64f format 2025-06-19 10:13:57 -04:00
Igor Loskutov
49f4b65f47 lint 2025-06-18 19:47:08 -04:00
Igor Loskutov
9c20e8b9f6 self-review 2025-06-18 17:32:13 -04:00
Igor Loskutov
c23e0e07ef update audio-deleted flow 2025-06-18 15:43:50 -04:00
Igor Loskutov
0c91f5dd59 slop review WIP 2025-06-17 19:26:11 -04:00
Igor Loskutov
7bb2962f94 consent preparation 2025-06-17 12:18:41 -04:00
f43045b41c Add recordings 2025-03-11 15:12:25 +01:00
78a522ac7c Fix reprocessing 2025-03-03 15:15:17 +01:00
326a4688d8 Shared rooms and transcripts 2025-02-10 14:59:18 +01:00
753a2615db Fix transcript reprocessing 2025-01-21 16:52:51 +01:00
4899a3854c Fix paging 2024-10-04 19:20:41 +02:00
39d02ab265 Add transcript source kind 2024-10-04 16:38:29 +02:00
ebb32ee613 Transcriptions filtering and search 2024-10-04 13:28:17 +02:00
6d976044d0 Update zulip message 2024-09-06 16:09:44 +02:00
2381428ae2 Link recorded meeting to a transcript 2024-08-09 17:30:45 +02:00
Sara
1116a0e484 fix folder deletion 2024-01-13 18:33:21 +01:00
Sara
6fe61cd5e3 fix transcript delete 2024-01-13 18:27:12 +01:00
Sara
8ca8fb141b Merge branch 'feat-api-speaker-reassignment' of github.com:Monadical-SAS/reflector into sara/feat-speaker-reassign 2023-12-13 15:47:17 +01:00
a15a63bc8d server: add reviewed field in transcript 2023-12-13 15:42:17 +01:00
Sara
d8c4f29d72 Merge branch 'feat-api-speaker-reassignment' of github.com:Monadical-SAS/reflector into sara/feat-speaker-reassign 2023-12-12 11:48:20 +01:00
37b11fdcb8 server: allow reassign speaker range using participant_id 2023-12-12 10:57:21 +01:00
Sara
c12f396b82 quick fix participant get create 2023-12-08 19:44:47 +01:00
6f3d7df507 server: add API to reassign speakers, and get topics with words 2023-12-06 16:41:18 +01:00
84a1350df7 hotfix/server: fix participants loading on old meetings 2023-12-01 18:18:09 +01:00
7ac6d25217 server: add participant API
Also break out views into different files for easier reading
2023-11-30 19:13:37 +01:00
Sara
2212d440d4 Merge branch 'main' of github.com:Monadical-SAS/reflector into feat-sharing 2023-11-22 19:28:45 +01:00
06b29d9bd4 server: add audio_location and move to external storage if possible 2023-11-22 14:41:40 +01:00
aecc3a0c3b server: first attempts to split post pipeline as single celery tasks 2023-11-22 14:41:40 +01:00
Sara
fe7f1a0e78 Merge branch 'main' of github.com:Monadical-SAS/reflector into feat-sharing 2023-11-21 12:11:58 +01:00
Sara
1fc261a669 try to move waveform to pipeline 2023-11-15 20:30:00 +01:00
226b92c347 www/server: introduce share mode 2023-11-07 12:39:48 +01:00
eb76cd9bcd server/www: rename topic text field to transcript
This aleviate the current issue with vercel deployment
2023-11-02 19:59:56 +01:00
9642d0fd1e hotfix/server: fix duplication of topics 2023-11-02 19:40:45 +01:00
3e7031d031 server: do not remove empty or recording transcripts by default
We should have the possibility to delete or hide them later
2023-11-02 17:39:21 +01:00
07c4d080c2 server: refactor with diarization, logic works 2023-11-02 17:39:21 +01:00
1c42473da0 server: refactor with clearer pipeline instanciation and linked to model 2023-11-02 17:39:21 +01:00