mirror of https://github.com/Monadical-SAS/reflector.git synced 2026-02-04 09:56:47 +00:00

Go to file

Mathieu Virbel 6f0c7c1a5e feat(cleanup): add automatic data retention for public instances (#574 )

* feat(cleanup): add automatic data retention for public instances

- Add Celery task to clean up anonymous data after configurable retention period
- Delete transcripts, meetings, and orphaned recordings older than retention days
- Only runs when PUBLIC_MODE is enabled to prevent accidental data loss
- Properly removes all associated files (local and S3 storage)
- Add manual cleanup tool for testing and intervention
- Configure retention via PUBLIC_DATA_RETENTION_DAYS setting (default: 7 days)

Fixes #571

* fix: apply pre-commit formatting fixes

* fix: properly delete recording files from storage during cleanup

- Add storage deletion for orphaned recordings in both cleanup task and manual tool
- Delete from storage before removing database records
- Log warnings if storage deletion fails but continue with database cleanup

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* Apply suggestion from @pr-agent-monadical[bot]

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* refactor: cleanup_old_data for better logging

* fix: linting

* test: fix meeting cleanup test to not require room controller

- Simplify test by directly inserting meetings into database
- Remove dependency on non-existent rooms_controller.create method
- Tests now pass successfully

* fix: linting

* refactor: simplify cleanup tool to use worker implementation

- Remove duplicate cleanup logic from manual tool
- Use the same _cleanup_old_public_data function from worker
- Remove dry-run feature as requested
- Prevent code duplication and ensure consistency
- Update documentation to reflect changes

* refactor: split cleanup worker into smaller functions

- Move all imports to the top of the file
- Extract cleanup logic into separate functions:
  - cleanup_old_transcripts()
  - cleanup_old_meetings()
  - cleanup_orphaned_recordings()
  - log_cleanup_results()
- Make code more maintainable and testable
- Add days parameter support to Celery task
- Update manual tool to work with refactored code

* feat: add TypedDict typing for cleanup stats

- Add CleanupStats TypedDict for better type safety
- Update all function signatures to use proper typing
- Add return type annotations to _cleanup_old_public_data
- Improves code maintainability and IDE support

* feat: add CASCADE DELETE to meeting_consent foreign key

- Add ondelete="CASCADE" to meeting_consent.meeting_id foreign key
- Generate and apply migration to update existing constraint
- Remove manual consent deletion from cleanup code
- Add unit test to verify CASCADE DELETE behavior

* style: linting

* fix: alembic migration branchpoint

* fix: correct downgrade constraint name in CASCADE DELETE migration

* fix: regenerate CASCADE DELETE migration with proper constraint names

- Delete problematic migration and regenerate with correct names
- Use explicit constraint name in both upgrade and downgrade
- Ensure migration works bidirectionally
- All tests passing including CASCADE DELETE test

* style: linting

* refactor: simplify cleanup to use transcripts as entry point

- Remove orphaned_recordings cleanup (not part of this PR scope)
- Remove separate old_meetings cleanup
- Transcripts are now the main entry point for cleanup
- Associated meetings and recordings are deleted with their transcript
- Use single database connection for all operations
- Update tests to reflect new approach

* refactor: cleanup and rename functions for clarity

- Rename _cleanup_old_public_data to cleanup_old_public_data (make public)
- Rename celery task to cleanup_old_public_data_task for clarity
- Update docstrings and improve code organization
- Remove unnecessary comments and simplify deletion logic
- Update tests to use new function names
- All tests passing

* style: linting\

* style: typing and review

* fix: add transaction on cleanup_single_transcript

* fix: naming

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

2025-08-29 08:47:14 -06:00

.github

ci: restrict workflow execution to main branch and add concurrency (#586 )

2025-08-28 16:43:17 -06:00

.vscode

Send to zulip

2023-11-20 21:39:33 +07:00

server

feat(cleanup): add automatic data retention for public instances (#574 )

2025-08-29 08:47:14 -06:00

www

feat: search frontend (#551 )

2025-08-20 20:56:45 -04:00

.gitignore

chore: ignore www/.env.[development,production]

2025-08-22 14:41:09 -06:00

.gitleaksignore

fix: cleaned repo, and get git-leaks clean

2025-08-22 11:51:34 -06:00

.pre-commit-config.yaml

fix: cleaned repo, and get git-leaks clean

2025-08-22 11:51:34 -06:00

CHANGELOG.md

chore(main): release 0.7.3 (#565 )

2025-08-22 16:35:52 -06:00

CLAUDE.md

ci: add pre-commit hook and fix linting issues (#545 )

2025-08-14 20:59:54 -06:00

compose.yml

ci: add pre-commit hook and fix linting issues (#545 )

2025-08-14 20:59:54 -06:00

LICENSE

docs: add AGPL-v3 license and update README (#487 )

2025-07-16 08:31:55 -06:00

README.md

doc: update local model readme

2025-08-22 17:50:24 -06:00

README.md

Reflector

Reflector is an AI-powered audio transcription and meeting analysis platform that provides real-time transcription, speaker diarization, translation and summarization for audio content and live meetings. It works 100% with local models (whisper/parakeet, pyannote, seamless-m4t, and your local llm like phi-4).

What is Reflector?

Reflector is a web application that utilizes local models to process audio content, providing:

Real-time Transcription: Convert speech to text using Whisper (multi-language) or Parakeet (English) models
Speaker Diarization: Identify and label different speakers using Pyannote 3.1
Live Translation: Translate audio content in real-time to many languages with Facebook Seamless-M4T
Topic Detection & Summarization: Extract key topics and generate concise summaries using LLMs
Meeting Recording: Create permanent records of meetings with searchable transcripts

Currently we provide modal.com gpu template to deploy.

Background

The project architecture consists of three primary components:

Back-End: Python server that offers an API and data persistence, found in server/.
Front-End: NextJS React project hosted on Vercel, located in www/.
GPU implementation: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations.

It also uses authentik for authentication if activated.

Contribution Guidelines

All new contributions should be made in a separate branch, and goes through a Pull Request. Conventional commits must be used for the PR title and commits.

Usage

To record both your voice and the meeting you're taking part in, you need:

For an in-person meeting, make sure your microphone is in range of all participants.
If using several microphones, make sure to merge the audio feeds into one with an external tool.
For an online meeting, if you do not use headphones, your microphone should be able to pick up both your voice and the audio feed of the meeting.
If you want to use headphones, you need to merge the audio feeds with an external tool.

Permissions:

You may have to add permission for browser's microphone access to record audio in System Preferences -> Privacy & Security -> Microphone System Preferences -> Privacy & Security -> Accessibility. You will be prompted to provide these when you try to connect.

How to Install Blackhole (Mac Only)

This is an external tool for merging the audio feeds as explained in the previous section of this document. Note: We currently do not have instructions for Windows users.

Install Blackhole-2ch (2 ch is enough) by 1 of 2 options listed.
Setup "Aggregate device" to route web audio and local microphone input.
Setup Multi-Output device
Then goto System Preferences -> Sound and choose the devices created from the Output and Input tabs.
The input from your local microphone, the browser run meeting should be aggregated into one virtual stream to listen to and the output should be fed back to your specified output devices if everything is configured properly.

Installation

Note: we're working toward better installation, theses instructions are not accurate for now

Frontend

Start with cd www.

Installation

pnpm install
cp .env_template .env
cp config-template.ts config.ts

Then, fill in the environment variables in .env and the configuration in config.ts as needed. If you are unsure on how to proceed, ask in Zulip.

Run in development mode

pnpm dev

Then (after completing server setup and starting it) open http://localhost:3000 to view it in the browser.

OpenAPI Code Generation

To generate the TypeScript files from the openapi.json file, make sure the python server is running, then run:

pnpm openapi

Backend

Start with cd server.

Run in development mode

docker compose up -d redis

# on the first run, or if the schemas changed
uv run alembic upgrade head

# start the worker
uv run celery -A reflector.worker.app worker --loglevel=info

# start the app
uv run -m reflector.app --reload

Then fill .env with the omitted values (ask in Zulip).

Crontab (optional)

For crontab (only healthcheck for now), start the celery beat (you don't need it on your local dev environment):

uv run celery -A reflector.worker.app beat

GPU models

Currently, reflector heavily use custom local models, deployed on modal. All the micro services are available in server/gpu/

To deploy llm changes to modal, you need:

a modal account
set up the required secret in your modal account (REFLECTOR_GPU_APIKEY)
install the modal cli
connect your modal cli to your account if not done previously
modal run path/to/required/llm

Using local files

You can manually process an audio file by calling the process tool:

uv run python -m reflector.tools.process path/to/audio.wav

Languages

Python 73.9%

TypeScript 24.6%

Shell 0.8%

JavaScript 0.3%

Dockerfile 0.2%