Igor Monadical b468427f1b feat: local llm support + standalone-script doc/draft (#856)
* feat: local LLM via Ollama + structured output response_format

- Add setup script (scripts/setup-local-llm.sh) for one-command Ollama setup
  Mac: native Metal GPU, Linux: containerized via docker-compose profiles
- Add ollama-gpu and ollama-cpu docker-compose profiles for Linux
- Add extra_hosts to server/hatchet-worker-llm for host.docker.internal
- Pass response_format JSON schema in StructuredOutputWorkflow.extract()
  enabling grammar-based constrained decoding on Ollama/llama.cpp/vLLM/OpenAI
- Update .env.example with Ollama as default LLM option
- Add Ollama PRD and local dev setup docs

* refactor: move Ollama services to docker-compose.standalone.yml

Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone
deployment. Mac devs never use them. Separate file keeps the main
compose clean and provides a natural home for future standalone services
(MinIO, etc.).

Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
Mac: docker compose up -d (native Ollama, no standalone file needed)

* fix: correct PRD goal (demo/eval, not dev replacement) and processor naming

* chore: remove completed PRD, rename setup doc, drop response_format tests

- Remove docs/01_ollama.prd.md (implementation complete)
- Rename local-dev-setup.md -> standalone-local-setup.md
- Remove TestResponseFormat class from test_llm_retry.py

* docs: resolve standalone storage step — skip S3 for live-only mode

* docs: add TASKS.md for standalone env defaults + setup script work

* feat: add unified setup-local-dev.sh for standalone deployment

Single script takes fresh clone to working Reflector: Ollama/LLM setup,
env file generation (server/.env + www/.env.local), docker compose up,
health checks. No Hatchet in standalone — live pipeline is pure Celery.

* chore: rename to setup-standalone, remove redundant setup-local-llm.sh

* feat: add custom S3 endpoint support + Garage standalone storage

Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible
backends (Garage, MinIO). When set, uses path-style addressing and
routes all requests to the custom endpoint. When unset, AWS behavior
is unchanged.

- AwsStorage: accept aws_endpoint_url, pass to all 6 session.client()
  calls, configure path-style addressing and base_url
- Fix 4 direct AwsStorage constructions in Hatchet workflows to pass
  endpoint_url (would have silently targeted wrong endpoint)
- Standalone: add Garage service to docker-compose.standalone.yml,
  setup script initializes layout/bucket/key and writes credentials
- Fix compose_cmd() bug: Mac path was missing standalone yml
- garage.toml template with runtime secret generation via openssl

* fix: standalone setup — garage config, symlink handling, healthcheck

- garage.toml: fix rpc_secret field name (was secret_transmitter),
  move to top-level per Garage v1.1.0 spec, remove unused [s3_web]
- setup-standalone.sh: resolve symlinked .env files before writing,
  always ensure all standalone-critical vars via env_set,
  fix garage key create/info syntax (positional arg, not --name),
  avoid overwriting key secret with "(redacted)" on re-run,
  use compose_cmd in health check
- docker-compose.standalone.yml: fix garage healthcheck (no curl in
  image, use /garage stats instead)

* docs: update standalone md — symlink handling, garage config template

* docs: add troubleshooting section + port conflict check in setup script

Port conflicts from stale next dev / other worktree processes silently
shadow Docker container port mappings, causing env vars to appear ignored.

* fix: invalidate transcript query on STATUS websocket event

Without this, the processing page never redirects after completion
because the redirect logic watches the REST query data, not the
WebSocket status state.

Cherry-picked from feat-dag-progress (faec509a).

* fix: local env setup (#855)

* Ensure rate limit

* Increase nextjs compilation speed

* Fix daily no content handling

* Simplify daily webhook creation

* Fix webhook request validation

* feat: add local pyannote file diarization processor (#858)

* feat: add local pyannote file diarization processor

Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.

* fix: standalone setup enables pyannote diarization and public mode

Replace DIARIZATION_ENABLED=false with DIARIZATION_BACKEND=pyannote so
file uploads get speaker diarization out of the box. Add PUBLIC_MODE=true
so unauthenticated users can list/browse transcripts.

* fix: touch env files before first compose_cmd in standalone setup

docker-compose.yml references www/.env.local as env_file, but the
setup script only creates it in step 4. compose_cmd calls in step 3
(Garage) fail on a fresh clone when the file doesn't exist yet.

* feat: standalone uses self-hosted GPU service for transcription+diarization

Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.

- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps

* fix: allow unauthenticated GPU requests when no API key configured

OAuth2PasswordBearer with auto_error=True rejects requests without
Authorization header before apikey_auth can check if auth is needed.

* fix: rename standalone gpu service to cpu to match Dockerfile.cpu usage

* docs: add programmatic testing section and fix gpu->cpu naming in setup script/docs

- Add "Testing programmatically" section to standalone docs with curl commands
  for creating transcript, uploading audio, polling status, checking result
- Fix setup-standalone.sh to reference `cpu` service (was still `gpu` after rename)
- Update all docs references from gpu to cpu service naming

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>

* Fix websocket disconnect errors

* Fix event loop is closed in Celery workers

* Allow reprocessing idle multitrack transcripts

* feat: add local pyannote file diarization processor

Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.

* feat: standalone uses self-hosted GPU service for transcription+diarization

Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.

- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps

* fix: set source_kind to FILE on audio file upload

The upload endpoint left source_kind as the default LIVE even when
a file was uploaded. Now sets it to FILE when the upload completes.

* Add hatchet env vars

* fix: improve port conflict detection and ollama model check in standalone setup

- Filter OrbStack/Docker Desktop PIDs from port conflict check (false positives on Mac)
- Check all infra ports (5432, 6379, 3900, 3903) not just app ports
- Fix ollama model detection to match on name column only
- Document OrbStack and cross-project port conflicts in troubleshooting

* fix: processing page auto-redirect after file upload completes

Three fixes for the processing page not redirecting when status becomes "ended":

- Add useWebSockets to processing page so it receives STATUS events
- Remove OAuth2PasswordBearer from auth_none — broke WebSocket endpoints (500)
- Reconnect stale Redis in ws_manager when Celery worker reuses dead event loop

* fix: mock Celery broker in idle transcript validation test

test_validation_idle_transcript_with_recording_allowed called
validate_transcript_for_processing without mocking
task_is_scheduled_or_active, which attempts a real Celery
broker connection (AMQP port 5672). Other tests in the same
file already mock this — apply the same pattern here.

* Enable server host mode

* Fix webrtc connection

* Remove turbopack

* fix: standalone GPU service connectivity with host network mode

Server runs with network_mode: host and can't resolve Docker service
names. Publish cpu port as 8100 on host, point server at localhost:8100.
Worker stays on bridge network using cpu:8000. Add dummy
TRANSCRIPT_MODAL_API_KEY since OpenAI SDK requires it even for local
endpoints.

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
Co-authored-by: Sergey Mankovsky <sergey@mankovsky.dev>
2026-02-11 18:20:36 -05:00
2025-12-11 13:30:36 +01:00
2023-11-20 21:39:33 +07:00
2026-01-30 13:11:51 -05:00
2026-02-05 14:23:31 -05:00
2025-12-10 13:57:13 -05:00
2025-11-25 17:05:46 -05:00

image

Reflector

Reflector is an AI-powered audio transcription and meeting analysis platform that provides real-time transcription, speaker diarization, translation and summarization for audio content and live meetings. It works 100% with local models (whisper/parakeet, pyannote, seamless-m4t, and your local llm like phi-4).

Tests License: MIT

image image image image

What is Reflector?

Reflector is a web application that utilizes local models to process audio content, providing:

  • Real-time Transcription: Convert speech to text using Whisper (multi-language) or Parakeet (English) models
  • Speaker Diarization: Identify and label different speakers using Pyannote 3.1
  • Live Translation: Translate audio content in real-time to many languages with Facebook Seamless-M4T
  • Topic Detection & Summarization: Extract key topics and generate concise summaries using LLMs
  • Meeting Recording: Create permanent records of meetings with searchable transcripts

Currently we provide modal.com gpu template to deploy.

Background

The project architecture consists of three primary components:

  • Back-End: Python server that offers an API and data persistence, found in server/.
  • Front-End: NextJS React project hosted on Vercel, located in www/.
  • GPU implementation: Providing services such as speech-to-text transcription, topic generation, automated summaries, and translations.

It also uses authentik for authentication if activated.

Contribution Guidelines

All new contributions should be made in a separate branch, and goes through a Pull Request. Conventional commits must be used for the PR title and commits.

Usage

To record both your voice and the meeting you're taking part in, you need:

  • For an in-person meeting, make sure your microphone is in range of all participants.
  • If using several microphones, make sure to merge the audio feeds into one with an external tool.
  • For an online meeting, if you do not use headphones, your microphone should be able to pick up both your voice and the audio feed of the meeting.
  • If you want to use headphones, you need to merge the audio feeds with an external tool.

Permissions:

You may have to add permission for browser's microphone access to record audio in System Preferences -> Privacy & Security -> Microphone System Preferences -> Privacy & Security -> Accessibility. You will be prompted to provide these when you try to connect.

How to Install Blackhole (Mac Only)

This is an external tool for merging the audio feeds as explained in the previous section of this document. Note: We currently do not have instructions for Windows users.

  • Install Blackhole-2ch (2 ch is enough) by 1 of 2 options listed.
  • Setup "Aggregate device" to route web audio and local microphone input.
  • Setup Multi-Output device
  • Then goto System Preferences -> Sound and choose the devices created from the Output and Input tabs.
  • The input from your local microphone, the browser run meeting should be aggregated into one virtual stream to listen to and the output should be fed back to your specified output devices if everything is configured properly.

Installation

Note: we're working toward better installation, theses instructions are not accurate for now

Frontend

Start with cd www.

Installation

pnpm install
cp .env.example .env

Then, fill in the environment variables in .env as needed. If you are unsure on how to proceed, ask in Zulip.

Run in development mode

pnpm dev

Then (after completing server setup and starting it) open http://localhost:3000 to view it in the browser.

OpenAPI Code Generation

To generate the TypeScript files from the openapi.json file, make sure the python server is running, then run:

pnpm openapi

Backend

Start with cd server.

Run in development mode

docker compose up -d redis

# on the first run, or if the schemas changed
uv run alembic upgrade head

# start the worker
uv run celery -A reflector.worker.app worker --loglevel=info

# start the app
uv run -m reflector.app --reload

Then fill .env with the omitted values (ask in Zulip).

Crontab (optional)

For crontab (only healthcheck for now), start the celery beat (you don't need it on your local dev environment):

uv run celery -A reflector.worker.app beat

GPU models

Currently, reflector heavily use custom local models, deployed on modal. All the micro services are available in server/gpu/

To deploy llm changes to modal, you need:

  • a modal account
  • set up the required secret in your modal account (REFLECTOR_GPU_APIKEY)
  • install the modal cli
  • connect your modal cli to your account if not done previously
  • modal run path/to/required/llm

Using local files

You can manually process an audio file by calling the process tool:

uv run python -m reflector.tools.process path/to/audio.wav

Reprocessing any transcription

uv run -m reflector.tools.process_transcript 81ec38d1-9dd7-43d2-b3f8-51f4d34a07cd --sync

Build-time env variables

Next.js projects are more used to NEXT_PUBLIC_ prefixed buildtime vars. We don't have those for the reason we need to serve a ccustomizable prebuild docker container.

Instead, all the variables are runtime. Variables needed to the frontend are served to the frontend app at initial render.

It also means there's no static prebuild and no static files to serve for js/html.

Feature Flags

Reflector uses environment variable-based feature flags to control application functionality. These flags allow you to enable or disable features without code changes.

Available Feature Flags

Feature Flag Environment Variable
requireLogin FEATURE_REQUIRE_LOGIN
privacy FEATURE_PRIVACY
browse FEATURE_BROWSE
sendToZulip FEATURE_SEND_TO_ZULIP
rooms FEATURE_ROOMS

Setting Feature Flags

Feature flags are controlled via environment variables using the pattern FEATURE_{FEATURE_NAME} where {FEATURE_NAME} is the SCREAMING_SNAKE_CASE version of the feature name.

Examples:

# Enable user authentication requirement
FEATURE_REQUIRE_LOGIN=true

# Disable browse functionality
FEATURE_BROWSE=false

# Enable Zulip integration
FEATURE_SEND_TO_ZULIP=true
Description
100% local ML models for meeting transcription and analysis
Readme MIT 90 MiB
Languages
Python 73.7%
TypeScript 21.5%
Shell 4.2%
Dockerfile 0.2%
JavaScript 0.2%
Other 0.1%