Commit Graph

517 Commits

Author SHA1 Message Date
Igor Loskutov
67aea78243 fix: mock Celery broker in idle transcript validation test
test_validation_idle_transcript_with_recording_allowed called
validate_transcript_for_processing without mocking
task_is_scheduled_or_active, which attempts a real Celery
broker connection (AMQP port 5672). Other tests in the same
file already mock this — apply the same pattern here.
2026-02-11 16:26:24 -05:00
Igor Loskutov
2d81321733 fix: processing page auto-redirect after file upload completes
Three fixes for the processing page not redirecting when status becomes "ended":

- Add useWebSockets to processing page so it receives STATUS events
- Remove OAuth2PasswordBearer from auth_none — broke WebSocket endpoints (500)
- Reconnect stale Redis in ws_manager when Celery worker reuses dead event loop
2026-02-11 15:53:21 -05:00
Igor Loskutov
f6201dd378 fix: set source_kind to FILE on audio file upload
The upload endpoint left source_kind as the default LIVE even when
a file was uploaded. Now sets it to FILE when the upload completes.
2026-02-11 13:37:55 -05:00
Igor Loskutov
9f62959069 feat: standalone uses self-hosted GPU service for transcription+diarization
Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.

- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps
2026-02-11 13:37:55 -05:00
Igor Loskutov
0353c23a94 feat: add local pyannote file diarization processor
Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.
2026-02-11 13:37:12 -05:00
Sergey Mankovsky
7372f80530 Allow reprocessing idle multitrack transcripts 2026-02-11 19:29:29 +01:00
Sergey Mankovsky
208361c8cc Fix event loop is closed in Celery workers 2026-02-11 19:29:23 +01:00
Sergey Mankovsky
70d17997ef Fix websocket disconnect errors 2026-02-11 19:29:16 +01:00
adc4c20bf4 feat: add local pyannote file diarization processor (#858)
* feat: add local pyannote file diarization processor

Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.

* fix: standalone setup enables pyannote diarization and public mode

Replace DIARIZATION_ENABLED=false with DIARIZATION_BACKEND=pyannote so
file uploads get speaker diarization out of the box. Add PUBLIC_MODE=true
so unauthenticated users can list/browse transcripts.

* fix: touch env files before first compose_cmd in standalone setup

docker-compose.yml references www/.env.local as env_file, but the
setup script only creates it in step 4. compose_cmd calls in step 3
(Garage) fail on a fresh clone when the file doesn't exist yet.

* feat: standalone uses self-hosted GPU service for transcription+diarization

Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.

- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps

* fix: allow unauthenticated GPU requests when no API key configured

OAuth2PasswordBearer with auto_error=True rejects requests without
Authorization header before apikey_auth can check if auth is needed.

* fix: rename standalone gpu service to cpu to match Dockerfile.cpu usage

* docs: add programmatic testing section and fix gpu->cpu naming in setup script/docs

- Add "Testing programmatically" section to standalone docs with curl commands
  for creating transcript, uploading audio, polling status, checking result
- Fix setup-standalone.sh to reference `cpu` service (was still `gpu` after rename)
- Update all docs references from gpu to cpu service naming

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-02-11 12:41:32 -05:00
Sergey Mankovsky
ec4f356b4c fix: local env setup (#855)
* Ensure rate limit

* Increase nextjs compilation speed

* Fix daily no content handling

* Simplify daily webhook creation

* Fix webhook request validation
2026-02-11 16:59:21 +01:00
Igor Loskutov
2f669dfd89 feat: add custom S3 endpoint support + Garage standalone storage
Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible
backends (Garage, MinIO). When set, uses path-style addressing and
routes all requests to the custom endpoint. When unset, AWS behavior
is unchanged.

- AwsStorage: accept aws_endpoint_url, pass to all 6 session.client()
  calls, configure path-style addressing and base_url
- Fix 4 direct AwsStorage constructions in Hatchet workflows to pass
  endpoint_url (would have silently targeted wrong endpoint)
- Standalone: add Garage service to docker-compose.standalone.yml,
  setup script initializes layout/bucket/key and writes credentials
- Fix compose_cmd() bug: Mac path was missing standalone yml
- garage.toml template with runtime secret generation via openssl
2026-02-10 18:40:23 -05:00
Igor Loskutov
d25d77333c chore: rename to setup-standalone, remove redundant setup-local-llm.sh 2026-02-10 17:51:03 -05:00
Igor Loskutov
608a3805c5 chore: remove completed PRD, rename setup doc, drop response_format tests
- Remove docs/01_ollama.prd.md (implementation complete)
- Rename local-dev-setup.md -> standalone-local-setup.md
- Remove TestResponseFormat class from test_llm_retry.py
2026-02-10 16:14:33 -05:00
Igor Loskutov
663345ece6 feat: local LLM via Ollama + structured output response_format
- Add setup script (scripts/setup-local-llm.sh) for one-command Ollama setup
  Mac: native Metal GPU, Linux: containerized via docker-compose profiles
- Add ollama-gpu and ollama-cpu docker-compose profiles for Linux
- Add extra_hosts to server/hatchet-worker-llm for host.docker.internal
- Pass response_format JSON schema in StructuredOutputWorkflow.extract()
  enabling grammar-based constrained decoding on Ollama/llama.cpp/vLLM/OpenAI
- Update .env.example with Ollama as default LLM option
- Add Ollama PRD and local dev setup docs
2026-02-10 15:55:21 -05:00
15ab2e306e feat: Daily+hatchet default (#846)
* feat: set Daily as default video platform

Daily.co has been battle-tested and is ready to be the default.
Whereby remains available for rooms that explicitly set it.

* feat: enforce Hatchet for all multitrack processing

Remove use_celery option from rooms - multitrack (Daily) recordings
now always use Hatchet workflows. Celery remains for single-track
(Whereby) file processing only.

- Remove use_celery column from room table
- Simplify dispatch logic to always use Hatchet for multitracks
- Update tests to mock Hatchet instead of Celery

* fix: update whereby test to patch Hatchet instead of removed Celery import

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-02-05 18:38:08 -05:00
1ce1c7a910 fix: websocket tests (#825)
* fix websocket tests

* fix: restore timeout and fix celery test infrastructure

- Re-add timeout=1.0 to ws_manager pubsub loop (prevents CPU spin?)
- Use Redis for Celery tests (memory:// broker doesn't support chords)
- Add timeout param to in-memory subscriber mock
- Remove duplicate celery_includes fixture from rtc_ws tests

* fix: remove redundant inline imports in test files

* fix: update gitleaks ignore for moved s3_key line

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-02-05 14:23:31 -05:00
8707c6694a fix: use Daily API recording.duration as master source for transcript duration (#844)
Set duration early in get_participants from Daily API (seconds -> ms),
ensuring post_zulip has the value before mixdown_tracks completes.

Removes redundant duration update from mixdown_tracks.

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-02-03 17:15:03 -05:00
4acde4b7fd fix: increase TIMEOUT_MEDIUM from 2m to 5m for LLM tasks (#843)
Topic detection was timing out on longer transcripts when LLM
responses are slow. This affects detect_chunk_topic and other
LLM-calling tasks that use TIMEOUT_MEDIUM.

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-02-03 16:05:16 -05:00
Igor Loskutov
c05d1f03cd fix: match httpx pad with hatchet audio timeout 2026-01-30 15:56:18 -05:00
Igor Loskutov
23eb1371cb fix: daily multitrack pipeline finalze dependency fix 2026-01-30 15:19:27 -05:00
7fde64e252 feat: modal padding (#837)
* Add Modal backend for audio padding

- Create reflector_padding.py Modal deployment (CPU-based)
- Add PaddingWorkflow with conditional Modal/local backend
- Update deploy-all.sh to include padding deployment

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-30 13:11:51 -05:00
fc3ef6c893 feat: mixdown optional (#834)
* optional mixdown

* optional mixdown

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-23 15:51:18 -05:00
6c175a11d8 feat: brady bunch (#816)
* brady bunch PRD/tasks

* clean dead daily.co code

* brady bunch prototype (no-mistakes)

* brady bunch prototype (no-mistakes) review

* self-review

* daily poll time match (no-mistakes)

* daily poll self-review (no-mistakes)

* daily poll self-review (no-mistakes)

* daily co doc

* cleanup

* cleanup

* self-review (no-mistakes)

* self-review (no-mistakes)

* self-review

* self-review

* ui typefix

* dupe calls error handling proper

* daily reflector data model doc

* logging style fix

* migration merge

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-23 12:33:06 -05:00
6e786b7631 hatchet processing resilence several fixes (#831)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-22 19:03:33 -05:00
c723752b7e feat: set hatchet as default for multitracks (#822)
* set hatchet as default for multitracks

* fix: pipeline routing tests for hatchet-default branch

- Create room with use_celery=True to force Celery backend in tests
- Link transcript to room to enable multitrack pipeline routing
- Fixes test failures caused by missing HATCHET_CLIENT_TOKEN in test env

* Update server/reflector/services/transcript_process.py

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
2026-01-21 17:05:03 -05:00
23d2bc283d fix: ics non-sync bugfix (#823)
* ics non-sync bugfix

* fix tests

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-21 15:10:19 -05:00
c8743fdf1c fix webhook tests (#826)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-21 14:31:20 -05:00
8a293882ad timeout to untighten ws python loop (#821)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-20 16:29:09 -05:00
3b6540eae5 feat: worker affinity (#819)
* worker affinity

* worker affinity

* worker affinity

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-20 12:27:16 -05:00
Igor Loskutov
7ca9cad937 docs 2026-01-09 15:37:15 -05:00
Igor Loskutov
3be7fc0b9a 200ms webm daily doc 2026-01-09 10:54:12 -05:00
407c15299f docs: docs website + installation (#778)
* feat: WIP doc (vibe started and iterated)

* install from scratch docs

* caddyfile.example

* gitignore

* authentik script

* authentik script

* authentik script

* llm doc

* authentik ongoing

* more daily setup logs

* doc website

* gpu self hosted setup guide (no-mistakes)

* doc review round

* doc review round

* doc review round

* update doc site sidebars

* feat(docs): add mermaid diagram support

* docs polishing

* live pipeline doc

* move pipeline dev docs to dev docs location

* doc pr review iteration

* dockerfile healthcheck

* docs/pr-comments

* remove jwt comment

* llm suggestion

* pr comments

* pr comments

* document auto migrations

* cleanup docs

---------

Co-authored-by: Mathieu Virbel <mat@meltingrocks.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-01-06 17:25:02 -05:00
e644d6497b correct workflow name for hatchet (#815)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-29 20:36:36 -05:00
5f7b1ff1a6 fix: webhook parity, pipeline rename, waveform constant fix (#806)
* pipeline fixes: whereby Hatchet preparation

* send_webhook fixes

* cleanup

* self-review

* comment

* webhook util functions: less dependencies

* remove comment

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-26 18:00:32 -05:00
2d0df48767 feat: devex/hatchet log progress track (#813)
* progress track for some hatchet tasks

* remove inline imports / type fixes

* progress callback for mixdown - move to a function

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-26 14:10:21 -05:00
5baa6dd92e pipeline type fixes (#812)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-26 11:28:43 -05:00
bab1e2d537 dynamic mixdown hatchet (#811)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 19:48:16 -05:00
e886153ae1 fix hatchet parallel syntax (#810)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 18:45:06 -05:00
7b352f465e dont always enable hatchet (#809)
* dont always enable hatchet

* fix hatchet worker params

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 18:15:33 -05:00
3cf9757ac2 diarization flow - pralellelize better (#808)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 17:35:43 -05:00
d9d3938192 better hatchet concurrency limits (#807)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 17:26:23 -05:00
594bcc09e0 feat: parallelize hatchet (#804)
* parallelize hatchet (no-mistakes)

* dry (no-mistakes) (minimal)

* comments

* self-review

* self-review

* self-review

* self-review

* pr comments

* pr comments

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-23 11:03:36 -05:00
1dac999b56 feat: durable (#794)
* durable (no-mistakes)

* hatchet no-mistake

* hatchet no-mistake

* hatchet no-mistake, better logging

* remove conductor and add hatchet tests (no-mistakes)

* self-review (no-mistakes)

* hatched logs

* remove shadow mode for hatchet

* and add hatchet processor setting to room

* .

* cleanup

* hatchet init db

* self-review (no-mistakes)

* self-review (no-mistakes)

* hatchet: restore zullip report

* self-review round

* self-review round

* self-review round

* dry hatchet with celery

* dry hatched with celery - 2

* self-review round

* more NES instead of str

* self-review wip

* self-review round

* self-review round

* self-review round

* can_replay cancelled

* add forgotten file

* pr autoreviewer fixes

* better log webhook events

* durable_started return

* migration sync

* latest changes feature parity

* migration merge

* pr review

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-22 12:09:20 -05:00
f580b996ee feat: increase daily recording max duration (#801)
* increate daily recording max duration

* recording end time: 3h min

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-22 09:02:14 -05:00
225783496f feat: consent disable feature (#799)
* consent disable feature (no-mistakes)

* sync migration

* consent disable refactor

* daily backend code refactor

* consent skip feature

* consent skip feature

* no forced whereby recording indicator

* active meetings type precision

* cleanup

* cleanup

---------

Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-22 08:47:07 -05:00
964cd78bb6 feat: identify action items (#790)
* Identify action items

* Add action items to mock summary

* Add action items validator

* Remove final prefix from action items

* Make on action items callback required

* Don't mutation action items response

* Assign action items to none on error

* Use timeout constant

* Exclude action items from transcript list
2025-12-18 21:13:47 +01:00
5f458aa4a7 fix: automatically reprocess daily recordings (#797)
* Automatically reprocess recordings

* Restore the comments

* Remove redundant check

* Fix indent

* Add comment about cyclic import
2025-12-18 21:10:04 +01:00
5f7dfadabd fix: retry on workflow timeout (#798) 2025-12-18 20:49:06 +01:00
c62e3c0753 incorporate daily api undocumented feature (#796)
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-12-17 09:51:55 -05:00
0eba147018 fix: populate room_name in transcript GET endpoint (#783)
Fixes monadical/internalai#14
2025-12-11 12:37:59 +01:00