* Increase max connections
* Classify hard and transient hatchet errors
* Fan out partial success
* Force reprocessing of error transcripts
* Stop retrying on 402 payment required
* Avoid httpx/hatchet timeout race
* Add retry wrapper to get_response for for transient errors
* Add retry backoff
* Return falsy results so get_response won't retry on empty string
* Skip error status in on_workflow_failure when transcript already ended
* Fix precommit issues
* Fail step on first fan-out failure instead of skipping
* fix: local processing instead of http server for cpu
* add fallback token if service worker doesnt work
* chore: rename processors to keep processor pattern up to date and allow other processors to be createed and used with env vars
* feat: enable daily co in selfhosted + only schedule tasks when necessary
* feat: refactor aws storage to be platform agnostic + add local pad tracking with slfhosted support
* Upgrade to nextjs 16
* Update sentry config
* Force dynamic for health route
* Upgrade eslint config
* Upgrade jest
* Move types to dev dependencies
* Remove pages from tailwind config
* Replace img with next image
* fix: switch structured output to tool-call with reflection retry
Replace the two-pass StructuredOutputWorkflow (TreeSummarize → acomplete)
with astructured_predict + reflection retry loop for structured LLM output.
- Enable function-calling mode (is_function_calling_model=True)
- Use astructured_predict with PromptTemplate for first attempt
- On ValidationError/parse failure, retry with reflection feedback
- Add min_length=10 to TopicResponse title/summary fields
- Remove dead StructuredOutputWorkflow class and its event types
- Rewrite tests to match new astructured_predict approach
* fix: include texts parameter in astructured_predict prompt
The switch to astructured_predict dropped the texts parameter entirely,
causing summary prompts (participants, subjects, action items) to be
sent without the transcript content. Combine texts with the prompt
before calling astructured_predict, mirroring what TreeSummarize did.
* fix: reduce TopicResponse min_length from 10 to 8 for title and summary
* ci: try fixing spawning job in github
* ci: fix for new arm64 builder
* fix: remove max_tokens cap to support thinking models (Kimi-K2.5)
Thinking/reasoning models like Kimi-K2.5 use output tokens for internal
chain-of-thought before generating the visible response. When max_tokens
was set (500 or 2048), the thinking budget consumed all available tokens,
leaving an empty response — causing TreeSummarize to return '' and
crashing the topic detection retry workflow.
Set max_tokens default to None so the model controls its own output
budget, allowing thinking models to complete both reasoning and response.
Also fix process.py CLI tool to import the Celery worker app before
dispatching tasks, ensuring the Redis broker config is used instead of
Celery's default AMQP transport.
* fix: remove max_tokens=200 cap from final title processor
Same thinking model issue — 200 tokens is especially tight and would be
entirely consumed by chain-of-thought reasoning, producing an empty title.
* Update server/reflector/tools/process.py
Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
* fix: remove max_tokens=500 cap from topic detector processor
Same thinking model fix — this is the original callsite that was failing
with Kimi-K2.5, producing empty TreeSummarize responses.
---------
Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
* feat: add change_seq to transcripts for ingestion support
Add a monotonically increasing change_seq column to the transcript table,
backed by a PostgreSQL sequence and BEFORE INSERT OR UPDATE trigger. Every
mutation gets a new sequence value, letting external ingesters checkpoint
and never miss an update.
* chore: regenerate frontend API types
* Standalone on ubuntu
* fix: use port 3043 for Caddy, disable rooms, remove dead Caddyfile
- Caddy mapped to host port 3043 instead of 80/443 to avoid conflicts
- FEATURE_ROOMS=false in standalone web service
- Removed scripts/standalone/Caddyfile (dead code on this branch)
- Updated all URLs, port checks, docs to reference :3043
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
* feat: remove network_mode host for standalone by fixing WebRTC port range and ICE candidates
aioice hardcodes bind(addr, 0) for ICE UDP sockets, making port mapping
impossible in Docker bridge networking. This adds two env-var-gated
mechanisms to replace network_mode: host:
1. WEBRTC_PORT_RANGE (e.g. "50000-50100"): monkey-patches aioice to bind
UDP sockets within a known range, so they can be mapped in Docker.
2. WEBRTC_HOST (e.g. "host.docker.internal"): rewrites container-internal
IPs in SDP answers with the Docker host's real IP, so LAN clients can
reach the ICE candidates.
Both default to None — no effect on existing deployments.
* fix: do not attempt sidecar to detect host ip, use the standalone script to figure out the external ip and use it
* style: reformat
---------
Co-authored-by: tito <tito@titos-Mac-Studio.local>
* feat: add Caddy reverse proxy with auto HTTPS for LAN access and auto-derive WebSocket URL
Add a Caddy service to docker-compose.standalone.yml that provides automatic
HTTPS with local certificates, enabling secure access to both the frontend
and API from the local network through a single entrypoint.
Backend changes:
- Add ROOT_PATH setting to FastAPI so the API can be served under /api prefix
- Route frontend and API (/server-api) through Caddy reverse proxy
Frontend changes:
- Support WEBSOCKET_URL=auto to derive the WebSocket URL from API_URL
automatically, using the page protocol (http→ws, https→wss) and host
- Make WEBSOCKET_URL env var optional instead of required
* style: pre-commit
* fix: make standalone compose self-contained (drop !reset dependency)
docker-compose.standalone.yml used !reset YAML tags to clear
network_mode and volumes from the base compose. !reset requires
Compose v2.24+ and breaks on Colima + brew-installed compose.
Rewrite as a fully self-contained file with all services defined
directly (server, worker, beat, redis, postgres, web, garage, cpu,
gpu-nvidia, ollama, ollama-cpu). No longer overlays docker-compose.yml.
Update setup-standalone.sh compose_cmd() to use only the standalone
file instead of both files.
* fix: update standalone docs to match self-contained compose usage
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
Without the plugin, `docker compose version` can still exit 0
by falling through to `docker version`. Grep for "Compose" in
the output to reliably detect the plugin.
Without the compose plugin, `docker compose -f ...` produces a
misleading "unknown shorthand flag: 'f'" error instead of telling
the user compose is missing.
- Add rebuild_images() to setup-standalone.sh that runs `compose build`
before `up -d`, with image hash comparison to log whether each service
was rebuilt or unchanged
- Blank HATCHET_CLIENT_SERVER_URL/HOST_PORT in standalone compose since
Hatchet is not started (localhost URLs break after network_mode:host removal)
- Fix grep -qx -> -qxF for ollama model matching (dots in model names)
Replace network_mode:host with standard compose networking for macOS
Docker Desktop compatibility. Add dump_diagnostics() for automatic
failure debugging and docker-exec-based server health checks.
* feat: standalone frontend uses production build instead of dev server
Override web service in docker-compose.standalone.yml to build from
www/Dockerfile (multi-stage: deps → build → standalone runner) instead
of running pnpm dev with bind-mounted source.
* chore: move standalone compose TODO to Huly issue RFFR-46
* fix: add required env vars for standalone production frontend
The standalone web service (node server.js) has no bind-mounted .env
files and the base env_file (.env.local) has API_URL commented out.
Next.js standalone server can't auto-load .env files without them on
disk, so all required vars must be explicit in the compose override.
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
* fix: live flow real-time updates during processing
Three gaps caused transcript pages to require manual refresh after
live recording/processing:
1. UserEventsProvider only invalidated list queries on TRANSCRIPT_STATUS,
not individual transcript queries. Now parses data.id from the event
and calls invalidateTranscript for the specific transcript.
2. useWebSockets had no reconnection logic — a dropped WS silently
killed all real-time updates. Added exponential backoff reconnection
(1s-30s, max 10 retries) with intentional close detection.
3. No polling fallback — WS was single point of failure. Added
conditional refetchInterval to useTranscriptGet that polls every 5s
when transcript status is processing/uploaded/recording.
* feat: type-safe WebSocket events via OpenAPI stub
Define Pydantic models with Literal discriminators for all WS events
(9 transcript-level, 5 user-level). Expose via stub GET endpoints so
pnpm openapi generates TS discriminated unions with exhaustive switch
narrowing on the frontend.
- New server/reflector/ws_events.py with TranscriptWsEvent and UserWsEvent
- Tighten backend emit signatures with TranscriptEventName literal
- Frontend uses generated types, removes Zod schema and manual casts
- Fix pre-existing bugs: waveform mapping, FINAL_LONG_SUMMARY field name
- STATUS value now typed as TranscriptStatus literal end-to-end
- TOPIC handler simplified to query invalidation only (avoids shape mismatch)
* fix: restore TOPIC WS handler with immediate state update
The setTopics call provides instant topic rendering during live
transcription. Query invalidation still follows for full data sync.
* fix: align TOPIC WS event data with GetTranscriptTopic shape
Convert TranscriptTopic → GetTranscriptTopic in pipeline before
emitting, so WS sends segments instead of words. Removes the
`as unknown as Topic` cast on the frontend.
* fix: use NonEmptyString and TranscriptStatus in user WS event models
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
* feat: local LLM via Ollama + structured output response_format
- Add setup script (scripts/setup-local-llm.sh) for one-command Ollama setup
Mac: native Metal GPU, Linux: containerized via docker-compose profiles
- Add ollama-gpu and ollama-cpu docker-compose profiles for Linux
- Add extra_hosts to server/hatchet-worker-llm for host.docker.internal
- Pass response_format JSON schema in StructuredOutputWorkflow.extract()
enabling grammar-based constrained decoding on Ollama/llama.cpp/vLLM/OpenAI
- Update .env.example with Ollama as default LLM option
- Add Ollama PRD and local dev setup docs
* refactor: move Ollama services to docker-compose.standalone.yml
Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone
deployment. Mac devs never use them. Separate file keeps the main
compose clean and provides a natural home for future standalone services
(MinIO, etc.).
Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d
Mac: docker compose up -d (native Ollama, no standalone file needed)
* fix: correct PRD goal (demo/eval, not dev replacement) and processor naming
* chore: remove completed PRD, rename setup doc, drop response_format tests
- Remove docs/01_ollama.prd.md (implementation complete)
- Rename local-dev-setup.md -> standalone-local-setup.md
- Remove TestResponseFormat class from test_llm_retry.py
* docs: resolve standalone storage step — skip S3 for live-only mode
* docs: add TASKS.md for standalone env defaults + setup script work
* feat: add unified setup-local-dev.sh for standalone deployment
Single script takes fresh clone to working Reflector: Ollama/LLM setup,
env file generation (server/.env + www/.env.local), docker compose up,
health checks. No Hatchet in standalone — live pipeline is pure Celery.
* chore: rename to setup-standalone, remove redundant setup-local-llm.sh
* feat: add custom S3 endpoint support + Garage standalone storage
Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible
backends (Garage, MinIO). When set, uses path-style addressing and
routes all requests to the custom endpoint. When unset, AWS behavior
is unchanged.
- AwsStorage: accept aws_endpoint_url, pass to all 6 session.client()
calls, configure path-style addressing and base_url
- Fix 4 direct AwsStorage constructions in Hatchet workflows to pass
endpoint_url (would have silently targeted wrong endpoint)
- Standalone: add Garage service to docker-compose.standalone.yml,
setup script initializes layout/bucket/key and writes credentials
- Fix compose_cmd() bug: Mac path was missing standalone yml
- garage.toml template with runtime secret generation via openssl
* fix: standalone setup — garage config, symlink handling, healthcheck
- garage.toml: fix rpc_secret field name (was secret_transmitter),
move to top-level per Garage v1.1.0 spec, remove unused [s3_web]
- setup-standalone.sh: resolve symlinked .env files before writing,
always ensure all standalone-critical vars via env_set,
fix garage key create/info syntax (positional arg, not --name),
avoid overwriting key secret with "(redacted)" on re-run,
use compose_cmd in health check
- docker-compose.standalone.yml: fix garage healthcheck (no curl in
image, use /garage stats instead)
* docs: update standalone md — symlink handling, garage config template
* docs: add troubleshooting section + port conflict check in setup script
Port conflicts from stale next dev / other worktree processes silently
shadow Docker container port mappings, causing env vars to appear ignored.
* fix: invalidate transcript query on STATUS websocket event
Without this, the processing page never redirects after completion
because the redirect logic watches the REST query data, not the
WebSocket status state.
Cherry-picked from feat-dag-progress (faec509a).
* fix: local env setup (#855)
* Ensure rate limit
* Increase nextjs compilation speed
* Fix daily no content handling
* Simplify daily webhook creation
* Fix webhook request validation
* feat: add local pyannote file diarization processor (#858)
* feat: add local pyannote file diarization processor
Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.
* fix: standalone setup enables pyannote diarization and public mode
Replace DIARIZATION_ENABLED=false with DIARIZATION_BACKEND=pyannote so
file uploads get speaker diarization out of the box. Add PUBLIC_MODE=true
so unauthenticated users can list/browse transcripts.
* fix: touch env files before first compose_cmd in standalone setup
docker-compose.yml references www/.env.local as env_file, but the
setup script only creates it in step 4. compose_cmd calls in step 3
(Garage) fail on a fresh clone when the file doesn't exist yet.
* feat: standalone uses self-hosted GPU service for transcription+diarization
Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.
- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps
* fix: allow unauthenticated GPU requests when no API key configured
OAuth2PasswordBearer with auto_error=True rejects requests without
Authorization header before apikey_auth can check if auth is needed.
* fix: rename standalone gpu service to cpu to match Dockerfile.cpu usage
* docs: add programmatic testing section and fix gpu->cpu naming in setup script/docs
- Add "Testing programmatically" section to standalone docs with curl commands
for creating transcript, uploading audio, polling status, checking result
- Fix setup-standalone.sh to reference `cpu` service (was still `gpu` after rename)
- Update all docs references from gpu to cpu service naming
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
* Fix websocket disconnect errors
* Fix event loop is closed in Celery workers
* Allow reprocessing idle multitrack transcripts
* feat: add local pyannote file diarization processor
Enables file diarization without Modal by using pyannote.audio locally.
Downloads model bundle from S3 on first use, caches locally, patches
config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.
* feat: standalone uses self-hosted GPU service for transcription+diarization
Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service.
Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container.
- Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA)
- Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set
- Add gpu service to docker-compose.standalone.yml with compose env overrides
- Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts)
- Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests
- Remove pyannote-audio from server local deps
* fix: set source_kind to FILE on audio file upload
The upload endpoint left source_kind as the default LIVE even when
a file was uploaded. Now sets it to FILE when the upload completes.
* Add hatchet env vars
* fix: improve port conflict detection and ollama model check in standalone setup
- Filter OrbStack/Docker Desktop PIDs from port conflict check (false positives on Mac)
- Check all infra ports (5432, 6379, 3900, 3903) not just app ports
- Fix ollama model detection to match on name column only
- Document OrbStack and cross-project port conflicts in troubleshooting
* fix: processing page auto-redirect after file upload completes
Three fixes for the processing page not redirecting when status becomes "ended":
- Add useWebSockets to processing page so it receives STATUS events
- Remove OAuth2PasswordBearer from auth_none — broke WebSocket endpoints (500)
- Reconnect stale Redis in ws_manager when Celery worker reuses dead event loop
* fix: mock Celery broker in idle transcript validation test
test_validation_idle_transcript_with_recording_allowed called
validate_transcript_for_processing without mocking
task_is_scheduled_or_active, which attempts a real Celery
broker connection (AMQP port 5672). Other tests in the same
file already mock this — apply the same pattern here.
* Enable server host mode
* Fix webrtc connection
* Remove turbopack
* fix: standalone GPU service connectivity with host network mode
Server runs with network_mode: host and can't resolve Docker service
names. Publish cpu port as 8100 on host, point server at localhost:8100.
Worker stays on bridge network using cpu:8000. Add dummy
TRANSCRIPT_MODAL_API_KEY since OpenAI SDK requires it even for local
endpoints.
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
Co-authored-by: Sergey Mankovsky <sergey@mankovsky.dev>
* feat: set Daily as default video platform
Daily.co has been battle-tested and is ready to be the default.
Whereby remains available for rooms that explicitly set it.
* feat: enforce Hatchet for all multitrack processing
Remove use_celery option from rooms - multitrack (Daily) recordings
now always use Hatchet workflows. Celery remains for single-track
(Whereby) file processing only.
- Remove use_celery column from room table
- Simplify dispatch logic to always use Hatchet for multitracks
- Update tests to mock Hatchet instead of Celery
* fix: update whereby test to patch Hatchet instead of removed Celery import
---------
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>