feat: standalone frontend uses production build instead of dev server

Override web service in docker-compose.standalone.yml to build from www/Dockerfile (multi-stage: deps → build → standalone runner) instead of running pnpm dev with bind-mounted source.
fix: standalone GPU service connectivity with host network mode
2026-05-08 20:15:18 +00:00 · 2026-02-12 11:27:45 -05:00 · 2026-02-11 18:10:20 -05:00 · 2026-02-11 23:15:32 +01:00 · 2026-02-11 23:11:46 +01:00 · 2026-02-11 23:11:31 +01:00
6 changed files with 32 additions and 204 deletions
--- a/COMPOSE_STANDALONE_TODO.md
+++ b/COMPOSE_STANDALONE_TODO.md
@@ -0,0 +1,10 @@
+# Standalone Compose: Remaining Production Work
+
+## Server/worker/beat: remove host network mode + bind mounts
+
+Currently `server` uses `network_mode: host` and all three services bind-mount `./server/:/app/`. For full standalone prod:
+
+- Remove `network_mode: host` from server
+- Remove bind-mount volumes from server, worker, beat (use built image only)
+- Update `compose_cmd` in `setup-standalone.sh` to not rely on host network
+- Change `SERVER_API_URL` from `http://host.docker.internal:1250` to `http://server:1250` (server reachable via Docker network once off host mode)
--- a/docker-compose.standalone.yml
+++ b/docker-compose.standalone.yml
@@ -76,6 +76,15 @@ services:
      DIARIZATION_BACKEND: modal
      DIARIZATION_URL: http://cpu:8000

+  web:
+    image: reflector-frontend-standalone
+    build:
+      context: ./www
+    command: ["node", "server.js"]
+    volumes: !reset []
+    environment:
+      NODE_ENV: production
+
  cpu:
    build:
      context: ./gpu/self_hosted
--- a/server/reflector/hatchet/workflows/daily_multitrack_pipeline.py
+++ b/server/reflector/hatchet/workflows/daily_multitrack_pipeline.py
@@ -720,6 +720,7 @@ async def detect_topics(input: PipelineInput, ctx: Context) -> TopicsResult:
                chunk_text=chunk["text"],
                timestamp=chunk["timestamp"],
                duration=chunk["duration"],
+                words=chunk["words"],
            )
        )
        for chunk in chunks
@@ -731,41 +732,31 @@ async def detect_topics(input: PipelineInput, ctx: Context) -> TopicsResult:
        TopicChunkResult(**result[TaskName.DETECT_CHUNK_TOPIC]) for result in results
    ]

-    # Build index-to-words map from local chunks (words not in child workflow results)
-    chunks_by_index = {chunk["index"]: chunk["words"] for chunk in chunks}
-
    async with fresh_db_connection():
        transcript = await transcripts_controller.get_by_id(input.transcript_id)
        if not transcript:
            raise ValueError(f"Transcript {input.transcript_id} not found")

-        # Clear topics for idempotency on retry (each topic gets a fresh UUID,
-        # so upsert_topic would append duplicates without this)
-        await transcripts_controller.update(transcript, {"topics": []})
-
        for chunk in topic_chunks:
-            chunk_words = chunks_by_index[chunk.chunk_index]
            topic = TranscriptTopic(
                title=chunk.title,
                summary=chunk.summary,
                timestamp=chunk.timestamp,
-                transcript=" ".join(w.text for w in chunk_words),
-                words=chunk_words,
+                transcript=" ".join(w.text for w in chunk.words),
+                words=chunk.words,
            )
            await transcripts_controller.upsert_topic(transcript, topic)
            await append_event_and_broadcast(
                input.transcript_id, transcript, "TOPIC", topic, logger=logger
            )

-    # Words omitted from TopicsResult — already persisted to DB above.
-    # Downstream tasks that need words refetch from DB.
    topics_list = [
        TitleSummary(
            title=chunk.title,
            summary=chunk.summary,
            timestamp=chunk.timestamp,
            duration=chunk.duration,
-            transcript=TranscriptType(words=[]),
+            transcript=TranscriptType(words=chunk.words),
        )
        for chunk in topic_chunks
    ]
@@ -851,8 +842,9 @@ async def extract_subjects(input: PipelineInput, ctx: Context) -> SubjectsResult
    ctx.log(f"extract_subjects: starting for transcript_id={input.transcript_id}")

    topics_result = ctx.task_output(detect_topics)
+    topics = topics_result.topics

-    if not topics_result.topics:
+    if not topics:
        ctx.log("extract_subjects: no topics, returning empty subjects")
        return SubjectsResult(
            subjects=[],
@@ -865,13 +857,11 @@ async def extract_subjects(input: PipelineInput, ctx: Context) -> SubjectsResult
    # sharing DB connections and LLM HTTP pools across forks
    from reflector.db.transcripts import transcripts_controller  # noqa: PLC0415
    from reflector.llm import LLM  # noqa: PLC0415
-    from reflector.processors.types import words_to_segments  # noqa: PLC0415

    async with fresh_db_connection():
        transcript = await transcripts_controller.get_by_id(input.transcript_id)

-        # Build transcript text from DB topics (words omitted from task output
-        # to reduce Hatchet payload size — refetch from DB where they were persisted)
+        # Build transcript text from topics (same logic as TranscriptFinalSummaryProcessor)
        speakermap = {}
        if transcript and transcript.participants:
            speakermap = {
@@ -881,8 +871,8 @@ async def extract_subjects(input: PipelineInput, ctx: Context) -> SubjectsResult
            }

        text_lines = []
-        for db_topic in transcript.topics:
-            for segment in words_to_segments(db_topic.words):
+        for topic in topics:
+            for segment in topic.transcript.as_segments():
                name = speakermap.get(segment.speaker, f"Speaker {segment.speaker}")
                text_lines.append(f"{name}: {segment.text}")

--- a/server/reflector/hatchet/workflows/models.py
+++ b/server/reflector/hatchet/workflows/models.py
@@ -95,6 +95,7 @@ class TopicChunkResult(BaseModel):
    summary: str
    timestamp: float
    duration: float
+    words: list[Word]


 class TopicsResult(BaseModel):
--- a/server/reflector/hatchet/workflows/topic_chunk_processing.py
+++ b/server/reflector/hatchet/workflows/topic_chunk_processing.py
@@ -20,6 +20,7 @@ from reflector.hatchet.constants import LLM_RATE_LIMIT_KEY, TIMEOUT_MEDIUM
 from reflector.hatchet.workflows.models import TopicChunkResult
 from reflector.logger import logger
 from reflector.processors.prompts import TOPIC_PROMPT
+from reflector.processors.types import Word


 class TopicChunkInput(BaseModel):
@@ -29,6 +30,7 @@ class TopicChunkInput(BaseModel):
    chunk_text: str
    timestamp: float
    duration: float
+    words: list[Word]


 hatchet = HatchetClientManager.get_client()
@@ -97,4 +99,5 @@ async def detect_chunk_topic(input: TopicChunkInput, ctx: Context) -> TopicChunk
        summary=response.summary,
        timestamp=input.timestamp,
        duration=input.duration,
+        words=input.words,
    )
--- a/server/tests/test_hatchet_payload_thinning.py
+++ b/server/tests/test_hatchet_payload_thinning.py
@@ -1,185 +0,0 @@
-"""
-Tests for Hatchet payload thinning optimizations.
-
-Verifies that:
-1. TopicChunkInput no longer carries words
-2. TopicChunkResult no longer carries words
-3. words_to_segments() matches Transcript.as_segments(is_multitrack=False) — behavioral equivalence
-   for the extract_subjects refactoring
-4. TopicsResult can be constructed with empty transcript words
-"""
-
-from reflector.hatchet.workflows.models import TopicChunkResult
-from reflector.hatchet.workflows.topic_chunk_processing import TopicChunkInput
-from reflector.processors.types import Word
-
-
-def _make_words(speaker: int = 0, start: float = 0.0) -> list[Word]:
-    return [
-        Word(text="Hello", start=start, end=start + 0.5, speaker=speaker),
-        Word(text=" world.", start=start + 0.5, end=start + 1.0, speaker=speaker),
-    ]
-
-
-class TestTopicChunkInputNoWords:
-    """TopicChunkInput must not have a words field."""
-
-    def test_no_words_field(self):
-        assert "words" not in TopicChunkInput.model_fields
-
-    def test_construction_without_words(self):
-        inp = TopicChunkInput(
-            chunk_index=0, chunk_text="Hello world.", timestamp=0.0, duration=1.0
-        )
-        assert inp.chunk_index == 0
-        assert inp.chunk_text == "Hello world."
-
-    def test_rejects_words_kwarg(self):
-        """Passing words= should raise a validation error (field doesn't exist)."""
-        import pydantic
-
-        try:
-            TopicChunkInput(
-                chunk_index=0,
-                chunk_text="text",
-                timestamp=0.0,
-                duration=1.0,
-                words=_make_words(),
-            )
-            # If pydantic is configured to ignore extra, this won't raise.
-            # Verify the field is still absent from the model.
-            assert "words" not in TopicChunkInput.model_fields
-        except pydantic.ValidationError:
-            pass  # Expected
-
-
-class TestTopicChunkResultNoWords:
-    """TopicChunkResult must not have a words field."""
-
-    def test_no_words_field(self):
-        assert "words" not in TopicChunkResult.model_fields
-
-    def test_construction_without_words(self):
-        result = TopicChunkResult(
-            chunk_index=0,
-            title="Test",
-            summary="Summary",
-            timestamp=0.0,
-            duration=1.0,
-        )
-        assert result.title == "Test"
-        assert result.chunk_index == 0
-
-    def test_serialization_roundtrip(self):
-        """Serialized TopicChunkResult has no words key."""
-        result = TopicChunkResult(
-            chunk_index=0,
-            title="Test",
-            summary="Summary",
-            timestamp=0.0,
-            duration=1.0,
-        )
-        data = result.model_dump()
-        assert "words" not in data
-        reconstructed = TopicChunkResult(**data)
-        assert reconstructed == result
-
-
-class TestWordsToSegmentsBehavioralEquivalence:
-    """words_to_segments() must produce same output as Transcript.as_segments(is_multitrack=False).
-
-    This ensures the extract_subjects refactoring (from task output topic.transcript.as_segments()
-    to words_to_segments(db_topic.words)) preserves identical behavior.
-    """
-
-    def test_single_speaker(self):
-        from reflector.processors.types import Transcript as TranscriptType
-        from reflector.processors.types import words_to_segments
-
-        words = _make_words(speaker=0)
-        direct = words_to_segments(words)
-        via_transcript = TranscriptType(words=words).as_segments(is_multitrack=False)
-
-        assert len(direct) == len(via_transcript)
-        for d, v in zip(direct, via_transcript):
-            assert d.text == v.text
-            assert d.speaker == v.speaker
-            assert d.start == v.start
-            assert d.end == v.end
-
-    def test_multiple_speakers(self):
-        from reflector.processors.types import Transcript as TranscriptType
-        from reflector.processors.types import words_to_segments
-
-        words = [
-            Word(text="Hello", start=0.0, end=0.5, speaker=0),
-            Word(text=" world.", start=0.5, end=1.0, speaker=0),
-            Word(text=" How", start=1.0, end=1.5, speaker=1),
-            Word(text=" are", start=1.5, end=2.0, speaker=1),
-            Word(text=" you?", start=2.0, end=2.5, speaker=1),
-        ]
-
-        direct = words_to_segments(words)
-        via_transcript = TranscriptType(words=words).as_segments(is_multitrack=False)
-
-        assert len(direct) == len(via_transcript)
-        for d, v in zip(direct, via_transcript):
-            assert d.text == v.text
-            assert d.speaker == v.speaker
-
-    def test_empty_words(self):
-        from reflector.processors.types import Transcript as TranscriptType
-        from reflector.processors.types import words_to_segments
-
-        assert words_to_segments([]) == []
-        assert TranscriptType(words=[]).as_segments(is_multitrack=False) == []
-
-
-class TestTopicsResultEmptyWords:
-    """TopicsResult can carry topics with empty transcript words."""
-
-    def test_construction_with_empty_words(self):
-        from reflector.hatchet.workflows.models import TopicsResult
-        from reflector.processors.types import TitleSummary
-        from reflector.processors.types import Transcript as TranscriptType
-
-        topics = [
-            TitleSummary(
-                title="Topic A",
-                summary="Summary A",
-                timestamp=0.0,
-                duration=5.0,
-                transcript=TranscriptType(words=[]),
-            ),
-            TitleSummary(
-                title="Topic B",
-                summary="Summary B",
-                timestamp=5.0,
-                duration=5.0,
-                transcript=TranscriptType(words=[]),
-            ),
-        ]
-        result = TopicsResult(topics=topics)
-        assert len(result.topics) == 2
-        for t in result.topics:
-            assert t.transcript.words == []
-
-    def test_serialization_roundtrip(self):
-        from reflector.hatchet.workflows.models import TopicsResult
-        from reflector.processors.types import TitleSummary
-        from reflector.processors.types import Transcript as TranscriptType
-
-        topics = [
-            TitleSummary(
-                title="Topic",
-                summary="Summary",
-                timestamp=0.0,
-                duration=1.0,
-                transcript=TranscriptType(words=[]),
-            )
-        ]
-        result = TopicsResult(topics=topics)
-        data = result.model_dump()
-        reconstructed = TopicsResult(**data)
-        assert len(reconstructed.topics) == 1
-        assert reconstructed.topics[0].transcript.words == []
Author	SHA1	Message	Date
Igor Loskutov	0d4c5c463c	feat: standalone frontend uses production build instead of dev server Override web service in docker-compose.standalone.yml to build from www/Dockerfile (multi-stage: deps → build → standalone runner) instead of running pnpm dev with bind-mounted source.	2026-02-12 11:27:45 -05:00
Igor Loskutov	f6a23cfddd	fix: standalone GPU service connectivity with host network mode Server runs with network_mode: host and can't resolve Docker service names. Publish cpu port as 8100 on host, point server at localhost:8100. Worker stays on bridge network using cpu:8000. Add dummy TRANSCRIPT_MODAL_API_KEY since OpenAI SDK requires it even for local endpoints.	2026-02-11 18:10:20 -05:00
Sergey Mankovsky	b1405af8c7	Remove turbopack	2026-02-11 23:15:32 +01:00
Sergey Mankovsky	71ad8a294f	Fix webrtc connection	2026-02-11 23:11:46 +01:00
Sergey Mankovsky	bba272505f	Enable server host mode	2026-02-11 23:11:31 +01:00
Igor Loskutov	67aea78243	fix: mock Celery broker in idle transcript validation test test_validation_idle_transcript_with_recording_allowed called validate_transcript_for_processing without mocking task_is_scheduled_or_active, which attempts a real Celery broker connection (AMQP port 5672). Other tests in the same file already mock this — apply the same pattern here.	2026-02-11 16:26:24 -05:00
Igor Loskutov	2d81321733	fix: processing page auto-redirect after file upload completes Three fixes for the processing page not redirecting when status becomes "ended": - Add useWebSockets to processing page so it receives STATUS events - Remove OAuth2PasswordBearer from auth_none — broke WebSocket endpoints (500) - Reconnect stale Redis in ws_manager when Celery worker reuses dead event loop	2026-02-11 15:53:21 -05:00
Igor Loskutov	8c2b720564	fix: improve port conflict detection and ollama model check in standalone setup - Filter OrbStack/Docker Desktop PIDs from port conflict check (false positives on Mac) - Check all infra ports (5432, 6379, 3900, 3903) not just app ports - Fix ollama model detection to match on name column only - Document OrbStack and cross-project port conflicts in troubleshooting	2026-02-11 14:17:19 -05:00
Sergey Mankovsky	88e945ec00	Add hatchet env vars	2026-02-11 20:02:29 +01:00
Igor Loskutov	f6201dd378	fix: set source_kind to FILE on audio file upload The upload endpoint left source_kind as the default LIVE even when a file was uploaded. Now sets it to FILE when the upload completes.	2026-02-11 13:37:55 -05:00
Igor Loskutov	9f62959069	feat: standalone uses self-hosted GPU service for transcription+diarization Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service. Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container. - Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA) - Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set - Add gpu service to docker-compose.standalone.yml with compose env overrides - Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts) - Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests - Remove pyannote-audio from server local deps	2026-02-11 13:37:55 -05:00
Igor Loskutov	0353c23a94	feat: add local pyannote file diarization processor Enables file diarization without Modal by using pyannote.audio locally. Downloads model bundle from S3 on first use, caches locally, patches config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable.	2026-02-11 13:37:12 -05:00
Sergey Mankovsky	7372f80530	Allow reprocessing idle multitrack transcripts	2026-02-11 19:29:29 +01:00
Sergey Mankovsky	208361c8cc	Fix event loop is closed in Celery workers	2026-02-11 19:29:23 +01:00
Sergey Mankovsky	70d17997ef	Fix websocket disconnect errors	2026-02-11 19:29:16 +01:00
Igor Monadical	adc4c20bf4	feat: add local pyannote file diarization processor (#858 ) * feat: add local pyannote file diarization processor Enables file diarization without Modal by using pyannote.audio locally. Downloads model bundle from S3 on first use, caches locally, patches config to use local paths. Set DIARIZATION_BACKEND=pyannote to enable. * fix: standalone setup enables pyannote diarization and public mode Replace DIARIZATION_ENABLED=false with DIARIZATION_BACKEND=pyannote so file uploads get speaker diarization out of the box. Add PUBLIC_MODE=true so unauthenticated users can list/browse transcripts. * fix: touch env files before first compose_cmd in standalone setup docker-compose.yml references www/.env.local as env_file, but the setup script only creates it in step 4. compose_cmd calls in step 3 (Garage) fail on a fresh clone when the file doesn't exist yet. * feat: standalone uses self-hosted GPU service for transcription+diarization Replace in-process pyannote approach with self-hosted gpu/self_hosted/ service. Same HTTP API as Modal — just TRANSCRIPT_URL/DIARIZATION_URL point to local container. - Add gpu/self_hosted/Dockerfile.cpu (GPU Dockerfile minus NVIDIA CUDA) - Add S3 model bundle fallback in diarizer.py when HF_TOKEN not set - Add gpu service to docker-compose.standalone.yml with compose env overrides - Fix /browse empty in PUBLIC_MODE (search+list queries filtered out roomless transcripts) - Remove audio_diarization_pyannote.py, file_diarization_pyannote.py and tests - Remove pyannote-audio from server local deps * fix: allow unauthenticated GPU requests when no API key configured OAuth2PasswordBearer with auto_error=True rejects requests without Authorization header before apikey_auth can check if auth is needed. * fix: rename standalone gpu service to cpu to match Dockerfile.cpu usage * docs: add programmatic testing section and fix gpu->cpu naming in setup script/docs - Add "Testing programmatically" section to standalone docs with curl commands for creating transcript, uploading audio, polling status, checking result - Fix setup-standalone.sh to reference `cpu` service (was still `gpu` after rename) - Update all docs references from gpu to cpu service naming --------- Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>	2026-02-11 12:41:32 -05:00
Sergey Mankovsky	ec4f356b4c	fix: local env setup (#855 ) * Ensure rate limit * Increase nextjs compilation speed * Fix daily no content handling * Simplify daily webhook creation * Fix webhook request validation	2026-02-11 16:59:21 +01:00
Igor Loskutov	39573626e9	fix: invalidate transcript query on STATUS websocket event Without this, the processing page never redirects after completion because the redirect logic watches the REST query data, not the WebSocket status state. Cherry-picked from feat-dag-progress (`faec509a`).	2026-02-10 20:27:34 -05:00
Igor Loskutov	d9aa6d6eb0	docs: add troubleshooting section + port conflict check in setup script Port conflicts from stale next dev / other worktree processes silently shadow Docker container port mappings, causing env vars to appear ignored.	2026-02-10 19:54:04 -05:00
Igor Loskutov	e1ea914675	docs: update standalone md — symlink handling, garage config template	2026-02-10 19:05:02 -05:00
Igor Loskutov	7200f3c65f	fix: standalone setup — garage config, symlink handling, healthcheck - garage.toml: fix rpc_secret field name (was secret_transmitter), move to top-level per Garage v1.1.0 spec, remove unused [s3_web] - setup-standalone.sh: resolve symlinked .env files before writing, always ensure all standalone-critical vars via env_set, fix garage key create/info syntax (positional arg, not --name), avoid overwriting key secret with "(redacted)" on re-run, use compose_cmd in health check - docker-compose.standalone.yml: fix garage healthcheck (no curl in image, use /garage stats instead)	2026-02-10 19:04:42 -05:00
Igor Loskutov	2f669dfd89	feat: add custom S3 endpoint support + Garage standalone storage Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible backends (Garage, MinIO). When set, uses path-style addressing and routes all requests to the custom endpoint. When unset, AWS behavior is unchanged. - AwsStorage: accept aws_endpoint_url, pass to all 6 session.client() calls, configure path-style addressing and base_url - Fix 4 direct AwsStorage constructions in Hatchet workflows to pass endpoint_url (would have silently targeted wrong endpoint) - Standalone: add Garage service to docker-compose.standalone.yml, setup script initializes layout/bucket/key and writes credentials - Fix compose_cmd() bug: Mac path was missing standalone yml - garage.toml template with runtime secret generation via openssl	2026-02-10 18:40:23 -05:00
Igor Loskutov	d25d77333c	chore: rename to setup-standalone, remove redundant setup-local-llm.sh	2026-02-10 17:51:03 -05:00
Igor Loskutov	427254fe33	feat: add unified setup-local-dev.sh for standalone deployment Single script takes fresh clone to working Reflector: Ollama/LLM setup, env file generation (server/.env + www/.env.local), docker compose up, health checks. No Hatchet in standalone — live pipeline is pure Celery.	2026-02-10 17:47:12 -05:00
Igor Loskutov	46750abad9	docs: add TASKS.md for standalone env defaults + setup script work	2026-02-10 17:12:01 -05:00
Igor Loskutov	f36b95b09f	docs: resolve standalone storage step — skip S3 for live-only mode	2026-02-10 16:48:18 -05:00
Igor Loskutov	608a3805c5	chore: remove completed PRD, rename setup doc, drop response_format tests - Remove docs/01_ollama.prd.md (implementation complete) - Rename local-dev-setup.md -> standalone-local-setup.md - Remove TestResponseFormat class from test_llm_retry.py	2026-02-10 16:14:33 -05:00
Igor Loskutov	d0af8ffdb7	fix: correct PRD goal (demo/eval, not dev replacement) and processor naming	2026-02-10 16:07:16 -05:00
Igor Loskutov	33a93db802	refactor: move Ollama services to docker-compose.standalone.yml Ollama profiles (ollama-gpu, ollama-cpu) are only for Linux standalone deployment. Mac devs never use them. Separate file keeps the main compose clean and provides a natural home for future standalone services (MinIO, etc.). Linux: docker compose -f docker-compose.yml -f docker-compose.standalone.yml --profile ollama-gpu up -d Mac: docker compose up -d (native Ollama, no standalone file needed)	2026-02-10 16:02:28 -05:00
Igor Loskutov	663345ece6	feat: local LLM via Ollama + structured output response_format - Add setup script (scripts/setup-local-llm.sh) for one-command Ollama setup Mac: native Metal GPU, Linux: containerized via docker-compose profiles - Add ollama-gpu and ollama-cpu docker-compose profiles for Linux - Add extra_hosts to server/hatchet-worker-llm for host.docker.internal - Pass response_format JSON schema in StructuredOutputWorkflow.extract() enabling grammar-based constrained decoding on Ollama/llama.cpp/vLLM/OpenAI - Update .env.example with Ollama as default LLM option - Add Ollama PRD and local dev setup docs	2026-02-10 15:55:21 -05:00