58 KiB
Conductor Migration Tasks
This document defines atomic, isolated work items for migrating the Daily.co multitrack diarization pipeline from Celery to Conductor. Each task is self-contained with clear dependencies, acceptance criteria, and references to the codebase.
Task Index
| ID | Title | Phase | Dependencies | Complexity |
|---|---|---|---|---|
| INFRA-001 | Add Conductor container to docker-compose | 1 | None | Low |
| INFRA-002 | Create Conductor Python client wrapper | 1 | INFRA-001 | Medium |
| INFRA-003 | Add Conductor environment configuration | 1 | INFRA-001 | Low |
| INFRA-004 | Create health check endpoint for Conductor | 1 | INFRA-002 | Low |
| TASK-001 | Create task definitions registry module | 2 | INFRA-002 | Medium |
| TASK-002 | Implement get_recording worker | 2 | TASK-001 | Low |
| TASK-003 | Implement get_participants worker | 2 | TASK-001 | Low |
| TASK-004a | Implement pad_track: extract stream metadata | 2 | TASK-001 | Medium |
| TASK-004b | Implement pad_track: PyAV padding filter | 2 | TASK-004a | Medium |
| TASK-004c | Implement pad_track: S3 upload padded file | 2 | TASK-004b | Low |
| TASK-005a | Implement mixdown_tracks: build filter graph | 2 | TASK-001 | Medium |
| TASK-005b | Implement mixdown_tracks: S3 streaming + upload | 2 | TASK-005a | Medium |
| TASK-006 | Implement generate_waveform worker | 2 | TASK-001 | Medium |
| TASK-007 | Implement transcribe_track worker | 2 | TASK-001 | Medium |
| TASK-008 | Implement merge_transcripts worker | 2 | TASK-001 | Medium |
| TASK-009 | Implement detect_topics worker | 2 | TASK-001 | Medium |
| TASK-010 | Implement generate_title worker | 2 | TASK-001 | Low |
| TASK-011 | Implement generate_summary worker | 2 | TASK-001 | Medium |
| TASK-012 | Implement finalize worker | 2 | TASK-001 | Medium |
| TASK-013 | Implement cleanup_consent worker | 2 | TASK-001 | Low |
| TASK-014 | Implement post_zulip worker | 2 | TASK-001 | Low |
| TASK-015 | Implement send_webhook worker | 2 | TASK-001 | Low |
| TASK-016 | Implement generate_dynamic_fork_tasks helper | 2 | TASK-001 | Low |
| STATE-001 | Add workflow_id to Recording model | 2 | INFRA-002 | Low |
| WFLOW-001 | Create workflow definition JSON with FORK_JOIN_DYNAMIC | 3 | TASK-002..015 | High |
| WFLOW-002 | Implement workflow registration script | 3 | WFLOW-001 | Medium |
| EVENT-001 | Add PIPELINE_PROGRESS WebSocket event (requires frontend ticket) | 2 | None | Medium |
| EVENT-002 | Emit progress events from workers (requires frontend ticket) | 2 | EVENT-001, TASK-002..015 | Medium |
| INTEG-001 | Modify pipeline trigger to start Conductor workflow | 4 | WFLOW-002, STATE-001 | Medium |
| SHADOW-001 | Implement shadow mode toggle | 4 | INTEG-001 | Medium |
| SHADOW-002 | Add result comparison: content fields | 4 | SHADOW-001 | Medium |
| CUTOVER-001 | Create feature flag for Conductor-only mode | 5 | SHADOW-001 | Low |
| CUTOVER-002 | Add fallback to Celery on Conductor failure | 5 | CUTOVER-001 | Medium |
| CLEANUP-001 | Remove deprecated Celery task code | 6 | CUTOVER-001 | Medium |
| CLEANUP-002 | Update documentation | 6 | CLEANUP-001 | Low |
| TEST-001a | Integration tests: API workers (defer to human if complex) | 2 | TASK-002, TASK-003 | Low |
| TEST-001b | Integration tests: audio workers (defer to human if complex) | 2 | TASK-004c, TASK-005b, TASK-006 | Medium |
| TEST-001c | Integration tests: transcription workers (defer to human if complex) | 2 | TASK-007, TASK-008 | Medium |
| TEST-001d | Integration tests: LLM workers (defer to human if complex) | 2 | TASK-009..011 | Medium |
| TEST-001e | Integration tests: finalization workers (defer to human if complex) | 2 | TASK-012..015 | Low |
| TEST-002 | E2E test for complete workflow (defer to human if complex) | 3 | WFLOW-002 | High |
| TEST-003 | Shadow mode comparison tests (defer to human tester if too complex) | 4 | SHADOW-002 | Medium |
Phase 1: Infrastructure Setup
INFRA-001: Add Conductor Container to docker-compose
Description: Add the Conductor OSS standalone container to the docker-compose configuration.
Files to Modify:
docker-compose.yml
Implementation Details:
conductor:
image: conductoross/conductor-standalone:3.15.0
ports:
- 8127:8080
- 5001:5000
environment:
- conductor.db.type=memory # Use postgres in production
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
Acceptance Criteria:
- Conductor UI accessible at http://localhost:8127
- Swagger docs available at http://localhost:8127/swagger-ui/index.html
- Health endpoint returns 200
Dependencies: None
Reference Files:
docs/conductor-pipeline-mock/docker-compose.yml
INFRA-002: Create Conductor Python Client Wrapper
Description:
Create a reusable client wrapper module for interacting with the Conductor server using the conductor-python SDK.
Files to Create:
server/reflector/conductor/__init__.pyserver/reflector/conductor/client.py
Implementation Details:
# server/reflector/conductor/client.py
from conductor.client.configuration.configuration import Configuration
from conductor.client.orkes_clients import OrkesClients
from conductor.client.workflow_client import WorkflowClient
from reflector.settings import settings
class ConductorClientManager:
_instance = None
@classmethod
def get_client(cls) -> WorkflowClient:
if cls._instance is None:
config = Configuration(
server_api_url=settings.CONDUCTOR_SERVER_URL,
debug=settings.CONDUCTOR_DEBUG,
)
cls._instance = OrkesClients(config)
return cls._instance.get_workflow_client()
@classmethod
def start_workflow(cls, name: str, version: int, input_data: dict) -> str:
"""Start a workflow and return the workflow ID."""
client = cls.get_client()
return client.start_workflow_by_name(name, input_data, version=version)
@classmethod
def get_workflow_status(cls, workflow_id: str) -> dict:
"""Get the current status of a workflow."""
client = cls.get_client()
return client.get_workflow(workflow_id, include_tasks=True)
Acceptance Criteria:
- Can connect to Conductor server
- Can start a workflow
- Can retrieve workflow status
- Proper error handling for connection failures
Dependencies: INFRA-001
Reference Files:
docs/conductor-pipeline-mock/src/main.pydocs/conductor-pipeline-mock/src/register_workflow.py
INFRA-003: Add Conductor Environment Configuration
Description: Add environment variables for Conductor configuration to the settings module.
Files to Modify:
server/reflector/settings.pyserver/.env_template
Implementation Details:
# Add to settings.py
CONDUCTOR_SERVER_URL: str = "http://conductor:8080/api"
CONDUCTOR_DEBUG: bool = False
CONDUCTOR_ENABLED: bool = False # Feature flag
CONDUCTOR_SHADOW_MODE: bool = False # Run both Celery and Conductor
Acceptance Criteria:
- Settings load from environment variables
- Default values work for local development
- Docker container uses internal hostname
Dependencies: INFRA-001
Reference Files:
server/reflector/settings.py
INFRA-004: Create Health Check Endpoint for Conductor
Description: Add an endpoint to check Conductor server connectivity and status.
Files to Create:
server/reflector/views/conductor.py
Files to Modify:
server/reflector/app.py(register router)
Implementation Details:
from fastapi import APIRouter
from reflector.conductor.client import ConductorClientManager
router = APIRouter(prefix="/conductor", tags=["conductor"])
@router.get("/health")
async def conductor_health():
try:
client = ConductorClientManager.get_client()
# Conductor SDK health check
return {"status": "healthy", "connected": True}
except Exception as e:
return {"status": "unhealthy", "error": str(e)}
Acceptance Criteria:
- Endpoint returns healthy when Conductor is up
- Endpoint returns unhealthy with error when Conductor is down
- Does not block on slow responses
Dependencies: INFRA-002
Phase 2: Task Decomposition - Worker Definitions
TASK-001: Create Task Definitions Registry Module
Description: Create a module that registers all task definitions with the Conductor server on startup.
Files to Create:
server/reflector/conductor/tasks/__init__.pyserver/reflector/conductor/tasks/definitions.pyserver/reflector/conductor/tasks/register.py
Implementation Details:
Task definition schema:
TASK_DEFINITIONS = [
{
"name": "get_recording",
"retryCount": 3,
"timeoutSeconds": 60,
"responseTimeoutSeconds": 30,
"inputKeys": ["recording_id"],
"outputKeys": ["id", "mtg_session_id", "room_name", "duration"],
"ownerEmail": "reflector@example.com",
},
# ... all other tasks
]
Acceptance Criteria:
- All 16 task types defined with correct timeouts
- Registration script runs successfully
- Tasks visible in Conductor UI
Dependencies: INFRA-002
Reference Files:
docs/conductor-pipeline-mock/src/register_workflow.py(lines 10-112)CONDUCTOR_MIGRATION_REQUIREMENTS.md(Module 5 section)
TASK-002: Implement get_recording Worker
Description: Create a Conductor worker that fetches recording metadata from the Daily.co API.
Files to Create:
server/reflector/conductor/workers/__init__.pyserver/reflector/conductor/workers/get_recording.py
Implementation Details:
from conductor.client.worker.worker_task import worker_task
from conductor.client.http.models import Task, TaskResult
from conductor.client.http.models.task_result_status import TaskResultStatus
from reflector.video_platforms.factory import create_platform_client
@worker_task(task_definition_name="get_recording")
async def get_recording(task: Task) -> TaskResult:
recording_id = task.input_data.get("recording_id")
async with create_platform_client("daily") as client:
recording = await client.get_recording(recording_id)
result = TaskResult(
task_id=task.task_id,
workflow_instance_id=task.workflow_instance_id,
worker_id=task.worker_id,
)
result.status = TaskResultStatus.COMPLETED
result.output_data = {
"id": recording.id,
"mtg_session_id": recording.mtgSessionId,
"room_name": recording.roomName,
"duration": recording.duration,
}
return result
Input Contract:
{"recording_id": "string"}
Output Contract:
{"id": "string", "mtg_session_id": "string", "room_name": "string", "duration": "number"}
Acceptance Criteria:
- Worker polls for tasks correctly
- Handles Daily.co API errors gracefully
- Returns correct output schema
- Timeout: 60s, Response timeout: 30s, Retries: 3
Dependencies: TASK-001
Reference Files:
server/reflector/worker/process.py(lines 218-294)docs/conductor-pipeline-mock/src/workers.py(lines 13-26)
TASK-003: Implement get_participants Worker
Description: Create a Conductor worker that fetches meeting participants from the Daily.co API.
Files to Create:
server/reflector/conductor/workers/get_participants.py
Implementation Details:
@worker_task(task_definition_name="get_participants")
async def get_participants(task: Task) -> TaskResult:
mtg_session_id = task.input_data.get("mtg_session_id")
async with create_platform_client("daily") as client:
payload = await client.get_meeting_participants(mtg_session_id)
participants = [
{"participant_id": p.participant_id, "user_name": p.user_name, "user_id": p.user_id}
for p in payload.data
]
result = TaskResult(...)
result.output_data = {"participants": participants}
return result
Input Contract:
{"mtg_session_id": "string"}
Output Contract:
{"participants": [{"participant_id": "string", "user_name": "string", "user_id": "string|null"}]}
Acceptance Criteria:
- Fetches participants from Daily.co API
- Maps participant IDs to names correctly
- Handles missing mtg_session_id
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 513-596)docs/conductor-pipeline-mock/src/workers.py(lines 29-42)
TASK-004a: Implement pad_track - Extract Stream Metadata
Description: Extract stream.start_time from WebM container metadata for timestamp alignment.
Files to Create:
server/reflector/conductor/workers/pad_track.py(partial - metadata extraction)
Implementation Details:
def _extract_stream_start_time_from_container(source_url: str) -> float:
"""Extract start_time from WebM stream metadata using PyAV."""
container = av.open(source_url, options={
"reconnect": "1",
"reconnect_streamed": "1",
"reconnect_delay_max": "30",
})
audio_stream = container.streams.audio[0]
start_time = float(audio_stream.start_time * audio_stream.time_base)
container.close()
return start_time
Acceptance Criteria:
- Opens WebM container from S3 presigned URL
- Extracts start_time from audio stream metadata
- Handles missing/invalid start_time (returns 0)
- Closes container properly
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 56-85)_extract_stream_start_time_from_container()method
TASK-004b: Implement pad_track - PyAV Padding Filter
Description: Apply adelay filter using PyAV filter graph to pad audio with silence.
Files to Modify:
server/reflector/conductor/workers/pad_track.py(add filter logic)
Implementation Details:
def _apply_audio_padding_to_file(in_container, output_path: str, start_time_seconds: float):
"""Apply adelay filter to pad audio with silence."""
delay_ms = math.floor(start_time_seconds * 1000)
graph = av.filter.Graph()
src = graph.add("abuffer", args=abuf_args, name="src")
aresample_f = graph.add("aresample", args="async=1", name="ares")
delays_arg = f"{delay_ms}|{delay_ms}"
adelay_f = graph.add("adelay", args=f"delays={delays_arg}:all=1", name="delay")
sink = graph.add("abuffersink", name="sink")
src.link_to(aresample_f)
aresample_f.link_to(adelay_f)
adelay_f.link_to(sink)
graph.configure()
# Process frames through filter graph
# Write to output file
Acceptance Criteria:
- Constructs correct filter graph chain
- Calculates delay_ms correctly (start_time * 1000)
- Handles stereo audio (delay per channel)
- Edge case: skip if start_time <= 0
Dependencies: TASK-004a
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 87-188)_apply_audio_padding_to_file()method
Technical Notes:
- Filter chain:
abuffer->aresample->adelay->abuffersink - adelay format:
delays={ms}|{ms}:all=1
TASK-004c: Implement pad_track - S3 Upload
Description: Complete the pad_track worker by uploading padded file to S3 and returning presigned URL.
Files to Modify:
server/reflector/conductor/workers/pad_track.py(complete worker)
Implementation Details:
@worker_task(task_definition_name="pad_track")
async def pad_track(task: Task) -> TaskResult:
track_index = task.input_data.get("track_index")
s3_key = task.input_data.get("s3_key")
bucket_name = task.input_data.get("bucket_name")
transcript_id = task.input_data.get("transcript_id")
storage = get_transcripts_storage()
source_url = await storage.get_file_url(s3_key, expires_in=7200, bucket=bucket_name)
# Use helpers from 004a and 004b
start_time = _extract_stream_start_time_from_container(source_url)
padded_path = _apply_audio_padding_to_file(source_url, start_time)
# Upload to S3
storage_key = f"{transcript_id}/padded_track_{track_index}.webm"
await storage.put_file(storage_key, padded_path)
padded_url = await storage.get_file_url(storage_key, expires_in=7200)
result.output_data = {"padded_url": padded_url, "size": file_size, "track_index": track_index}
return result
Input Contract:
{"track_index": "number", "s3_key": "string", "bucket_name": "string", "transcript_id": "string"}
Output Contract:
{"padded_url": "string", "size": "number", "track_index": "number"}
Acceptance Criteria:
- Uploads padded file to S3
- Returns presigned URL (7200s expiry)
- Timeout: 300s, Response timeout: 120s, Retries: 3
Dependencies: TASK-004b
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 190-210)
TASK-005a: Implement mixdown_tracks - Build Filter Graph
Description: Build PyAV filter graph for mixing N audio tracks with amix filter.
Files to Create:
server/reflector/conductor/workers/mixdown_tracks.py(partial - filter graph)
Implementation Details:
def _build_mixdown_filter_graph(containers: list, out_stream) -> av.filter.Graph:
"""Build filter graph: N abuffer -> amix -> aformat -> sink."""
graph = av.filter.Graph()
# Create abuffer for each input
abuffers = []
for i, container in enumerate(containers):
audio_stream = container.streams.audio[0]
abuf_args = f"time_base={...}:sample_rate=48000:sample_fmt=fltp:channel_layout=stereo"
abuffers.append(graph.add("abuffer", args=abuf_args, name=f"src{i}"))
# amix with normalize=0 to prevent volume reduction
amix = graph.add("amix", args=f"inputs={len(containers)}:normalize=0", name="amix")
aformat = graph.add("aformat", args="sample_fmts=s16:channel_layouts=stereo", name="aformat")
sink = graph.add("abuffersink", name="sink")
# Link all sources to amix
for abuf in abuffers:
abuf.link_to(amix)
amix.link_to(aformat)
aformat.link_to(sink)
graph.configure()
return graph
Acceptance Criteria:
- Creates abuffer per input track
- Uses amix with normalize=0
- Outputs stereo s16 format
- Handles variable number of inputs (1-N tracks)
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 324-420)
Technical Notes:
- amix normalize=0 prevents volume reduction when mixing
- Output format: stereo, s16 for MP3 encoding
TASK-005b: Implement mixdown_tracks - S3 Streaming and Upload
Description: Complete mixdown worker with S3 streaming input and upload output.
Files to Modify:
server/reflector/conductor/workers/mixdown_tracks.py(complete worker)
Implementation Details:
@worker_task(task_definition_name="mixdown_tracks")
async def mixdown_tracks(task: Task) -> TaskResult:
padded_urls = task.input_data.get("padded_urls", [])
transcript_id = task.input_data.get("transcript_id")
# Open containers with reconnect options for S3 streaming
containers = []
for url in padded_urls:
containers.append(av.open(url, options={
"reconnect": "1", "reconnect_streamed": "1", "reconnect_delay_max": "30"
}))
# Build filter graph and process
graph = _build_mixdown_filter_graph(containers, ...)
# Encode to MP3 and upload
storage = get_transcripts_storage()
storage_path = f"{transcript_id}/audio.mp3"
await storage.put_file(storage_path, mp3_file)
result.output_data = {"audio_key": storage_path, "duration": duration, "size": file_size}
return result
Input Contract:
{"padded_urls": ["string"], "transcript_id": "string"}
Output Contract:
{"audio_key": "string", "duration": "number", "size": "number"}
Acceptance Criteria:
- Opens all padded tracks via presigned URLs
- Handles S3 streaming with reconnect options
- Encodes to MP3 format
- Uploads to
{transcript_id}/audio.mp3 - Returns duration for broadcast
- Timeout: 600s, Response timeout: 300s, Retries: 3
Dependencies: TASK-005a
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 420-498)
TASK-006: Implement generate_waveform Worker
Description: Create a Conductor worker that generates waveform visualization data from the mixed audio.
Files to Create:
server/reflector/conductor/workers/generate_waveform.py
Implementation Details:
@worker_task(task_definition_name="generate_waveform")
async def generate_waveform(task: Task) -> TaskResult:
audio_key = task.input_data.get("audio_key")
transcript_id = task.input_data.get("transcript_id")
# Use AudioWaveformProcessor to generate peaks
# This processor uses librosa/scipy internally
result.output_data = {"waveform": waveform_peaks}
return result
Input Contract:
{"audio_key": "string", "transcript_id": "string"}
Output Contract:
{"waveform": ["number"]}
Acceptance Criteria:
- Generates waveform peaks array
- Broadcasts WAVEFORM event to WebSocket
- Stores waveform JSON locally
- Timeout: 120s, Response timeout: 60s, Retries: 3
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 670-678)server/reflector/processors/audio_waveform_processor.pydocs/conductor-pipeline-mock/src/workers.py(lines 79-92)
TASK-007: Implement transcribe_track Worker
Description: Create a Conductor worker that transcribes a single audio track using GPU (Modal.com) or local Whisper.
Files to Create:
server/reflector/conductor/workers/transcribe_track.py
Implementation Details:
@worker_task(task_definition_name="transcribe_track")
async def transcribe_track(task: Task) -> TaskResult:
track_index = task.input_data.get("track_index")
audio_url = task.input_data.get("audio_url")
language = task.input_data.get("language", "en")
transcript = await transcribe_file_with_processor(audio_url, language)
# Tag all words with speaker index
for word in transcript.words:
word.speaker = track_index
result.output_data = {
"words": [w.model_dump() for w in transcript.words],
"track_index": track_index,
}
return result
Input Contract:
{
"track_index": "number",
"audio_url": "string",
"language": "string"
}
Output Contract:
{
"words": [{"word": "string", "start": "number", "end": "number", "speaker": "number"}],
"track_index": "number"
}
Acceptance Criteria:
- Calls Modal.com GPU transcription service
- Tags words with correct speaker index
- Handles empty transcription results
- Timeout: 1800s, Response timeout: 900s, Retries: 3
Dependencies: TASK-001, CACHE-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 747-748)server/reflector/pipelines/transcription_helpers.pyserver/reflector/processors/file_transcript_auto.pydocs/conductor-pipeline-mock/src/workers.py(lines 95-109)
Technical Notes:
- This is the most expensive operation (GPU time)
- Should implement caching to avoid re-transcription on retries (see CACHE-002)
- Environment variable:
TRANSCRIPT_MODAL_API_KEY
TASK-008: Implement merge_transcripts Worker
Description: Create a Conductor worker that merges multiple track transcriptions into a single timeline sorted by timestamp.
Files to Create:
server/reflector/conductor/workers/merge_transcripts.py
Implementation Details:
@worker_task(task_definition_name="merge_transcripts")
async def merge_transcripts(task: Task) -> TaskResult:
transcripts = task.input_data.get("transcripts", [])
transcript_id = task.input_data.get("transcript_id")
all_words = []
for t in transcripts:
if isinstance(t, dict) and "words" in t:
all_words.extend(t["words"])
# Sort by start timestamp
all_words.sort(key=lambda w: w.get("start", 0))
# Broadcast TRANSCRIPT event
await broadcast_transcript_event(transcript_id, all_words)
result.output_data = {
"all_words": all_words,
"word_count": len(all_words),
}
return result
Input Contract:
{
"transcripts": [{"words": [...]}],
"transcript_id": "string"
}
Output Contract:
{"all_words": [...], "word_count": "number"}
Acceptance Criteria:
- Merges words from all tracks
- Sorts by start timestamp
- Preserves speaker attribution
- Broadcasts TRANSCRIPT event
- Updates transcript.events in DB
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 727-736)docs/conductor-pipeline-mock/src/workers.py(lines 112-131)
TASK-009: Implement detect_topics Worker
Description: Create a Conductor worker that detects topics using LLM calls.
Files to Create:
server/reflector/conductor/workers/detect_topics.py
Implementation Details:
@worker_task(task_definition_name="detect_topics")
async def detect_topics(task: Task) -> TaskResult:
words = task.input_data.get("words", [])
transcript_id = task.input_data.get("transcript_id")
target_language = task.input_data.get("target_language", "en")
# Uses TranscriptTopicDetectorProcessor
# Chunks words into groups of 300, calls LLM per chunk
topics = await topic_processing.detect_topics(
TranscriptType(words=words),
target_language,
on_topic_callback=lambda t: broadcast_topic_event(transcript_id, t),
empty_pipeline=EmptyPipeline(logger),
)
result.output_data = {
"topics": [t.model_dump() for t in topics]
}
return result
Input Contract:
{
"words": [...],
"transcript_id": "string",
"target_language": "string"
}
Output Contract:
{"topics": [{"id": "string", "title": "string", "summary": "string", "timestamp": "number", "duration": "number"}]}
Acceptance Criteria:
- Chunks words in groups of 300
- Calls LLM for each chunk
- Broadcasts TOPIC event for each detected topic
- Returns complete topics list
- Timeout: 300s, Response timeout: 120s, Retries: 3
Dependencies: TASK-001, CACHE-001
Reference Files:
server/reflector/pipelines/topic_processing.py(lines 34-63)server/reflector/processors/transcript_topic_detector.pydocs/conductor-pipeline-mock/src/workers.py(lines 134-147)
Technical Notes:
- Number of LLM calls:
ceil(word_count / 300) - Uses
TranscriptTopicDetectorProcessor
TASK-010: Implement generate_title Worker
Description: Create a Conductor worker that generates a meeting title from detected topics using LLM.
Files to Create:
server/reflector/conductor/workers/generate_title.py
Implementation Details:
@worker_task(task_definition_name="generate_title")
async def generate_title(task: Task) -> TaskResult:
topics = task.input_data.get("topics", [])
transcript_id = task.input_data.get("transcript_id")
if not topics:
result.output_data = {"title": "Untitled Meeting"}
return result
# Uses TranscriptFinalTitleProcessor
title = await topic_processing.generate_title(
topics,
on_title_callback=lambda t: broadcast_title_event(transcript_id, t),
empty_pipeline=EmptyPipeline(logger),
logger=logger,
)
result.output_data = {"title": title}
return result
Input Contract:
{"topics": [...], "transcript_id": "string"}
Output Contract:
{"title": "string"}
Acceptance Criteria:
- Generates title from topic summaries
- Broadcasts FINAL_TITLE event
- Updates transcript.title in DB
- Handles empty topics list
- Timeout: 60s, Response timeout: 30s, Retries: 3
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/topic_processing.py(lines 66-84)server/reflector/pipelines/main_multitrack_pipeline.py(lines 760-766)docs/conductor-pipeline-mock/src/workers.py(lines 150-163)
TASK-011: Implement generate_summary Worker
Description: Create a Conductor worker that generates long and short summaries from topics and words using LLM.
Files to Create:
server/reflector/conductor/workers/generate_summary.py
Implementation Details:
@worker_task(task_definition_name="generate_summary")
async def generate_summary(task: Task) -> TaskResult:
words = task.input_data.get("words", [])
topics = task.input_data.get("topics", [])
transcript_id = task.input_data.get("transcript_id")
transcript = await transcripts_controller.get_by_id(transcript_id)
# Uses TranscriptFinalSummaryProcessor
await topic_processing.generate_summaries(
topics, transcript,
on_long_summary_callback=lambda s: broadcast_long_summary_event(transcript_id, s),
on_short_summary_callback=lambda s: broadcast_short_summary_event(transcript_id, s),
empty_pipeline=EmptyPipeline(logger),
logger=logger,
)
result.output_data = {
"summary": long_summary,
"short_summary": short_summary,
}
return result
Input Contract:
{
"words": [...],
"topics": [...],
"transcript_id": "string"
}
Output Contract:
{"summary": "string", "short_summary": "string"}
Acceptance Criteria:
- Generates long summary
- Generates short summary
- Broadcasts FINAL_LONG_SUMMARY event
- Broadcasts FINAL_SHORT_SUMMARY event
- Updates transcript.long_summary and transcript.short_summary in DB
- Timeout: 300s, Response timeout: 120s, Retries: 3
Dependencies: TASK-001, CACHE-001
Reference Files:
server/reflector/pipelines/topic_processing.py(lines 86-109)server/reflector/pipelines/main_multitrack_pipeline.py(lines 768-777)docs/conductor-pipeline-mock/src/workers.py(lines 166-180)
Technical Notes:
- LLM calls: 2 + 2*M where M = number of subjects (max 6)
TASK-012: Implement finalize Worker
Description: Create a Conductor worker that finalizes the transcript status and updates the database.
Files to Create:
server/reflector/conductor/workers/finalize.py
Implementation Details:
@worker_task(task_definition_name="finalize")
async def finalize(task: Task) -> TaskResult:
transcript_id = task.input_data.get("transcript_id")
title = task.input_data.get("title")
summary = task.input_data.get("summary")
short_summary = task.input_data.get("short_summary")
duration = task.input_data.get("duration")
transcript = await transcripts_controller.get_by_id(transcript_id)
await transcripts_controller.update(transcript, {
"status": "ended",
"title": title,
"long_summary": summary,
"short_summary": short_summary,
"duration": duration,
})
# Broadcast STATUS event
await broadcast_status_event(transcript_id, "ended")
result.output_data = {"status": "COMPLETED"}
return result
Input Contract:
{
"transcript_id": "string",
"title": "string",
"summary": "string",
"short_summary": "string",
"duration": "number"
}
Output Contract:
{"status": "string"}
Acceptance Criteria:
- Updates transcript status to "ended"
- Persists title, summaries, duration
- Broadcasts STATUS event with "ended"
- Idempotent (can be retried safely)
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_multitrack_pipeline.py(lines 745, 787-791)docs/conductor-pipeline-mock/src/workers.py(lines 183-196)
TASK-013: Implement cleanup_consent Worker
Description: Create a Conductor worker that checks participant consent and deletes audio if denied.
Files to Create:
server/reflector/conductor/workers/cleanup_consent.py
Implementation Details:
@worker_task(task_definition_name="cleanup_consent")
async def cleanup_consent(task: Task) -> TaskResult:
transcript_id = task.input_data.get("transcript_id")
# Check if any participant denied consent
# Delete audio from S3 if so
# Implementation mirrors task_cleanup_consent from main_live_pipeline
result.output_data = {
"audio_deleted": deleted,
"reason": reason,
}
return result
Input Contract:
{"transcript_id": "string"}
Output Contract:
{"audio_deleted": "boolean", "reason": "string|null"}
Acceptance Criteria:
- Checks all participant consent statuses
- Deletes audio from S3 if any denied
- Updates transcript.audio_deleted flag
- Idempotent deletes
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_live_pipeline.py-task_cleanup_consentserver/reflector/pipelines/main_multitrack_pipeline.py(line 794)
TASK-014: Implement post_zulip Worker
Description: Create a Conductor worker that posts or updates a Zulip message with the transcript summary.
Files to Create:
server/reflector/conductor/workers/post_zulip.py
Implementation Details:
@worker_task(task_definition_name="post_zulip")
async def post_zulip(task: Task) -> TaskResult:
transcript_id = task.input_data.get("transcript_id")
# Uses existing Zulip integration
# Post new message or update existing using message_id
result.output_data = {"message_id": message_id}
return result
Input Contract:
{"transcript_id": "string"}
Output Contract:
{"message_id": "string|null"}
Acceptance Criteria:
- Posts to configured Zulip channel
- Updates existing message if message_id exists
- Handles Zulip API errors gracefully
- Timeout: 60s, Response timeout: 30s, Retries: 5
Dependencies: TASK-001
Reference Files:
server/reflector/pipelines/main_live_pipeline.py-task_pipeline_post_to_zulipserver/reflector/pipelines/main_multitrack_pipeline.py(line 795)server/reflector/zulip.py
TASK-015: Implement send_webhook Worker
Description: Create a Conductor worker that sends the transcript completion webhook to the configured URL.
Files to Create:
server/reflector/conductor/workers/send_webhook.py
Implementation Details:
@worker_task(task_definition_name="send_webhook")
async def send_webhook(task: Task) -> TaskResult:
transcript_id = task.input_data.get("transcript_id")
room_id = task.input_data.get("room_id")
# Uses existing webhook logic from webhook.py
# Includes HMAC signature if secret configured
result.output_data = {
"sent": success,
"status_code": status_code,
}
return result
Input Contract:
{"transcript_id": "string", "room_id": "string"}
Output Contract:
{"sent": "boolean", "status_code": "number|null"}
Acceptance Criteria:
- Sends webhook with correct payload schema
- Includes HMAC signature
- Retries on 5xx, not on 4xx
- Timeout: 60s, Response timeout: 30s, Retries: 30
Dependencies: TASK-001
Reference Files:
server/reflector/worker/webhook.pyserver/reflector/pipelines/main_file_pipeline.py-task_send_webhook_if_neededserver/reflector/pipelines/main_multitrack_pipeline.py(line 796)
TASK-016: Implement generate_dynamic_fork_tasks Helper
Description: Create a helper worker that generates dynamic task definitions for FORK_JOIN_DYNAMIC. This is required because Conductor's FORK_JOIN_DYNAMIC needs pre-computed task lists and input maps.
Files to Create:
server/reflector/conductor/workers/generate_dynamic_fork_tasks.py
Implementation Details:
@worker_task(task_definition_name="generate_dynamic_fork_tasks")
def generate_dynamic_fork_tasks(task: Task) -> TaskResult:
tracks = task.input_data.get("tracks", [])
task_type = task.input_data.get("task_type") # "pad_track" or "transcribe_track"
transcript_id = task.input_data.get("transcript_id")
tasks = []
inputs = {}
for idx, track in enumerate(tracks):
ref_name = f"{task_type}_{idx}"
tasks.append({
"name": task_type,
"taskReferenceName": ref_name,
"type": "SIMPLE"
})
inputs[ref_name] = {
"track_index": idx,
"transcript_id": transcript_id,
# Additional task-specific inputs based on task_type
}
result.output_data = {"tasks": tasks, "inputs": inputs}
return result
Input Contract:
{
"tracks": [{"s3_key": "string"}],
"task_type": "pad_track" | "transcribe_track",
"transcript_id": "string",
"bucket_name": "string"
}
Output Contract:
{
"tasks": [{"name": "string", "taskReferenceName": "string", "type": "SIMPLE"}],
"inputs": {"ref_name": {...input_data...}}
}
Acceptance Criteria:
- Generates correct task list for variable track counts (1, 2, ... N)
- Generates correct input map with task-specific parameters
- Supports both pad_track and transcribe_track task types
- Timeout: 30s, Response timeout: 15s, Retries: 3
Dependencies: TASK-001
Technical Notes:
- This helper is required because FORK_JOIN_DYNAMIC expects
dynamicTasksanddynamicTasksInputparameters - The workflow uses this helper twice: once for padding, once for transcription
- Each invocation has different task_type and additional inputs
Phase 2 (Continued): State Management
STATE-001: Add workflow_id to Recording Model
Description:
Add a workflow_id field to the Recording model to track the Conductor workflow associated with each recording.
Files to Modify:
server/reflector/db/recordings.py- Create migration file
Implementation Details:
# In Recording model
workflow_id: Optional[str] = Column(String, nullable=True, index=True)
Acceptance Criteria:
- Migration adds nullable workflow_id column
- Index created for workflow_id lookups
- Recording can be queried by workflow_id
Dependencies: INFRA-002
Reference Files:
CONDUCTOR_MIGRATION_REQUIREMENTS.md(Module 7: State Management)
Phase 3: Workflow Definition
WFLOW-001: Create Workflow Definition JSON with FORK_JOIN_DYNAMIC
Description: Define the complete workflow DAG in Conductor's workflow definition format, including dynamic forking for variable track counts.
Files to Create:
server/reflector/conductor/workflows/diarization_pipeline.json
Implementation Details:
The workflow must include:
- Sequential: get_recording -> get_participants
- FORK_JOIN_DYNAMIC: pad_track for each track
- Sequential: mixdown_tracks -> generate_waveform
- FORK_JOIN_DYNAMIC: transcribe_track for each track (parallel!)
- Sequential: merge_transcripts -> detect_topics
- FORK_JOIN: generate_title || generate_summary
- Sequential: finalize -> cleanup_consent -> post_zulip -> send_webhook
FORK_JOIN_DYNAMIC Pattern:
{
"name": "fork_track_padding",
"taskReferenceName": "fork_track_padding",
"type": "FORK_JOIN_DYNAMIC",
"inputParameters": {
"dynamicTasks": "${generate_padding_tasks.output.tasks}",
"dynamicTasksInput": "${generate_padding_tasks.output.inputs}"
},
"dynamicForkTasksParam": "dynamicTasks",
"dynamicForkTasksInputParamName": "dynamicTasksInput"
}
This requires a helper task that generates the dynamic fork structure based on track count.
Acceptance Criteria:
- Valid Conductor workflow schema
- All task references match registered task definitions
- Input/output parameter mappings correct
- FORK_JOIN_DYNAMIC works with 1, 2, ... N tracks
- JOIN correctly collects all parallel results
- DAG renders correctly in Conductor UI
Dependencies: TASK-002 through TASK-015
Reference Files:
docs/conductor-pipeline-mock/src/register_workflow.py(lines 125-304)CONDUCTOR_MIGRATION_REQUIREMENTS.md(Module 3 section, Target Architecture diagram)
WFLOW-002: Implement Workflow Registration Script
Description: Create a script that registers the workflow definition with the Conductor server.
Files to Create:
server/reflector/conductor/workflows/register.py
Implementation Details:
import requests
from reflector.settings import settings
def register_workflow():
with open("diarization_pipeline.json") as f:
workflow = json.load(f)
resp = requests.put(
f"{settings.CONDUCTOR_SERVER_URL}/metadata/workflow",
json=[workflow],
headers={"Content-Type": "application/json"},
)
resp.raise_for_status()
Acceptance Criteria:
- Workflow visible in Conductor UI
- Can start workflow via API
- DAG renders correctly in UI
Dependencies: WFLOW-001
Reference Files:
docs/conductor-pipeline-mock/src/register_workflow.py(lines 317-327)
Phase 2 (Continued): WebSocket Events
EVENT-001: Add PIPELINE_PROGRESS WebSocket Event
Description: Define a new WebSocket event type for granular pipeline progress tracking.
⚠️ Note: Requires separate frontend ticket to add UI consumer for this event.
Files to Modify:
server/reflector/db/transcripts.py(add event type)server/reflector/ws_manager.py(ensure broadcast support)
Implementation Details:
# New event schema
class PipelineProgressEvent(BaseModel):
event: str = "PIPELINE_PROGRESS"
data: PipelineProgressData
class PipelineProgressData(BaseModel):
workflow_id: str
current_step: str
step_index: int
total_steps: int
step_status: Literal["pending", "in_progress", "completed", "failed"]
Acceptance Criteria:
- Event schema defined
- Works with existing WebSocket infrastructure
- Frontend ticket created for progress UI consumer
Dependencies: None
Reference Files:
CONDUCTOR_MIGRATION_REQUIREMENTS.md(Module 6 section)server/reflector/pipelines/main_live_pipeline.py(broadcast_to_sockets decorator)
EVENT-002: Emit Progress Events from Workers
Description: Modify workers to emit PIPELINE_PROGRESS events at start and completion of each task.
⚠️ Note: Requires separate frontend ticket to add UI consumer (see EVENT-001).
Files to Modify:
- All worker files in
server/reflector/conductor/workers/
Implementation Details:
async def emit_progress(transcript_id: str, step: str, status: str, index: int, total: int):
ws_manager = get_ws_manager()
await ws_manager.send_json(
room_id=f"ts:{transcript_id}",
message={
"event": "PIPELINE_PROGRESS",
"data": {
"current_step": step,
"step_index": index,
"total_steps": total,
"step_status": status,
}
}
)
@worker_task(task_definition_name="transcribe_track")
async def transcribe_track(task: Task) -> TaskResult:
await emit_progress(transcript_id, "transcribe_track", "in_progress", 6, 14)
# ... processing ...
await emit_progress(transcript_id, "transcribe_track", "completed", 6, 14)
Acceptance Criteria:
- Progress emitted at task start
- Progress emitted at task completion
Dependencies: EVENT-001, TASK-002 through TASK-015
Phase 4: Integration
INTEG-001: Modify Pipeline Trigger to Start Conductor Workflow
Description:
Replace task_pipeline_multitrack_process.delay() with Conductor workflow start in process_multitrack_recording.
This single change captures BOTH webhook AND polling entry paths, since both converge at this function.
Files to Modify:
server/reflector/worker/process.py
Implementation Details:
# In _process_multitrack_recording_inner(), around line 289
# Replace:
# task_pipeline_multitrack_process.delay(
# transcript_id=transcript.id,
# bucket_name=bucket_name,
# track_keys=filter_cam_audio_tracks(track_keys),
# )
# With:
if settings.CONDUCTOR_ENABLED:
from reflector.conductor.client import ConductorClientManager
from reflector.db.recordings import recordings_controller
workflow_id = ConductorClientManager.start_workflow(
name="diarization_pipeline",
version=1,
input_data={
"recording_id": recording_id,
"room_name": daily_room_name,
"tracks": [{"s3_key": k} for k in filter_cam_audio_tracks(track_keys)],
"bucket_name": bucket_name,
"transcript_id": transcript.id,
"room_id": room.id,
}
)
logger.info("Started Conductor workflow", workflow_id=workflow_id, transcript_id=transcript.id)
# Store workflow_id on recording for status tracking
await recordings_controller.update(recording, {"workflow_id": workflow_id})
if not settings.CONDUCTOR_SHADOW_MODE:
return # Don't trigger Celery
# Existing Celery trigger (runs in shadow mode or when Conductor disabled)
task_pipeline_multitrack_process.delay(
transcript_id=transcript.id,
bucket_name=bucket_name,
track_keys=filter_cam_audio_tracks(track_keys),
)
Acceptance Criteria:
- Conductor workflow started from process_multitrack_recording
- Workflow ID stored on Recording model
- Both webhook and polling paths covered (single integration point)
- Celery still triggered in shadow mode
Dependencies: WFLOW-002, STATE-001
Reference Files:
server/reflector/worker/process.py(lines 172-293)CONDUCTOR_MIGRATION_REQUIREMENTS.md(Module 4 section)
SHADOW-001: Implement Shadow Mode Toggle
Description: Add configuration and logic to run both Celery and Conductor pipelines simultaneously for comparison.
Files to Modify:
server/reflector/settings.py(already has CONDUCTOR_SHADOW_MODE from INFRA-003)server/reflector/worker/process.py(INTEG-001 already implements shadow mode logic)
Implementation Details:
# settings.py (already done in INFRA-003)
CONDUCTOR_SHADOW_MODE: bool = False
# worker/process.py (in _process_multitrack_recording_inner)
if settings.CONDUCTOR_ENABLED:
workflow_id = ConductorClientManager.start_workflow(...)
await recordings_controller.update(recording, {"workflow_id": workflow_id})
if not settings.CONDUCTOR_SHADOW_MODE:
return # Conductor only - skip Celery
# If shadow mode, fall through to Celery trigger below
# Celery trigger (runs when Conductor disabled OR in shadow mode)
task_pipeline_multitrack_process.delay(...)
Acceptance Criteria:
- Both pipelines triggered when CONDUCTOR_SHADOW_MODE=True
- Only Conductor triggered when CONDUCTOR_ENABLED=True and SHADOW_MODE=False
- Only Celery triggered when CONDUCTOR_ENABLED=False
- workflow_id stored on Recording model for comparison
Dependencies: INTEG-001
Note: INTEG-001 already implements the shadow mode toggle logic. This task verifies the implementation and adds any missing comparison/monitoring infrastructure.
Reference Files:
CONDUCTOR_MIGRATION_REQUIREMENTS.md(Phase 3: Shadow Mode)
SHADOW-002: Add Result Comparison - Content Fields
Description: Compare content fields (title, summaries, topics, word counts) between Celery and Conductor outputs.
Files to Create:
server/reflector/conductor/shadow_compare.py
Implementation Details:
async def compare_content_results(recording_id: str, workflow_id: str) -> dict:
"""Compare content results from Celery and Conductor pipelines."""
celery_transcript = await transcripts_controller.get_by_recording_id(recording_id)
workflow_status = ConductorClientManager.get_workflow_status(workflow_id)
differences = []
# Compare title
if celery_transcript.title != workflow_status.output.get("title"):
differences.append({"field": "title", ...})
# Compare summaries, topics, word_count
...
return {"match": len(differences) == 0, "differences": differences}
Acceptance Criteria:
- Compares title, long_summary, short_summary
- Compares topic count and content
- Compares word_count
- Logs differences for debugging
Dependencies: SHADOW-001
Phase 5: Cutover
CUTOVER-001: Create Feature Flag for Conductor-Only Mode
Description: Enable Conductor-only mode by setting environment variables. No code changes required.
Files to Modify:
.envor environment configuration
Implementation Details:
# .env (production)
CONDUCTOR_ENABLED=true # Enable Conductor
CONDUCTOR_SHADOW_MODE=false # Disable shadow mode (Conductor only)
The logic is already implemented in INTEG-001:
# worker/process.py (_process_multitrack_recording_inner)
if settings.CONDUCTOR_ENABLED:
workflow_id = ConductorClientManager.start_workflow(...)
if not settings.CONDUCTOR_SHADOW_MODE:
return # Conductor only - Celery not triggered
# Celery only reached if Conductor disabled or shadow mode enabled
task_pipeline_multitrack_process.delay(...)
Acceptance Criteria:
- Set CONDUCTOR_ENABLED=true in production environment
- Set CONDUCTOR_SHADOW_MODE=false
- Verify Celery not triggered (check logs for "Started Conductor workflow")
- Can toggle back via environment variables without code changes
Dependencies: SHADOW-001
Note: This is primarily a configuration change. The code logic is already in place from INTEG-001.
CUTOVER-002: Add Fallback to Celery on Conductor Failure
Description: Implement automatic fallback to Celery pipeline if Conductor fails to start or process a workflow.
Files to Modify:
server/reflector/worker/process.pyserver/reflector/conductor/client.py
Implementation Details:
# In _process_multitrack_recording_inner()
if settings.CONDUCTOR_ENABLED:
try:
workflow_id = ConductorClientManager.start_workflow(
name="diarization_pipeline",
version=1,
input_data={...}
)
logger.info("Conductor workflow started", workflow_id=workflow_id, transcript_id=transcript.id)
await recordings_controller.update(recording, {"workflow_id": workflow_id})
if not settings.CONDUCTOR_SHADOW_MODE:
return # Success - don't trigger Celery
except Exception as e:
logger.error(
"Conductor workflow start failed, falling back to Celery",
error=str(e),
transcript_id=transcript.id,
exc_info=True,
)
# Fall through to Celery trigger below
# Celery fallback (runs on Conductor failure, or when disabled, or in shadow mode)
task_pipeline_multitrack_process.delay(
transcript_id=transcript.id,
bucket_name=bucket_name,
track_keys=filter_cam_audio_tracks(track_keys),
)
Acceptance Criteria:
- Celery triggered on Conductor connection failure
- Celery triggered on workflow start failure
- Errors logged with full context for debugging
- workflow_id still stored if partially successful
Dependencies: CUTOVER-001
Phase 6: Cleanup
CLEANUP-001: Remove Deprecated Celery Task Code
Description: After successful migration, remove the old Celery-based pipeline code.
Files to Modify:
server/reflector/pipelines/main_multitrack_pipeline.py- Remove entire fileserver/reflector/worker/process.py- Removetask_pipeline_multitrack_process.delay()callserver/reflector/pipelines/main_live_pipeline.py- Remove shared utilities if unused
Implementation Details:
# worker/process.py - Remove Celery fallback entirely
if settings.CONDUCTOR_ENABLED:
workflow_id = ConductorClientManager.start_workflow(...)
await recordings_controller.update(recording, {"workflow_id": workflow_id})
return # No Celery fallback
# Delete this:
# task_pipeline_multitrack_process.delay(...)
Acceptance Criteria:
main_multitrack_pipeline.pydeleted- Celery trigger removed from
worker/process.py - Old task imports removed
- No new recordings processed via Celery
- Code removed after stability period (1-2 weeks)
Dependencies: CUTOVER-001
CLEANUP-002: Update Documentation
Description: Update all documentation to reflect the new Conductor-based architecture.
Files to Modify:
CLAUDE.mdREADME.mddocs/(if applicable)
Files to Archive:
CONDUCTOR_MIGRATION_REQUIREMENTS.md(move to docs/archive/)
Acceptance Criteria:
- Architecture diagrams updated
- API documentation reflects new endpoints
- Runbooks updated for Conductor operations
Dependencies: CLEANUP-001
Testing Tasks
⚠️ Note: All test tasks should be deferred to human tester if automated testing proves too complex or time-consuming.
TEST-001a: Integration Tests - API Workers
Description: Write integration tests for get_recording and get_participants workers.
Files to Create:
server/tests/conductor/test_workers_api.py
Implementation Details:
@pytest.mark.asyncio
async def test_get_recording_worker():
with patch("reflector.conductor.workers.get_recording.create_platform_client") as mock:
mock.return_value.__aenter__.return_value.get_recording.return_value = MockRecording()
task = Task(input_data={"recording_id": "rec_123"})
result = await get_recording(task)
assert result.status == TaskResultStatus.COMPLETED
assert result.output_data["id"] == "rec_123"
Acceptance Criteria:
- get_recording worker tested with mock Daily.co API
- get_participants worker tested with mock response
- Error handling tested (API failures)
Dependencies: TASK-002, TASK-003
TEST-001b: Integration Tests - Audio Processing Workers
Description: Write integration tests for pad_track, mixdown_tracks, and generate_waveform workers.
Files to Create:
server/tests/conductor/test_workers_audio.py
Acceptance Criteria:
- pad_track worker tested with mock S3 and sample WebM
- mixdown_tracks worker tested with mock audio streams
- generate_waveform worker tested
- PyAV filter graph execution verified
Dependencies: TASK-004c, TASK-005b, TASK-006
TEST-001c: Integration Tests - Transcription Workers
Description: Write integration tests for transcribe_track and merge_transcripts workers.
Files to Create:
server/tests/conductor/test_workers_transcription.py
Acceptance Criteria:
- transcribe_track worker tested with mock Modal.com response
- merge_transcripts worker tested with multiple track inputs
- Word sorting by timestamp verified
Dependencies: TASK-007, TASK-008
TEST-001d: Integration Tests - LLM Workers
Description: Write integration tests for detect_topics, generate_title, and generate_summary workers.
Files to Create:
server/tests/conductor/test_workers_llm.py
Acceptance Criteria:
- detect_topics worker tested with mock LLM response
- generate_title worker tested
- generate_summary worker tested
- WebSocket event broadcasting verified
Dependencies: TASK-009, TASK-010, TASK-011
TEST-001e: Integration Tests - Finalization Workers
Description: Write integration tests for finalize, cleanup_consent, post_zulip, and send_webhook workers.
Files to Create:
server/tests/conductor/test_workers_finalization.py
Acceptance Criteria:
- finalize worker tested (DB update)
- cleanup_consent worker tested (S3 deletion)
- post_zulip worker tested with mock API
- send_webhook worker tested with HMAC verification
Dependencies: TASK-012, TASK-013, TASK-014, TASK-015
TEST-002: E2E Test for Complete Workflow
Description: Create an end-to-end test that runs the complete Conductor workflow with mock services.
Files to Create:
server/tests/conductor/test_workflow_e2e.py
Implementation Details:
@pytest.mark.asyncio
async def test_complete_diarization_workflow():
# Start Conductor in test mode
workflow_id = ConductorClientManager.start_workflow(
"diarization_pipeline", 1,
{"recording_id": "test_123", "tracks": [...]}
)
# Wait for completion
status = await wait_for_workflow(workflow_id, timeout=60)
assert status.status == "COMPLETED"
assert status.output["title"] is not None
Acceptance Criteria:
- Complete workflow runs successfully
- All tasks execute in correct order
- FORK_JOIN_DYNAMIC parallelism works
- Output matches expected schema
Dependencies: WFLOW-002
TEST-003: Shadow Mode Comparison Tests
Description: Write tests that verify Celery and Conductor produce equivalent results.
Files to Create:
server/tests/conductor/test_shadow_compare.py
Acceptance Criteria:
- Same input produces same output
- Timing differences documented
- Edge cases handled
Dependencies: SHADOW-002b
Appendix: Task Timeout Reference
| Task | Timeout (s) | Response Timeout (s) | Retry Count |
|---|---|---|---|
| get_recording | 60 | 30 | 3 |
| get_participants | 60 | 30 | 3 |
| pad_track | 300 | 120 | 3 |
| mixdown_tracks | 600 | 300 | 3 |
| generate_waveform | 120 | 60 | 3 |
| transcribe_track | 1800 | 900 | 3 |
| merge_transcripts | 60 | 30 | 3 |
| detect_topics | 300 | 120 | 3 |
| generate_title | 60 | 30 | 3 |
| generate_summary | 300 | 120 | 3 |
| finalize | 60 | 30 | 3 |
| cleanup_consent | 60 | 30 | 3 |
| post_zulip | 60 | 30 | 5 |
| send_webhook | 60 | 30 | 30 |
Appendix: File Structure After Migration
server/reflector/
├── conductor/
│ ├── __init__.py
│ ├── client.py # Conductor SDK wrapper
│ ├── cache.py # Idempotency cache
│ ├── shadow_compare.py # Shadow mode comparison
│ ├── tasks/
│ │ ├── __init__.py
│ │ ├── definitions.py # Task definitions with timeouts
│ │ └── register.py # Registration script
│ ├── workers/
│ │ ├── __init__.py
│ │ ├── get_recording.py
│ │ ├── get_participants.py
│ │ ├── pad_track.py
│ │ ├── mixdown_tracks.py
│ │ ├── generate_waveform.py
│ │ ├── transcribe_track.py
│ │ ├── merge_transcripts.py
│ │ ├── detect_topics.py
│ │ ├── generate_title.py
│ │ ├── generate_summary.py
│ │ ├── finalize.py
│ │ ├── cleanup_consent.py
│ │ ├── post_zulip.py
│ │ └── send_webhook.py
│ └── workflows/
│ ├── diarization_pipeline.json
│ └── register.py
├── views/
│ └── conductor.py # Health & status endpoints
└── ...existing files...