Files
reflector/TASKS.md
2025-12-16 13:24:05 -05:00

3.2 KiB

Durable Workflow Migration Tasks

This document defines atomic, isolated work items for migrating the Daily.co multitrack diarization pipeline from Celery to durable workflow orchestration using Hatchet.


Provider Selection

# .env
DURABLE_WORKFLOW_PROVIDER=none      # Celery only (default)
DURABLE_WORKFLOW_PROVIDER=hatchet   # Use Hatchet
DURABLE_WORKFLOW_SHADOW_MODE=true   # Run both Hatchet + Celery (for comparison)

Task Index

ID Title Status
INFRA-001 Add container to docker-compose Done
INFRA-002 Create Python client wrapper Done
INFRA-003 Add environment configuration Done
TASK-001 Create workflow definition Done
TASK-002 get_recording task Done
TASK-003 get_participants task Done
TASK-004 pad_track task Done
TASK-005 mixdown_tracks task Done
TASK-006 generate_waveform task Done
TASK-007 transcribe_track task Done
TASK-008 merge_transcripts task Done (in process_tracks)
TASK-009 detect_topics task Done
TASK-010 generate_title task Done
TASK-011 generate_summary task Done
TASK-012 finalize task Done
TASK-013 cleanup_consent task Done
TASK-014 post_zulip task Done
TASK-015 send_webhook task Done
EVENT-001 Progress WebSocket events Done
INTEG-001 Pipeline trigger integration Done
SHADOW-001 Shadow mode toggle Done
TEST-001 Integration tests Pending
TEST-002 E2E workflow test Pending
CUTOVER-001 Production cutover Pending
CLEANUP-001 Remove Celery code Pending

File Structure

server/reflector/hatchet/
├── client.py                    # SDK wrapper
├── progress.py                  # WebSocket progress emission
├── run_workers.py               # Worker startup
└── workflows/
    ├── diarization_pipeline.py  # Main workflow with all tasks
    └── track_processing.py      # Child workflow (pad + transcribe)

Remaining Work

TEST-001: Integration Tests

  • Test each task with mocked external services
  • Test error handling and retries

TEST-002: E2E Workflow Test

  • Complete workflow run with real Daily.co recording
  • Verify output matches Celery pipeline
  • Performance comparison

CUTOVER-001: Production Cutover

  • Deploy with DURABLE_WORKFLOW_PROVIDER=hatchet
  • Monitor for failures
  • Compare results with shadow mode if needed

CLEANUP-001: Remove Celery Code

  • Remove main_multitrack_pipeline.py
  • Remove Celery task triggers
  • Update documentation

Known Issues

Hatchet

  • See HATCHET_LLM_OBSERVATIONS.md for debugging notes
  • SDK v1.21+ API changes (breaking)
  • JWT token Docker networking issues
  • Worker appears hung without debug mode
  • Workflow replay is version-locked (use --force to run latest code)

Quick Start

Hatchet

# Start infrastructure
docker compose up -d hatchet hatchet-worker

# Workers auto-register on startup

Trigger Workflow

# Set provider in .env
DURABLE_WORKFLOW_PROVIDER=hatchet

# Process a Daily.co recording via webhook or API
# The pipeline trigger automatically uses the configured provider