mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-22 05:09:05 +00:00
116 lines
3.2 KiB
Markdown
116 lines
3.2 KiB
Markdown
# Durable Workflow Migration Tasks
|
|
|
|
This document defines atomic, isolated work items for migrating the Daily.co multitrack diarization pipeline from Celery to durable workflow orchestration using **Hatchet**.
|
|
|
|
---
|
|
|
|
## Provider Selection
|
|
|
|
```bash
|
|
# .env
|
|
DURABLE_WORKFLOW_PROVIDER=none # Celery only (default)
|
|
DURABLE_WORKFLOW_PROVIDER=hatchet # Use Hatchet
|
|
DURABLE_WORKFLOW_SHADOW_MODE=true # Run both Hatchet + Celery (for comparison)
|
|
```
|
|
|
|
---
|
|
|
|
## Task Index
|
|
|
|
| ID | Title | Status |
|
|
|----|-------|--------|
|
|
| INFRA-001 | Add container to docker-compose | Done |
|
|
| INFRA-002 | Create Python client wrapper | Done |
|
|
| INFRA-003 | Add environment configuration | Done |
|
|
| TASK-001 | Create workflow definition | Done |
|
|
| TASK-002 | get_recording task | Done |
|
|
| TASK-003 | get_participants task | Done |
|
|
| TASK-004 | pad_track task | Done |
|
|
| TASK-005 | mixdown_tracks task | Done |
|
|
| TASK-006 | generate_waveform task | Done |
|
|
| TASK-007 | transcribe_track task | Done |
|
|
| TASK-008 | merge_transcripts task | Done (in process_tracks) |
|
|
| TASK-009 | detect_topics task | Done |
|
|
| TASK-010 | generate_title task | Done |
|
|
| TASK-011 | generate_summary task | Done |
|
|
| TASK-012 | finalize task | Done |
|
|
| TASK-013 | cleanup_consent task | Done |
|
|
| TASK-014 | post_zulip task | Done |
|
|
| TASK-015 | send_webhook task | Done |
|
|
| EVENT-001 | Progress WebSocket events | Done |
|
|
| INTEG-001 | Pipeline trigger integration | Done |
|
|
| SHADOW-001 | Shadow mode toggle | Done |
|
|
| TEST-001 | Integration tests | Pending |
|
|
| TEST-002 | E2E workflow test | Pending |
|
|
| CUTOVER-001 | Production cutover | Pending |
|
|
| CLEANUP-001 | Remove Celery code | Pending |
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
```
|
|
server/reflector/hatchet/
|
|
├── client.py # SDK wrapper
|
|
├── progress.py # WebSocket progress emission
|
|
├── run_workers.py # Worker startup
|
|
└── workflows/
|
|
├── diarization_pipeline.py # Main workflow with all tasks
|
|
└── track_processing.py # Child workflow (pad + transcribe)
|
|
```
|
|
|
|
---
|
|
|
|
## Remaining Work
|
|
|
|
### TEST-001: Integration Tests
|
|
- [ ] Test each task with mocked external services
|
|
- [ ] Test error handling and retries
|
|
|
|
### TEST-002: E2E Workflow Test
|
|
- [ ] Complete workflow run with real Daily.co recording
|
|
- [ ] Verify output matches Celery pipeline
|
|
- [ ] Performance comparison
|
|
|
|
### CUTOVER-001: Production Cutover
|
|
- [ ] Deploy with `DURABLE_WORKFLOW_PROVIDER=hatchet`
|
|
- [ ] Monitor for failures
|
|
- [ ] Compare results with shadow mode if needed
|
|
|
|
### CLEANUP-001: Remove Celery Code
|
|
- [ ] Remove `main_multitrack_pipeline.py`
|
|
- [ ] Remove Celery task triggers
|
|
- [ ] Update documentation
|
|
|
|
---
|
|
|
|
## Known Issues
|
|
|
|
### Hatchet
|
|
- See `HATCHET_LLM_OBSERVATIONS.md` for debugging notes
|
|
- SDK v1.21+ API changes (breaking)
|
|
- JWT token Docker networking issues
|
|
- Worker appears hung without debug mode
|
|
- Workflow replay is version-locked (use --force to run latest code)
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Hatchet
|
|
```bash
|
|
# Start infrastructure
|
|
docker compose up -d hatchet hatchet-worker
|
|
|
|
# Workers auto-register on startup
|
|
```
|
|
|
|
### Trigger Workflow
|
|
```bash
|
|
# Set provider in .env
|
|
DURABLE_WORKFLOW_PROVIDER=hatchet
|
|
|
|
# Process a Daily.co recording via webhook or API
|
|
# The pipeline trigger automatically uses the configured provider
|
|
```
|