meeting consent vibe

This commit is contained in:
Igor Loskutov
2025-06-17 16:30:23 -04:00
parent b85338754e
commit 91c7c8b83a
19 changed files with 3929 additions and 3836 deletions

425
GUIDE.md
View File

@@ -1,425 +0,0 @@
# Codebase Review Guide: Audio Storage Consent Implementation
This guide walks through the relevant parts of the codebase for implementing the audio storage consent flow. **Important**: This implementation works with post-processing deletion, not real-time recording control, due to Whereby integration constraints.
## System Reality: Recording Detection Constraints
**Critical Understanding**:
- **No real-time recording detection** - System only discovers recordings after they complete via SQS polling (60+ second delay)
- **Cannot stop recordings in progress** - Whereby controls recording entirely based on room configuration
- **Limited webhooks** - Only `room.client.joined/left` events available, no recording events
- **Post-processing intervention only** - Can only mark recordings for deletion during SQS processing
## 1. Current Consent Implementation (TO BE REMOVED)
### File: `www/app/[roomName]/page.tsx`
**Purpose:** Room entry page with blocking consent dialog
**Key Areas:**
- **Line 24:** `const [consentGiven, setConsentGiven] = useState<boolean | null>(null);`
- **Lines 34-36:** `handleConsent` function that sets consent state
- **Lines 80-124:** Consent UI blocking room entry
- **Line 80:** `if (!isAuthenticated && !consentGiven)` - blocking condition
**Current Logic:**
```typescript
// Lines 99-111: Consent request UI
{consentGiven === null ? (
<>
<Text fontSize="lg" fontWeight="bold">
This meeting may be recorded. Do you consent to being recorded?
</Text>
<HStack spacing={4}>
<Button variant="outline" onClick={() => handleConsent(false)}>
No, I do not consent
</Button>
<Button colorScheme="blue" onClick={() => handleConsent(true)}>
Yes, I consent
</Button>
</HStack>
</>
) : (
// Lines 114-120: Rejection message
<Text>You cannot join the meeting without consenting...</Text>
)}
```
**What to Change:** Remove entire consent blocking logic, allow direct room entry.
---
## 2. Whereby Integration Reality
### File: `www/app/[roomName]/page.tsx`
**Purpose:** Main room page where video call happens via whereby-embed
**Key Whereby Integration:**
- **Line 129:** `<whereby-embed>` element - this IS the video call
- **Lines 26-28:** Room URL from meeting API
- **Lines 48-57:** Event listeners for whereby events
**What Happens:**
1. `useRoomMeeting()` calls backend to create/get Whereby meeting
2. Whereby automatically records based on room `recording_trigger` configuration
3. **NO real-time recording status** - system doesn't know when recording starts/stops
### File: `www/app/[roomName]/useRoomMeeting.tsx`
**Purpose:** Creates or retrieves Whereby meeting for room
**Key Flow:**
- **Line 48:** Calls `v1RoomsCreateMeeting({ roomName })`
- **Lines 49-52:** Returns meeting with `room_url` and `host_room_url`
- Meeting includes recording configuration from room settings
**What to Add:** Consent dialog overlay on the whereby-embed - always ask for consent regardless of meeting configuration (simplified approach).
---
## 3. Recording Discovery System (POST-PROCESSING ONLY)
### File: `server/reflector/worker/process.py`
**Purpose:** Discovers recordings after they complete via SQS polling
**Key Areas:**
- **Lines 24-62:** `process_messages()` - polls SQS every 60 seconds
- **Lines 66-133:** `process_recording()` - processes discovered recording files
- **Lines 69-71:** Extracts meeting info from S3 object key format
**Current Discovery Flow:**
```python
# Lines 69-71: Parse S3 object key
room_name = f"/{object_key[:36]}" # First 36 chars = room GUID
recorded_at = datetime.fromisoformat(object_key[37:57]) # Timestamp
# Lines 73-74: Link to meeting
meeting = await meetings_controller.get_by_room_name(room_name)
room = await rooms_controller.get_by_id(meeting.room_id)
```
**What to Add:** Consent checking after transcript processing - always create transcript first, then delete only audio files if consent denied.
### File: `server/reflector/worker/app.py`
**Purpose:** Celery task scheduling
**Key Schedule:**
- **Lines 26-29:** `process_messages` runs every 60 seconds
- **Lines 30-33:** `process_meetings` runs every 60 seconds to check meeting status
**Reality:** consent must be requested during the meeting, not based on recording detection.
---
## 4. Meeting-Based Consent Timing
### File: `server/reflector/views/whereby.py`
**Purpose:** Whereby webhook handler - receives participant join/leave events
**Key Areas:**
- **Lines 69-72:** Handles `room.client.joined` and `room.client.left` events
- **Line 71:** Updates `num_clients` count in meeting record
**Current Logic:**
```python
# Lines 69-72: Participant tracking
if event.type in ["room.client.joined", "room.client.left"]:
await meetings_controller.update_meeting(
meeting.id, num_clients=event.data["numClients"]
)
```
**What to Add:** ALWAYS ask for consent - no triggers, no conditions. Simple list field to track who denied consent.
### File: `server/reflector/db/meetings.py`
**Purpose:** Meeting database model and recording configuration
**Key Recording Config:**
- **Lines 56-59:** Recording trigger options:
- `"automatic"` - Recording starts immediately
- `"automatic-2nd-participant"` (default) - Recording starts when 2nd person joins
- `"prompt"` - Manual recording start
- `"none"` - No recording
**Current Meeting Model:**
```python
# Lines 56-59: Recording configuration
recording_type: Literal["none", "local", "cloud"] = "cloud"
recording_trigger: Literal[
"none", "prompt", "automatic", "automatic-2nd-participant"
] = "automatic-2nd-participant"
```
**What to Add:** Dictionary field `participant_consent_responses: dict[str, bool]` in Meeting model to store {user_id: true/false}. ALWAYS ask for consent - no complex logic.
---
## 5. Consent Implementation (NO WebSockets Needed)
**Consent is meeting-level, not transcript-level** - WebSocket events are for transcript processing, not consent.
### Simple Consent Flow:
1. **Frontend**: Show consent dialog when meeting loads
2. **User Response**: Direct API call to `/meetings/{meeting_id}/consent`
3. **Backend**: Store response in meeting record
4. **SQS Processing**: Check consent during recording processing
**No WebSocket events needed** - consent is a simple API interaction, not real-time transcript data.
---
## 4. Backend WebSocket System
### File: `server/reflector/views/transcripts_websocket.py`
**Purpose:** Server-side WebSocket endpoint for real-time events
**Key Areas:**
- **Lines 19-55:** `transcript_events_websocket` function
- **Line 32:** Room ID format: `room_id = f"ts:{transcript_id}"`
- **Lines 37-44:** Initial event sending to new connections
- **Lines 42-43:** Filtering events: `if name in ("TRANSCRIPT", "STATUS"): continue`
**Current Flow:**
1. WebSocket connects to `/transcripts/{transcript_id}/events`
2. Server adds user to Redis room `ts:{transcript_id}`
3. Server sends historical events (except TRANSCRIPT/STATUS)
4. Server waits for new events via Redis pub/sub
**What to Add:** Handle new consent events in the message flow.
### File: `server/reflector/ws_manager.py`
**Purpose:** Redis pub/sub WebSocket management
**Key Areas:**
- **Lines 61-99:** `WebsocketManager` class
- **Lines 78-79:** `send_json` method for broadcasting
- **Lines 88-98:** `_pubsub_data_reader` for distributing messages
**Broadcasting Pattern:**
```python
# Line 78: How to broadcast to all users in a room
async def send_json(self, room_id: str, message: dict) -> None:
await self.pubsub_client.send_json(room_id, message)
```
**What to Use:** This system for broadcasting consent requests and responses.
---
## 5. Database Models and Migrations
### File: `server/reflector/db/transcripts.py`
**Purpose:** Transcript database model and controller
**Key Areas:**
- **Lines 28-73:** `transcripts` SQLAlchemy table definition
- **Lines 149-172:** `Transcript` Pydantic model
- **Lines 304-614:** `TranscriptController` class with database operations
**Current Schema Fields:**
```python
# Lines 31-72: Key existing columns
sqlalchemy.Column("id", sqlalchemy.String, primary_key=True),
sqlalchemy.Column("status", sqlalchemy.String),
sqlalchemy.Column("duration", sqlalchemy.Integer),
sqlalchemy.Column("locked", sqlalchemy.Boolean),
sqlalchemy.Column("audio_location", sqlalchemy.String, server_default="local"),
# ... more columns
```
**Audio File Management:**
- **Lines 225-230:** Audio file path properties
- **Lines 252-284:** `get_audio_url` method for accessing audio
- **Lines 554-571:** `move_mp3_to_storage` for cloud storage
**What to Add:** New columns for consent tracking and deletion marking.
### File: `server/migrations/versions/b9348748bbbc_reviewed.py`
**Purpose:** Example migration pattern for adding boolean columns
**Pattern:**
```python
# Lines 20-23: Adding boolean column with default
def upgrade() -> None:
op.add_column('transcript', sa.Column('reviewed', sa.Boolean(),
server_default=sa.text('0'), nullable=False))
def downgrade() -> None:
op.drop_column('transcript', 'reviewed')
```
**What to Follow:** This pattern for adding consent columns.
---
## 6. API Endpoint Patterns
### File: `server/reflector/views/transcripts.py`
**Purpose:** REST API endpoints for transcript operations
**Key Areas:**
- **Lines 29-30:** Router setup: `router = APIRouter()`
- **Lines 70-85:** `CreateTranscript` and `UpdateTranscript` models
- **Lines 122-135:** Example POST endpoint: `transcripts_create`
**Endpoint Pattern:**
```python
# Lines 122-135: Standard endpoint structure
@router.post("/transcripts", response_model=GetTranscript)
async def transcripts_create(
info: CreateTranscript,
user: Annotated[Optional[auth.UserInfo], Depends(auth.current_user_optional)],
):
user_id = user["sub"] if user else None
return await transcripts_controller.add(...)
```
**Authentication Pattern:**
- **Line 125:** Optional user authentication dependency
- **Line 127:** Extract user ID: `user_id = user["sub"] if user else None`
**What to Follow:** This pattern for new consent endpoint.
---
## 7. Live Pipeline System
### File: `server/reflector/pipelines/main_live_pipeline.py`
**Purpose:** Real-time processing pipeline during recording
**Key Areas:**
- **Lines 80-96:** `@broadcast_to_sockets` decorator for WebSocket events
- **Lines 98-104:** `@get_transcript` decorator for database access
- **Line 56:** WebSocket manager import: `from reflector.ws_manager import get_ws_manager`
**Event Broadcasting Pattern:**
```python
# Lines 80-95: Decorator for broadcasting events
def broadcast_to_sockets(func):
async def wrapper(self, *args, **kwargs):
resp = await func(self, *args, **kwargs)
if resp is None:
return
await self.ws_manager.send_json(
room_id=self.ws_room_id,
message=resp.model_dump(mode="json"),
)
return wrapper
```
---
## 8. Modal/Dialog Patterns
### File: `www/app/(app)/transcripts/[transcriptId]/shareModal.tsx`
**Purpose:** Example modal implementation using fixed overlay
**Key Areas:**
- **Lines 105-176:** Modal implementation using `fixed inset-0` overlay
- **Lines 107-108:** Overlay styling: `fixed inset-0 bg-gray-600 bg-opacity-50`
- **Lines 152-170:** Button patterns for actions
**Modal Structure:**
```typescript
// Lines 105-109: Modal overlay and container
<div className="absolute">
{props.show && (
<div className="fixed inset-0 bg-gray-600 bg-opacity-50 overflow-y-auto h-full w-full z-50">
<div className="relative top-20 mx-auto p-5 w-96 shadow-lg rounded-md bg-white">
// Modal content...
</div>
</div>
)}
</div>
```
### File: `www/app/(app)/transcripts/shareAndPrivacy.tsx`
**Purpose:** Example using Chakra UI Modal components
**Key Areas:**
- **Lines 10-16:** Chakra UI Modal imports
- **Lines 86-100:** Chakra Modal structure
**Chakra Modal Pattern:**
```typescript
// Lines 86-94: Chakra UI Modal structure
<Modal isOpen={!!showModal} onClose={() => setShowModal(false)} size={"xl"}>
<ModalOverlay />
<ModalContent>
<ModalHeader>Share</ModalHeader>
<ModalBody>
// Modal content...
</ModalBody>
</ModalContent>
</Modal>
```
**What to Choose:** Either pattern works - fixed overlay for simple cases, Chakra UI for consistent styling.
---
## 9. Audio File Management
### File: `server/reflector/db/transcripts.py`
**Purpose:** Audio file storage and access
**Key Methods:**
- **Lines 225-230:** File path properties
- `audio_wav_filename`: Local WAV file path
- `audio_mp3_filename`: Local MP3 file path
- `storage_audio_path`: Cloud storage path
- **Lines 252-284:** `get_audio_url()` - Generate access URL
- **Lines 554-571:** `move_mp3_to_storage()` - Move to cloud
- **Lines 572-580:** `download_mp3_from_storage()` - Download from cloud
**File Path Properties:**
```python
# Lines 225-230: Audio file locations
@property
def audio_wav_filename(self):
return self.data_path / "audio.wav"
@property
def audio_mp3_filename(self):
return self.data_path / "audio.mp3"
```
**Storage Logic:**
- **Line 253:** Local files: `if self.audio_location == "local"`
- **Line 255:** Cloud storage: `elif self.audio_location == "storage"`
**What to Modify:** Add deletion logic and update `get_audio_url` to handle deleted files.
---
## 10. Review Checklist
Before implementing, manually review these areas with the **meeting-based consent** approach:
### Frontend Changes
- [ ] **Room Entry**: Remove consent blocking in `www/app/[roomName]/page.tsx:80-124`
- [ ] **Meeting UI**: Add consent dialog overlay on `whereby-embed` in `www/app/[roomName]/page.tsx:126+`
- [ ] **Meeting Hook**: Update `www/app/[roomName]/useRoomMeeting.tsx` to provide meeting data for consent
- [ ] **WebSocket Events**: Add consent event handlers (meeting-based, not transcript-based)
- [ ] **User Identification**: Add browser fingerprinting for anonymous users
### Backend Changes - Meeting Scope
- [ ] **Database**: Create `meeting_consent` table migration following `server/migrations/versions/b9348748bbbc_reviewed.py` pattern
- [ ] **Meeting Model**: Add consent tracking in `server/reflector/db/meetings.py`
- [ ] **Recording Model**: Add deletion flags in `server/reflector/db/recordings.py`
- [ ] **API**: Add meeting consent endpoint in `server/reflector/views/meetings.py`
- [ ] **Whereby Webhook**: Update `server/reflector/views/whereby.py` to trigger consent based on participant count
- [ ] **SQS Processing**: Update `server/reflector/worker/process.py` to check consent before processing recordings
### Critical Integration Points
- [ ] **Consent Timing**: ALWAYS ask for consent - no conditions, no triggers, no participant count checks
- [ ] **SQS Processing**: Always create transcript first, then delete only audio files if consent denied
- [ ] **Meeting Scoping**: All consent tracking uses `meeting_id`, not `room_id` (rooms are reused)
- [ ] **Post-Processing Only**: No real-time recording control - all intervention happens during SQS processing
### Testing Strategy
- [ ] **Multiple Participants**: Test consent collection from multiple users in same meeting
- [ ] **Room Reuse**: Verify consent doesn't affect other meetings in same room
- [ ] **Recording Triggers**: Test different `recording_trigger` configurations
- [ ] **SQS Deletion**: Verify recordings are deleted from S3 when consent denied
- [ ] **Timing Edge Cases**: Test consent given after recording already started
**Reality Check**: This implementation works with **post-processing deletion only**. We cannot stop recordings in progress or detect exactly when they start. Consent timing is estimated based on meeting configuration and participant events.