feat: add transcript format parameter to GET endpoint (#709)

* feat: add transcript format parameter to GET endpoint Add transcript_format query parameter to /v1/transcripts/{id} endpoint with support for multiple output formats using discriminated unions. Formats supported: - text: Plain speaker dialogue (default) - text-timestamped: Dialogue with [MM:SS] timestamps - webvtt-named: WebVTT subtitles with participant names - json: Structured segments with full metadata Response models use Pydantic discriminated unions with transcript_format as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants for minimal responses. GET endpoint returns format-specific models. * Copy transcript format * Regenerate types * Fix transcript formats * Don't throw inside try * Remove any type * Toast share copy errors * transcript_format exhaustiveness and python idiomatic assert_never * format_timestamp_mmss clear type definition * Rename seconds_to_timestamp * Test transcript format with overlapping speakers * exact match for vtt multispeaker test --------- Co-authored-by: Sergey Mankovsky <sergey@monadical.com> Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2026-05-06 11:15:18 +00:00 · 2025-11-26 11:51:14 -06:00
parent 3aef926203
commit f6ca07505f
12 changed files with 1625 additions and 138 deletions
--- a/docs/transcript.md
+++ b/docs/transcript.md
@@ -0,0 +1,241 @@
+# Transcript Formats
+
+The Reflector API provides multiple output formats for transcript data through the `transcript_format` query parameter on the GET `/v1/transcripts/{id}` endpoint.
+
+## Overview
+
+When retrieving a transcript, you can specify the desired format using the `transcript_format` query parameter. The API supports four formats optimized for different use cases:
+
+- **text** - Plain text with speaker names (default)
+- **text-timestamped** - Timestamped text with speaker names
+- **webvtt-named** - WebVTT subtitle format with participant names
+- **json** - Structured JSON segments with full metadata
+
+All formats include participant information when available, resolving speaker IDs to actual names.
+
+## Query Parameter Usage
+
+```
+GET /v1/transcripts/{id}?transcript_format={format}
+```
+
+### Parameters
+
+- `transcript_format` (optional): The desired output format
+  - Type: `"text" | "text-timestamped" | "webvtt-named" | "json"`
+  - Default: `"text"`
+
+## Format Descriptions
+
+### Text Format (`text`)
+
+**Use case:** Simple, human-readable transcript for display or export.
+
+**Format:** Speaker names followed by their dialogue, one line per segment.
+
+**Example:**
+```
+John Smith: Hello everyone
+Jane Doe: Hi there
+John Smith: How are you today?
+```
+
+**Request:**
+```bash
+GET /v1/transcripts/{id}?transcript_format=text
+```
+
+**Response:**
+```json
+{
+  "id": "transcript_123",
+  "name": "Meeting Recording",
+  "transcript_format": "text",
+  "transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
+  "participants": [
+    {"id": "p1", "speaker": 0, "name": "John Smith"},
+    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
+  ],
+  ...
+}
+```
+
+### Text Timestamped Format (`text-timestamped`)
+
+**Use case:** Transcript with timing information for navigation or reference.
+
+**Format:** `[MM:SS]` timestamp prefix before each speaker and dialogue.
+
+**Example:**
+```
+[00:00] John Smith: Hello everyone
+[00:05] Jane Doe: Hi there
+[00:12] John Smith: How are you today?
+```
+
+**Request:**
+```bash
+GET /v1/transcripts/{id}?transcript_format=text-timestamped
+```
+
+**Response:**
+```json
+{
+  "id": "transcript_123",
+  "name": "Meeting Recording",
+  "transcript_format": "text-timestamped",
+  "transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
+  "participants": [
+    {"id": "p1", "speaker": 0, "name": "John Smith"},
+    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
+  ],
+  ...
+}
+```
+
+### WebVTT Named Format (`webvtt-named`)
+
+**Use case:** Subtitle files for video players, accessibility tools, or video editing.
+
+**Format:** Standard WebVTT subtitle format with voice tags using participant names.
+
+**Example:**
+```
+WEBVTT
+
+00:00:00.000 --> 00:00:05.000
+<v John Smith>Hello everyone
+
+00:00:05.000 --> 00:00:12.000
+<v Jane Doe>Hi there
+
+00:00:12.000 --> 00:00:18.000
+<v John Smith>How are you today?
+```
+
+**Request:**
+```bash
+GET /v1/transcripts/{id}?transcript_format=webvtt-named
+```
+
+**Response:**
+```json
+{
+  "id": "transcript_123",
+  "name": "Meeting Recording",
+  "transcript_format": "webvtt-named",
+  "transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
+  "participants": [
+    {"id": "p1", "speaker": 0, "name": "John Smith"},
+    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
+  ],
+  ...
+}
+```
+
+### JSON Format (`json`)
+
+**Use case:** Programmatic access with full timing and speaker metadata.
+
+**Format:** Array of segment objects with speaker information, text content, and precise timing.
+
+**Example:**
+```json
+[
+  {
+    "speaker": 0,
+    "speaker_name": "John Smith",
+    "text": "Hello everyone",
+    "start": 0.0,
+    "end": 5.0
+  },
+  {
+    "speaker": 1,
+    "speaker_name": "Jane Doe",
+    "text": "Hi there",
+    "start": 5.0,
+    "end": 12.0
+  },
+  {
+    "speaker": 0,
+    "speaker_name": "John Smith",
+    "text": "How are you today?",
+    "start": 12.0,
+    "end": 18.0
+  }
+]
+```
+
+**Request:**
+```bash
+GET /v1/transcripts/{id}?transcript_format=json
+```
+
+**Response:**
+```json
+{
+  "id": "transcript_123",
+  "name": "Meeting Recording",
+  "transcript_format": "json",
+  "transcript": [
+    {
+      "speaker": 0,
+      "speaker_name": "John Smith",
+      "text": "Hello everyone",
+      "start": 0.0,
+      "end": 5.0
+    },
+    {
+      "speaker": 1,
+      "speaker_name": "Jane Doe",
+      "text": "Hi there",
+      "start": 5.0,
+      "end": 12.0
+    }
+  ],
+  "participants": [
+    {"id": "p1", "speaker": 0, "name": "John Smith"},
+    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
+  ],
+  ...
+}
+```
+
+## Response Structure
+
+All formats return the same base transcript metadata with an additional `transcript_format` field and format-specific `transcript` field:
+
+### Common Fields
+
+- `id`: Transcript identifier
+- `user_id`: Owner user ID (if authenticated)
+- `name`: Transcript name
+- `status`: Processing status
+- `locked`: Whether transcript is locked for editing
+- `duration`: Total duration in seconds
+- `title`: Auto-generated or custom title
+- `short_summary`: Brief summary
+- `long_summary`: Detailed summary
+- `created_at`: Creation timestamp
+- `share_mode`: Access control setting
+- `source_language`: Original audio language
+- `target_language`: Translation target language
+- `reviewed`: Whether transcript has been reviewed
+- `meeting_id`: Associated meeting ID (if applicable)
+- `source_kind`: Source type (live, file, room)
+- `room_id`: Associated room ID (if applicable)
+- `audio_deleted`: Whether audio has been deleted
+- `participants`: Array of participant objects with speaker mappings
+
+### Format-Specific Fields
+
+- `transcript_format`: The format identifier (discriminator field)
+- `transcript`: The formatted transcript content (string for text/webvtt formats, array for json format)
+
+## Speaker Name Resolution
+
+All formats resolve speaker IDs to participant names when available:
+
+- If a participant exists for the speaker ID, their name is used
+- If no participant exists, a default name like "Speaker 0" is generated
+- Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization