* feat: add transcript format parameter to GET endpoint
Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.
Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata
Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.
* Copy transcript format
* Regenerate types
* Fix transcript formats
* Don't throw inside try
* Remove any type
* Toast share copy errors
* transcript_format exhaustiveness and python idiomatic assert_never
* format_timestamp_mmss clear type definition
* Rename seconds_to_timestamp
* Test transcript format with overlapping speakers
* exact match for vtt multispeaker test
---------
Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
5.9 KiB
Transcript Formats
The Reflector API provides multiple output formats for transcript data through the transcript_format query parameter on the GET /v1/transcripts/{id} endpoint.
Overview
When retrieving a transcript, you can specify the desired format using the transcript_format query parameter. The API supports four formats optimized for different use cases:
- text - Plain text with speaker names (default)
- text-timestamped - Timestamped text with speaker names
- webvtt-named - WebVTT subtitle format with participant names
- json - Structured JSON segments with full metadata
All formats include participant information when available, resolving speaker IDs to actual names.
Query Parameter Usage
GET /v1/transcripts/{id}?transcript_format={format}
Parameters
transcript_format(optional): The desired output format- Type:
"text" | "text-timestamped" | "webvtt-named" | "json" - Default:
"text"
- Type:
Format Descriptions
Text Format (text)
Use case: Simple, human-readable transcript for display or export.
Format: Speaker names followed by their dialogue, one line per segment.
Example:
John Smith: Hello everyone
Jane Doe: Hi there
John Smith: How are you today?
Request:
GET /v1/transcripts/{id}?transcript_format=text
Response:
{
"id": "transcript_123",
"name": "Meeting Recording",
"transcript_format": "text",
"transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
"participants": [
{"id": "p1", "speaker": 0, "name": "John Smith"},
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
],
...
}
Text Timestamped Format (text-timestamped)
Use case: Transcript with timing information for navigation or reference.
Format: [MM:SS] timestamp prefix before each speaker and dialogue.
Example:
[00:00] John Smith: Hello everyone
[00:05] Jane Doe: Hi there
[00:12] John Smith: How are you today?
Request:
GET /v1/transcripts/{id}?transcript_format=text-timestamped
Response:
{
"id": "transcript_123",
"name": "Meeting Recording",
"transcript_format": "text-timestamped",
"transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
"participants": [
{"id": "p1", "speaker": 0, "name": "John Smith"},
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
],
...
}
WebVTT Named Format (webvtt-named)
Use case: Subtitle files for video players, accessibility tools, or video editing.
Format: Standard WebVTT subtitle format with voice tags using participant names.
Example:
WEBVTT
00:00:00.000 --> 00:00:05.000
<v John Smith>Hello everyone
00:00:05.000 --> 00:00:12.000
<v Jane Doe>Hi there
00:00:12.000 --> 00:00:18.000
<v John Smith>How are you today?
Request:
GET /v1/transcripts/{id}?transcript_format=webvtt-named
Response:
{
"id": "transcript_123",
"name": "Meeting Recording",
"transcript_format": "webvtt-named",
"transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
"participants": [
{"id": "p1", "speaker": 0, "name": "John Smith"},
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
],
...
}
JSON Format (json)
Use case: Programmatic access with full timing and speaker metadata.
Format: Array of segment objects with speaker information, text content, and precise timing.
Example:
[
{
"speaker": 0,
"speaker_name": "John Smith",
"text": "Hello everyone",
"start": 0.0,
"end": 5.0
},
{
"speaker": 1,
"speaker_name": "Jane Doe",
"text": "Hi there",
"start": 5.0,
"end": 12.0
},
{
"speaker": 0,
"speaker_name": "John Smith",
"text": "How are you today?",
"start": 12.0,
"end": 18.0
}
]
Request:
GET /v1/transcripts/{id}?transcript_format=json
Response:
{
"id": "transcript_123",
"name": "Meeting Recording",
"transcript_format": "json",
"transcript": [
{
"speaker": 0,
"speaker_name": "John Smith",
"text": "Hello everyone",
"start": 0.0,
"end": 5.0
},
{
"speaker": 1,
"speaker_name": "Jane Doe",
"text": "Hi there",
"start": 5.0,
"end": 12.0
}
],
"participants": [
{"id": "p1", "speaker": 0, "name": "John Smith"},
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
],
...
}
Response Structure
All formats return the same base transcript metadata with an additional transcript_format field and format-specific transcript field:
Common Fields
id: Transcript identifieruser_id: Owner user ID (if authenticated)name: Transcript namestatus: Processing statuslocked: Whether transcript is locked for editingduration: Total duration in secondstitle: Auto-generated or custom titleshort_summary: Brief summarylong_summary: Detailed summarycreated_at: Creation timestampshare_mode: Access control settingsource_language: Original audio languagetarget_language: Translation target languagereviewed: Whether transcript has been reviewedmeeting_id: Associated meeting ID (if applicable)source_kind: Source type (live, file, room)room_id: Associated room ID (if applicable)audio_deleted: Whether audio has been deletedparticipants: Array of participant objects with speaker mappings
Format-Specific Fields
transcript_format: The format identifier (discriminator field)transcript: The formatted transcript content (string for text/webvtt formats, array for json format)
Speaker Name Resolution
All formats resolve speaker IDs to participant names when available:
- If a participant exists for the speaker ID, their name is used
- If no participant exists, a default name like "Speaker 0" is generated
- Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization