mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-20 20:29:06 +00:00
* feat: add transcript format parameter to GET endpoint
Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.
Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata
Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.
* Copy transcript format
* Regenerate types
* Fix transcript formats
* Don't throw inside try
* Remove any type
* Toast share copy errors
* transcript_format exhaustiveness and python idiomatic assert_never
* format_timestamp_mmss clear type definition
* Rename seconds_to_timestamp
* Test transcript format with overlapping speakers
* exact match for vtt multispeaker test
---------
Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
242 lines
5.9 KiB
Markdown
242 lines
5.9 KiB
Markdown
# Transcript Formats
|
|
|
|
The Reflector API provides multiple output formats for transcript data through the `transcript_format` query parameter on the GET `/v1/transcripts/{id}` endpoint.
|
|
|
|
## Overview
|
|
|
|
When retrieving a transcript, you can specify the desired format using the `transcript_format` query parameter. The API supports four formats optimized for different use cases:
|
|
|
|
- **text** - Plain text with speaker names (default)
|
|
- **text-timestamped** - Timestamped text with speaker names
|
|
- **webvtt-named** - WebVTT subtitle format with participant names
|
|
- **json** - Structured JSON segments with full metadata
|
|
|
|
All formats include participant information when available, resolving speaker IDs to actual names.
|
|
|
|
## Query Parameter Usage
|
|
|
|
```
|
|
GET /v1/transcripts/{id}?transcript_format={format}
|
|
```
|
|
|
|
### Parameters
|
|
|
|
- `transcript_format` (optional): The desired output format
|
|
- Type: `"text" | "text-timestamped" | "webvtt-named" | "json"`
|
|
- Default: `"text"`
|
|
|
|
## Format Descriptions
|
|
|
|
### Text Format (`text`)
|
|
|
|
**Use case:** Simple, human-readable transcript for display or export.
|
|
|
|
**Format:** Speaker names followed by their dialogue, one line per segment.
|
|
|
|
**Example:**
|
|
```
|
|
John Smith: Hello everyone
|
|
Jane Doe: Hi there
|
|
John Smith: How are you today?
|
|
```
|
|
|
|
**Request:**
|
|
```bash
|
|
GET /v1/transcripts/{id}?transcript_format=text
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "transcript_123",
|
|
"name": "Meeting Recording",
|
|
"transcript_format": "text",
|
|
"transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
|
|
"participants": [
|
|
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
|
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
|
],
|
|
...
|
|
}
|
|
```
|
|
|
|
### Text Timestamped Format (`text-timestamped`)
|
|
|
|
**Use case:** Transcript with timing information for navigation or reference.
|
|
|
|
**Format:** `[MM:SS]` timestamp prefix before each speaker and dialogue.
|
|
|
|
**Example:**
|
|
```
|
|
[00:00] John Smith: Hello everyone
|
|
[00:05] Jane Doe: Hi there
|
|
[00:12] John Smith: How are you today?
|
|
```
|
|
|
|
**Request:**
|
|
```bash
|
|
GET /v1/transcripts/{id}?transcript_format=text-timestamped
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "transcript_123",
|
|
"name": "Meeting Recording",
|
|
"transcript_format": "text-timestamped",
|
|
"transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
|
|
"participants": [
|
|
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
|
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
|
],
|
|
...
|
|
}
|
|
```
|
|
|
|
### WebVTT Named Format (`webvtt-named`)
|
|
|
|
**Use case:** Subtitle files for video players, accessibility tools, or video editing.
|
|
|
|
**Format:** Standard WebVTT subtitle format with voice tags using participant names.
|
|
|
|
**Example:**
|
|
```
|
|
WEBVTT
|
|
|
|
00:00:00.000 --> 00:00:05.000
|
|
<v John Smith>Hello everyone
|
|
|
|
00:00:05.000 --> 00:00:12.000
|
|
<v Jane Doe>Hi there
|
|
|
|
00:00:12.000 --> 00:00:18.000
|
|
<v John Smith>How are you today?
|
|
```
|
|
|
|
**Request:**
|
|
```bash
|
|
GET /v1/transcripts/{id}?transcript_format=webvtt-named
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "transcript_123",
|
|
"name": "Meeting Recording",
|
|
"transcript_format": "webvtt-named",
|
|
"transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
|
|
"participants": [
|
|
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
|
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
|
],
|
|
...
|
|
}
|
|
```
|
|
|
|
### JSON Format (`json`)
|
|
|
|
**Use case:** Programmatic access with full timing and speaker metadata.
|
|
|
|
**Format:** Array of segment objects with speaker information, text content, and precise timing.
|
|
|
|
**Example:**
|
|
```json
|
|
[
|
|
{
|
|
"speaker": 0,
|
|
"speaker_name": "John Smith",
|
|
"text": "Hello everyone",
|
|
"start": 0.0,
|
|
"end": 5.0
|
|
},
|
|
{
|
|
"speaker": 1,
|
|
"speaker_name": "Jane Doe",
|
|
"text": "Hi there",
|
|
"start": 5.0,
|
|
"end": 12.0
|
|
},
|
|
{
|
|
"speaker": 0,
|
|
"speaker_name": "John Smith",
|
|
"text": "How are you today?",
|
|
"start": 12.0,
|
|
"end": 18.0
|
|
}
|
|
]
|
|
```
|
|
|
|
**Request:**
|
|
```bash
|
|
GET /v1/transcripts/{id}?transcript_format=json
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "transcript_123",
|
|
"name": "Meeting Recording",
|
|
"transcript_format": "json",
|
|
"transcript": [
|
|
{
|
|
"speaker": 0,
|
|
"speaker_name": "John Smith",
|
|
"text": "Hello everyone",
|
|
"start": 0.0,
|
|
"end": 5.0
|
|
},
|
|
{
|
|
"speaker": 1,
|
|
"speaker_name": "Jane Doe",
|
|
"text": "Hi there",
|
|
"start": 5.0,
|
|
"end": 12.0
|
|
}
|
|
],
|
|
"participants": [
|
|
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
|
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
|
],
|
|
...
|
|
}
|
|
```
|
|
|
|
## Response Structure
|
|
|
|
All formats return the same base transcript metadata with an additional `transcript_format` field and format-specific `transcript` field:
|
|
|
|
### Common Fields
|
|
|
|
- `id`: Transcript identifier
|
|
- `user_id`: Owner user ID (if authenticated)
|
|
- `name`: Transcript name
|
|
- `status`: Processing status
|
|
- `locked`: Whether transcript is locked for editing
|
|
- `duration`: Total duration in seconds
|
|
- `title`: Auto-generated or custom title
|
|
- `short_summary`: Brief summary
|
|
- `long_summary`: Detailed summary
|
|
- `created_at`: Creation timestamp
|
|
- `share_mode`: Access control setting
|
|
- `source_language`: Original audio language
|
|
- `target_language`: Translation target language
|
|
- `reviewed`: Whether transcript has been reviewed
|
|
- `meeting_id`: Associated meeting ID (if applicable)
|
|
- `source_kind`: Source type (live, file, room)
|
|
- `room_id`: Associated room ID (if applicable)
|
|
- `audio_deleted`: Whether audio has been deleted
|
|
- `participants`: Array of participant objects with speaker mappings
|
|
|
|
### Format-Specific Fields
|
|
|
|
- `transcript_format`: The format identifier (discriminator field)
|
|
- `transcript`: The formatted transcript content (string for text/webvtt formats, array for json format)
|
|
|
|
## Speaker Name Resolution
|
|
|
|
All formats resolve speaker IDs to participant names when available:
|
|
|
|
- If a participant exists for the speaker ID, their name is used
|
|
- If no participant exists, a default name like "Speaker 0" is generated
|
|
- Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization
|