mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2025-12-20 12:19:06 +00:00
feat: add transcript format parameter to GET endpoint (#709)
* feat: add transcript format parameter to GET endpoint
Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.
Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata
Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.
* Copy transcript format
* Regenerate types
* Fix transcript formats
* Don't throw inside try
* Remove any type
* Toast share copy errors
* transcript_format exhaustiveness and python idiomatic assert_never
* format_timestamp_mmss clear type definition
* Rename seconds_to_timestamp
* Test transcript format with overlapping speakers
* exact match for vtt multispeaker test
---------
Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
This commit is contained in:
241
docs/transcript.md
Normal file
241
docs/transcript.md
Normal file
@@ -0,0 +1,241 @@
|
||||
# Transcript Formats
|
||||
|
||||
The Reflector API provides multiple output formats for transcript data through the `transcript_format` query parameter on the GET `/v1/transcripts/{id}` endpoint.
|
||||
|
||||
## Overview
|
||||
|
||||
When retrieving a transcript, you can specify the desired format using the `transcript_format` query parameter. The API supports four formats optimized for different use cases:
|
||||
|
||||
- **text** - Plain text with speaker names (default)
|
||||
- **text-timestamped** - Timestamped text with speaker names
|
||||
- **webvtt-named** - WebVTT subtitle format with participant names
|
||||
- **json** - Structured JSON segments with full metadata
|
||||
|
||||
All formats include participant information when available, resolving speaker IDs to actual names.
|
||||
|
||||
## Query Parameter Usage
|
||||
|
||||
```
|
||||
GET /v1/transcripts/{id}?transcript_format={format}
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `transcript_format` (optional): The desired output format
|
||||
- Type: `"text" | "text-timestamped" | "webvtt-named" | "json"`
|
||||
- Default: `"text"`
|
||||
|
||||
## Format Descriptions
|
||||
|
||||
### Text Format (`text`)
|
||||
|
||||
**Use case:** Simple, human-readable transcript for display or export.
|
||||
|
||||
**Format:** Speaker names followed by their dialogue, one line per segment.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
John Smith: Hello everyone
|
||||
Jane Doe: Hi there
|
||||
John Smith: How are you today?
|
||||
```
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
GET /v1/transcripts/{id}?transcript_format=text
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "transcript_123",
|
||||
"name": "Meeting Recording",
|
||||
"transcript_format": "text",
|
||||
"transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
|
||||
"participants": [
|
||||
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
||||
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Text Timestamped Format (`text-timestamped`)
|
||||
|
||||
**Use case:** Transcript with timing information for navigation or reference.
|
||||
|
||||
**Format:** `[MM:SS]` timestamp prefix before each speaker and dialogue.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
[00:00] John Smith: Hello everyone
|
||||
[00:05] Jane Doe: Hi there
|
||||
[00:12] John Smith: How are you today?
|
||||
```
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
GET /v1/transcripts/{id}?transcript_format=text-timestamped
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "transcript_123",
|
||||
"name": "Meeting Recording",
|
||||
"transcript_format": "text-timestamped",
|
||||
"transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
|
||||
"participants": [
|
||||
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
||||
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### WebVTT Named Format (`webvtt-named`)
|
||||
|
||||
**Use case:** Subtitle files for video players, accessibility tools, or video editing.
|
||||
|
||||
**Format:** Standard WebVTT subtitle format with voice tags using participant names.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
WEBVTT
|
||||
|
||||
00:00:00.000 --> 00:00:05.000
|
||||
<v John Smith>Hello everyone
|
||||
|
||||
00:00:05.000 --> 00:00:12.000
|
||||
<v Jane Doe>Hi there
|
||||
|
||||
00:00:12.000 --> 00:00:18.000
|
||||
<v John Smith>How are you today?
|
||||
```
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
GET /v1/transcripts/{id}?transcript_format=webvtt-named
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "transcript_123",
|
||||
"name": "Meeting Recording",
|
||||
"transcript_format": "webvtt-named",
|
||||
"transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
|
||||
"participants": [
|
||||
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
||||
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### JSON Format (`json`)
|
||||
|
||||
**Use case:** Programmatic access with full timing and speaker metadata.
|
||||
|
||||
**Format:** Array of segment objects with speaker information, text content, and precise timing.
|
||||
|
||||
**Example:**
|
||||
```json
|
||||
[
|
||||
{
|
||||
"speaker": 0,
|
||||
"speaker_name": "John Smith",
|
||||
"text": "Hello everyone",
|
||||
"start": 0.0,
|
||||
"end": 5.0
|
||||
},
|
||||
{
|
||||
"speaker": 1,
|
||||
"speaker_name": "Jane Doe",
|
||||
"text": "Hi there",
|
||||
"start": 5.0,
|
||||
"end": 12.0
|
||||
},
|
||||
{
|
||||
"speaker": 0,
|
||||
"speaker_name": "John Smith",
|
||||
"text": "How are you today?",
|
||||
"start": 12.0,
|
||||
"end": 18.0
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
GET /v1/transcripts/{id}?transcript_format=json
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "transcript_123",
|
||||
"name": "Meeting Recording",
|
||||
"transcript_format": "json",
|
||||
"transcript": [
|
||||
{
|
||||
"speaker": 0,
|
||||
"speaker_name": "John Smith",
|
||||
"text": "Hello everyone",
|
||||
"start": 0.0,
|
||||
"end": 5.0
|
||||
},
|
||||
{
|
||||
"speaker": 1,
|
||||
"speaker_name": "Jane Doe",
|
||||
"text": "Hi there",
|
||||
"start": 5.0,
|
||||
"end": 12.0
|
||||
}
|
||||
],
|
||||
"participants": [
|
||||
{"id": "p1", "speaker": 0, "name": "John Smith"},
|
||||
{"id": "p2", "speaker": 1, "name": "Jane Doe"}
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
## Response Structure
|
||||
|
||||
All formats return the same base transcript metadata with an additional `transcript_format` field and format-specific `transcript` field:
|
||||
|
||||
### Common Fields
|
||||
|
||||
- `id`: Transcript identifier
|
||||
- `user_id`: Owner user ID (if authenticated)
|
||||
- `name`: Transcript name
|
||||
- `status`: Processing status
|
||||
- `locked`: Whether transcript is locked for editing
|
||||
- `duration`: Total duration in seconds
|
||||
- `title`: Auto-generated or custom title
|
||||
- `short_summary`: Brief summary
|
||||
- `long_summary`: Detailed summary
|
||||
- `created_at`: Creation timestamp
|
||||
- `share_mode`: Access control setting
|
||||
- `source_language`: Original audio language
|
||||
- `target_language`: Translation target language
|
||||
- `reviewed`: Whether transcript has been reviewed
|
||||
- `meeting_id`: Associated meeting ID (if applicable)
|
||||
- `source_kind`: Source type (live, file, room)
|
||||
- `room_id`: Associated room ID (if applicable)
|
||||
- `audio_deleted`: Whether audio has been deleted
|
||||
- `participants`: Array of participant objects with speaker mappings
|
||||
|
||||
### Format-Specific Fields
|
||||
|
||||
- `transcript_format`: The format identifier (discriminator field)
|
||||
- `transcript`: The formatted transcript content (string for text/webvtt formats, array for json format)
|
||||
|
||||
## Speaker Name Resolution
|
||||
|
||||
All formats resolve speaker IDs to participant names when available:
|
||||
|
||||
- If a participant exists for the speaker ID, their name is used
|
||||
- If no participant exists, a default name like "Speaker 0" is generated
|
||||
- Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization
|
||||
Reference in New Issue
Block a user