Files
reflector/docs/transcript.md
Mathieu Virbel f6ca07505f feat: add transcript format parameter to GET endpoint (#709)
* feat: add transcript format parameter to GET endpoint

Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.

Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata

Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.

* Copy transcript format

* Regenerate types

* Fix transcript formats

* Don't throw inside try

* Remove any type

* Toast share copy errors

* transcript_format exhaustiveness and python idiomatic assert_never

* format_timestamp_mmss clear type definition

* Rename seconds_to_timestamp

* Test transcript format with overlapping speakers

* exact match for vtt multispeaker test

---------

Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>
2025-11-26 18:51:14 +01:00

5.9 KiB

Transcript Formats

The Reflector API provides multiple output formats for transcript data through the transcript_format query parameter on the GET /v1/transcripts/{id} endpoint.

Overview

When retrieving a transcript, you can specify the desired format using the transcript_format query parameter. The API supports four formats optimized for different use cases:

  • text - Plain text with speaker names (default)
  • text-timestamped - Timestamped text with speaker names
  • webvtt-named - WebVTT subtitle format with participant names
  • json - Structured JSON segments with full metadata

All formats include participant information when available, resolving speaker IDs to actual names.

Query Parameter Usage

GET /v1/transcripts/{id}?transcript_format={format}

Parameters

  • transcript_format (optional): The desired output format
    • Type: "text" | "text-timestamped" | "webvtt-named" | "json"
    • Default: "text"

Format Descriptions

Text Format (text)

Use case: Simple, human-readable transcript for display or export.

Format: Speaker names followed by their dialogue, one line per segment.

Example:

John Smith: Hello everyone
Jane Doe: Hi there
John Smith: How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=text

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "text",
  "transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

Text Timestamped Format (text-timestamped)

Use case: Transcript with timing information for navigation or reference.

Format: [MM:SS] timestamp prefix before each speaker and dialogue.

Example:

[00:00] John Smith: Hello everyone
[00:05] Jane Doe: Hi there
[00:12] John Smith: How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=text-timestamped

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "text-timestamped",
  "transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

WebVTT Named Format (webvtt-named)

Use case: Subtitle files for video players, accessibility tools, or video editing.

Format: Standard WebVTT subtitle format with voice tags using participant names.

Example:

WEBVTT

00:00:00.000 --> 00:00:05.000
<v John Smith>Hello everyone

00:00:05.000 --> 00:00:12.000
<v Jane Doe>Hi there

00:00:12.000 --> 00:00:18.000
<v John Smith>How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=webvtt-named

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "webvtt-named",
  "transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

JSON Format (json)

Use case: Programmatic access with full timing and speaker metadata.

Format: Array of segment objects with speaker information, text content, and precise timing.

Example:

[
  {
    "speaker": 0,
    "speaker_name": "John Smith",
    "text": "Hello everyone",
    "start": 0.0,
    "end": 5.0
  },
  {
    "speaker": 1,
    "speaker_name": "Jane Doe",
    "text": "Hi there",
    "start": 5.0,
    "end": 12.0
  },
  {
    "speaker": 0,
    "speaker_name": "John Smith",
    "text": "How are you today?",
    "start": 12.0,
    "end": 18.0
  }
]

Request:

GET /v1/transcripts/{id}?transcript_format=json

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "json",
  "transcript": [
    {
      "speaker": 0,
      "speaker_name": "John Smith",
      "text": "Hello everyone",
      "start": 0.0,
      "end": 5.0
    },
    {
      "speaker": 1,
      "speaker_name": "Jane Doe",
      "text": "Hi there",
      "start": 5.0,
      "end": 12.0
    }
  ],
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

Response Structure

All formats return the same base transcript metadata with an additional transcript_format field and format-specific transcript field:

Common Fields

  • id: Transcript identifier
  • user_id: Owner user ID (if authenticated)
  • name: Transcript name
  • status: Processing status
  • locked: Whether transcript is locked for editing
  • duration: Total duration in seconds
  • title: Auto-generated or custom title
  • short_summary: Brief summary
  • long_summary: Detailed summary
  • created_at: Creation timestamp
  • share_mode: Access control setting
  • source_language: Original audio language
  • target_language: Translation target language
  • reviewed: Whether transcript has been reviewed
  • meeting_id: Associated meeting ID (if applicable)
  • source_kind: Source type (live, file, room)
  • room_id: Associated room ID (if applicable)
  • audio_deleted: Whether audio has been deleted
  • participants: Array of participant objects with speaker mappings

Format-Specific Fields

  • transcript_format: The format identifier (discriminator field)
  • transcript: The formatted transcript content (string for text/webvtt formats, array for json format)

Speaker Name Resolution

All formats resolve speaker IDs to participant names when available:

  • If a participant exists for the speaker ID, their name is used
  • If no participant exists, a default name like "Speaker 0" is generated
  • Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization