mirror of https://github.com/Monadical-SAS/reflector.git synced 2025-12-20 12:19:06 +00:00

Files

Mathieu Virbel f6ca07505f feat: add transcript format parameter to GET endpoint (#709 )

* feat: add transcript format parameter to GET endpoint

Add transcript_format query parameter to /v1/transcripts/{id} endpoint
with support for multiple output formats using discriminated unions.

Formats supported:
- text: Plain speaker dialogue (default)
- text-timestamped: Dialogue with [MM:SS] timestamps
- webvtt-named: WebVTT subtitles with participant names
- json: Structured segments with full metadata

Response models use Pydantic discriminated unions with transcript_format
as discriminator field. POST/PATCH endpoints return GetTranscriptWithParticipants
for minimal responses. GET endpoint returns format-specific models.

* Copy transcript format

* Regenerate types

* Fix transcript formats

* Don't throw inside try

* Remove any type

* Toast share copy errors

* transcript_format exhaustiveness and python idiomatic assert_never

* format_timestamp_mmss clear type definition

* Rename seconds_to_timestamp

* Test transcript format with overlapping speakers

* exact match for vtt multispeaker test

---------

Co-authored-by: Sergey Mankovsky <sergey@monadical.com>
Co-authored-by: Igor Loskutov <igor.loskutoff@gmail.com>

2025-11-26 18:51:14 +01:00

5.9 KiB

Raw Blame History

Transcript Formats

The Reflector API provides multiple output formats for transcript data through the transcript_format query parameter on the GET /v1/transcripts/{id} endpoint.

Overview

When retrieving a transcript, you can specify the desired format using the transcript_format query parameter. The API supports four formats optimized for different use cases:

text - Plain text with speaker names (default)
text-timestamped - Timestamped text with speaker names
webvtt-named - WebVTT subtitle format with participant names
json - Structured JSON segments with full metadata

All formats include participant information when available, resolving speaker IDs to actual names.

Query Parameter Usage

GET /v1/transcripts/{id}?transcript_format={format}

Parameters

transcript_format (optional): The desired output format
- Type: "text" | "text-timestamped" | "webvtt-named" | "json"
- Default: "text"

Format Descriptions

Text Format (`text`)

Use case: Simple, human-readable transcript for display or export.

Format: Speaker names followed by their dialogue, one line per segment.

Example:

John Smith: Hello everyone
Jane Doe: Hi there
John Smith: How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=text

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "text",
  "transcript": "John Smith: Hello everyone\nJane Doe: Hi there\nJohn Smith: How are you today?",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

Text Timestamped Format (`text-timestamped`)

Use case: Transcript with timing information for navigation or reference.

Format: [MM:SS] timestamp prefix before each speaker and dialogue.

Example:

[00:00] John Smith: Hello everyone
[00:05] Jane Doe: Hi there
[00:12] John Smith: How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=text-timestamped

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "text-timestamped",
  "transcript": "[00:00] John Smith: Hello everyone\n[00:05] Jane Doe: Hi there\n[00:12] John Smith: How are you today?",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

WebVTT Named Format (`webvtt-named`)

Use case: Subtitle files for video players, accessibility tools, or video editing.

Format: Standard WebVTT subtitle format with voice tags using participant names.

Example:

WEBVTT

00:00:00.000 --> 00:00:05.000
<v John Smith>Hello everyone

00:00:05.000 --> 00:00:12.000
<v Jane Doe>Hi there

00:00:12.000 --> 00:00:18.000
<v John Smith>How are you today?

Request:

GET /v1/transcripts/{id}?transcript_format=webvtt-named

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "webvtt-named",
  "transcript": "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\n<v John Smith>Hello everyone\n\n...",
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

JSON Format (`json`)

Use case: Programmatic access with full timing and speaker metadata.

Format: Array of segment objects with speaker information, text content, and precise timing.

Example:

[
  {
    "speaker": 0,
    "speaker_name": "John Smith",
    "text": "Hello everyone",
    "start": 0.0,
    "end": 5.0
  },
  {
    "speaker": 1,
    "speaker_name": "Jane Doe",
    "text": "Hi there",
    "start": 5.0,
    "end": 12.0
  },
  {
    "speaker": 0,
    "speaker_name": "John Smith",
    "text": "How are you today?",
    "start": 12.0,
    "end": 18.0
  }
]

Request:

GET /v1/transcripts/{id}?transcript_format=json

Response:

{
  "id": "transcript_123",
  "name": "Meeting Recording",
  "transcript_format": "json",
  "transcript": [
    {
      "speaker": 0,
      "speaker_name": "John Smith",
      "text": "Hello everyone",
      "start": 0.0,
      "end": 5.0
    },
    {
      "speaker": 1,
      "speaker_name": "Jane Doe",
      "text": "Hi there",
      "start": 5.0,
      "end": 12.0
    }
  ],
  "participants": [
    {"id": "p1", "speaker": 0, "name": "John Smith"},
    {"id": "p2", "speaker": 1, "name": "Jane Doe"}
  ],
  ...
}

Response Structure

All formats return the same base transcript metadata with an additional transcript_format field and format-specific transcript field:

Common Fields

id: Transcript identifier
user_id: Owner user ID (if authenticated)
name: Transcript name
status: Processing status
locked: Whether transcript is locked for editing
duration: Total duration in seconds
title: Auto-generated or custom title
short_summary: Brief summary
long_summary: Detailed summary
created_at: Creation timestamp
share_mode: Access control setting
source_language: Original audio language
target_language: Translation target language
reviewed: Whether transcript has been reviewed
meeting_id: Associated meeting ID (if applicable)
source_kind: Source type (live, file, room)
room_id: Associated room ID (if applicable)
audio_deleted: Whether audio has been deleted
participants: Array of participant objects with speaker mappings

Format-Specific Fields

transcript_format: The format identifier (discriminator field)
transcript: The formatted transcript content (string for text/webvtt formats, array for json format)

Speaker Name Resolution

All formats resolve speaker IDs to participant names when available:

If a participant exists for the speaker ID, their name is used
If no participant exists, a default name like "Speaker 0" is generated
Speaker IDs are integers (0, 1, 2, etc.) assigned during diarization

5.9 KiB Raw Blame History

Transcript Formats

Overview

Query Parameter Usage

Parameters

Format Descriptions

Text Format (text)

Text Timestamped Format (text-timestamped)

WebVTT Named Format (webvtt-named)

JSON Format (json)

Response Structure

Common Fields

Format-Specific Fields

Speaker Name Resolution

5.9 KiB

Raw Blame History

Text Format (`text`)

Text Timestamped Format (`text-timestamped`)

WebVTT Named Format (`webvtt-named`)

JSON Format (`json`)