11 KiB
DataIndex API Reference
DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an entity with a common base structure plus type-specific fields.
Base URL: http://localhost:42000/dataindex/api/v1 (via Caddy) or http://localhost:42180/api/v1 (direct)
Entity Types
All entities share these base fields:
| Field | Type | Description |
|---|---|---|
id |
string | Format: connector_name:native_id |
entity_type |
string | One of the types below |
timestamp |
datetime | When the entity occurred |
contact_ids |
string[] | ContactDB IDs of people involved |
connector_id |
string | Which connector produced this |
title |
string? | Display title |
parent_id |
string? | Parent entity (e.g., thread for a message) |
raw_data |
dict | Original source data (excluded by default) |
calendar_event
From ICS calendar feeds.
| Field | Type | Description |
|---|---|---|
start_time |
datetime? | Event start |
end_time |
datetime? | Event end |
all_day |
bool | All-day event flag |
description |
string? | Event description |
location |
string? | Event location |
attendees |
dict[] | Attendee list |
organizer_contact_id |
string? | ContactDB ID of organizer |
status |
string? | Event status |
calendar_name |
string? | Source calendar name |
meeting_url |
string? | Video call link |
meeting
From Reflector (recorded meetings with transcripts).
| Field | Type | Description |
|---|---|---|
start_time |
datetime? | Meeting start |
end_time |
datetime? | Meeting end |
participants |
MeetingParticipant[] | People in the meeting |
meeting_platform |
string? | Platform (e.g., "jitsi") |
transcript |
string? | Full transcript text |
summary |
string? | AI-generated summary |
meeting_url |
string? | Meeting link |
recording_url |
string? | Recording link |
location |
string? | Physical location |
room_name |
string? | Virtual room name (also indicates meeting location — see below) |
MeetingParticipant fields: display_name, contact_id?, platform_user_id?, email?, speaker?
room_nameas location indicator: Theroom_namefield often encodes where the meeting took place (e.g., a Jitsi room name likestandup-office-bogota). Use it to infer the meeting location whenlocationis not set.
Participant and contact coverage is incomplete. Meeting data comes from Reflector, which only tracks users who are logged into the Reflector platform. This means:
contact_idsonly contains ContactDB IDs for Reflector-logged participants who were matched to a known contact. It will often be a subset of the actual attendees — do not assume it is the full list.participantsis more complete thancontact_idsbut still only includes people detected by Reflector. Not all participants have accounts or could be identified — some attendees may be entirely absent from this list.contact_idwithin a participant may benullif the person was detected but couldn't be matched to a ContactDB entry.Consequence for queries: Filtering meetings by
contact_idswill miss meetings where the person attended but wasn't logged into Reflector or wasn't resolved. To get better coverage, combine multiple strategies:
- Filter by
contact_idsfor resolved participants- Search
participants[].display_nameclient-side for name matches- Use
POST /searchwith the person's name to search meeting transcripts and summaries
email
From mbsync email sync.
| Field | Type | Description |
|---|---|---|
thread_id |
string? | Email thread grouping |
text_content |
string? | Plain text body |
html_content |
string? | HTML body |
snippet |
string? | Preview snippet |
from_contact_id |
string? | Sender's ContactDB ID |
to_contact_ids |
string[] | Recipient ContactDB IDs |
cc_contact_ids |
string[] | CC recipient ContactDB IDs |
has_attachments |
bool | Has attachments flag |
attachments |
dict[] | Attachment metadata |
conversation
A Zulip stream/channel.
| Field | Type | Description |
|---|---|---|
recent_messages |
dict[] | Recent messages in the conversation |
conversation_message
A single message in a Zulip conversation.
| Field | Type | Description |
|---|---|---|
message |
string? | Message text content |
mentioned_contact_ids |
string[] | ContactDB IDs of mentioned people |
threaded_conversation
A Zulip topic thread (group of messages under a topic).
| Field | Type | Description |
|---|---|---|
recent_messages |
dict[] | Recent messages in the thread |
document
From HedgeDoc, API ingestion, or other document sources.
| Field | Type | Description |
|---|---|---|
content |
string? | Document body text |
description |
string? | Document description |
mimetype |
string? | MIME type |
url |
string? | Source URL |
revision_id |
string? | Revision identifier |
webpage
From browser history extension.
| Field | Type | Description |
|---|---|---|
url |
string | Page URL |
visit_time |
datetime | When visited |
text_content |
string? | Page text content |
REST Endpoints
GET /api/v1/query — Exhaustive Filtered Enumeration
Use when you need all entities matching specific criteria. Supports pagination.
When to use: "List all meetings since January", "Get all emails from Alice", "Count calendar events this week"
Query parameters:
| Parameter | Type | Description |
|---|---|---|
entity_types |
string (repeat) | Filter by type — repeat param for multiple: ?entity_types=email&entity_types=meeting |
contact_ids |
string | Comma-separated ContactDB IDs: "1,42" |
connector_ids |
string | Comma-separated connector IDs: "zulip,reflector" |
date_from |
string | ISO datetime lower bound (UTC if no timezone) |
date_to |
string | ISO datetime upper bound |
search |
string? | Text filter on content fields |
parent_id |
string? | Filter by parent entity |
thread_id |
string? | Filter emails by thread ID |
room_name |
string? | Filter meetings by room name |
limit |
int | Max results per page (default 50) |
offset |
int | Pagination offset (default 0) |
sort_by |
string | "timestamp" (default), "title", "contact_activity", etc. |
sort_order |
string | "desc" (default) or "asc" |
include_raw_data |
bool | Include raw_data field (default false) |
Response format:
{
"items": [...],
"total": 152,
"page": 1,
"size": 50,
"pages": 4
}
Pagination: loop with offset increments until offset >= total. See notebook-patterns.md for a reusable helper.
POST /api/v1/search — Semantic Search
Use when you need relevant results for a natural-language question. Returns ranked text chunks. No pagination — set a higher limit instead.
When to use: "What was discussed about the product roadmap?", "Find conversations about hiring"
Request body (JSON):
{
"search_text": "product roadmap decisions",
"entity_types": ["meeting", "threaded_conversation"],
"contact_ids": ["1", "42"],
"date_from": "2025-01-01T00:00:00Z",
"date_to": "2025-06-01T00:00:00Z",
"connector_ids": ["reflector", "zulip"],
"limit": 20
}
Response: {results: [...chunks], total_count} — each chunk has entity_ids, entity_type, connector_id, content, timestamp.
GET /api/v1/entities/{id} — Get Entity by ID
Retrieve full details of a single entity. The entity_id format is connector_name:native_id.
GET /api/v1/connectors/status — Connector Status
Get sync status for all connectors (last sync time, entity count, health).
Common Query Recipes
| Question | entity_type + connector_id |
|---|---|
| Meetings I attended | meeting + reflector, with your contact_id |
| Upcoming calendar events | calendar_event + ics_calendar, date_from=now |
| Emails from someone | email + mbsync_email, with their contact_id |
| Zulip threads about a topic | threaded_conversation + zulip, search="topic" |
| All documents | document + hedgedoc |
| Chat messages mentioning someone | conversation_message + zulip, with contact_id |
| What was discussed about X? | Use POST /search with search_text |