219 lines
11 KiB
Markdown
219 lines
11 KiB
Markdown
# DataIndex API Reference
|
|
|
|
DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an **entity** with a common base structure plus type-specific fields.
|
|
|
|
**Base URL:** `http://localhost:42000/dataindex/api/v1` (via Caddy) or `http://localhost:42180/api/v1` (direct)
|
|
|
|
## Entity Types
|
|
|
|
All entities share these base fields:
|
|
|
|
| Field | Type | Description |
|
|
|----------------------|-------------|---------------------------------------------|
|
|
| `id` | string | Format: `connector_name:native_id` |
|
|
| `entity_type` | string | One of the types below |
|
|
| `timestamp` | datetime | When the entity occurred |
|
|
| `contact_ids` | string[] | ContactDB IDs of people involved |
|
|
| `connector_id` | string | Which connector produced this |
|
|
| `title` | string? | Display title |
|
|
| `parent_id` | string? | Parent entity (e.g., thread for a message) |
|
|
| `raw_data` | dict | Original source data (excluded by default) |
|
|
|
|
### `calendar_event`
|
|
|
|
From ICS calendar feeds.
|
|
|
|
| Field | Type | Description |
|
|
|-----------------------|-------------|--------------------------------|
|
|
| `start_time` | datetime? | Event start |
|
|
| `end_time` | datetime? | Event end |
|
|
| `all_day` | bool | All-day event flag |
|
|
| `description` | string? | Event description |
|
|
| `location` | string? | Event location |
|
|
| `attendees` | dict[] | Attendee list |
|
|
| `organizer_contact_id`| string? | ContactDB ID of organizer |
|
|
| `status` | string? | Event status |
|
|
| `calendar_name` | string? | Source calendar name |
|
|
| `meeting_url` | string? | Video call link |
|
|
|
|
### `meeting`
|
|
|
|
From Reflector (recorded meetings with transcripts).
|
|
|
|
| Field | Type | Description |
|
|
|--------------------|---------------------|-----------------------------------|
|
|
| `start_time` | datetime? | Meeting start |
|
|
| `end_time` | datetime? | Meeting end |
|
|
| `participants` | MeetingParticipant[]| People in the meeting |
|
|
| `meeting_platform` | string? | Platform (e.g., "jitsi") |
|
|
| `transcript` | string? | Full transcript text |
|
|
| `summary` | string? | AI-generated summary |
|
|
| `meeting_url` | string? | Meeting link |
|
|
| `recording_url` | string? | Recording link |
|
|
| `location` | string? | Physical location |
|
|
| `room_name` | string? | Virtual room name (also indicates meeting location — see below) |
|
|
|
|
**MeetingParticipant** fields: `display_name`, `contact_id?`, `platform_user_id?`, `email?`, `speaker?`
|
|
|
|
> **`room_name` as location indicator:** The `room_name` field often encodes where the meeting took place (e.g., a Jitsi room name like `standup-office-bogota`). Use it to infer the meeting location when `location` is not set.
|
|
|
|
> **Participant and contact coverage is incomplete.** Meeting data comes from Reflector, which only tracks users who are logged into the Reflector platform. This means:
|
|
>
|
|
> - **`contact_ids`** only contains ContactDB IDs for Reflector-logged participants who were matched to a known contact. It will often be a **subset** of the actual attendees — do not assume it is the full list.
|
|
> - **`participants`** is more complete than `contact_ids` but still only includes people detected by Reflector. Not all participants have accounts or could be identified — some attendees may be entirely absent from this list.
|
|
> - **`contact_id` within a participant** may be `null` if the person was detected but couldn't be matched to a ContactDB entry.
|
|
>
|
|
> **Consequence for queries:** Filtering meetings by `contact_ids` will **miss meetings** where the person attended but wasn't logged into Reflector or wasn't resolved. To get better coverage, combine multiple strategies:
|
|
>
|
|
> 1. Filter by `contact_ids` for resolved participants
|
|
> 2. Search `participants[].display_name` client-side for name matches
|
|
> 3. Use `POST /search` with the person's name to search meeting transcripts and summaries
|
|
|
|
### `email`
|
|
|
|
From mbsync email sync.
|
|
|
|
| Field | Type | Description |
|
|
|--------------------|-----------|--------------------------------------|
|
|
| `thread_id` | string? | Email thread grouping |
|
|
| `text_content` | string? | Plain text body |
|
|
| `html_content` | string? | HTML body |
|
|
| `snippet` | string? | Preview snippet |
|
|
| `from_contact_id` | string? | Sender's ContactDB ID |
|
|
| `to_contact_ids` | string[] | Recipient ContactDB IDs |
|
|
| `cc_contact_ids` | string[] | CC recipient ContactDB IDs |
|
|
| `has_attachments` | bool | Has attachments flag |
|
|
| `attachments` | dict[] | Attachment metadata |
|
|
|
|
### `conversation`
|
|
|
|
A Zulip stream/channel.
|
|
|
|
| Field | Type | Description |
|
|
|--------------------|---------|----------------------------------------|
|
|
| `recent_messages` | dict[] | Recent messages in the conversation |
|
|
|
|
### `conversation_message`
|
|
|
|
A single message in a Zulip conversation.
|
|
|
|
| Field | Type | Description |
|
|
|-------------------------|-----------|-----------------------------------|
|
|
| `message` | string? | Message text content |
|
|
| `mentioned_contact_ids` | string[] | ContactDB IDs of mentioned people |
|
|
|
|
### `threaded_conversation`
|
|
|
|
A Zulip topic thread (group of messages under a topic).
|
|
|
|
| Field | Type | Description |
|
|
|--------------------|---------|----------------------------------------|
|
|
| `recent_messages` | dict[] | Recent messages in the thread |
|
|
|
|
### `document`
|
|
|
|
From HedgeDoc, API ingestion, or other document sources.
|
|
|
|
| Field | Type | Description |
|
|
|----------------|-----------|------------------------------|
|
|
| `content` | string? | Document body text |
|
|
| `description` | string? | Document description |
|
|
| `mimetype` | string? | MIME type |
|
|
| `url` | string? | Source URL |
|
|
| `revision_id` | string? | Revision identifier |
|
|
|
|
### `webpage`
|
|
|
|
From browser history extension.
|
|
|
|
| Field | Type | Description |
|
|
|----------------|-----------|------------------------------|
|
|
| `url` | string | Page URL |
|
|
| `visit_time` | datetime | When visited |
|
|
| `text_content` | string? | Page text content |
|
|
|
|
## REST Endpoints
|
|
|
|
### GET `/api/v1/query` — Exhaustive Filtered Enumeration
|
|
|
|
Use when you need **all** entities matching specific criteria. Supports pagination.
|
|
|
|
**When to use:** "List all meetings since January", "Get all emails from Alice", "Count calendar events this week"
|
|
|
|
**Query parameters:**
|
|
|
|
| Parameter | Type | Description |
|
|
|------------------|---------------|------------------------------------------------|
|
|
| `entity_types` | string (repeat) | Filter by type — repeat param for multiple: `?entity_types=email&entity_types=meeting` |
|
|
| `contact_ids` | string | Comma-separated ContactDB IDs: `"1,42"` |
|
|
| `connector_ids` | string | Comma-separated connector IDs: `"zulip,reflector"` |
|
|
| `date_from` | string | ISO datetime lower bound (UTC if no timezone) |
|
|
| `date_to` | string | ISO datetime upper bound |
|
|
| `search` | string? | Text filter on content fields |
|
|
| `parent_id` | string? | Filter by parent entity |
|
|
| `thread_id` | string? | Filter emails by thread ID |
|
|
| `room_name` | string? | Filter meetings by room name |
|
|
| `limit` | int | Max results per page (default 50) |
|
|
| `offset` | int | Pagination offset (default 0) |
|
|
| `sort_by` | string | `"timestamp"` (default), `"title"`, `"contact_activity"`, etc. |
|
|
| `sort_order` | string | `"desc"` (default) or `"asc"` |
|
|
| `include_raw_data`| bool | Include raw_data field (default false) |
|
|
|
|
**Response format:**
|
|
|
|
```json
|
|
{
|
|
"items": [...],
|
|
"total": 152,
|
|
"page": 1,
|
|
"size": 50,
|
|
"pages": 4
|
|
}
|
|
```
|
|
|
|
**Pagination:** loop with offset increments until `offset >= total`. See [notebook-patterns.md] for a reusable helper.
|
|
|
|
### POST `/api/v1/search` — Semantic Search
|
|
|
|
Use when you need **relevant** results for a natural-language question. Returns ranked text chunks. No pagination — set a higher `limit` instead.
|
|
|
|
**When to use:** "What was discussed about the product roadmap?", "Find conversations about hiring"
|
|
|
|
**Request body (JSON):**
|
|
|
|
```json
|
|
{
|
|
"search_text": "product roadmap decisions",
|
|
"entity_types": ["meeting", "threaded_conversation"],
|
|
"contact_ids": ["1", "42"],
|
|
"date_from": "2025-01-01T00:00:00Z",
|
|
"date_to": "2025-06-01T00:00:00Z",
|
|
"connector_ids": ["reflector", "zulip"],
|
|
"limit": 20
|
|
}
|
|
```
|
|
|
|
**Response:** `{results: [...chunks], total_count}` — each chunk has `entity_ids`, `entity_type`, `connector_id`, `content`, `timestamp`.
|
|
|
|
### GET `/api/v1/entities/{id}` — Get Entity by ID
|
|
|
|
Retrieve full details of a single entity. The `entity_id` format is `connector_name:native_id`.
|
|
|
|
### GET `/api/v1/connectors/status` — Connector Status
|
|
|
|
Get sync status for all connectors (last sync time, entity count, health).
|
|
|
|
## Common Query Recipes
|
|
|
|
| Question | entity_type + connector_id |
|
|
|---------------------------------------|------------------------------------------|
|
|
| Meetings I attended | `meeting` + `reflector`, with your contact_id |
|
|
| Upcoming calendar events | `calendar_event` + `ics_calendar`, date_from=now |
|
|
| Emails from someone | `email` + `mbsync_email`, with their contact_id |
|
|
| Zulip threads about a topic | `threaded_conversation` + `zulip`, search="topic" |
|
|
| All documents | `document` + `hedgedoc` |
|
|
| Chat messages mentioning someone | `conversation_message` + `zulip`, with contact_id |
|
|
| What was discussed about X? | Use `POST /search` with `search_text` |
|
|
|
|
[notebook-patterns.md]: ./notebook-patterns.md
|