100 lines
5.0 KiB
Markdown
100 lines
5.0 KiB
Markdown
# Connectors and Data Sources
|
|
|
|
Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh.
|
|
|
|
Use `list_connectors()` at runtime to see which connectors are actually configured — not all connectors below may be active in every deployment.
|
|
|
|
## Connector → Entity Type Mapping
|
|
|
|
| Connector ID | Entity Types Produced | Description |
|
|
|------------------|-----------------------------------------------------------------|----------------------------------|
|
|
| `reflector` | `meeting` | Meeting recordings + transcripts |
|
|
| `ics_calendar` | `calendar_event` | ICS calendar feed events |
|
|
| `mbsync_email` | `email` | Email via mbsync IMAP sync |
|
|
| `zulip` | `conversation`, `conversation_message`, `threaded_conversation` | Zulip chat streams and topics |
|
|
| `babelfish` | `conversation_message`, `threaded_conversation` | Chat translation bridge |
|
|
| `hedgedoc` | `document` | HedgeDoc collaborative documents |
|
|
| `contactdb` | `contact` | Synced from ContactDB (static) |
|
|
| `browser_history`| `webpage` | Browser extension page visits |
|
|
| `api_document` | `document` | API-ingested documents (static) |
|
|
|
|
## Per-Connector Details
|
|
|
|
### `reflector` — Meeting Recordings
|
|
|
|
Ingests meetings from Reflector, Monadical's meeting recording tool.
|
|
|
|
- **Entity type:** `meeting`
|
|
- **Key fields:** `transcript`, `summary`, `participants`, `start_time`, `end_time`, `room_name`
|
|
- **Use cases:** Find meetings someone attended, search meeting transcripts, get summaries
|
|
- **Tip:** Filter with `contact_ids` to find meetings involving specific people. The `transcript` field contains speaker-diarized text.
|
|
|
|
### `ics_calendar` — Calendar Events
|
|
|
|
Parses ICS calendar feeds (Google Calendar, Outlook, etc.).
|
|
|
|
- **Entity type:** `calendar_event`
|
|
- **Key fields:** `start_time`, `end_time`, `attendees`, `location`, `description`, `calendar_name`
|
|
- **Use cases:** Check upcoming events, find events with specific attendees, review past schedule
|
|
- **Tip:** Multiple calendar feeds may be configured as separate connectors (e.g., `personal_calendar`, `work_calendar`). Use `list_connectors()` to discover them.
|
|
|
|
### `mbsync_email` — Email
|
|
|
|
Syncs email via mbsync (IMAP).
|
|
|
|
- **Entity type:** `email`
|
|
- **Key fields:** `text_content`, `from_contact_id`, `to_contact_ids`, `cc_contact_ids`, `thread_id`, `has_attachments`
|
|
- **Use cases:** Find emails from/to someone, search email content, track email threads
|
|
- **Tip:** Use `from_contact_id` and `to_contact_ids` with `contact_ids` filter. For thread grouping, use the `thread_id` field.
|
|
|
|
### `zulip` — Chat
|
|
|
|
Ingests Zulip streams, topics, and messages.
|
|
|
|
- **Entity types:**
|
|
- `conversation` — A Zulip stream/channel with recent messages
|
|
- `conversation_message` — Individual chat messages
|
|
- `threaded_conversation` — A topic thread within a stream
|
|
- **Key fields:** `message`, `mentioned_contact_ids`, `recent_messages`
|
|
- **Use cases:** Find discussions about a topic, track who said what, find @-mentions
|
|
- **Tip:** Use `threaded_conversation` to find topic-level discussions. Use `conversation_message` with `mentioned_contact_ids` to find messages that mention specific people.
|
|
|
|
### `babelfish` — Translation Bridge
|
|
|
|
Ingests translated chat messages from the Babelfish service.
|
|
|
|
- **Entity types:** `conversation_message`, `threaded_conversation`
|
|
- **Use cases:** Similar to Zulip but for translated cross-language conversations
|
|
- **Tip:** Query alongside `zulip` connector for complete conversation coverage.
|
|
|
|
### `hedgedoc` — Collaborative Documents
|
|
|
|
Syncs documents from HedgeDoc (collaborative markdown editor).
|
|
|
|
- **Entity type:** `document`
|
|
- **Key fields:** `content`, `description`, `url`, `revision_id`
|
|
- **Use cases:** Find documents by content, track document revisions
|
|
- **Tip:** Use `search()` for semantic document search rather than `query_entities` text filter.
|
|
|
|
### `contactdb` — Contact Sync (Static)
|
|
|
|
Mirrors contacts from ContactDB into DataIndex for unified search.
|
|
|
|
- **Entity type:** `contact`
|
|
- **Note:** This is a read-only mirror. Use ContactDB MCP tools directly for contact operations.
|
|
|
|
### `browser_history` — Browser Extension (Static)
|
|
|
|
Captures visited webpages from a browser extension.
|
|
|
|
- **Entity type:** `webpage`
|
|
- **Key fields:** `url`, `visit_time`, `text_content`
|
|
- **Use cases:** Find previously visited pages, search page content
|
|
|
|
### `api_document` — API Documents (Static)
|
|
|
|
Documents ingested via the REST API (e.g., uploaded PDFs, imported files).
|
|
|
|
- **Entity type:** `document`
|
|
- **Note:** These are ingested via `POST /api/v1/ingest/documents`, not periodic sync.
|