Files
internalai-agent/docs/connectors-and-sources.md
2026-02-10 18:19:30 -06:00

5.0 KiB

Connectors and Data Sources

Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh.

Use list_connectors() at runtime to see which connectors are actually configured — not all connectors below may be active in every deployment.

Connector → Entity Type Mapping

Connector ID Entity Types Produced Description
reflector meeting Meeting recordings + transcripts
ics_calendar calendar_event ICS calendar feed events
mbsync_email email Email via mbsync IMAP sync
zulip conversation, conversation_message, threaded_conversation Zulip chat streams and topics
babelfish conversation_message, threaded_conversation Chat translation bridge
hedgedoc document HedgeDoc collaborative documents
contactdb contact Synced from ContactDB (static)
browser_history webpage Browser extension page visits
api_document document API-ingested documents (static)

Per-Connector Details

reflector — Meeting Recordings

Ingests meetings from Reflector, Monadical's meeting recording tool.

  • Entity type: meeting
  • Key fields: transcript, summary, participants, start_time, end_time, room_name
  • Use cases: Find meetings someone attended, search meeting transcripts, get summaries
  • Tip: Filter with contact_ids to find meetings involving specific people. The transcript field contains speaker-diarized text.

ics_calendar — Calendar Events

Parses ICS calendar feeds (Google Calendar, Outlook, etc.).

  • Entity type: calendar_event
  • Key fields: start_time, end_time, attendees, location, description, calendar_name
  • Use cases: Check upcoming events, find events with specific attendees, review past schedule
  • Tip: Multiple calendar feeds may be configured as separate connectors (e.g., personal_calendar, work_calendar). Use list_connectors() to discover them.

mbsync_email — Email

Syncs email via mbsync (IMAP).

  • Entity type: email
  • Key fields: text_content, from_contact_id, to_contact_ids, cc_contact_ids, thread_id, has_attachments
  • Use cases: Find emails from/to someone, search email content, track email threads
  • Tip: Use from_contact_id and to_contact_ids with contact_ids filter. For thread grouping, use the thread_id field.

zulip — Chat

Ingests Zulip streams, topics, and messages.

  • Entity types:
    • conversation — A Zulip stream/channel with recent messages
    • conversation_message — Individual chat messages
    • threaded_conversation — A topic thread within a stream
  • Key fields: message, mentioned_contact_ids, recent_messages
  • Use cases: Find discussions about a topic, track who said what, find @-mentions
  • Tip: Use threaded_conversation to find topic-level discussions. Use conversation_message with mentioned_contact_ids to find messages that mention specific people.

babelfish — Translation Bridge

Ingests translated chat messages from the Babelfish service.

  • Entity types: conversation_message, threaded_conversation
  • Use cases: Similar to Zulip but for translated cross-language conversations
  • Tip: Query alongside zulip connector for complete conversation coverage.

hedgedoc — Collaborative Documents

Syncs documents from HedgeDoc (collaborative markdown editor).

  • Entity type: document
  • Key fields: content, description, url, revision_id
  • Use cases: Find documents by content, track document revisions
  • Tip: Use search() for semantic document search rather than query_entities text filter.

contactdb — Contact Sync (Static)

Mirrors contacts from ContactDB into DataIndex for unified search.

  • Entity type: contact
  • Note: This is a read-only mirror. Use ContactDB MCP tools directly for contact operations.

browser_history — Browser Extension (Static)

Captures visited webpages from a browser extension.

  • Entity type: webpage
  • Key fields: url, visit_time, text_content
  • Use cases: Find previously visited pages, search page content

api_document — API Documents (Static)

Documents ingested via the REST API (e.g., uploaded PDFs, imported files).

  • Entity type: document
  • Note: These are ingested via POST /api/v1/ingest/documents, not periodic sync.