feat: first commit

2026-02-10 18:19:30 -06:00
commit b18ee3b564
8 changed files with 1322 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,143 @@
+# Agent Documentation — InternalAI Platform
+
+The InternalAI platform aggregates company data from email, calendars, Zulip chat, meetings, and documents into two core APIs. These docs give LLM agents the context they need to build programmatic workflows — typically as marimo notebooks — that answer analytical questions about people and their interactions.
+
+## Routing Table
+
+| I need to...                                | Read                          |
+|---------------------------------------------|-------------------------------|
+| Understand the company and its tools        | [company-context.md]          |
+| Look up people, contacts, relationships     | [contactdb-api.md]            |
+| Query emails, meetings, chats, documents    | [dataindex-api.md]            |
+| Know which connector provides what data     | [connectors-and-sources.md]   |
+| Create a marimo analysis notebook           | [notebook-patterns.md]        |
+
+## API Base URLs
+
+| Service    | Swagger UI                                        | OpenAPI JSON                           |
+|------------|---------------------------------------------------|----------------------------------------|
+| ContactDB  | `http://localhost:42000/contactdb-api/docs`       | `/contactdb-api/openapi.json`          |
+| DataIndex  | `http://localhost:42000/dataindex/docs`            | `/dataindex/openapi.json`              |
+
+Direct backend ports (without Caddy proxy):
+- ContactDB: `http://localhost:42800`
+- DataIndex: `http://localhost:42180`
+
+## Common Questions → API Calls
+
+Use this table to translate natural language questions into API calls. The base URLs below assume Caddy proxy (`http://localhost:42000`).
+
+| Question | API Call | Notes |
+|----------|----------|-------|
+| "Who am I?" | `GET /contactdb-api/api/contacts/me` | Returns your contact record: name, emails, bio, contact_id |
+| "Find Alice" / "Who is Alice?" | `GET /contactdb-api/api/contacts?search=Alice` | Returns matching contacts with their IDs |
+| "What's Alice's contact ID?" | `GET /contactdb-api/api/contacts?search=Alice` → use `contacts[0].id` | Needed for all DataIndex queries about a person |
+| "Find contact by email" | `GET /contactdb-api/api/contacts/by-email/{email}` | Direct lookup |
+| "My recent meetings" | `GET /dataindex/api/v1/query?entity_types=meeting&contact_ids={my_id}` | Get `my_id` from `/contacts/me` first |
+| "Emails with Alice" | `GET /dataindex/api/v1/query?entity_types=email&contact_ids={alice_id}` | Matches sender, to, or cc |
+| "What was discussed about X?" | `POST /dataindex/api/v1/search` with `{"search_text": "X"}` | Semantic search across all data |
+| "Zulip threads about hiring" | `GET /dataindex/api/v1/query?entity_types=threaded_conversation&search=hiring` | Text filter on content |
+| "My calendar this week" | `GET /dataindex/api/v1/query?entity_types=calendar_event&contact_ids={my_id}&date_from=...&date_to=...` | Set date range |
+| "Who are the most active contacts?" | `GET /contactdb-api/api/contacts?sort_by=hotness&min_hotness=50` | Hotness = 0-100 interaction score |
+| "What connectors are available?" | `GET /dataindex/api/v1/connectors/status` | Lists all data sources and sync status |
+
+**Key pattern:** Any question about "me" / "my" / "I" requires calling `GET /contactdb-api/api/contacts/me` first to get your `contact_id`, then using that ID in subsequent DataIndex queries.
+
+## Workflow
+
+### When to create a marimo notebook
+
+Any request that involves **analysis over a period of time** (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a **large volume of data** — too much to process inline. In these cases, **always produce a marimo notebook** (a `.py` file following the patterns in [notebook-patterns.md]).
+
+Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".
+
+If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
+
+### File naming and location
+
+All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
+
+```
+workflows/<NNN>_<topic>_<scope>.py
+```
+
+- `<NNN>` — zero-padded sequence number (`001`, `002`, …). Look at existing files in `workflows/` to determine the next number.
+- `<topic>` — what is being analyzed, in snake_case (e.g., `greyhaven_meetings`, `alice_emails`, `hiring_discussions`)
+- `<scope>` — time range or qualifier (e.g., `january`, `q1_2026`, `last_30d`, `all_time`)
+
+**Examples:**
+
+```
+workflows/001_greyhaven_meetings_january.py
+workflows/002_alice_emails_q1_2026.py
+workflows/003_hiring_discussions_last_30d.py
+workflows/004_team_interaction_timeline_all_time.py
+```
+
+**Before creating a new workflow**, list existing files in `workflows/` to find the highest number and increment it.
+
+### Plan before you implement
+
+Before writing any notebook, **always propose a plan first** and get the user's approval. The plan should describe:
+
+1. **Goal** — What question are we answering?
+2. **Data sources** — Which entity types and API endpoints will be used?
+3. **Algorithm / ETL steps** — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
+4. **Output format** — Table columns, charts, or summary statistics the user will see.
+
+Only proceed to implementation after the user confirms the plan.
+
+### Steps
+
+1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
+2. **Find data** — Use DataIndex `GET /query` (exhaustive, paginated) or `POST /search` (semantic, ranked) with `contact_ids`, `entity_types`, `date_from`/`date_to`, `connector_ids` filters.
+3. **Analyze** — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see [notebook-patterns.md]).
+
+### Quick Example (Python)
+
+> "Find all emails involving Alice since January"
+
+```python
+import httpx
+
+CONTACTDB = "http://localhost:42000/contactdb-api"
+DATAINDEX = "http://localhost:42000/dataindex/api/v1"
+client = httpx.Client(timeout=30)
+
+# 1. Resolve "Alice" to a contact_id
+resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
+alice_id = resp.json()["contacts"][0]["id"]  # e.g. 42
+
+# 2. Fetch all emails involving Alice (with pagination)
+emails = []
+offset = 0
+while True:
+    resp = client.get(f"{DATAINDEX}/query", params={
+        "entity_types": "email",
+        "contact_ids": str(alice_id),
+        "date_from": "2025-01-01T00:00:00Z",
+        "limit": 50,
+        "offset": offset,
+    })
+    data = resp.json()
+    emails.extend(data["items"])
+    if offset + 50 >= data["total"]:
+        break
+    offset += 50
+
+print(f"Found {len(emails)} emails involving Alice")
+```
+
+## File Index
+
+- [company-context.md] — Business context, team structure, vocabulary
+- [contactdb-api.md] — ContactDB entities and REST endpoints
+- [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints
+- [connectors-and-sources.md] — Connector-to-entity-type mapping
+- [notebook-patterns.md] — Marimo notebook patterns and common API workflows
+
+[company-context.md]: ./docs/company-context.md
+[contactdb-api.md]: ./docs/contactdb-api.md
+[dataindex-api.md]: ./docs/dataindex-api.md
+[connectors-and-sources.md]: ./docs/connectors-and-sources.md
+[notebook-patterns.md]: ./docs/notebook-patterns.md