Files

Mathieu Virbel d04aa26f31 Update marimo notebook docs with lessons from workflow debugging

- Add rules: all imports in setup cell, cell output at top level,
  async cells need async def, return classes from model cells,
  use python-dotenv for .env loading
- Add marimo check validation step to AGENTS.md and notebook-patterns.md
- Add "always create new workflow" rule to AGENTS.md
- Add new doc sections: Cell Output Must Be at the Top Level,
  Async Cells, Cells That Define Classes, Fixing _unparsable_cell,
  Checking Notebooks Before Running
- Update all code examples to follow new import/output rules
- Update workflows/lib/llm.py for mirascope v2 API

2026-02-10 19:25:53 -06:00

8.1 KiB

Raw Blame History

Agent Documentation — InternalAI Platform

The InternalAI platform aggregates company data from email, calendars, Zulip chat, meetings, and documents into two core APIs. These docs give LLM agents the context they need to build programmatic workflows — typically as marimo notebooks — that answer analytical questions about people and their interactions.

Routing Table

I need to...	Read
Understand the company and its tools	company-context.md
Look up people, contacts, relationships	contactdb-api.md
Query emails, meetings, chats, documents	dataindex-api.md
Know which connector provides what data	connectors-and-sources.md
Create a marimo analysis notebook	notebook-patterns.md

API Base URLs

Service	Swagger UI	OpenAPI JSON
ContactDB	`http://localhost:42000/contactdb-api/docs`	`/contactdb-api/openapi.json`
DataIndex	`http://localhost:42000/dataindex/docs`	`/dataindex/openapi.json`

Direct backend ports (without Caddy proxy):

ContactDB: http://localhost:42800
DataIndex: http://localhost:42180

Common Questions → API Calls

Use this table to translate natural language questions into API calls. The base URLs below assume Caddy proxy (http://localhost:42000).

Question	API Call	Notes
"Who am I?"	`GET /contactdb-api/api/contacts/me`	Returns your contact record: name, emails, bio, contact_id
"Find Alice" / "Who is Alice?"	`GET /contactdb-api/api/contacts?search=Alice`	Returns matching contacts with their IDs
"What's Alice's contact ID?"	`GET /contactdb-api/api/contacts?search=Alice` → use `contacts[0].id`	Needed for all DataIndex queries about a person
"Find contact by email"	`GET /contactdb-api/api/contacts/by-email/{email}`	Direct lookup
"My recent meetings"	`GET /dataindex/api/v1/query?entity_types=meeting&contact_ids={my_id}`	Get `my_id` from `/contacts/me` first
"Emails with Alice"	`GET /dataindex/api/v1/query?entity_types=email&contact_ids={alice_id}`	Matches sender, to, or cc
"What was discussed about X?"	`POST /dataindex/api/v1/search` with `{"search_text": "X"}`	Semantic search across all data
"Zulip threads about hiring"	`GET /dataindex/api/v1/query?entity_types=threaded_conversation&search=hiring`	Text filter on content
"My calendar this week"	`GET /dataindex/api/v1/query?entity_types=calendar_event&contact_ids={my_id}&date_from=...&date_to=...`	Set date range
"Who are the most active contacts?"	`GET /contactdb-api/api/contacts?sort_by=hotness&min_hotness=50`	Hotness = 0-100 interaction score
"What connectors are available?"	`GET /dataindex/api/v1/connectors/status`	Lists all data sources and sync status

Key pattern: Any question about "me" / "my" / "I" requires calling GET /contactdb-api/api/contacts/me first to get your contact_id, then using that ID in subsequent DataIndex queries.

Workflow

When to create a marimo notebook

Any request that involves analysis over a period of time (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a large volume of data — too much to process inline. In these cases, always produce a marimo notebook (a .py file following the patterns in notebook-patterns.md).

Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".

If you're unsure whether a question is simple enough to answer directly or needs a notebook, ask the user.

Always create a new workflow

When the user requests a workflow, always create a new notebook file. Do not modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.

File naming and location

All notebooks go in the workflows/ directory. Use a sequential number prefix so workflows stay ordered by creation:

workflows/<NNN>_<topic>_<scope>.py

<NNN> — zero-padded sequence number (001, 002, …). Look at existing files in workflows/ to determine the next number.
<topic> — what is being analyzed, in snake_case (e.g., greyhaven_meetings, alice_emails, hiring_discussions)
<scope> — time range or qualifier (e.g., january, q1_2026, last_30d, all_time)

Examples:

workflows/001_greyhaven_meetings_january.py
workflows/002_alice_emails_q1_2026.py
workflows/003_hiring_discussions_last_30d.py
workflows/004_team_interaction_timeline_all_time.py

Before creating a new workflow, list existing files in workflows/ to find the highest number and increment it.

Plan before you implement

Before writing any notebook, always propose a plan first and get the user's approval. The plan should describe:

Goal — What question are we answering?
Data sources — Which entity types and API endpoints will be used?
Algorithm / ETL steps — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
Output format — Table columns, charts, or summary statistics the user will see.

Only proceed to implementation after the user confirms the plan.

Validate before delivering

After writing or editing a notebook, always run uvx marimo check to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):

uvx marimo check workflows/NNN_topic_scope.py

A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.

Steps

Identify people — Use ContactDB to resolve names/emails to contact_id values. For "me"/"my" questions, always start with GET /api/contacts/me.
Find data — Use DataIndex GET /query (exhaustive, paginated) or POST /search (semantic, ranked) with contact_ids, entity_types, date_from/date_to, connector_ids filters.
Analyze — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see notebook-patterns.md).

Quick Example (Python)

"Find all emails involving Alice since January"

import httpx

CONTACTDB = "http://localhost:42000/contactdb-api"
DATAINDEX = "http://localhost:42000/dataindex/api/v1"
client = httpx.Client(timeout=30)

# 1. Resolve "Alice" to a contact_id
resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
alice_id = resp.json()["contacts"][0]["id"]  # e.g. 42

# 2. Fetch all emails involving Alice (with pagination)
emails = []
offset = 0
while True:
    resp = client.get(f"{DATAINDEX}/query", params={
        "entity_types": "email",
        "contact_ids": str(alice_id),
        "date_from": "2025-01-01T00:00:00Z",
        "limit": 50,
        "offset": offset,
    })
    data = resp.json()
    emails.extend(data["items"])
    if offset + 50 >= data["total"]:
        break
    offset += 50

print(f"Found {len(emails)} emails involving Alice")

File Index

company-context.md — Business context, team structure, vocabulary
contactdb-api.md — ContactDB entities and REST endpoints
dataindex-api.md — DataIndex entity types, query modes, REST endpoints
connectors-and-sources.md — Connector-to-entity-type mapping
notebook-patterns.md — Marimo notebook patterns and common API workflows

8.1 KiB Raw Blame History