chore: remove unused docs/ directory

2026-02-20 15:11:22 -06:00
parent 864038a60e
commit fc3bdf86ba
5 changed files with 0 additions and 1316 deletions
--- a/docs/company-context.md
+++ b/docs/company-context.md
@@ -1,43 +0,0 @@
-# Company Context
-
-## About Monadical
-
-Monadical is a software consultancy founded in 2016. The company operates across multiple locations: Montreal and Vancouver (Canada), and Medellin and Cali (Colombia). The team builds internal products alongside client work.
-
-### Internal Products
-
- **Reflector** — Meeting recording and transcription tool (produces meeting entities in DataIndex)
- **GreyHaven / InternalAI platform** — A local-first platform that aggregates personal data, resolve contact to do automation and analysis
-
-## Communication Tools
-
-| Tool       | Role                        | Data in DataIndex?  |
-|------------|-----------------------------|---------------------|
-| Zulip      | Primary internal chat       | Yes (connector: `zulip`) |
-| Fastmail/Email | External communication      | Yes (connector: `mbsync_email`) |
-| Calendar   | Scheduling (ICS feeds)      | Yes (connector: `ics_calendar`) |
-| Reflector  | Meeting recordings          | Yes (connector: `reflector`) |
-| HedgeDoc   | Collaborative documents     | Yes (connector: `hedgedoc`) |
-
-## How the company is working
-
-We use zulip as our main hub for communication. Zulip have channels (top level) and topic (low level). Depending the channels, differents behavior have to be adopted.
-
-### Zulip channels
-
-Here is a list of zulip stream prefix with context on how the company is organized:
-
- InternalAI (zulip:stream:193) is about this specific platform.
- Leads (zulip:stream:78) is where we talk about our leads/client. We usually create one topic per lead/client - So if you are searching information about a client, always have a look if a related topic exist, that match the client or the company name.
- Checkins (zulip:stream:24) are usually one topic per employee. This is where an employee indicate what it did or will do during a period of time, or just some status update. Not everybody is using the system on regular basis.
- Devcap (zulip:stream:156) is where we are talking about our investment / due diligence before investing. One topic per company.
- General (zulip:stream:21) is where we talk about different topic on various subject, company wide or services.
- Enginerring (zulip:stream:25) is where we talk about enginerring issue / services / new tool to try
- Learning (zulip:stream:31) is where we share links about new tools / ideas or stuff to learn about
- Reflector (zulip:stream:155) dedicated stream about reflector development and usage
- GreyHaven is separated in multiple topics: branding is in (zulip:stream:206), leads specific to greyhaven (zulip:stream:208) with one topic per lead, and marketing (zulip:stream:212)
-
-### Meeting and Calendar
-
-Some persons in the company have a dedicated room for their meeting in reflector. This can be seen in `room_name` in `meeting` entity.
-For person like Max, dataindex have calendar information, and he mostly have a related meeting that will be in reflector. However, there is no direct relation between calendar information and reflector meeting. A correlation has to be done to figure out which meeting is it when talking about an event.
--- a/docs/connectors-and-sources.md
+++ b/docs/connectors-and-sources.md
@@ -1,99 +0,0 @@
-# Connectors and Data Sources
-
-Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh.
-
-Use `list_connectors()` at runtime to see which connectors are actually configured — not all connectors below may be active in every deployment.
-
-## Connector → Entity Type Mapping
-
-| Connector ID     | Entity Types Produced                                           | Description                      |
-|------------------|-----------------------------------------------------------------|----------------------------------|
-| `reflector`      | `meeting`                                                       | Meeting recordings + transcripts |
-| `ics_calendar`   | `calendar_event`                                                | ICS calendar feed events         |
-| `mbsync_email`   | `email`                                                         | Email via mbsync IMAP sync       |
-| `zulip`          | `conversation`, `conversation_message`, `threaded_conversation` | Zulip chat streams and topics    |
-| `babelfish`      | `conversation_message`, `threaded_conversation`                 | Chat translation bridge          |
-| `hedgedoc`       | `document`                                                      | HedgeDoc collaborative documents |
-| `contactdb`      | `contact`                                                       | Synced from ContactDB (static)   |
-| `browser_history`| `webpage`                                                       | Browser extension page visits    |
-| `api_document`   | `document`                                                      | API-ingested documents (static)  |
-
-## Per-Connector Details
-
-### `reflector` — Meeting Recordings
-
-Ingests meetings from Reflector, Monadical's meeting recording tool.
-
- **Entity type:** `meeting`
- **Key fields:** `transcript`, `summary`, `participants`, `start_time`, `end_time`, `room_name`
- **Use cases:** Find meetings someone attended, search meeting transcripts, get summaries
- **Tip:** Filter with `contact_ids` to find meetings involving specific people. The `transcript` field contains speaker-diarized text.
-
-### `ics_calendar` — Calendar Events
-
-Parses ICS calendar feeds (Google Calendar, Outlook, etc.).
-
- **Entity type:** `calendar_event`
- **Key fields:** `start_time`, `end_time`, `attendees`, `location`, `description`, `calendar_name`
- **Use cases:** Check upcoming events, find events with specific attendees, review past schedule
- **Tip:** Multiple calendar feeds may be configured as separate connectors (e.g., `personal_calendar`, `work_calendar`). Use `list_connectors()` to discover them.
-
-### `mbsync_email` — Email
-
-Syncs email via mbsync (IMAP).
-
- **Entity type:** `email`
- **Key fields:** `text_content`, `from_contact_id`, `to_contact_ids`, `cc_contact_ids`, `thread_id`, `has_attachments`
- **Use cases:** Find emails from/to someone, search email content, track email threads
- **Tip:** Use `from_contact_id` and `to_contact_ids` with `contact_ids` filter. For thread grouping, use the `thread_id` field.
-
-### `zulip` — Chat
-
-Ingests Zulip streams, topics, and messages.
-
- **Entity types:**
-  - `conversation` — A Zulip stream/channel with recent messages
-  - `conversation_message` — Individual chat messages
-  - `threaded_conversation` — A topic thread within a stream
- **Key fields:** `message`, `mentioned_contact_ids`, `recent_messages`
- **Use cases:** Find discussions about a topic, track who said what, find @-mentions
- **Tip:** Use `threaded_conversation` to find topic-level discussions. Use `conversation_message` with `mentioned_contact_ids` to find messages that mention specific people.
-
-### `babelfish` — Translation Bridge
-
-Ingests translated chat messages from the Babelfish service.
-
- **Entity types:** `conversation_message`, `threaded_conversation`
- **Use cases:** Similar to Zulip but for translated cross-language conversations
- **Tip:** Query alongside `zulip` connector for complete conversation coverage.
-
-### `hedgedoc` — Collaborative Documents
-
-Syncs documents from HedgeDoc (collaborative markdown editor).
-
- **Entity type:** `document`
- **Key fields:** `content`, `description`, `url`, `revision_id`
- **Use cases:** Find documents by content, track document revisions
- **Tip:** Use `search()` for semantic document search rather than `query_entities` text filter.
-
-### `contactdb` — Contact Sync (Static)
-
-Mirrors contacts from ContactDB into DataIndex for unified search.
-
- **Entity type:** `contact`
- **Note:** This is a read-only mirror. Use ContactDB MCP tools directly for contact operations.
-
-### `browser_history` — Browser Extension (Static)
-
-Captures visited webpages from a browser extension.
-
- **Entity type:** `webpage`
- **Key fields:** `url`, `visit_time`, `text_content`
- **Use cases:** Find previously visited pages, search page content
-
-### `api_document` — API Documents (Static)
-
-Documents ingested via the REST API (e.g., uploaded PDFs, imported files).
-
- **Entity type:** `document`
- **Note:** These are ingested via `POST /api/v1/ingest/documents`, not periodic sync.
--- a/docs/contactdb-api.md
+++ b/docs/contactdb-api.md
@@ -1,154 +0,0 @@
-# ContactDB API Reference
-
-ContactDB is the people directory. It stores contacts, their platform identities, relationships, notes, and links. Every person across all data sources resolves to a single ContactDB `contact_id`.
-
-**Base URL:** `http://localhost:42000/contactdb-api/` (direct) or `http://caddy/contactdb-api/` (via greywall sandbox)
-
-## Core Entities
-
-### Contact
-
-The central entity — represents a person.
-
-| Field                | Type                | Description                                    |
-|----------------------|---------------------|------------------------------------------------|
-| `id`                 | int                 | Unique contact ID                              |
-| `name`               | string              | Display name                                   |
-| `emails`             | EmailField[]        | `{type, value, preferred}`                     |
-| `phones`             | PhoneField[]        | `{type, value, preferred}`                     |
-| `bio`                | string?             | Short biography                                |
-| `avatar_url`         | string?             | Profile image URL                              |
-| `personal_info`      | PersonalInfo        | Birthday, partner, children, role, company, location, how_we_met |
-| `interests`          | string[]            | Topics of interest                             |
-| `values`             | string[]            | Personal values                                |
-| `tags`               | string[]            | User-assigned tags                             |
-| `profile_description`| string?             | Extended description                           |
-| `is_placeholder`     | bool                | Auto-created stub (not yet fully resolved)     |
-| `is_service_account` | bool                | Non-human account (bot, no-reply)              |
-| `stats`              | ContactStats        | Interaction statistics (see below)             |
-| `enrichment_data`    | dict                | Data from enrichment providers                 |
-| `platform_identities`| PlatformIdentity[]  | Identities on various platforms                |
-| `created_at`         | datetime            | When created                                   |
-| `updated_at`         | datetime            | Last modified                                  |
-| `merged_into_id`     | int?                | If merged, target contact ID                   |
-| `deleted_at`         | datetime?           | Soft-delete timestamp                          |
-
-### ContactStats
-
-| Field                    | Type          | Description                          |
-|--------------------------|---------------|--------------------------------------|
-| `total_messages`         | int           | Total messages across platforms       |
-| `platforms_count`        | int           | Number of platforms active on         |
-| `last_interaction_at`    | string?       | ISO datetime of last interaction      |
-| `interaction_count_30d`  | int           | Interactions in last 30 days          |
-| `interaction_count_90d`  | int           | Interactions in last 90 days          |
-| `hotness`                | HotnessScore? | Composite engagement score (0-100)   |
-
-### PlatformIdentity
-
-Links a contact to a specific platform account.
-
-| Field              | Type      | Description                              |
-|--------------------|-----------|------------------------------------------|
-| `id`               | int       | Identity record ID                       |
-| `contact_id`       | int       | Parent contact                           |
-| `source`           | string    | Data provenance (e.g., `dataindex_zulip`)|
-| `platform`         | string    | Platform name (e.g., `email`, `zulip`)   |
-| `platform_user_id` | string    | User ID on that platform                 |
-| `display_name`     | string?   | Name shown on that platform              |
-| `avatar_url`       | string?   | Platform-specific avatar                 |
-| `bio`              | string?   | Platform-specific bio                    |
-| `extra_data`       | dict      | Additional platform-specific data        |
-| `first_seen_at`    | datetime  | When first observed                      |
-| `last_seen_at`     | datetime  | When last observed                       |
-
-### Relationship
-
-Tracks connections between contacts.
-
-| Field                  | Type      | Description                          |
-|------------------------|-----------|--------------------------------------|
-| `id`                   | int       | Relationship ID                      |
-| `from_contact_id`      | int       | Source contact                       |
-| `to_contact_id`        | int       | Target contact                       |
-| `relationship_type`    | string    | Type (e.g., "colleague", "client")   |
-| `since_date`           | date?     | When relationship started            |
-| `relationship_metadata`| dict      | Additional metadata                  |
-
-### Note
-
-Free-text notes attached to a contact.
-
-| Field        | Type     | Description          |
-|--------------|----------|----------------------|
-| `id`         | int      | Note ID              |
-| `contact_id` | int      | Parent contact       |
-| `content`    | string   | Note text            |
-| `created_by` | string   | Who wrote it         |
-| `created_at` | datetime | When created         |
-
-### Link
-
-External URLs associated with a contact.
-
-| Field        | Type     | Description              |
-|--------------|----------|--------------------------|
-| `id`         | int      | Link ID                  |
-| `contact_id` | int      | Parent contact           |
-| `type`       | string   | Link type (e.g., "github", "linkedin") |
-| `label`      | string   | Display label            |
-| `url`        | string   | URL                      |
-
-## REST Endpoints
-
-### GET `/api/contacts` — List/search contacts
-
-Primary way to find contacts. Returns `{contacts: [...], total, limit, offset}`.
-
-**Query parameters:**
-
-| Parameter              | Type          | Description                                  |
-|------------------------|---------------|----------------------------------------------|
-| `search`               | string?       | Search in name and bio                       |
-| `is_placeholder`       | bool?         | Filter by placeholder status                 |
-| `is_service_account`   | bool?         | Filter by service account status             |
-| `sort_by`              | string?       | `"hotness"`, `"name"`, or `"updated_at"`     |
-| `min_hotness`          | float?        | Minimum hotness score (0-100)                |
-| `max_hotness`          | float?        | Maximum hotness score (0-100)                |
-| `platforms`            | string[]?     | Contacts with ALL specified platforms (AND)  |
-| `last_interaction_from`| string?       | ISO datetime lower bound                     |
-| `last_interaction_to`  | string?       | ISO datetime upper bound                     |
-| `limit`                | int           | Max results (1-100, default 50)              |
-| `offset`               | int           | Pagination offset (default 0)                |
-
-### GET `/api/contacts/me` — Get self contact
-
-Returns the platform operator's own contact record. **Call this first** in most workflows to get your own `contact_id`.
-
-### GET `/api/contacts/{id}` — Get contact by ID
-
-Get full details for a single contact by numeric ID.
-
-### GET `/api/contacts/by-email/{email}` — Get contact by email
-
-Look up a contact by email address.
-
-### Other Endpoints
-
-| Method | Path                                    | Description                      |
-|--------|-----------------------------------------|----------------------------------|
-| POST   | `/api/contacts`                         | Create contact                   |
-| PUT    | `/api/contacts/{id}`                    | Update contact                   |
-| DELETE | `/api/contacts/{id}`                    | Delete contact                   |
-| POST   | `/api/contacts/merge`                   | Merge two contacts               |
-| GET    | `/api/contacts/{id}/relationships`      | List relationships               |
-| GET    | `/api/contacts/{id}/notes`              | List notes                       |
-| GET    | `/api/contacts/{id}/links`              | List links                       |
-| GET    | `/api/platform-identities/contacts/{id}`| List platform identities         |
-
-## Usage Pattern
-
-1. **Start with `GET /api/contacts/me`** to get the operator's contact ID
-2. **Search by name** with `GET /api/contacts?search=Alice`
-3. **Use contact IDs** from results as filters in DataIndex queries (`contact_ids` parameter)
-4. **Paginate** large result sets with `offset` increments
--- a/docs/dataindex-api.md
+++ b/docs/dataindex-api.md
@@ -1,218 +0,0 @@
-# DataIndex API Reference
-
-DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an **entity** with a common base structure plus type-specific fields.
-
-**Base URL:** `http://localhost:42000/dataindex/api/v1/` (direct) or `http://caddy/dataindex/api/v1/` (via greywall sandbox)
-
-## Entity Types
-
-All entities share these base fields:
-
-| Field                | Type        | Description                                 |
-|----------------------|-------------|---------------------------------------------|
-| `id`                 | string      | Format: `connector_name:native_id`          |
-| `entity_type`        | string      | One of the types below                      |
-| `timestamp`          | datetime    | When the entity occurred                    |
-| `contact_ids`        | string[]    | ContactDB IDs of people involved            |
-| `connector_id`       | string      | Which connector produced this               |
-| `title`              | string?     | Display title                               |
-| `parent_id`          | string?     | Parent entity (e.g., thread for a message)  |
-| `raw_data`           | dict        | Original source data (excluded by default)  |
-
-### `calendar_event`
-
-From ICS calendar feeds.
-
-| Field                 | Type        | Description                    |
-|-----------------------|-------------|--------------------------------|
-| `start_time`          | datetime?   | Event start                    |
-| `end_time`            | datetime?   | Event end                      |
-| `all_day`             | bool        | All-day event flag             |
-| `description`         | string?     | Event description              |
-| `location`            | string?     | Event location                 |
-| `attendees`           | dict[]      | Attendee list                  |
-| `organizer_contact_id`| string?     | ContactDB ID of organizer      |
-| `status`              | string?     | Event status                   |
-| `calendar_name`       | string?     | Source calendar name           |
-| `meeting_url`         | string?     | Video call link                |
-
-### `meeting`
-
-From Reflector (recorded meetings with transcripts).
-
-| Field              | Type                | Description                       |
-|--------------------|---------------------|-----------------------------------|
-| `start_time`       | datetime?           | Meeting start                     |
-| `end_time`         | datetime?           | Meeting end                       |
-| `participants`     | MeetingParticipant[]| People in the meeting             |
-| `meeting_platform` | string?             | Platform (e.g., "jitsi")          |
-| `transcript`       | string?             | Full transcript text              |
-| `summary`          | string?             | AI-generated summary              |
-| `meeting_url`      | string?             | Meeting link                      |
-| `recording_url`    | string?             | Recording link                    |
-| `location`         | string?             | Physical location                 |
-| `room_name`        | string?             | Virtual room name (also indicates meeting location — see below) |
-
-**MeetingParticipant** fields: `display_name`, `contact_id?`, `platform_user_id?`, `email?`, `speaker?`
-
-> **`room_name` as location indicator:** The `room_name` field often encodes where the meeting took place (e.g., a Jitsi room name like `standup-office-bogota`). Use it to infer the meeting location when `location` is not set.
-
-> **Participant and contact coverage is incomplete.** Meeting data comes from Reflector, which only tracks users who are logged into the Reflector platform. This means:
->
-> - **`contact_ids`** only contains ContactDB IDs for Reflector-logged participants who were matched to a known contact. It will often be a **subset** of the actual attendees — do not assume it is the full list.
-> - **`participants`** is more complete than `contact_ids` but still only includes people detected by Reflector. Not all participants have accounts or could be identified — some attendees may be entirely absent from this list.
-> - **`contact_id` within a participant** may be `null` if the person was detected but couldn't be matched to a ContactDB entry.
->
-> **Consequence for queries:** Filtering meetings by `contact_ids` will **miss meetings** where the person attended but wasn't logged into Reflector or wasn't resolved. To get better coverage, combine multiple strategies:
->
-> 1. Filter by `contact_ids` for resolved participants
-> 2. Search `participants[].display_name` client-side for name matches
-> 3. Use `POST /search` with the person's name to search meeting transcripts and summaries
-
-### `email`
-
-From mbsync email sync.
-
-| Field              | Type      | Description                          |
-|--------------------|-----------|--------------------------------------|
-| `thread_id`        | string?   | Email thread grouping                |
-| `text_content`     | string?   | Plain text body                      |
-| `html_content`     | string?   | HTML body                            |
-| `snippet`          | string?   | Preview snippet                      |
-| `from_contact_id`  | string?   | Sender's ContactDB ID               |
-| `to_contact_ids`   | string[]  | Recipient ContactDB IDs             |
-| `cc_contact_ids`   | string[]  | CC recipient ContactDB IDs          |
-| `has_attachments`  | bool      | Has attachments flag                 |
-| `attachments`      | dict[]    | Attachment metadata                  |
-
-### `conversation`
-
-A Zulip stream/channel.
-
-| Field              | Type    | Description                            |
-|--------------------|---------|----------------------------------------|
-| `recent_messages`  | dict[]  | Recent messages in the conversation    |
-
-### `conversation_message`
-
-A single message in a Zulip conversation.
-
-| Field                   | Type      | Description                       |
-|-------------------------|-----------|-----------------------------------|
-| `message`               | string?   | Message text content              |
-| `mentioned_contact_ids` | string[]  | ContactDB IDs of mentioned people |
-
-### `threaded_conversation`
-
-A Zulip topic thread (group of messages under a topic).
-
-| Field              | Type    | Description                            |
-|--------------------|---------|----------------------------------------|
-| `recent_messages`  | dict[]  | Recent messages in the thread          |
-
-### `document`
-
-From HedgeDoc, API ingestion, or other document sources.
-
-| Field          | Type      | Description                  |
-|----------------|-----------|------------------------------|
-| `content`      | string?   | Document body text           |
-| `description`  | string?   | Document description         |
-| `mimetype`     | string?   | MIME type                    |
-| `url`          | string?   | Source URL                   |
-| `revision_id`  | string?   | Revision identifier          |
-
-### `webpage`
-
-From browser history extension.
-
-| Field          | Type      | Description                  |
-|----------------|-----------|------------------------------|
-| `url`          | string    | Page URL                     |
-| `visit_time`   | datetime  | When visited                 |
-| `text_content` | string?   | Page text content            |
-
-## REST Endpoints
-
-### GET `/api/v1/query` — Exhaustive Filtered Enumeration
-
-Use when you need **all** entities matching specific criteria. Supports pagination.
-
-**When to use:** "List all meetings since January", "Get all emails from Alice", "Count calendar events this week"
-
-**Query parameters:**
-
-| Parameter        | Type          | Description                                    |
-|------------------|---------------|------------------------------------------------|
-| `entity_types`   | string (repeat) | Filter by type — repeat param for multiple: `?entity_types=email&entity_types=meeting` |
-| `contact_ids`    | string        | Comma-separated ContactDB IDs: `"1,42"`        |
-| `connector_ids`  | string        | Comma-separated connector IDs: `"zulip,reflector"` |
-| `date_from`      | string        | ISO datetime lower bound (UTC if no timezone)  |
-| `date_to`        | string        | ISO datetime upper bound                       |
-| `search`         | string?       | Text filter on content fields                  |
-| `parent_id`      | string?       | Filter by parent entity                        |
-| `thread_id`      | string?       | Filter emails by thread ID                     |
-| `room_name`      | string?       | Filter meetings by room name                   |
-| `limit`          | int           | Max results per page (default 50)              |
-| `offset`         | int           | Pagination offset (default 0)                  |
-| `sort_by`        | string        | `"timestamp"` (default), `"title"`, `"contact_activity"`, etc. |
-| `sort_order`     | string        | `"desc"` (default) or `"asc"`                  |
-| `include_raw_data`| bool         | Include raw_data field (default false)         |
-
-**Response format:**
-
-```json
-{
-  "items": [...],
-  "total": 152,
-  "page": 1,
-  "size": 50,
-  "pages": 4
-}
-```
-
-**Pagination:** loop with offset increments until `offset >= total`. See [notebook-patterns.md] for a reusable helper.
-
-### POST `/api/v1/search` — Semantic Search
-
-Use when you need **relevant** results for a natural-language question. Returns ranked text chunks. No pagination — set a higher `limit` instead.
-
-**When to use:** "What was discussed about the product roadmap?", "Find conversations about hiring"
-
-**Request body (JSON):**
-
-```json
-{
-  "search_text": "product roadmap decisions",
-  "entity_types": ["meeting", "threaded_conversation"],
-  "contact_ids": ["1", "42"],
-  "date_from": "2025-01-01T00:00:00Z",
-  "date_to": "2025-06-01T00:00:00Z",
-  "connector_ids": ["reflector", "zulip"],
-  "limit": 20
-}
-```
-
-**Response:** `{results: [...chunks], total_count}` — each chunk has `entity_ids`, `entity_type`, `connector_id`, `content`, `timestamp`.
-
-### GET `/api/v1/entities/{id}` — Get Entity by ID
-
-Retrieve full details of a single entity. The `entity_id` format is `connector_name:native_id`.
-
-### GET `/api/v1/connectors/status` — Connector Status
-
-Get sync status for all connectors (last sync time, entity count, health).
-
-## Common Query Recipes
-
-| Question                              | entity_type + connector_id               |
-|---------------------------------------|------------------------------------------|
-| Meetings I attended                   | `meeting` + `reflector`, with your contact_id |
-| Upcoming calendar events              | `calendar_event` + `ics_calendar`, date_from=now |
-| Emails from someone                   | `email` + `mbsync_email`, with their contact_id |
-| Zulip threads about a topic           | `threaded_conversation` + `zulip`, search="topic" |
-| All documents                         | `document` + `hedgedoc`                  |
-| Chat messages mentioning someone      | `conversation_message` + `zulip`, with contact_id |
-| What was discussed about X?           | Use `POST /search` with `search_text`    |
-
-[notebook-patterns.md]: ./notebook-patterns.md
--- a/docs/notebook-patterns.md
+++ b/docs/notebook-patterns.md
@@ -1,802 +0,0 @@
-# Marimo Notebook Patterns
-
-This guide covers how to create [marimo](https://marimo.io) notebooks for data analysis against the InternalAI platform APIs. Marimo notebooks are plain `.py` files with reactive cells — no `.ipynb` format, no Jupyter dependency.
-
-## Marimo Basics
-
-A marimo notebook is a Python file with `@app.cell` decorated functions. Each cell returns values as a tuple, and other cells receive them as function parameters — marimo builds a reactive DAG automatically.
-
-```python
-import marimo
-app = marimo.App()
-
-@app.cell
-def cell_one():
-    x = 42
-    return (x,)
-
-@app.cell
-def cell_two(x):
-    # Re-runs automatically when x changes
-    result = x * 2
-    return (result,)
-```
-
-**Key rules:**
- Cells declare dependencies via function parameters
- Cells return values as tuples: `return (var1, var2,)`
- The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below
- Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below)
- No manual execution order; the DAG determines it
- **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names.
- **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`.
-
-### Cell Variable Scoping — Example
-
-This is the **most common mistake**. Any variable assigned at the top level of a cell (not inside a `def` or comprehension) is tracked by marimo. If two cells assign the same name, the notebook refuses to run.
-
-**BROKEN** — `resp` is defined at top level in both cells:
-
-```python
-# Cell A
-@app.cell
-def search_meetings(client, DATAINDEX):
-    resp = client.post(f"{DATAINDEX}/search", json={...})  # defines 'resp'
-    resp.raise_for_status()
-    results = resp.json()["results"]
-    return (results,)
-
-# Cell B
-@app.cell
-def fetch_details(client, DATAINDEX, results):
-    resp = client.get(f"{DATAINDEX}/entities/{results[0]}")  # also defines 'resp' → ERROR
-    meeting = resp.json()
-    return (meeting,)
-```
-
-> **Error:** `MultipleDefinitionError: variable 'resp' is defined in multiple cells`
-
-**FIXED** — prefix cell-local variables with `_`:
-
-```python
-# Cell A
-@app.cell
-def search_meetings(client, DATAINDEX):
-    _resp = client.post(f"{DATAINDEX}/search", json={...})  # _resp is cell-private
-    _resp.raise_for_status()
-    results = _resp.json()["results"]
-    return (results,)
-
-# Cell B
-@app.cell
-def fetch_details(client, DATAINDEX, results):
-    _resp = client.get(f"{DATAINDEX}/entities/{results[0]}")  # _resp is cell-private, no conflict
-    meeting = _resp.json()
-    return (meeting,)
-```
-
-**Rule of thumb:** if a variable is only used within the cell to compute a return value, prefix it with `_`. Only leave names unprefixed if another cell needs to receive them.
-
-> **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell.
-
-### Cell Output Must Be at the Top Level
-
-Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded.
-
-**BROKEN** — `_df` inside the `if` branch is never rendered, and `mo.md()` inside `if`/`else` is also discarded:
-
-```python
-@app.cell
-def show_results(results, mo):
-    if results:
-        _df = pl.DataFrame(results)
-        mo.md(f"**Found {len(results)} results**")
-        _df  # Inside an if block — marimo does NOT display this
-    else:
-        mo.md("**No results found**")  # Also inside a block — NOT displayed
-    return
-```
-
-**FIXED** — split into separate cells. Each cell displays exactly **one thing** at the top level:
-
-```python
-# Cell 1: build the data, return it
-@app.cell
-def build_results(results, pl):
-    results_df = pl.DataFrame(results) if results else None
-    return (results_df,)
-
-# Cell 2: heading — mo.md() is the top-level expression (use ternary for conditional text)
-@app.cell
-def show_results_heading(results_df, mo):
-    mo.md(f"**Found {len(results_df)} results**" if results_df is not None else "**No results found**")
-
-# Cell 3: table — DataFrame is the top-level expression
-@app.cell
-def show_results_table(results_df):
-    results_df  # Top-level expression — marimo renders this as interactive table
-```
-
-**Rules:**
- Each cell should display **one thing** — either `mo.md()` OR a DataFrame, never both
- `mo.md()` must be a **top-level expression**, not inside `if`/`else`/`for`/`try` blocks
- Build conditional text using variables or ternary expressions, then call `mo.md(_text)` at the top level
- For DataFrames, use a standalone display cell: `def show_table(df): df`
-
-### Async Cells
-
-When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`:
-
-```python
-@app.cell
-async def analyze(meetings, llm_call, ResponseModel, asyncio):
-    async def _score(meeting):
-        return await llm_call(prompt=..., response_model=ResponseModel)
-
-    results = await asyncio.gather(*[_score(_m) for _m in meetings])
-    return (results,)
-```
-
-Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells.
-
-If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below.
-
-### Cells That Define Classes Must Return Them
-
-If a cell defines Pydantic models (or any class) that other cells need, it **must** return them:
-
-```python
-# BaseModel and Field are imported in the setup cell and received as parameters
-@app.cell
-def models(BaseModel, Field):
-    class MeetingSentiment(BaseModel):
-        overall_sentiment: str
-        sentiment_score: int = Field(description="Score from -10 to +10")
-
-    class FrustrationExtraction(BaseModel):
-        has_frustrations: bool
-        frustrations: list[dict]
-
-    return MeetingSentiment, FrustrationExtraction  # Other cells receive these as parameters
-```
-
-A bare `return` (or no return) means those classes are invisible to the rest of the notebook.
-
-### Fixing `_unparsable_cell`
-
-When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`.
-
-**Common causes:**
-1. Using `await` without making the cell `async def`
-2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1)
-
-**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function:
-
-```python
-# BROKEN — saved as _unparsable_cell because of top-level await
-app._unparsable_cell("""
-results = await asyncio.gather(...)
-return results
-""", name="my_cell")
-
-# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
-@app.cell
-async def my_cell(some_dependency, asyncio):
-    results = await asyncio.gather(...)
-    return (results,)
-```
-
-**Key differences to note when converting:**
- Wrap the code in an `async def` function (if it uses `await`)
- Add cell dependencies as function parameters (including imports like `asyncio`)
- Return values as tuples: `return (var,)` not `return var`
- Prefix cell-local variables with `_`
- Never add `import` statements inside the cell — all imports belong in `setup`
-
-### Inline Dependencies with PEP 723
-
-Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
-
-```python
-# /// script
-# requires-python = ">=3.12"
-# dependencies = [
-#     "marimo",
-#     "httpx",
-#     "polars",
-#     "mirascope[openai]",
-#     "pydantic",
-#     "python-dotenv",
-# ]
-# ///
-```
-
-### Checking Notebooks Before Running
-
-Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor:
-
-```bash
-uvx marimo check notebook.py           # Check a single notebook
-uvx marimo check workflows/            # Check all notebooks in a directory
-uvx marimo check --fix notebook.py     # Auto-fix fixable issues
-```
-
-**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
-
-### Running Notebooks
-
-```bash
-uvx marimo edit notebook.py   # Interactive editor (best for development)
-uvx marimo run notebook.py    # Read-only web app
-uv run notebook.py            # Script mode (terminal output)
-```
-
-### Inspecting Cell Outputs
-
-In `marimo edit`, every cell's return value is displayed as rich output below the cell. This is the primary way to introspect API responses:
-
- **Dicts/lists** render as collapsible JSON trees — click to expand nested fields
- **Polars/Pandas DataFrames** render as interactive sortable tables
- **Strings** render as plain text
-
-To inspect a raw API response, just make it the last expression:
-
-```python
-@app.cell
-def inspect_response(client, DATAINDEX):
-    _resp = client.get(f"{DATAINDEX}/query", params={
-        "entity_types": "meeting", "limit": 2,
-    })
-    _resp.json()  # This gets displayed as a collapsible JSON tree
-```
-
-To inspect an intermediate value alongside other work, use `mo.accordion` or return it:
-
-```python
-@app.cell
-def debug_meetings(meetings, mo):
-    mo.md(f"**Count:** {len(meetings)}")
-    # Show first item structure for inspection
-    mo.accordion({"First meeting raw": mo.json(meetings[0])}) if meetings else None
-```
-
-## Notebook Skeleton
-
-Every notebook against InternalAI follows this structure:
-
-```python
-# /// script
-# requires-python = ">=3.12"
-# dependencies = [
-#     "marimo",
-#     "httpx",
-#     "polars",
-#     "mirascope[openai]",
-#     "pydantic",
-#     "python-dotenv",
-# ]
-# ///
-
-import marimo
-app = marimo.App()
-
-@app.cell
-def params():
-    """User parameters — edit these to change the workflow's behavior."""
-    SEARCH_TERMS = ["greyhaven"]
-    DATE_FROM = "2026-01-01T00:00:00Z"
-    DATE_TO = "2026-02-01T00:00:00Z"
-    TARGET_PERSON = None  # Set to a name like "Alice" to filter by person, or None for all
-    return DATE_FROM, DATE_TO, SEARCH_TERMS, TARGET_PERSON
-
-@app.cell
-def config():
-    BASE = "http://localhost:42000"
-    CONTACTDB = f"{BASE}/contactdb-api"
-    DATAINDEX = f"{BASE}/dataindex/api/v1"
-    return (CONTACTDB, DATAINDEX,)
-
-@app.cell
-def setup():
-    from dotenv import load_dotenv
-    load_dotenv(".env")  # Load .env from the project root
-
-    import asyncio  # All imports go here — never import inside other cells
-    import httpx
-    import marimo as mo
-    import polars as pl
-    from pydantic import BaseModel, Field
-    client = httpx.Client(timeout=30)
-    return (asyncio, client, mo, pl, BaseModel, Field,)
-
-# --- your IN / ETL / OUT cells here ---
-
-if __name__ == "__main__":
-    app.run()
-```
-
-> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell.
-
-**The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.
-
-## Pagination Helper
-
-The DataIndex `GET /query` endpoint paginates with `limit` and `offset`. Always paginate — result sets can be large.
-
-```python
-@app.cell
-def helpers(client):
-    def fetch_all(url, params):
-        """Fetch all pages from a paginated DataIndex endpoint."""
-        all_items = []
-        limit = params.get("limit", 50)
-        params = {**params, "limit": limit, "offset": 0}
-        while True:
-            resp = client.get(url, params=params)
-            resp.raise_for_status()
-            data = resp.json()
-            all_items.extend(data["items"])
-            if params["offset"] + limit >= data["total"]:
-                break
-            params["offset"] += limit
-        return all_items
-
-    def resolve_contact(name, contactdb_url):
-        """Find a contact by name, return their ID."""
-        resp = client.get(f"{contactdb_url}/api/contacts", params={"search": name})
-        resp.raise_for_status()
-        contacts = resp.json()["contacts"]
-        if not contacts:
-            raise ValueError(f"No contact found for '{name}'")
-        return contacts[0]
-
-    return (fetch_all, resolve_contact,)
-```
-
-## Pattern 1: Emails Involving a Specific Person
-
-Emails have `from_contact_id`, `to_contact_ids`, and `cc_contact_ids`. The query API's `contact_ids` filter matches entities where the contact appears in **any** of these roles.
-
-```python
-@app.cell
-def find_person(resolve_contact, CONTACTDB):
-    target = resolve_contact("Alice", CONTACTDB)
-    target_id = target["id"]
-    target_name = target["name"]
-    return (target_id, target_name,)
-
-@app.cell
-def fetch_emails(fetch_all, DATAINDEX, target_id):
-    emails = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "email",
-        "contact_ids": str(target_id),
-        "date_from": "2025-01-01T00:00:00Z",
-        "sort_order": "desc",
-    })
-    return (emails,)
-
-@app.cell
-def email_table(emails, target_id, target_name, pl):
-    email_df = pl.DataFrame([{
-        "date": e["timestamp"][:10],
-        "subject": e.get("title", "(no subject)"),
-        "direction": (
-            "sent" if str(target_id) == str(e.get("from_contact_id"))
-            else "received"
-        ),
-        "snippet": (e.get("snippet") or e.get("text_content") or "")[:100],
-    } for e in emails])
-    return (email_df,)
-
-@app.cell
-def show_emails(email_df, target_name, mo):
-    mo.md(f"## Emails involving {target_name} ({len(email_df)} total)")
-
-@app.cell
-def display_email_table(email_df):
-    email_df  # Renders as interactive table in marimo edit
-```
-
-## Pattern 2: Meetings with a Specific Participant
-
-Meetings have a `participants` list where each entry may or may not have a resolved `contact_id`. The query API's `contact_ids` filter only matches **resolved** participants.
-
-**Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones.
-
-> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`.
-
-```python
-@app.cell
-def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
-    # Get meetings where the target appears in contact_ids
-    resolved_meetings = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "meeting",
-        "contact_ids": str(target_id),
-        "date_from": "2025-01-01T00:00:00Z",
-    })
-    return (resolved_meetings,)
-
-@app.cell
-def meeting_table(resolved_meetings, target_name, pl):
-    _rows = []
-    for _m in resolved_meetings:
-        _participants = _m.get("participants", [])
-        _names = [_p["display_name"] for _p in _participants]
-        _rows.append({
-            "date": (_m.get("start_time") or _m["timestamp"])[:10],
-            "title": _m.get("title", "Untitled"),
-            "room_name": _m.get("room_name", ""),
-            "participants": ", ".join(_names),
-            "has_transcript": _m.get("transcript") is not None,
-            "has_summary": _m.get("summary") is not None,
-        })
-    meeting_df = pl.DataFrame(_rows)
-    return (meeting_df,)
-```
-
-To also find meetings where the person was present but **not resolved** (guest), search the transcript:
-
-```python
-@app.cell
-def search_unresolved(client, DATAINDEX, target_name):
-    # Semantic search for the person's name in meeting transcripts
-    _resp = client.post(f"{DATAINDEX}/search", json={
-        "search_text": target_name,
-        "entity_types": ["meeting"],
-        "limit": 50,
-    })
-    _resp.raise_for_status()
-    transcript_hits = _resp.json()["results"]
-    return (transcript_hits,)
-```
-
-## Pattern 3: Calendar Events → Meeting Correlation
-
-Calendar events and meetings are separate entities from different connectors. To find which calendar events had a corresponding recorded meeting, match by time overlap.
-
-```python
-@app.cell
-def fetch_calendar_and_meetings(fetch_all, DATAINDEX, my_id):
-    events = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "calendar_event",
-        "contact_ids": str(my_id),
-        "date_from": "2025-01-01T00:00:00Z",
-        "sort_by": "timestamp",
-        "sort_order": "asc",
-    })
-    meetings = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "meeting",
-        "contact_ids": str(my_id),
-        "date_from": "2025-01-01T00:00:00Z",
-    })
-    return (events, meetings,)
-
-@app.cell
-def correlate(events, meetings, pl):
-    from datetime import datetime, timedelta
-
-    def _parse_dt(s):
-        if not s:
-            return None
-        return datetime.fromisoformat(s.replace("Z", "+00:00"))
-
-    # Index meetings by start_time for matching
-    _meeting_by_time = {}
-    for _m in meetings:
-        _start = _parse_dt(_m.get("start_time"))
-        if _start:
-            _meeting_by_time[_start] = _m
-
-    _rows = []
-    for _ev in events:
-        _ev_start = _parse_dt(_ev.get("start_time"))
-        _ev_end = _parse_dt(_ev.get("end_time"))
-        if not _ev_start:
-            continue
-
-        # Find meeting within 15-min window of calendar event start
-        _matched = None
-        for _m_start, _m in _meeting_by_time.items():
-            if abs((_m_start - _ev_start).total_seconds()) < 900:
-                _matched = _m
-                break
-
-        _rows.append({
-            "date": _ev_start.strftime("%Y-%m-%d"),
-            "time": _ev_start.strftime("%H:%M"),
-            "event_title": _ev.get("title", "(untitled)"),
-            "has_recording": _matched is not None,
-            "meeting_title": _matched.get("title", "") if _matched else "",
-            "attendee_count": len(_ev.get("attendees", [])),
-        })
-
-    calendar_df = pl.DataFrame(_rows)
-    return (calendar_df,)
-```
-
-## Pattern 4: Full Interaction Timeline for a Person
-
-Combine emails, meetings, and Zulip messages into a single chronological view.
-
-```python
-@app.cell
-def fetch_all_interactions(fetch_all, DATAINDEX, target_id):
-    all_entities = fetch_all(f"{DATAINDEX}/query", {
-        "contact_ids": str(target_id),
-        "date_from": "2025-01-01T00:00:00Z",
-        "sort_by": "timestamp",
-        "sort_order": "desc",
-    })
-    return (all_entities,)
-
-@app.cell
-def interaction_timeline(all_entities, target_name, pl):
-    _rows = []
-    for _e in all_entities:
-        _etype = _e["entity_type"]
-        _summary = ""
-        if _etype == "email":
-            _summary = _e.get("snippet") or _e.get("title") or ""
-        elif _etype == "meeting":
-            _summary = _e.get("summary") or _e.get("title") or ""
-        elif _etype == "conversation_message":
-            _summary = (_e.get("message") or "")[:120]
-        elif _etype == "threaded_conversation":
-            _summary = _e.get("title") or ""
-        elif _etype == "calendar_event":
-            _summary = _e.get("title") or ""
-        else:
-            _summary = _e.get("title") or _e["entity_type"]
-
-        _rows.append({
-            "date": _e["timestamp"][:10],
-            "type": _etype,
-            "source": _e["connector_id"],
-            "summary": _summary[:120],
-        })
-
-    timeline_df = pl.DataFrame(_rows)
-    return (timeline_df,)
-
-@app.cell
-def show_timeline(timeline_df, target_name, mo):
-    mo.md(f"## Interaction Timeline: {target_name} ({len(timeline_df)} events)")
-
-@app.cell
-def display_timeline(timeline_df):
-    timeline_df
-```
-
-## Pattern 5: LLM Filtering with `lib.llm`
-
-When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
-
-**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies.
-
-```python
-# /// script
-# requires-python = ">=3.12"
-# dependencies = [
-#     "marimo",
-#     "httpx",
-#     "polars",
-#     "mirascope[openai]",
-#     "pydantic",
-#     "python-dotenv",
-# ]
-# ///
-```
-
-### Setup cell — load `.env` and import `llm_call`
-
-```python
-@app.cell
-def setup():
-    from dotenv import load_dotenv
-    load_dotenv(".env")  # Makes LLM_API_KEY available to lib/llm.py
-
-    import asyncio
-    import httpx
-    import marimo as mo
-    import polars as pl
-    from pydantic import BaseModel, Field
-    from lib.llm import llm_call
-    client = httpx.Client(timeout=30)
-    return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
-```
-
-### Define a response model
-
-Create a Pydantic model that describes the structured output you want from the LLM:
-
-```python
-@app.cell
-def models(BaseModel, Field):
-
-    class RelevanceScore(BaseModel):
-        relevant: bool
-        reason: str
-        score: int  # 0-10
-
-    return (RelevanceScore,)
-```
-
-### Filter entities through the LLM
-
-Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
-
-```python
-@app.cell
-async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
-    _topic = "Greyhaven"
-
-    async def _score(meeting):
-        _text = meeting.get("summary") or meeting.get("title") or ""
-        _result = await llm_call(
-            prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
-            response_model=RelevanceScore,
-            system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
-        )
-        return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
-
-    scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
-    relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
-
-    mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
-    return (relevant_meetings,)
-```
-
-### Tips for LLM filtering
-
- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
-
-## Do / Don't — Quick Reference for LLM Agents
-
-When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
-
-### Do
-
- **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells.
- **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`.
- **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell.
- **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell.
- **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped.
- **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below.
- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter.
- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`.
- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo.
- **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook.
- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API.
- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`.
- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell.
-
-### Don't
-
- **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error.
- **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters.
- **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name.
- **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame.
- **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions.
- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters.
- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)).
-
-## Cell Output Previews
-
-Every cell that fetches, transforms, or produces data **must display a preview** so the user can validate results at each step. The only exceptions are **utility cells** (config, setup, helpers) that only define constants or functions.
-
-Think from the user's perspective: when they open the notebook in `marimo edit`, each cell should tell them something useful — a count, a sample, a summary. Silent cells that do work but show nothing are hard to debug and validate.
-
-### What to show
-
-| Cell type | What to preview |
-|-----------|----------------|
-| API fetch (list of items) | `mo.md(f"**Fetched {len(items)} meetings**")` |
-| DataFrame build | The DataFrame itself as last expression (renders as interactive table) |
-| Scalar result | `mo.md(f"**Contact:** {name} (id={contact_id})")` |
-| Search / filter | `mo.md(f"**{len(hits)} results** matching '{term}'")` |
-| Final output | Full DataFrame or `mo.md()` summary as last expression |
-
-### Example: fetch cell with preview
-
-**Bad** — cell runs silently, user sees nothing:
-
-```python
-@app.cell
-def fetch_meetings(fetch_all, DATAINDEX, my_id):
-    meetings = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "meeting",
-        "contact_ids": str(my_id),
-    })
-    return (meetings,)
-```
-
-**Good** — cell shows a count so the user knows it worked:
-
-```python
-@app.cell
-def fetch_meetings(fetch_all, DATAINDEX, my_id, mo):
-    meetings = fetch_all(f"{DATAINDEX}/query", {
-        "entity_types": "meeting",
-        "contact_ids": str(my_id),
-    })
-    mo.md(f"**Fetched {len(meetings)} meetings**")
-    return (meetings,)
-```
-
-### Example: transform cell with table preview
-
-**Bad** — builds DataFrame but doesn't display it:
-
-```python
-@app.cell
-def build_table(meetings, pl):
-    _rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
-    meeting_df = pl.DataFrame(_rows)
-    return (meeting_df,)
-```
-
-**Good** — the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table:
-
-```python
-@app.cell
-def build_table(meetings, pl, mo):
-    _rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
-    meeting_df = pl.DataFrame(_rows).sort("date")
-    mo.md(f"### Meetings ({len(meeting_df)} results)")
-    return (meeting_df,)
-
-@app.cell
-def show_meeting_table(meeting_df):
-    meeting_df  # Renders as interactive sortable table
-```
-
-### Separate display cells for DataFrames
-
-When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
-
-```python
-# Cell 1: build and return the DataFrame, show a count
-@app.cell
-def build_sentiment_table(analyzed_meetings, pl, mo):
-    _rows = [...]
-    sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
-    mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
-    return (sentiment_df,)
-
-# Cell 2: standalone display — just the DataFrame, nothing else
-@app.cell
-def show_sentiment_table(sentiment_df):
-    sentiment_df
-```
-
-This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
-
-### Utility cells (no preview needed)
-
-Config, setup, and helper cells that only define constants or functions don't need previews:
-
-```python
-@app.cell
-def config():
-    BASE = "http://localhost:42000"
-    CONTACTDB = f"{BASE}/contactdb-api"
-    DATAINDEX = f"{BASE}/dataindex/api/v1"
-    return CONTACTDB, DATAINDEX
-
-@app.cell
-def helpers(client):
-    def fetch_all(url, params):
-        ...
-    return (fetch_all,)
-```
-
-## Tips
-
- Use `marimo edit` during development to see cell outputs interactively
- Make raw API responses the last expression in a cell to inspect their structure
- Use `polars` over `pandas` for better performance and type safety
- Set `timeout=30` on httpx clients — some queries over large date ranges are slow
- Name cells descriptively — function names appear in the marimo sidebar