diff --git a/docs/company-context.md b/docs/company-context.md deleted file mode 100644 index 414022b..0000000 --- a/docs/company-context.md +++ /dev/null @@ -1,43 +0,0 @@ -# Company Context - -## About Monadical - -Monadical is a software consultancy founded in 2016. The company operates across multiple locations: Montreal and Vancouver (Canada), and Medellin and Cali (Colombia). The team builds internal products alongside client work. - -### Internal Products - -- **Reflector** — Meeting recording and transcription tool (produces meeting entities in DataIndex) -- **GreyHaven / InternalAI platform** — A local-first platform that aggregates personal data, resolve contact to do automation and analysis - -## Communication Tools - -| Tool | Role | Data in DataIndex? | -|------------|-----------------------------|---------------------| -| Zulip | Primary internal chat | Yes (connector: `zulip`) | -| Fastmail/Email | External communication | Yes (connector: `mbsync_email`) | -| Calendar | Scheduling (ICS feeds) | Yes (connector: `ics_calendar`) | -| Reflector | Meeting recordings | Yes (connector: `reflector`) | -| HedgeDoc | Collaborative documents | Yes (connector: `hedgedoc`) | - -## How the company is working - -We use zulip as our main hub for communication. Zulip have channels (top level) and topic (low level). Depending the channels, differents behavior have to be adopted. - -### Zulip channels - -Here is a list of zulip stream prefix with context on how the company is organized: - -- InternalAI (zulip:stream:193) is about this specific platform. -- Leads (zulip:stream:78) is where we talk about our leads/client. We usually create one topic per lead/client - So if you are searching information about a client, always have a look if a related topic exist, that match the client or the company name. -- Checkins (zulip:stream:24) are usually one topic per employee. This is where an employee indicate what it did or will do during a period of time, or just some status update. Not everybody is using the system on regular basis. -- Devcap (zulip:stream:156) is where we are talking about our investment / due diligence before investing. One topic per company. -- General (zulip:stream:21) is where we talk about different topic on various subject, company wide or services. -- Enginerring (zulip:stream:25) is where we talk about enginerring issue / services / new tool to try -- Learning (zulip:stream:31) is where we share links about new tools / ideas or stuff to learn about -- Reflector (zulip:stream:155) dedicated stream about reflector development and usage -- GreyHaven is separated in multiple topics: branding is in (zulip:stream:206), leads specific to greyhaven (zulip:stream:208) with one topic per lead, and marketing (zulip:stream:212) - -### Meeting and Calendar - -Some persons in the company have a dedicated room for their meeting in reflector. This can be seen in `room_name` in `meeting` entity. -For person like Max, dataindex have calendar information, and he mostly have a related meeting that will be in reflector. However, there is no direct relation between calendar information and reflector meeting. A correlation has to be done to figure out which meeting is it when talking about an event. diff --git a/docs/connectors-and-sources.md b/docs/connectors-and-sources.md deleted file mode 100644 index 029e604..0000000 --- a/docs/connectors-and-sources.md +++ /dev/null @@ -1,99 +0,0 @@ -# Connectors and Data Sources - -Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh. - -Use `list_connectors()` at runtime to see which connectors are actually configured — not all connectors below may be active in every deployment. - -## Connector → Entity Type Mapping - -| Connector ID | Entity Types Produced | Description | -|------------------|-----------------------------------------------------------------|----------------------------------| -| `reflector` | `meeting` | Meeting recordings + transcripts | -| `ics_calendar` | `calendar_event` | ICS calendar feed events | -| `mbsync_email` | `email` | Email via mbsync IMAP sync | -| `zulip` | `conversation`, `conversation_message`, `threaded_conversation` | Zulip chat streams and topics | -| `babelfish` | `conversation_message`, `threaded_conversation` | Chat translation bridge | -| `hedgedoc` | `document` | HedgeDoc collaborative documents | -| `contactdb` | `contact` | Synced from ContactDB (static) | -| `browser_history`| `webpage` | Browser extension page visits | -| `api_document` | `document` | API-ingested documents (static) | - -## Per-Connector Details - -### `reflector` — Meeting Recordings - -Ingests meetings from Reflector, Monadical's meeting recording tool. - -- **Entity type:** `meeting` -- **Key fields:** `transcript`, `summary`, `participants`, `start_time`, `end_time`, `room_name` -- **Use cases:** Find meetings someone attended, search meeting transcripts, get summaries -- **Tip:** Filter with `contact_ids` to find meetings involving specific people. The `transcript` field contains speaker-diarized text. - -### `ics_calendar` — Calendar Events - -Parses ICS calendar feeds (Google Calendar, Outlook, etc.). - -- **Entity type:** `calendar_event` -- **Key fields:** `start_time`, `end_time`, `attendees`, `location`, `description`, `calendar_name` -- **Use cases:** Check upcoming events, find events with specific attendees, review past schedule -- **Tip:** Multiple calendar feeds may be configured as separate connectors (e.g., `personal_calendar`, `work_calendar`). Use `list_connectors()` to discover them. - -### `mbsync_email` — Email - -Syncs email via mbsync (IMAP). - -- **Entity type:** `email` -- **Key fields:** `text_content`, `from_contact_id`, `to_contact_ids`, `cc_contact_ids`, `thread_id`, `has_attachments` -- **Use cases:** Find emails from/to someone, search email content, track email threads -- **Tip:** Use `from_contact_id` and `to_contact_ids` with `contact_ids` filter. For thread grouping, use the `thread_id` field. - -### `zulip` — Chat - -Ingests Zulip streams, topics, and messages. - -- **Entity types:** - - `conversation` — A Zulip stream/channel with recent messages - - `conversation_message` — Individual chat messages - - `threaded_conversation` — A topic thread within a stream -- **Key fields:** `message`, `mentioned_contact_ids`, `recent_messages` -- **Use cases:** Find discussions about a topic, track who said what, find @-mentions -- **Tip:** Use `threaded_conversation` to find topic-level discussions. Use `conversation_message` with `mentioned_contact_ids` to find messages that mention specific people. - -### `babelfish` — Translation Bridge - -Ingests translated chat messages from the Babelfish service. - -- **Entity types:** `conversation_message`, `threaded_conversation` -- **Use cases:** Similar to Zulip but for translated cross-language conversations -- **Tip:** Query alongside `zulip` connector for complete conversation coverage. - -### `hedgedoc` — Collaborative Documents - -Syncs documents from HedgeDoc (collaborative markdown editor). - -- **Entity type:** `document` -- **Key fields:** `content`, `description`, `url`, `revision_id` -- **Use cases:** Find documents by content, track document revisions -- **Tip:** Use `search()` for semantic document search rather than `query_entities` text filter. - -### `contactdb` — Contact Sync (Static) - -Mirrors contacts from ContactDB into DataIndex for unified search. - -- **Entity type:** `contact` -- **Note:** This is a read-only mirror. Use ContactDB MCP tools directly for contact operations. - -### `browser_history` — Browser Extension (Static) - -Captures visited webpages from a browser extension. - -- **Entity type:** `webpage` -- **Key fields:** `url`, `visit_time`, `text_content` -- **Use cases:** Find previously visited pages, search page content - -### `api_document` — API Documents (Static) - -Documents ingested via the REST API (e.g., uploaded PDFs, imported files). - -- **Entity type:** `document` -- **Note:** These are ingested via `POST /api/v1/ingest/documents`, not periodic sync. diff --git a/docs/contactdb-api.md b/docs/contactdb-api.md deleted file mode 100644 index 93b7102..0000000 --- a/docs/contactdb-api.md +++ /dev/null @@ -1,154 +0,0 @@ -# ContactDB API Reference - -ContactDB is the people directory. It stores contacts, their platform identities, relationships, notes, and links. Every person across all data sources resolves to a single ContactDB `contact_id`. - -**Base URL:** `http://localhost:42000/contactdb-api/` (direct) or `http://caddy/contactdb-api/` (via greywall sandbox) - -## Core Entities - -### Contact - -The central entity — represents a person. - -| Field | Type | Description | -|----------------------|---------------------|------------------------------------------------| -| `id` | int | Unique contact ID | -| `name` | string | Display name | -| `emails` | EmailField[] | `{type, value, preferred}` | -| `phones` | PhoneField[] | `{type, value, preferred}` | -| `bio` | string? | Short biography | -| `avatar_url` | string? | Profile image URL | -| `personal_info` | PersonalInfo | Birthday, partner, children, role, company, location, how_we_met | -| `interests` | string[] | Topics of interest | -| `values` | string[] | Personal values | -| `tags` | string[] | User-assigned tags | -| `profile_description`| string? | Extended description | -| `is_placeholder` | bool | Auto-created stub (not yet fully resolved) | -| `is_service_account` | bool | Non-human account (bot, no-reply) | -| `stats` | ContactStats | Interaction statistics (see below) | -| `enrichment_data` | dict | Data from enrichment providers | -| `platform_identities`| PlatformIdentity[] | Identities on various platforms | -| `created_at` | datetime | When created | -| `updated_at` | datetime | Last modified | -| `merged_into_id` | int? | If merged, target contact ID | -| `deleted_at` | datetime? | Soft-delete timestamp | - -### ContactStats - -| Field | Type | Description | -|--------------------------|---------------|--------------------------------------| -| `total_messages` | int | Total messages across platforms | -| `platforms_count` | int | Number of platforms active on | -| `last_interaction_at` | string? | ISO datetime of last interaction | -| `interaction_count_30d` | int | Interactions in last 30 days | -| `interaction_count_90d` | int | Interactions in last 90 days | -| `hotness` | HotnessScore? | Composite engagement score (0-100) | - -### PlatformIdentity - -Links a contact to a specific platform account. - -| Field | Type | Description | -|--------------------|-----------|------------------------------------------| -| `id` | int | Identity record ID | -| `contact_id` | int | Parent contact | -| `source` | string | Data provenance (e.g., `dataindex_zulip`)| -| `platform` | string | Platform name (e.g., `email`, `zulip`) | -| `platform_user_id` | string | User ID on that platform | -| `display_name` | string? | Name shown on that platform | -| `avatar_url` | string? | Platform-specific avatar | -| `bio` | string? | Platform-specific bio | -| `extra_data` | dict | Additional platform-specific data | -| `first_seen_at` | datetime | When first observed | -| `last_seen_at` | datetime | When last observed | - -### Relationship - -Tracks connections between contacts. - -| Field | Type | Description | -|------------------------|-----------|--------------------------------------| -| `id` | int | Relationship ID | -| `from_contact_id` | int | Source contact | -| `to_contact_id` | int | Target contact | -| `relationship_type` | string | Type (e.g., "colleague", "client") | -| `since_date` | date? | When relationship started | -| `relationship_metadata`| dict | Additional metadata | - -### Note - -Free-text notes attached to a contact. - -| Field | Type | Description | -|--------------|----------|----------------------| -| `id` | int | Note ID | -| `contact_id` | int | Parent contact | -| `content` | string | Note text | -| `created_by` | string | Who wrote it | -| `created_at` | datetime | When created | - -### Link - -External URLs associated with a contact. - -| Field | Type | Description | -|--------------|----------|--------------------------| -| `id` | int | Link ID | -| `contact_id` | int | Parent contact | -| `type` | string | Link type (e.g., "github", "linkedin") | -| `label` | string | Display label | -| `url` | string | URL | - -## REST Endpoints - -### GET `/api/contacts` — List/search contacts - -Primary way to find contacts. Returns `{contacts: [...], total, limit, offset}`. - -**Query parameters:** - -| Parameter | Type | Description | -|------------------------|---------------|----------------------------------------------| -| `search` | string? | Search in name and bio | -| `is_placeholder` | bool? | Filter by placeholder status | -| `is_service_account` | bool? | Filter by service account status | -| `sort_by` | string? | `"hotness"`, `"name"`, or `"updated_at"` | -| `min_hotness` | float? | Minimum hotness score (0-100) | -| `max_hotness` | float? | Maximum hotness score (0-100) | -| `platforms` | string[]? | Contacts with ALL specified platforms (AND) | -| `last_interaction_from`| string? | ISO datetime lower bound | -| `last_interaction_to` | string? | ISO datetime upper bound | -| `limit` | int | Max results (1-100, default 50) | -| `offset` | int | Pagination offset (default 0) | - -### GET `/api/contacts/me` — Get self contact - -Returns the platform operator's own contact record. **Call this first** in most workflows to get your own `contact_id`. - -### GET `/api/contacts/{id}` — Get contact by ID - -Get full details for a single contact by numeric ID. - -### GET `/api/contacts/by-email/{email}` — Get contact by email - -Look up a contact by email address. - -### Other Endpoints - -| Method | Path | Description | -|--------|-----------------------------------------|----------------------------------| -| POST | `/api/contacts` | Create contact | -| PUT | `/api/contacts/{id}` | Update contact | -| DELETE | `/api/contacts/{id}` | Delete contact | -| POST | `/api/contacts/merge` | Merge two contacts | -| GET | `/api/contacts/{id}/relationships` | List relationships | -| GET | `/api/contacts/{id}/notes` | List notes | -| GET | `/api/contacts/{id}/links` | List links | -| GET | `/api/platform-identities/contacts/{id}`| List platform identities | - -## Usage Pattern - -1. **Start with `GET /api/contacts/me`** to get the operator's contact ID -2. **Search by name** with `GET /api/contacts?search=Alice` -3. **Use contact IDs** from results as filters in DataIndex queries (`contact_ids` parameter) -4. **Paginate** large result sets with `offset` increments diff --git a/docs/dataindex-api.md b/docs/dataindex-api.md deleted file mode 100644 index aa946f8..0000000 --- a/docs/dataindex-api.md +++ /dev/null @@ -1,218 +0,0 @@ -# DataIndex API Reference - -DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an **entity** with a common base structure plus type-specific fields. - -**Base URL:** `http://localhost:42000/dataindex/api/v1/` (direct) or `http://caddy/dataindex/api/v1/` (via greywall sandbox) - -## Entity Types - -All entities share these base fields: - -| Field | Type | Description | -|----------------------|-------------|---------------------------------------------| -| `id` | string | Format: `connector_name:native_id` | -| `entity_type` | string | One of the types below | -| `timestamp` | datetime | When the entity occurred | -| `contact_ids` | string[] | ContactDB IDs of people involved | -| `connector_id` | string | Which connector produced this | -| `title` | string? | Display title | -| `parent_id` | string? | Parent entity (e.g., thread for a message) | -| `raw_data` | dict | Original source data (excluded by default) | - -### `calendar_event` - -From ICS calendar feeds. - -| Field | Type | Description | -|-----------------------|-------------|--------------------------------| -| `start_time` | datetime? | Event start | -| `end_time` | datetime? | Event end | -| `all_day` | bool | All-day event flag | -| `description` | string? | Event description | -| `location` | string? | Event location | -| `attendees` | dict[] | Attendee list | -| `organizer_contact_id`| string? | ContactDB ID of organizer | -| `status` | string? | Event status | -| `calendar_name` | string? | Source calendar name | -| `meeting_url` | string? | Video call link | - -### `meeting` - -From Reflector (recorded meetings with transcripts). - -| Field | Type | Description | -|--------------------|---------------------|-----------------------------------| -| `start_time` | datetime? | Meeting start | -| `end_time` | datetime? | Meeting end | -| `participants` | MeetingParticipant[]| People in the meeting | -| `meeting_platform` | string? | Platform (e.g., "jitsi") | -| `transcript` | string? | Full transcript text | -| `summary` | string? | AI-generated summary | -| `meeting_url` | string? | Meeting link | -| `recording_url` | string? | Recording link | -| `location` | string? | Physical location | -| `room_name` | string? | Virtual room name (also indicates meeting location — see below) | - -**MeetingParticipant** fields: `display_name`, `contact_id?`, `platform_user_id?`, `email?`, `speaker?` - -> **`room_name` as location indicator:** The `room_name` field often encodes where the meeting took place (e.g., a Jitsi room name like `standup-office-bogota`). Use it to infer the meeting location when `location` is not set. - -> **Participant and contact coverage is incomplete.** Meeting data comes from Reflector, which only tracks users who are logged into the Reflector platform. This means: -> -> - **`contact_ids`** only contains ContactDB IDs for Reflector-logged participants who were matched to a known contact. It will often be a **subset** of the actual attendees — do not assume it is the full list. -> - **`participants`** is more complete than `contact_ids` but still only includes people detected by Reflector. Not all participants have accounts or could be identified — some attendees may be entirely absent from this list. -> - **`contact_id` within a participant** may be `null` if the person was detected but couldn't be matched to a ContactDB entry. -> -> **Consequence for queries:** Filtering meetings by `contact_ids` will **miss meetings** where the person attended but wasn't logged into Reflector or wasn't resolved. To get better coverage, combine multiple strategies: -> -> 1. Filter by `contact_ids` for resolved participants -> 2. Search `participants[].display_name` client-side for name matches -> 3. Use `POST /search` with the person's name to search meeting transcripts and summaries - -### `email` - -From mbsync email sync. - -| Field | Type | Description | -|--------------------|-----------|--------------------------------------| -| `thread_id` | string? | Email thread grouping | -| `text_content` | string? | Plain text body | -| `html_content` | string? | HTML body | -| `snippet` | string? | Preview snippet | -| `from_contact_id` | string? | Sender's ContactDB ID | -| `to_contact_ids` | string[] | Recipient ContactDB IDs | -| `cc_contact_ids` | string[] | CC recipient ContactDB IDs | -| `has_attachments` | bool | Has attachments flag | -| `attachments` | dict[] | Attachment metadata | - -### `conversation` - -A Zulip stream/channel. - -| Field | Type | Description | -|--------------------|---------|----------------------------------------| -| `recent_messages` | dict[] | Recent messages in the conversation | - -### `conversation_message` - -A single message in a Zulip conversation. - -| Field | Type | Description | -|-------------------------|-----------|-----------------------------------| -| `message` | string? | Message text content | -| `mentioned_contact_ids` | string[] | ContactDB IDs of mentioned people | - -### `threaded_conversation` - -A Zulip topic thread (group of messages under a topic). - -| Field | Type | Description | -|--------------------|---------|----------------------------------------| -| `recent_messages` | dict[] | Recent messages in the thread | - -### `document` - -From HedgeDoc, API ingestion, or other document sources. - -| Field | Type | Description | -|----------------|-----------|------------------------------| -| `content` | string? | Document body text | -| `description` | string? | Document description | -| `mimetype` | string? | MIME type | -| `url` | string? | Source URL | -| `revision_id` | string? | Revision identifier | - -### `webpage` - -From browser history extension. - -| Field | Type | Description | -|----------------|-----------|------------------------------| -| `url` | string | Page URL | -| `visit_time` | datetime | When visited | -| `text_content` | string? | Page text content | - -## REST Endpoints - -### GET `/api/v1/query` — Exhaustive Filtered Enumeration - -Use when you need **all** entities matching specific criteria. Supports pagination. - -**When to use:** "List all meetings since January", "Get all emails from Alice", "Count calendar events this week" - -**Query parameters:** - -| Parameter | Type | Description | -|------------------|---------------|------------------------------------------------| -| `entity_types` | string (repeat) | Filter by type — repeat param for multiple: `?entity_types=email&entity_types=meeting` | -| `contact_ids` | string | Comma-separated ContactDB IDs: `"1,42"` | -| `connector_ids` | string | Comma-separated connector IDs: `"zulip,reflector"` | -| `date_from` | string | ISO datetime lower bound (UTC if no timezone) | -| `date_to` | string | ISO datetime upper bound | -| `search` | string? | Text filter on content fields | -| `parent_id` | string? | Filter by parent entity | -| `thread_id` | string? | Filter emails by thread ID | -| `room_name` | string? | Filter meetings by room name | -| `limit` | int | Max results per page (default 50) | -| `offset` | int | Pagination offset (default 0) | -| `sort_by` | string | `"timestamp"` (default), `"title"`, `"contact_activity"`, etc. | -| `sort_order` | string | `"desc"` (default) or `"asc"` | -| `include_raw_data`| bool | Include raw_data field (default false) | - -**Response format:** - -```json -{ - "items": [...], - "total": 152, - "page": 1, - "size": 50, - "pages": 4 -} -``` - -**Pagination:** loop with offset increments until `offset >= total`. See [notebook-patterns.md] for a reusable helper. - -### POST `/api/v1/search` — Semantic Search - -Use when you need **relevant** results for a natural-language question. Returns ranked text chunks. No pagination — set a higher `limit` instead. - -**When to use:** "What was discussed about the product roadmap?", "Find conversations about hiring" - -**Request body (JSON):** - -```json -{ - "search_text": "product roadmap decisions", - "entity_types": ["meeting", "threaded_conversation"], - "contact_ids": ["1", "42"], - "date_from": "2025-01-01T00:00:00Z", - "date_to": "2025-06-01T00:00:00Z", - "connector_ids": ["reflector", "zulip"], - "limit": 20 -} -``` - -**Response:** `{results: [...chunks], total_count}` — each chunk has `entity_ids`, `entity_type`, `connector_id`, `content`, `timestamp`. - -### GET `/api/v1/entities/{id}` — Get Entity by ID - -Retrieve full details of a single entity. The `entity_id` format is `connector_name:native_id`. - -### GET `/api/v1/connectors/status` — Connector Status - -Get sync status for all connectors (last sync time, entity count, health). - -## Common Query Recipes - -| Question | entity_type + connector_id | -|---------------------------------------|------------------------------------------| -| Meetings I attended | `meeting` + `reflector`, with your contact_id | -| Upcoming calendar events | `calendar_event` + `ics_calendar`, date_from=now | -| Emails from someone | `email` + `mbsync_email`, with their contact_id | -| Zulip threads about a topic | `threaded_conversation` + `zulip`, search="topic" | -| All documents | `document` + `hedgedoc` | -| Chat messages mentioning someone | `conversation_message` + `zulip`, with contact_id | -| What was discussed about X? | Use `POST /search` with `search_text` | - -[notebook-patterns.md]: ./notebook-patterns.md diff --git a/docs/notebook-patterns.md b/docs/notebook-patterns.md deleted file mode 100644 index 17ee1ed..0000000 --- a/docs/notebook-patterns.md +++ /dev/null @@ -1,802 +0,0 @@ -# Marimo Notebook Patterns - -This guide covers how to create [marimo](https://marimo.io) notebooks for data analysis against the InternalAI platform APIs. Marimo notebooks are plain `.py` files with reactive cells — no `.ipynb` format, no Jupyter dependency. - -## Marimo Basics - -A marimo notebook is a Python file with `@app.cell` decorated functions. Each cell returns values as a tuple, and other cells receive them as function parameters — marimo builds a reactive DAG automatically. - -```python -import marimo -app = marimo.App() - -@app.cell -def cell_one(): - x = 42 - return (x,) - -@app.cell -def cell_two(x): - # Re-runs automatically when x changes - result = x * 2 - return (result,) -``` - -**Key rules:** -- Cells declare dependencies via function parameters -- Cells return values as tuples: `return (var1, var2,)` -- The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below -- Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below) -- No manual execution order; the DAG determines it -- **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names. -- **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`. - -### Cell Variable Scoping — Example - -This is the **most common mistake**. Any variable assigned at the top level of a cell (not inside a `def` or comprehension) is tracked by marimo. If two cells assign the same name, the notebook refuses to run. - -**BROKEN** — `resp` is defined at top level in both cells: - -```python -# Cell A -@app.cell -def search_meetings(client, DATAINDEX): - resp = client.post(f"{DATAINDEX}/search", json={...}) # defines 'resp' - resp.raise_for_status() - results = resp.json()["results"] - return (results,) - -# Cell B -@app.cell -def fetch_details(client, DATAINDEX, results): - resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # also defines 'resp' → ERROR - meeting = resp.json() - return (meeting,) -``` - -> **Error:** `MultipleDefinitionError: variable 'resp' is defined in multiple cells` - -**FIXED** — prefix cell-local variables with `_`: - -```python -# Cell A -@app.cell -def search_meetings(client, DATAINDEX): - _resp = client.post(f"{DATAINDEX}/search", json={...}) # _resp is cell-private - _resp.raise_for_status() - results = _resp.json()["results"] - return (results,) - -# Cell B -@app.cell -def fetch_details(client, DATAINDEX, results): - _resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # _resp is cell-private, no conflict - meeting = _resp.json() - return (meeting,) -``` - -**Rule of thumb:** if a variable is only used within the cell to compute a return value, prefix it with `_`. Only leave names unprefixed if another cell needs to receive them. - -> **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell. - -### Cell Output Must Be at the Top Level - -Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded. - -**BROKEN** — `_df` inside the `if` branch is never rendered, and `mo.md()` inside `if`/`else` is also discarded: - -```python -@app.cell -def show_results(results, mo): - if results: - _df = pl.DataFrame(results) - mo.md(f"**Found {len(results)} results**") - _df # Inside an if block — marimo does NOT display this - else: - mo.md("**No results found**") # Also inside a block — NOT displayed - return -``` - -**FIXED** — split into separate cells. Each cell displays exactly **one thing** at the top level: - -```python -# Cell 1: build the data, return it -@app.cell -def build_results(results, pl): - results_df = pl.DataFrame(results) if results else None - return (results_df,) - -# Cell 2: heading — mo.md() is the top-level expression (use ternary for conditional text) -@app.cell -def show_results_heading(results_df, mo): - mo.md(f"**Found {len(results_df)} results**" if results_df is not None else "**No results found**") - -# Cell 3: table — DataFrame is the top-level expression -@app.cell -def show_results_table(results_df): - results_df # Top-level expression — marimo renders this as interactive table -``` - -**Rules:** -- Each cell should display **one thing** — either `mo.md()` OR a DataFrame, never both -- `mo.md()` must be a **top-level expression**, not inside `if`/`else`/`for`/`try` blocks -- Build conditional text using variables or ternary expressions, then call `mo.md(_text)` at the top level -- For DataFrames, use a standalone display cell: `def show_table(df): df` - -### Async Cells - -When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`: - -```python -@app.cell -async def analyze(meetings, llm_call, ResponseModel, asyncio): - async def _score(meeting): - return await llm_call(prompt=..., response_model=ResponseModel) - - results = await asyncio.gather(*[_score(_m) for _m in meetings]) - return (results,) -``` - -Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells. - -If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below. - -### Cells That Define Classes Must Return Them - -If a cell defines Pydantic models (or any class) that other cells need, it **must** return them: - -```python -# BaseModel and Field are imported in the setup cell and received as parameters -@app.cell -def models(BaseModel, Field): - class MeetingSentiment(BaseModel): - overall_sentiment: str - sentiment_score: int = Field(description="Score from -10 to +10") - - class FrustrationExtraction(BaseModel): - has_frustrations: bool - frustrations: list[dict] - - return MeetingSentiment, FrustrationExtraction # Other cells receive these as parameters -``` - -A bare `return` (or no return) means those classes are invisible to the rest of the notebook. - -### Fixing `_unparsable_cell` - -When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`. - -**Common causes:** -1. Using `await` without making the cell `async def` -2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1) - -**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function: - -```python -# BROKEN — saved as _unparsable_cell because of top-level await -app._unparsable_cell(""" -results = await asyncio.gather(...) -return results -""", name="my_cell") - -# FIXED — proper async cell function (asyncio imported in setup, received as parameter) -@app.cell -async def my_cell(some_dependency, asyncio): - results = await asyncio.gather(...) - return (results,) -``` - -**Key differences to note when converting:** -- Wrap the code in an `async def` function (if it uses `await`) -- Add cell dependencies as function parameters (including imports like `asyncio`) -- Return values as tuples: `return (var,)` not `return var` -- Prefix cell-local variables with `_` -- Never add `import` statements inside the cell — all imports belong in `setup` - -### Inline Dependencies with PEP 723 - -Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies: - -```python -# /// script -# requires-python = ">=3.12" -# dependencies = [ -# "marimo", -# "httpx", -# "polars", -# "mirascope[openai]", -# "pydantic", -# "python-dotenv", -# ] -# /// -``` - -### Checking Notebooks Before Running - -Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor: - -```bash -uvx marimo check notebook.py # Check a single notebook -uvx marimo check workflows/ # Check all notebooks in a directory -uvx marimo check --fix notebook.py # Auto-fix fixable issues -``` - -**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running. - -### Running Notebooks - -```bash -uvx marimo edit notebook.py # Interactive editor (best for development) -uvx marimo run notebook.py # Read-only web app -uv run notebook.py # Script mode (terminal output) -``` - -### Inspecting Cell Outputs - -In `marimo edit`, every cell's return value is displayed as rich output below the cell. This is the primary way to introspect API responses: - -- **Dicts/lists** render as collapsible JSON trees — click to expand nested fields -- **Polars/Pandas DataFrames** render as interactive sortable tables -- **Strings** render as plain text - -To inspect a raw API response, just make it the last expression: - -```python -@app.cell -def inspect_response(client, DATAINDEX): - _resp = client.get(f"{DATAINDEX}/query", params={ - "entity_types": "meeting", "limit": 2, - }) - _resp.json() # This gets displayed as a collapsible JSON tree -``` - -To inspect an intermediate value alongside other work, use `mo.accordion` or return it: - -```python -@app.cell -def debug_meetings(meetings, mo): - mo.md(f"**Count:** {len(meetings)}") - # Show first item structure for inspection - mo.accordion({"First meeting raw": mo.json(meetings[0])}) if meetings else None -``` - -## Notebook Skeleton - -Every notebook against InternalAI follows this structure: - -```python -# /// script -# requires-python = ">=3.12" -# dependencies = [ -# "marimo", -# "httpx", -# "polars", -# "mirascope[openai]", -# "pydantic", -# "python-dotenv", -# ] -# /// - -import marimo -app = marimo.App() - -@app.cell -def params(): - """User parameters — edit these to change the workflow's behavior.""" - SEARCH_TERMS = ["greyhaven"] - DATE_FROM = "2026-01-01T00:00:00Z" - DATE_TO = "2026-02-01T00:00:00Z" - TARGET_PERSON = None # Set to a name like "Alice" to filter by person, or None for all - return DATE_FROM, DATE_TO, SEARCH_TERMS, TARGET_PERSON - -@app.cell -def config(): - BASE = "http://localhost:42000" - CONTACTDB = f"{BASE}/contactdb-api" - DATAINDEX = f"{BASE}/dataindex/api/v1" - return (CONTACTDB, DATAINDEX,) - -@app.cell -def setup(): - from dotenv import load_dotenv - load_dotenv(".env") # Load .env from the project root - - import asyncio # All imports go here — never import inside other cells - import httpx - import marimo as mo - import polars as pl - from pydantic import BaseModel, Field - client = httpx.Client(timeout=30) - return (asyncio, client, mo, pl, BaseModel, Field,) - -# --- your IN / ETL / OUT cells here --- - -if __name__ == "__main__": - app.run() -``` - -> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell. - -**The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values. - -## Pagination Helper - -The DataIndex `GET /query` endpoint paginates with `limit` and `offset`. Always paginate — result sets can be large. - -```python -@app.cell -def helpers(client): - def fetch_all(url, params): - """Fetch all pages from a paginated DataIndex endpoint.""" - all_items = [] - limit = params.get("limit", 50) - params = {**params, "limit": limit, "offset": 0} - while True: - resp = client.get(url, params=params) - resp.raise_for_status() - data = resp.json() - all_items.extend(data["items"]) - if params["offset"] + limit >= data["total"]: - break - params["offset"] += limit - return all_items - - def resolve_contact(name, contactdb_url): - """Find a contact by name, return their ID.""" - resp = client.get(f"{contactdb_url}/api/contacts", params={"search": name}) - resp.raise_for_status() - contacts = resp.json()["contacts"] - if not contacts: - raise ValueError(f"No contact found for '{name}'") - return contacts[0] - - return (fetch_all, resolve_contact,) -``` - -## Pattern 1: Emails Involving a Specific Person - -Emails have `from_contact_id`, `to_contact_ids`, and `cc_contact_ids`. The query API's `contact_ids` filter matches entities where the contact appears in **any** of these roles. - -```python -@app.cell -def find_person(resolve_contact, CONTACTDB): - target = resolve_contact("Alice", CONTACTDB) - target_id = target["id"] - target_name = target["name"] - return (target_id, target_name,) - -@app.cell -def fetch_emails(fetch_all, DATAINDEX, target_id): - emails = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "email", - "contact_ids": str(target_id), - "date_from": "2025-01-01T00:00:00Z", - "sort_order": "desc", - }) - return (emails,) - -@app.cell -def email_table(emails, target_id, target_name, pl): - email_df = pl.DataFrame([{ - "date": e["timestamp"][:10], - "subject": e.get("title", "(no subject)"), - "direction": ( - "sent" if str(target_id) == str(e.get("from_contact_id")) - else "received" - ), - "snippet": (e.get("snippet") or e.get("text_content") or "")[:100], - } for e in emails]) - return (email_df,) - -@app.cell -def show_emails(email_df, target_name, mo): - mo.md(f"## Emails involving {target_name} ({len(email_df)} total)") - -@app.cell -def display_email_table(email_df): - email_df # Renders as interactive table in marimo edit -``` - -## Pattern 2: Meetings with a Specific Participant - -Meetings have a `participants` list where each entry may or may not have a resolved `contact_id`. The query API's `contact_ids` filter only matches **resolved** participants. - -**Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones. - -> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`. - -```python -@app.cell -def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id): - # Get meetings where the target appears in contact_ids - resolved_meetings = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "meeting", - "contact_ids": str(target_id), - "date_from": "2025-01-01T00:00:00Z", - }) - return (resolved_meetings,) - -@app.cell -def meeting_table(resolved_meetings, target_name, pl): - _rows = [] - for _m in resolved_meetings: - _participants = _m.get("participants", []) - _names = [_p["display_name"] for _p in _participants] - _rows.append({ - "date": (_m.get("start_time") or _m["timestamp"])[:10], - "title": _m.get("title", "Untitled"), - "room_name": _m.get("room_name", ""), - "participants": ", ".join(_names), - "has_transcript": _m.get("transcript") is not None, - "has_summary": _m.get("summary") is not None, - }) - meeting_df = pl.DataFrame(_rows) - return (meeting_df,) -``` - -To also find meetings where the person was present but **not resolved** (guest), search the transcript: - -```python -@app.cell -def search_unresolved(client, DATAINDEX, target_name): - # Semantic search for the person's name in meeting transcripts - _resp = client.post(f"{DATAINDEX}/search", json={ - "search_text": target_name, - "entity_types": ["meeting"], - "limit": 50, - }) - _resp.raise_for_status() - transcript_hits = _resp.json()["results"] - return (transcript_hits,) -``` - -## Pattern 3: Calendar Events → Meeting Correlation - -Calendar events and meetings are separate entities from different connectors. To find which calendar events had a corresponding recorded meeting, match by time overlap. - -```python -@app.cell -def fetch_calendar_and_meetings(fetch_all, DATAINDEX, my_id): - events = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "calendar_event", - "contact_ids": str(my_id), - "date_from": "2025-01-01T00:00:00Z", - "sort_by": "timestamp", - "sort_order": "asc", - }) - meetings = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "meeting", - "contact_ids": str(my_id), - "date_from": "2025-01-01T00:00:00Z", - }) - return (events, meetings,) - -@app.cell -def correlate(events, meetings, pl): - from datetime import datetime, timedelta - - def _parse_dt(s): - if not s: - return None - return datetime.fromisoformat(s.replace("Z", "+00:00")) - - # Index meetings by start_time for matching - _meeting_by_time = {} - for _m in meetings: - _start = _parse_dt(_m.get("start_time")) - if _start: - _meeting_by_time[_start] = _m - - _rows = [] - for _ev in events: - _ev_start = _parse_dt(_ev.get("start_time")) - _ev_end = _parse_dt(_ev.get("end_time")) - if not _ev_start: - continue - - # Find meeting within 15-min window of calendar event start - _matched = None - for _m_start, _m in _meeting_by_time.items(): - if abs((_m_start - _ev_start).total_seconds()) < 900: - _matched = _m - break - - _rows.append({ - "date": _ev_start.strftime("%Y-%m-%d"), - "time": _ev_start.strftime("%H:%M"), - "event_title": _ev.get("title", "(untitled)"), - "has_recording": _matched is not None, - "meeting_title": _matched.get("title", "") if _matched else "", - "attendee_count": len(_ev.get("attendees", [])), - }) - - calendar_df = pl.DataFrame(_rows) - return (calendar_df,) -``` - -## Pattern 4: Full Interaction Timeline for a Person - -Combine emails, meetings, and Zulip messages into a single chronological view. - -```python -@app.cell -def fetch_all_interactions(fetch_all, DATAINDEX, target_id): - all_entities = fetch_all(f"{DATAINDEX}/query", { - "contact_ids": str(target_id), - "date_from": "2025-01-01T00:00:00Z", - "sort_by": "timestamp", - "sort_order": "desc", - }) - return (all_entities,) - -@app.cell -def interaction_timeline(all_entities, target_name, pl): - _rows = [] - for _e in all_entities: - _etype = _e["entity_type"] - _summary = "" - if _etype == "email": - _summary = _e.get("snippet") or _e.get("title") or "" - elif _etype == "meeting": - _summary = _e.get("summary") or _e.get("title") or "" - elif _etype == "conversation_message": - _summary = (_e.get("message") or "")[:120] - elif _etype == "threaded_conversation": - _summary = _e.get("title") or "" - elif _etype == "calendar_event": - _summary = _e.get("title") or "" - else: - _summary = _e.get("title") or _e["entity_type"] - - _rows.append({ - "date": _e["timestamp"][:10], - "type": _etype, - "source": _e["connector_id"], - "summary": _summary[:120], - }) - - timeline_df = pl.DataFrame(_rows) - return (timeline_df,) - -@app.cell -def show_timeline(timeline_df, target_name, mo): - mo.md(f"## Interaction Timeline: {target_name} ({len(timeline_df)} events)") - -@app.cell -def display_timeline(timeline_df): - timeline_df -``` - -## Pattern 5: LLM Filtering with `lib.llm` - -When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model. - -**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies. - -```python -# /// script -# requires-python = ">=3.12" -# dependencies = [ -# "marimo", -# "httpx", -# "polars", -# "mirascope[openai]", -# "pydantic", -# "python-dotenv", -# ] -# /// -``` - -### Setup cell — load `.env` and import `llm_call` - -```python -@app.cell -def setup(): - from dotenv import load_dotenv - load_dotenv(".env") # Makes LLM_API_KEY available to lib/llm.py - - import asyncio - import httpx - import marimo as mo - import polars as pl - from pydantic import BaseModel, Field - from lib.llm import llm_call - client = httpx.Client(timeout=30) - return (asyncio, client, llm_call, mo, pl, BaseModel, Field,) -``` - -### Define a response model - -Create a Pydantic model that describes the structured output you want from the LLM: - -```python -@app.cell -def models(BaseModel, Field): - - class RelevanceScore(BaseModel): - relevant: bool - reason: str - score: int # 0-10 - - return (RelevanceScore,) -``` - -### Filter entities through the LLM - -Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently: - -```python -@app.cell -async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio): - _topic = "Greyhaven" - - async def _score(meeting): - _text = meeting.get("summary") or meeting.get("title") or "" - _result = await llm_call( - prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}", - response_model=RelevanceScore, - system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.", - ) - return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score} - - scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings]) - relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]] - - mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'") - return (relevant_meetings,) -``` - -### Tips for LLM filtering - -- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity. -- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text. -- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits. -- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit. - -## Do / Don't — Quick Reference for LLM Agents - -When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime. - -### Do - -- **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells. -- **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`. -- **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell. -- **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell. -- **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped. -- **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below. -- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter. -- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`. -- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo. -- **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook. -- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API. -- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`. -- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell. - -### Don't - -- **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error. -- **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters. -- **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name. -- **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame. -- **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions. -- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters. -- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)). - -## Cell Output Previews - -Every cell that fetches, transforms, or produces data **must display a preview** so the user can validate results at each step. The only exceptions are **utility cells** (config, setup, helpers) that only define constants or functions. - -Think from the user's perspective: when they open the notebook in `marimo edit`, each cell should tell them something useful — a count, a sample, a summary. Silent cells that do work but show nothing are hard to debug and validate. - -### What to show - -| Cell type | What to preview | -|-----------|----------------| -| API fetch (list of items) | `mo.md(f"**Fetched {len(items)} meetings**")` | -| DataFrame build | The DataFrame itself as last expression (renders as interactive table) | -| Scalar result | `mo.md(f"**Contact:** {name} (id={contact_id})")` | -| Search / filter | `mo.md(f"**{len(hits)} results** matching '{term}'")` | -| Final output | Full DataFrame or `mo.md()` summary as last expression | - -### Example: fetch cell with preview - -**Bad** — cell runs silently, user sees nothing: - -```python -@app.cell -def fetch_meetings(fetch_all, DATAINDEX, my_id): - meetings = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "meeting", - "contact_ids": str(my_id), - }) - return (meetings,) -``` - -**Good** — cell shows a count so the user knows it worked: - -```python -@app.cell -def fetch_meetings(fetch_all, DATAINDEX, my_id, mo): - meetings = fetch_all(f"{DATAINDEX}/query", { - "entity_types": "meeting", - "contact_ids": str(my_id), - }) - mo.md(f"**Fetched {len(meetings)} meetings**") - return (meetings,) -``` - -### Example: transform cell with table preview - -**Bad** — builds DataFrame but doesn't display it: - -```python -@app.cell -def build_table(meetings, pl): - _rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings] - meeting_df = pl.DataFrame(_rows) - return (meeting_df,) -``` - -**Good** — the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table: - -```python -@app.cell -def build_table(meetings, pl, mo): - _rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings] - meeting_df = pl.DataFrame(_rows).sort("date") - mo.md(f"### Meetings ({len(meeting_df)} results)") - return (meeting_df,) - -@app.cell -def show_meeting_table(meeting_df): - meeting_df # Renders as interactive sortable table -``` - -### Separate display cells for DataFrames - -When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI. - -```python -# Cell 1: build and return the DataFrame, show a count -@app.cell -def build_sentiment_table(analyzed_meetings, pl, mo): - _rows = [...] - sentiment_df = pl.DataFrame(_rows).sort("date", descending=True) - mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)") - return (sentiment_df,) - -# Cell 2: standalone display — just the DataFrame, nothing else -@app.cell -def show_sentiment_table(sentiment_df): - sentiment_df -``` - -This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively. - -### Utility cells (no preview needed) - -Config, setup, and helper cells that only define constants or functions don't need previews: - -```python -@app.cell -def config(): - BASE = "http://localhost:42000" - CONTACTDB = f"{BASE}/contactdb-api" - DATAINDEX = f"{BASE}/dataindex/api/v1" - return CONTACTDB, DATAINDEX - -@app.cell -def helpers(client): - def fetch_all(url, params): - ... - return (fetch_all,) -``` - -## Tips - -- Use `marimo edit` during development to see cell outputs interactively -- Make raw API responses the last expression in a cell to inspect their structure -- Use `polars` over `pandas` for better performance and type safety -- Set `timeout=30` on httpx clients — some queries over large date ranges are slow -- Name cells descriptively — function names appear in the marimo sidebar