Add __pycache__ to gitignore

Add room_name to meeting table examples in docs
Include room_name as a column in meeting DataFrames — it shows the virtual room name and helps identify meeting location when title is generic or missing.
2026-02-10 19:49:23 -06:00 · 2026-02-10 19:48:52 -06:00 · 2026-02-10 19:45:04 -06:00 · 2026-02-10 19:30:14 -06:00 · 2026-02-10 19:25:53 -06:00 · 2026-02-10 18:32:20 -06:00
8 changed files with 427 additions and 78 deletions
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,3 @@
+LLM_API_URL=https://litellm-notrack.app.monadical.io
+LLM_MODEL=GLM-4.5-Air-FP8-dev
+LLM_API_KEY=xxxxx
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,3 @@
+.env
+MYSELF.md
+__pycache__/
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -6,12 +6,24 @@ The InternalAI platform aggregates company data from email, calendars, Zulip cha

 | I need to...                                | Read                          |
 |---------------------------------------------|-------------------------------|
+| Know who the user is and what they care about | [MYSELF.md]                 |
 | Understand the company and its tools        | [company-context.md]          |
 | Look up people, contacts, relationships     | [contactdb-api.md]            |
 | Query emails, meetings, chats, documents    | [dataindex-api.md]            |
 | Know which connector provides what data     | [connectors-and-sources.md]   |
 | Create a marimo analysis notebook           | [notebook-patterns.md]        |

+## About the User
+
+If `MYSELF.md` exists in the project root, **read it first** before starting any workflow. It contains the user's name, role, team, frequent collaborators, and preferences. Use this context to:
+
+- Address the user by name in notebook markdown
+- Default `TARGET_PERSON` or filter values to people they work with
+- Scope date ranges and topics to their stated interests
+- Tailor output format to their preferences
+
+If `MYSELF.md` does not exist, ask the user to copy `MYSELF.example.md` to `MYSELF.md` and fill it in, or proceed without personalization.
+
 ## API Base URLs

 | Service    | Swagger UI                                        | OpenAPI JSON                           |
@@ -53,6 +65,10 @@ Also create a notebook when the user asks to "create a workflow", "write a workf

 If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.

+### Always create a new workflow
+
+When the user requests a workflow, **always create a new notebook file**. Do **not** modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.
+
 ### File naming and location

 All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
@@ -87,6 +103,16 @@ Before writing any notebook, **always propose a plan first** and get the user's

 Only proceed to implementation after the user confirms the plan.

+### Validate before delivering
+
+After writing or editing a notebook, **always run `uvx marimo check`** to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):
+
+```bash
+uvx marimo check workflows/NNN_topic_scope.py
+```
+
+A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.
+
 ### Steps

 1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
@@ -130,12 +156,14 @@ print(f"Found {len(emails)} emails involving Alice")

 ## File Index

+- [MYSELF.md] — User identity, role, collaborators, and preferences (gitignored, copy from `MYSELF.example.md`)
 - [company-context.md] — Business context, team structure, vocabulary
 - [contactdb-api.md] — ContactDB entities and REST endpoints
 - [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints
 - [connectors-and-sources.md] — Connector-to-entity-type mapping
 - [notebook-patterns.md] — Marimo notebook patterns and common API workflows

+[MYSELF.md]: ./MYSELF.md
 [company-context.md]: ./docs/company-context.md
 [contactdb-api.md]: ./docs/contactdb-api.md
 [dataindex-api.md]: ./docs/dataindex-api.md
--- a/MYSELF.example.md
+++ b/MYSELF.example.md
@@ -0,0 +1,28 @@
+# About Me
+
+Copy this file to `MYSELF.md` and fill in your details. The agent reads it to personalize workflows and understand your role. `MYSELF.md` is gitignored — it stays local and private.
+
+## Identity
+
+- **Name:** Your Name
+- **Role:** e.g. Engineering Lead, Product Manager, Designer
+- **Contact ID** Your contact id from the contactdb - useful to prevent a call of me
+
+## What I work on
+
+Brief description of your current projects, responsibilities, or focus areas. This helps the agent scope queries — e.g., if you work on GreyHaven, the agent can default to filtering meetings/emails related to that project.
+
+## People I work with frequently
+
+List the names of people you interact with most. The agent can use these to suggest relevant filters or default `TARGET_PERSON` values in workflows.
+
+- Alice — role or context
+- Bob — role or context
+
+## Preferences
+
+Any preferences for how you want workflows or analysis structured:
+
+- **Default date range:** e.g. "last 30 days", "current quarter"
+- **Preferred output format:** e.g. "tables with counts", "timeline view"
+- **Topics of interest:** e.g. "hiring", "client feedback", "sprint blockers"
--- a/README.md
+++ b/README.md
@@ -4,16 +4,11 @@ A documentation and pattern library that gives LLM agents the context they need

 The goal is to use [opencode](https://opencode.ai) (or any LLM-powered coding tool) to iteratively create [marimo](https://marimo.io) notebook workflows that query and analyze company data.

-## Getting Started
+## Setup

-### Prerequisites
-
- [opencode](https://opencode.ai) installed
- Access to the InternalAI platform (ContactDB + DataIndex running locally, accessible via http://localhost:42000)
-
-### Configuring opencode with LiteLLM
-
-To use models through LiteLLM, add the following to `~/.config/opencode/config.json`:
+1. Install [opencode](https://opencode.ai)
+2. Make sure InternalAI is running locally (ContactDB + DataIndex accessible via http://localhost:42000)
+3. Configure LiteLLM — add to `~/.config/opencode/config.json`:

 ```json
 {
@@ -38,83 +33,58 @@ To use models through LiteLLM, add the following to `~/.config/opencode/config.j

 Replace `xxxxx` with your actual LiteLLM API key.

-### Running opencode
-
-From the project root:
+4. **Set up your profile** — copy the example and fill in your name, role, and contact ID so the agent can personalize workflows:

 ```bash
-opencode
+cp MYSELF.example.md MYSELF.md
 ```

-opencode will pick up `AGENTS.md` automatically and use it as the entry point to understand the project, the available APIs, and how to write workflows.
+5. **(Optional) LLM filtering in workflows** — if your workflows need to classify or score entities via an LLM, copy `.env.example` to `.env` and fill in your key:

-## How AGENTS.md Works
-
-`AGENTS.md` is the routing guide for LLM agents. It is structured as follows:
-
-1. **Purpose statement** — Explains that the agent's job is to build marimo notebooks that analyze company data.
-
-2. **Documentation routing table** — Directs the agent to the right file depending on the topic:
-
-   | Topic | File |
-   |-------|------|
-   | Company context, tools, connectors overview | `docs/company-context.md` |
-   | People, contacts, relationships | `docs/contactdb-api.md` |
-   | Querying emails, meetings, chats, docs | `docs/dataindex-api.md` |
-   | Connector-to-entity-type mappings | `docs/connectors-and-sources.md` |
-   | Notebook creation patterns and templates | `docs/notebook-patterns.md` |
-
-3. **API base URLs** — ContactDB and DataIndex endpoints (both via Caddy proxy and direct).
-
-4. **Common query translation table** — Maps natural-language questions (e.g. "Who am I?", "Recent meetings") to the corresponding API calls.
-
-5. **Workflow rules** — When to create a notebook vs. answer inline, naming conventions, and the requirement to propose a plan before implementing.
-
-## Workflow
-
-### How it works
-
-1. **Ask a question in opencode** — Describe what you want to analyze (e.g. "Show me all meetings about Greyhaven in January").
-
-2. **Agent reads AGENTS.md** — opencode picks up the routing guide and navigates to the relevant docs to understand the APIs.
-
-3. **Agent proposes a plan** — Before writing code, the agent outlines: Goal, Data Sources, Algorithm, and Output Format.
-
-4. **Agent creates a marimo notebook** — A `.py` file is written to `workflows/` following the naming convention `<NNN>_<topic>_<scope>.py`.
-
-5. **Iterate** — Run the notebook with `marimo edit workflows/<name>.py`, review the output, and ask the agent to refine.
-
-### Workflow output format
-
-Workflows are [marimo notebooks](https://marimo.io) — plain Python files with `@app.cell` decorators. They typically follow this structure:
-
- **params cell** — User-editable parameters (search terms, date ranges, contact names)
- **config cell** — API base URLs
- **setup cell** — Shared imports (`httpx`, `polars`, `marimo`)
- **data cells** — Fetch and transform data from ContactDB / DataIndex
- **output cells** — Tables, charts, or markdown summaries
-
-### Naming convention
-
-```
-workflows/<NNN>_<topic>_<scope>.py
+```bash
+cp .env.example .env
 ```

-Examples:
- `001_greyhaven_meetings_january.py`
- `002_email_activity_q1.py`
+The `workflows/lib` module provides an `llm_call` helper (using [mirascope](https://mirascope.io)) for structured LLM calls — see Pattern 5 in `docs/notebook-patterns.md`.
+
+## Quickstart
+
+1. Run `opencode` from the project root
+2. Ask it to create a workflow, e.g.: *"Create a workflow that shows all meetings about Greyhaven in January"*
+3. The agent reads `AGENTS.md`, proposes a plan, and generates a notebook like `workflows/001_greyhaven_meetings_january.py`
+4. Run it: `uvx marimo edit workflows/001_greyhaven_meetings_january.py`
+5. Iterate — review the output in marimo, go back to opencode and ask for refinements
+
+## How AGENTS.md is Structured
+
+`AGENTS.md` is the entry point that opencode reads automatically. It routes the agent to the right documentation:
+
+| Topic | File |
+|-------|------|
+| Your identity, role, preferences | `MYSELF.md` (copy from `MYSELF.example.md`) |
+| Company context, tools, connectors | `docs/company-context.md` |
+| People, contacts, relationships | `docs/contactdb-api.md` |
+| Querying emails, meetings, chats, docs | `docs/dataindex-api.md` |
+| Connector-to-entity-type mappings | `docs/connectors-and-sources.md` |
+| Notebook templates and patterns | `docs/notebook-patterns.md` |
+
+It also includes API base URLs, a translation table mapping natural-language questions to API calls, and rules for when/how to create workflow notebooks.

 ## Project Structure

 ```
 internalai-agent/
 ├── AGENTS.md                        # LLM agent routing guide (entry point)
-├── README.md
+├── MYSELF.example.md                # User profile template (copy to MYSELF.md)
+├── .env.example                     # LLM credentials template
 ├── docs/
 │   ├── company-context.md           # Monadical org, tools, key concepts
 │   ├── contactdb-api.md             # ContactDB REST API reference
 │   ├── dataindex-api.md             # DataIndex REST API reference
 │   ├── connectors-and-sources.md    # Connector → entity type mappings
 │   └── notebook-patterns.md         # Marimo notebook templates and patterns
-└── workflows/                       # Generated analysis notebooks go here
+└── workflows/
+    └── lib/                         # Shared helpers for notebooks
+        ├── __init__.py
+        └── llm.py                   # llm_call() — structured LLM calls via mirascope
 ```
--- a/docs/notebook-patterns.md
+++ b/docs/notebook-patterns.md
@@ -25,11 +25,11 @@ def cell_two(x):
 **Key rules:**
 - Cells declare dependencies via function parameters
 - Cells return values as tuples: `return (var1, var2,)`
- The **last expression** in a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees)
+- The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below
 - Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below)
 - No manual execution order; the DAG determines it
 - **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names.
- **Import shared modules once** in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` in multiple cells — that defines `mo` twice. Instead, import it once in `setup` and receive it via `def my_cell(mo):`.
+- **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`.

 ### Cell Variable Scoping — Example

@@ -79,6 +79,111 @@ def fetch_details(client, DATAINDEX, results):

 > **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell.

+### Cell Output Must Be at the Top Level
+
+Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded.
+
+**BROKEN** — `_df` inside the `if` branch is never rendered:
+
+```python
+@app.cell
+def show_results(results, mo):
+    if results:
+        _df = pl.DataFrame(results)
+        mo.md(f"**Found {len(results)} results**")
+        _df  # Inside an if block — marimo does NOT display this
+    else:
+        mo.md("**No results found**")
+    return
+```
+
+**FIXED** — assign inside the branches, display at the top level:
+
+```python
+@app.cell
+def show_results(results, mo):
+    _output = None
+    if results:
+        _output = pl.DataFrame(results)
+        mo.md(f"**Found {len(results)} results**")
+    else:
+        mo.md("**No results found**")
+    _output  # Top-level last expression — marimo renders this
+    return
+```
+
+**Rule of thumb:** initialize a `_output = None` variable before any conditional, assign the displayable value inside the branches, then put `_output` as the last top-level expression. When it's `None` (e.g., the `else` path), marimo shows nothing — which is fine since the `mo.md()` already provides feedback.
+
+### Async Cells
+
+When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`:
+
+```python
+@app.cell
+async def analyze(meetings, llm_call, ResponseModel, asyncio):
+    async def _score(meeting):
+        return await llm_call(prompt=..., response_model=ResponseModel)
+
+    results = await asyncio.gather(*[_score(_m) for _m in meetings])
+    return (results,)
+```
+
+Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells.
+
+If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below.
+
+### Cells That Define Classes Must Return Them
+
+If a cell defines Pydantic models (or any class) that other cells need, it **must** return them:
+
+```python
+# BaseModel and Field are imported in the setup cell and received as parameters
+@app.cell
+def models(BaseModel, Field):
+    class MeetingSentiment(BaseModel):
+        overall_sentiment: str
+        sentiment_score: int = Field(description="Score from -10 to +10")
+
+    class FrustrationExtraction(BaseModel):
+        has_frustrations: bool
+        frustrations: list[dict]
+
+    return MeetingSentiment, FrustrationExtraction  # Other cells receive these as parameters
+```
+
+A bare `return` (or no return) means those classes are invisible to the rest of the notebook.
+
+### Fixing `_unparsable_cell`
+
+When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`.
+
+**Common causes:**
+1. Using `await` without making the cell `async def`
+2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1)
+
+**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function:
+
+```python
+# BROKEN — saved as _unparsable_cell because of top-level await
+app._unparsable_cell("""
+results = await asyncio.gather(...)
+return results
+""", name="my_cell")
+
+# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
+@app.cell
+async def my_cell(some_dependency, asyncio):
+    results = await asyncio.gather(...)
+    return (results,)
+```
+
+**Key differences to note when converting:**
+- Wrap the code in an `async def` function (if it uses `await`)
+- Add cell dependencies as function parameters (including imports like `asyncio`)
+- Return values as tuples: `return (var,)` not `return var`
+- Prefix cell-local variables with `_`
+- Never add `import` statements inside the cell — all imports belong in `setup`
+
 ### Inline Dependencies with PEP 723

 Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
@@ -90,10 +195,25 @@ Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
 #     "marimo",
 #     "httpx",
 #     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
 # ]
 # ///
 ```

+### Checking Notebooks Before Running
+
+Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor:
+
+```bash
+uvx marimo check notebook.py           # Check a single notebook
+uvx marimo check workflows/            # Check all notebooks in a directory
+uvx marimo check --fix notebook.py     # Auto-fix fixable issues
+```
+
+**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
+
 ### Running Notebooks

 ```bash
@@ -142,6 +262,9 @@ Every notebook against InternalAI follows this structure:
 #     "marimo",
 #     "httpx",
 #     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
 # ]
 # ///

@@ -166,11 +289,16 @@ def config():

@app.cell
 def setup():
+    from dotenv import load_dotenv
+    load_dotenv(".env")  # Load .env from the project root
+
+    import asyncio  # All imports go here — never import inside other cells
    import httpx
    import marimo as mo
    import polars as pl
+    from pydantic import BaseModel, Field
    client = httpx.Client(timeout=30)
-    return (client, mo, pl,)
+    return (asyncio, client, mo, pl, BaseModel, Field,)

 # --- your IN / ETL / OUT cells here ---

@@ -178,6 +306,8 @@ if __name__ == "__main__":
    app.run()
 ```

+> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell.
+
 **The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.

 ## Pagination Helper
@@ -264,6 +394,8 @@ Meetings have a `participants` list where each entry may or may not have a resol

 **Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones.

+> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`.
+
 ```python
@app.cell
 def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
@@ -283,7 +415,8 @@ def meeting_table(resolved_meetings, target_name, pl):
        _names = [_p["display_name"] for _p in _participants]
        _rows.append({
            "date": (_m.get("start_time") or _m["timestamp"])[:10],
-            "title": _m.get("title", _m.get("room_name", "Untitled")),
+            "title": _m.get("title", "Untitled"),
+            "room_name": _m.get("room_name", ""),
            "participants": ", ".join(_names),
            "has_transcript": _m.get("transcript") is not None,
            "has_summary": _m.get("summary") is not None,
@@ -425,6 +558,92 @@ def display_timeline(timeline_df):
    timeline_df
 ```

+## Pattern 5: LLM Filtering with `lib.llm`
+
+When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
+
+**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies.
+
+```python
+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+#     "marimo",
+#     "httpx",
+#     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
+# ]
+# ///
+```
+
+### Setup cell — load `.env` and import `llm_call`
+
+```python
+@app.cell
+def setup():
+    from dotenv import load_dotenv
+    load_dotenv(".env")  # Makes LLM_API_KEY available to lib/llm.py
+
+    import asyncio
+    import httpx
+    import marimo as mo
+    import polars as pl
+    from pydantic import BaseModel, Field
+    from lib.llm import llm_call
+    client = httpx.Client(timeout=30)
+    return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
+```
+
+### Define a response model
+
+Create a Pydantic model that describes the structured output you want from the LLM:
+
+```python
+@app.cell
+def models(BaseModel, Field):
+
+    class RelevanceScore(BaseModel):
+        relevant: bool
+        reason: str
+        score: int  # 0-10
+
+    return (RelevanceScore,)
+```
+
+### Filter entities through the LLM
+
+Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
+
+```python
+@app.cell
+async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
+    _topic = "Greyhaven"
+
+    async def _score(meeting):
+        _text = meeting.get("summary") or meeting.get("title") or ""
+        _result = await llm_call(
+            prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
+            response_model=RelevanceScore,
+            system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
+        )
+        return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
+
+    scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
+    relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
+
+    mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
+    return (relevant_meetings,)
+```
+
+### Tips for LLM filtering
+
+- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
+- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
+- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
+- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
+
 ## Do / Don't — Quick Reference for LLM Agents

 When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
@@ -432,20 +651,28 @@ When generating marimo notebooks, follow these rules strictly. Violations cause
 ### Do

 - **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells.
- **Import shared modules once in `setup`** and pass them as cell parameters: `def my_cell(client, mo, pl):`.
+- **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`.
 - **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell.
 - **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell.
- **Use `from datetime import datetime` inside the cell** that needs it (stdlib imports are fine inline since they're `_`-safe inside functions, but avoid assigning them to non-`_` names if another cell does the same).
+- **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped.
 - **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below.
+- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter.
+- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`.
+- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo.
 - **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook.
+- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API.
+- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`.
+- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell.

 ### Don't

 - **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error.
- **Don't `import marimo as mo` in multiple cells** — this defines `mo` twice. Import it once in `setup`, then receive it via `def my_cell(mo):`.
+- **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters.
 - **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name.
 - **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame.
- **Don't use `import X` at the top level of multiple cells** for the same module — the module variable name would be duplicated. Import once in `setup` or use `_`-prefixed local imports (`_json = __import__("json")`).
+- **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions.
+- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters.
+- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)).

 ## Cell Output Previews

@@ -502,7 +729,7 @@ def build_table(meetings, pl):
    return (meeting_df,)
 ```

-**Good** — DataFrame is the last expression, so marimo renders it as an interactive table:
+**Good** — the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table:

 ```python
@app.cell
@@ -517,6 +744,27 @@ def show_meeting_table(meeting_df):
    meeting_df  # Renders as interactive sortable table
 ```

+### Separate display cells for DataFrames
+
+When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
+
+```python
+# Cell 1: build and return the DataFrame, show a count
+@app.cell
+def build_sentiment_table(analyzed_meetings, pl, mo):
+    _rows = [...]
+    sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
+    mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
+    return (sentiment_df,)
+
+# Cell 2: standalone display — just the DataFrame, nothing else
+@app.cell
+def show_sentiment_table(sentiment_df):
+    sentiment_df
+```
+
+This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
+
 ### Utility cells (no preview needed)

 Config, setup, and helper cells that only define constants or functions don't need previews:
--- a/workflows/lib/init.py
+++ b/workflows/lib/init.py
@@ -0,0 +1,5 @@
+"""Library modules for contact analysis workbooks."""
+
+from lib.llm import llm_call
+
+__all__ = ["llm_call"]
--- a/workflows/lib/llm.py
+++ b/workflows/lib/llm.py
@@ -0,0 +1,64 @@
+"""Simple LLM helper for workbooks using Mirascope v2."""
+
+import os
+import re
+from typing import TypeVar
+
+from mirascope import llm
+from pydantic import BaseModel
+
+T = TypeVar("T", bound=BaseModel)
+
+# Configure from environment (defaults match .env.example)
+_api_key = os.getenv("LLM_API_KEY", "")
+_base_url = os.getenv("LLM_API_URL", "https://litellm-notrack.app.monadical.io")
+_model = os.getenv("LLM_MODEL", "GLM-4.5-Air-FP8-dev")
+
+# Register our LiteLLM endpoint as an OpenAI-compatible provider
+_base = (_base_url or "").rstrip("/")
+llm.register_provider(
+    "openai",
+    scope="litellm/",
+    base_url=_base if _base.endswith("/v1") else f"{_base}/v1",
+    api_key=_api_key,
+)
+
+
+def _sanitize_json(text: str) -> str:
+    """Strip control characters (U+0000–U+001F) that break JSON parsing.
+
+    Some LLMs emit literal newlines/tabs inside JSON string values,
+    which is invalid per the JSON spec. Replace them with spaces.
+    """
+    return re.sub(r"[\x00-\x1f]+", " ", text)
+
+
+async def llm_call(
+    prompt: str,
+    response_model: type[T],
+    system_prompt: str = "You are a helpful assistant.",
+    model: str | None = None,
+) -> T:
+    """Make a structured LLM call.
+
+    Args:
+        prompt: The user prompt
+        response_model: Pydantic model for structured output
+        system_prompt: System instructions
+        model: Override the default model
+
+    Returns:
+        Parsed response matching the response_model schema
+    """
+    use_model = model or _model
+
+    @llm.call(f"litellm/{use_model}", format=response_model)
+    async def _call() -> str:
+        return f"{system_prompt}\n\n{prompt}"
+
+    response = await _call()
+    try:
+        return response.parse()
+    except Exception:
+        # Fallback: sanitize control characters and parse manually
+        return response_model.model_validate_json(_sanitize_json(response.content))
Author	SHA1	Message	Date
Mathieu Virbel	0a306d847c	Add __pycache__ to gitignore	2026-02-10 19:49:23 -06:00
Mathieu Virbel	f641cc267e	Add room_name to meeting table examples in docs Include room_name as a column in meeting DataFrames — it shows the virtual room name and helps identify meeting location when title is generic or missing.	2026-02-10 19:48:52 -06:00
Mathieu Virbel	46dfebd05f	Update docs and fix LLM JSON parsing - Use load_dotenv(".env") explicitly in all doc examples - Move pydantic imports (BaseModel, Field) to setup cell in all examples - Add separate display cell pattern for DataFrame inspection - Fix LLM control character error: sanitize JSON before Pydantic parsing - Remove debug print from llm.py	2026-02-10 19:45:04 -06:00
Mathieu Virbel	8eb1fb87a7	Add MYSELF.md user profile for agent personalization - Add MYSELF.example.md template with identity, role, collaborators, and preferences sections - Add MYSELF.md to .gitignore (contains personal info) - Reference MYSELF.md in AGENTS.md routing table, new "About the User" section, and file index - Add setup step and routing entry in README.md	2026-02-10 19:30:14 -06:00
Mathieu Virbel	d04aa26f31	Update marimo notebook docs with lessons from workflow debugging - Add rules: all imports in setup cell, cell output at top level, async cells need async def, return classes from model cells, use python-dotenv for .env loading - Add marimo check validation step to AGENTS.md and notebook-patterns.md - Add "always create new workflow" rule to AGENTS.md - Add new doc sections: Cell Output Must Be at the Top Level, Async Cells, Cells That Define Classes, Fixing _unparsable_cell, Checking Notebooks Before Running - Update all code examples to follow new import/output rules - Update workflows/lib/llm.py for mirascope v2 API	2026-02-10 19:25:53 -06:00
Mathieu Virbel	439e9db0a4	Add LLM filtering pattern, .env.example, and workflows/lib - Add .env.example with LLM_API_URL, LLM_MODEL, LLM_API_KEY - Add .gitignore to exclude .env - Add Pattern 5 (LLM filtering) to notebook-patterns.md - Track workflows/lib with llm_call helper using mirascope - Update README with LLM setup step and updated project structure	2026-02-10 18:32:20 -06:00
Mathieu Virbel	a17cf63d2f	Add concise README with setup, quickstart, and AGENTS.md overview	2026-02-10 18:24:55 -06:00