Compare commits

...

7 Commits

Author SHA1 Message Date
0a306d847c Add __pycache__ to gitignore 2026-02-10 19:49:23 -06:00
f641cc267e Add room_name to meeting table examples in docs
Include room_name as a column in meeting DataFrames — it shows the
virtual room name and helps identify meeting location when title is
generic or missing.
2026-02-10 19:48:52 -06:00
46dfebd05f Update docs and fix LLM JSON parsing
- Use load_dotenv(".env") explicitly in all doc examples
- Move pydantic imports (BaseModel, Field) to setup cell in all examples
- Add separate display cell pattern for DataFrame inspection
- Fix LLM control character error: sanitize JSON before Pydantic parsing
- Remove debug print from llm.py
2026-02-10 19:45:04 -06:00
8eb1fb87a7 Add MYSELF.md user profile for agent personalization
- Add MYSELF.example.md template with identity, role, collaborators,
  and preferences sections
- Add MYSELF.md to .gitignore (contains personal info)
- Reference MYSELF.md in AGENTS.md routing table, new "About the User"
  section, and file index
- Add setup step and routing entry in README.md
2026-02-10 19:30:14 -06:00
d04aa26f31 Update marimo notebook docs with lessons from workflow debugging
- Add rules: all imports in setup cell, cell output at top level,
  async cells need async def, return classes from model cells,
  use python-dotenv for .env loading
- Add marimo check validation step to AGENTS.md and notebook-patterns.md
- Add "always create new workflow" rule to AGENTS.md
- Add new doc sections: Cell Output Must Be at the Top Level,
  Async Cells, Cells That Define Classes, Fixing _unparsable_cell,
  Checking Notebooks Before Running
- Update all code examples to follow new import/output rules
- Update workflows/lib/llm.py for mirascope v2 API
2026-02-10 19:25:53 -06:00
439e9db0a4 Add LLM filtering pattern, .env.example, and workflows/lib
- Add .env.example with LLM_API_URL, LLM_MODEL, LLM_API_KEY
- Add .gitignore to exclude .env
- Add Pattern 5 (LLM filtering) to notebook-patterns.md
- Track workflows/lib with llm_call helper using mirascope
- Update README with LLM setup step and updated project structure
2026-02-10 18:32:20 -06:00
a17cf63d2f Add concise README with setup, quickstart, and AGENTS.md overview 2026-02-10 18:24:55 -06:00
8 changed files with 427 additions and 78 deletions

3
.env.example Normal file
View File

@@ -0,0 +1,3 @@
LLM_API_URL=https://litellm-notrack.app.monadical.io
LLM_MODEL=GLM-4.5-Air-FP8-dev
LLM_API_KEY=xxxxx

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.env
MYSELF.md
__pycache__/

View File

@@ -6,12 +6,24 @@ The InternalAI platform aggregates company data from email, calendars, Zulip cha
| I need to... | Read | | I need to... | Read |
|---------------------------------------------|-------------------------------| |---------------------------------------------|-------------------------------|
| Know who the user is and what they care about | [MYSELF.md] |
| Understand the company and its tools | [company-context.md] | | Understand the company and its tools | [company-context.md] |
| Look up people, contacts, relationships | [contactdb-api.md] | | Look up people, contacts, relationships | [contactdb-api.md] |
| Query emails, meetings, chats, documents | [dataindex-api.md] | | Query emails, meetings, chats, documents | [dataindex-api.md] |
| Know which connector provides what data | [connectors-and-sources.md] | | Know which connector provides what data | [connectors-and-sources.md] |
| Create a marimo analysis notebook | [notebook-patterns.md] | | Create a marimo analysis notebook | [notebook-patterns.md] |
## About the User
If `MYSELF.md` exists in the project root, **read it first** before starting any workflow. It contains the user's name, role, team, frequent collaborators, and preferences. Use this context to:
- Address the user by name in notebook markdown
- Default `TARGET_PERSON` or filter values to people they work with
- Scope date ranges and topics to their stated interests
- Tailor output format to their preferences
If `MYSELF.md` does not exist, ask the user to copy `MYSELF.example.md` to `MYSELF.md` and fill it in, or proceed without personalization.
## API Base URLs ## API Base URLs
| Service | Swagger UI | OpenAPI JSON | | Service | Swagger UI | OpenAPI JSON |
@@ -53,6 +65,10 @@ Also create a notebook when the user asks to "create a workflow", "write a workf
If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**. If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
### Always create a new workflow
When the user requests a workflow, **always create a new notebook file**. Do **not** modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.
### File naming and location ### File naming and location
All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation: All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
@@ -87,6 +103,16 @@ Before writing any notebook, **always propose a plan first** and get the user's
Only proceed to implementation after the user confirms the plan. Only proceed to implementation after the user confirms the plan.
### Validate before delivering
After writing or editing a notebook, **always run `uvx marimo check`** to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):
```bash
uvx marimo check workflows/NNN_topic_scope.py
```
A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.
### Steps ### Steps
1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`. 1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
@@ -130,12 +156,14 @@ print(f"Found {len(emails)} emails involving Alice")
## File Index ## File Index
- [MYSELF.md] — User identity, role, collaborators, and preferences (gitignored, copy from `MYSELF.example.md`)
- [company-context.md] — Business context, team structure, vocabulary - [company-context.md] — Business context, team structure, vocabulary
- [contactdb-api.md] — ContactDB entities and REST endpoints - [contactdb-api.md] — ContactDB entities and REST endpoints
- [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints - [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints
- [connectors-and-sources.md] — Connector-to-entity-type mapping - [connectors-and-sources.md] — Connector-to-entity-type mapping
- [notebook-patterns.md] — Marimo notebook patterns and common API workflows - [notebook-patterns.md] — Marimo notebook patterns and common API workflows
[MYSELF.md]: ./MYSELF.md
[company-context.md]: ./docs/company-context.md [company-context.md]: ./docs/company-context.md
[contactdb-api.md]: ./docs/contactdb-api.md [contactdb-api.md]: ./docs/contactdb-api.md
[dataindex-api.md]: ./docs/dataindex-api.md [dataindex-api.md]: ./docs/dataindex-api.md

28
MYSELF.example.md Normal file
View File

@@ -0,0 +1,28 @@
# About Me
Copy this file to `MYSELF.md` and fill in your details. The agent reads it to personalize workflows and understand your role. `MYSELF.md` is gitignored — it stays local and private.
## Identity
- **Name:** Your Name
- **Role:** e.g. Engineering Lead, Product Manager, Designer
- **Contact ID** Your contact id from the contactdb - useful to prevent a call of me
## What I work on
Brief description of your current projects, responsibilities, or focus areas. This helps the agent scope queries — e.g., if you work on GreyHaven, the agent can default to filtering meetings/emails related to that project.
## People I work with frequently
List the names of people you interact with most. The agent can use these to suggest relevant filters or default `TARGET_PERSON` values in workflows.
- Alice — role or context
- Bob — role or context
## Preferences
Any preferences for how you want workflows or analysis structured:
- **Default date range:** e.g. "last 30 days", "current quarter"
- **Preferred output format:** e.g. "tables with counts", "timeline view"
- **Topics of interest:** e.g. "hiring", "client feedback", "sprint blockers"

108
README.md
View File

@@ -4,16 +4,11 @@ A documentation and pattern library that gives LLM agents the context they need
The goal is to use [opencode](https://opencode.ai) (or any LLM-powered coding tool) to iteratively create [marimo](https://marimo.io) notebook workflows that query and analyze company data. The goal is to use [opencode](https://opencode.ai) (or any LLM-powered coding tool) to iteratively create [marimo](https://marimo.io) notebook workflows that query and analyze company data.
## Getting Started ## Setup
### Prerequisites 1. Install [opencode](https://opencode.ai)
2. Make sure InternalAI is running locally (ContactDB + DataIndex accessible via http://localhost:42000)
- [opencode](https://opencode.ai) installed 3. Configure LiteLLM — add to `~/.config/opencode/config.json`:
- Access to the InternalAI platform (ContactDB + DataIndex running locally, accessible via http://localhost:42000)
### Configuring opencode with LiteLLM
To use models through LiteLLM, add the following to `~/.config/opencode/config.json`:
```json ```json
{ {
@@ -38,83 +33,58 @@ To use models through LiteLLM, add the following to `~/.config/opencode/config.j
Replace `xxxxx` with your actual LiteLLM API key. Replace `xxxxx` with your actual LiteLLM API key.
### Running opencode 4. **Set up your profile** — copy the example and fill in your name, role, and contact ID so the agent can personalize workflows:
From the project root:
```bash ```bash
opencode cp MYSELF.example.md MYSELF.md
``` ```
opencode will pick up `AGENTS.md` automatically and use it as the entry point to understand the project, the available APIs, and how to write workflows. 5. **(Optional) LLM filtering in workflows** — if your workflows need to classify or score entities via an LLM, copy `.env.example` to `.env` and fill in your key:
## How AGENTS.md Works ```bash
cp .env.example .env
`AGENTS.md` is the routing guide for LLM agents. It is structured as follows:
1. **Purpose statement** — Explains that the agent's job is to build marimo notebooks that analyze company data.
2. **Documentation routing table** — Directs the agent to the right file depending on the topic:
| Topic | File |
|-------|------|
| Company context, tools, connectors overview | `docs/company-context.md` |
| People, contacts, relationships | `docs/contactdb-api.md` |
| Querying emails, meetings, chats, docs | `docs/dataindex-api.md` |
| Connector-to-entity-type mappings | `docs/connectors-and-sources.md` |
| Notebook creation patterns and templates | `docs/notebook-patterns.md` |
3. **API base URLs** — ContactDB and DataIndex endpoints (both via Caddy proxy and direct).
4. **Common query translation table** — Maps natural-language questions (e.g. "Who am I?", "Recent meetings") to the corresponding API calls.
5. **Workflow rules** — When to create a notebook vs. answer inline, naming conventions, and the requirement to propose a plan before implementing.
## Workflow
### How it works
1. **Ask a question in opencode** — Describe what you want to analyze (e.g. "Show me all meetings about Greyhaven in January").
2. **Agent reads AGENTS.md** — opencode picks up the routing guide and navigates to the relevant docs to understand the APIs.
3. **Agent proposes a plan** — Before writing code, the agent outlines: Goal, Data Sources, Algorithm, and Output Format.
4. **Agent creates a marimo notebook** — A `.py` file is written to `workflows/` following the naming convention `<NNN>_<topic>_<scope>.py`.
5. **Iterate** — Run the notebook with `marimo edit workflows/<name>.py`, review the output, and ask the agent to refine.
### Workflow output format
Workflows are [marimo notebooks](https://marimo.io) — plain Python files with `@app.cell` decorators. They typically follow this structure:
- **params cell** — User-editable parameters (search terms, date ranges, contact names)
- **config cell** — API base URLs
- **setup cell** — Shared imports (`httpx`, `polars`, `marimo`)
- **data cells** — Fetch and transform data from ContactDB / DataIndex
- **output cells** — Tables, charts, or markdown summaries
### Naming convention
```
workflows/<NNN>_<topic>_<scope>.py
``` ```
Examples: The `workflows/lib` module provides an `llm_call` helper (using [mirascope](https://mirascope.io)) for structured LLM calls — see Pattern 5 in `docs/notebook-patterns.md`.
- `001_greyhaven_meetings_january.py`
- `002_email_activity_q1.py` ## Quickstart
1. Run `opencode` from the project root
2. Ask it to create a workflow, e.g.: *"Create a workflow that shows all meetings about Greyhaven in January"*
3. The agent reads `AGENTS.md`, proposes a plan, and generates a notebook like `workflows/001_greyhaven_meetings_january.py`
4. Run it: `uvx marimo edit workflows/001_greyhaven_meetings_january.py`
5. Iterate — review the output in marimo, go back to opencode and ask for refinements
## How AGENTS.md is Structured
`AGENTS.md` is the entry point that opencode reads automatically. It routes the agent to the right documentation:
| Topic | File |
|-------|------|
| Your identity, role, preferences | `MYSELF.md` (copy from `MYSELF.example.md`) |
| Company context, tools, connectors | `docs/company-context.md` |
| People, contacts, relationships | `docs/contactdb-api.md` |
| Querying emails, meetings, chats, docs | `docs/dataindex-api.md` |
| Connector-to-entity-type mappings | `docs/connectors-and-sources.md` |
| Notebook templates and patterns | `docs/notebook-patterns.md` |
It also includes API base URLs, a translation table mapping natural-language questions to API calls, and rules for when/how to create workflow notebooks.
## Project Structure ## Project Structure
``` ```
internalai-agent/ internalai-agent/
├── AGENTS.md # LLM agent routing guide (entry point) ├── AGENTS.md # LLM agent routing guide (entry point)
├── README.md ├── MYSELF.example.md # User profile template (copy to MYSELF.md)
├── .env.example # LLM credentials template
├── docs/ ├── docs/
│ ├── company-context.md # Monadical org, tools, key concepts │ ├── company-context.md # Monadical org, tools, key concepts
│ ├── contactdb-api.md # ContactDB REST API reference │ ├── contactdb-api.md # ContactDB REST API reference
│ ├── dataindex-api.md # DataIndex REST API reference │ ├── dataindex-api.md # DataIndex REST API reference
│ ├── connectors-and-sources.md # Connector → entity type mappings │ ├── connectors-and-sources.md # Connector → entity type mappings
│ └── notebook-patterns.md # Marimo notebook templates and patterns │ └── notebook-patterns.md # Marimo notebook templates and patterns
└── workflows/ # Generated analysis notebooks go here └── workflows/
└── lib/ # Shared helpers for notebooks
├── __init__.py
└── llm.py # llm_call() — structured LLM calls via mirascope
``` ```

View File

@@ -25,11 +25,11 @@ def cell_two(x):
**Key rules:** **Key rules:**
- Cells declare dependencies via function parameters - Cells declare dependencies via function parameters
- Cells return values as tuples: `return (var1, var2,)` - Cells return values as tuples: `return (var1, var2,)`
- The **last expression** in a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees) - The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below
- Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below) - Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below)
- No manual execution order; the DAG determines it - No manual execution order; the DAG determines it
- **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names. - **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names.
- **Import shared modules once** in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` in multiple cells — that defines `mo` twice. Instead, import it once in `setup` and receive it via `def my_cell(mo):`. - **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`.
### Cell Variable Scoping — Example ### Cell Variable Scoping — Example
@@ -79,6 +79,111 @@ def fetch_details(client, DATAINDEX, results):
> **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell. > **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell.
### Cell Output Must Be at the Top Level
Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded.
**BROKEN**`_df` inside the `if` branch is never rendered:
```python
@app.cell
def show_results(results, mo):
if results:
_df = pl.DataFrame(results)
mo.md(f"**Found {len(results)} results**")
_df # Inside an if block — marimo does NOT display this
else:
mo.md("**No results found**")
return
```
**FIXED** — assign inside the branches, display at the top level:
```python
@app.cell
def show_results(results, mo):
_output = None
if results:
_output = pl.DataFrame(results)
mo.md(f"**Found {len(results)} results**")
else:
mo.md("**No results found**")
_output # Top-level last expression — marimo renders this
return
```
**Rule of thumb:** initialize a `_output = None` variable before any conditional, assign the displayable value inside the branches, then put `_output` as the last top-level expression. When it's `None` (e.g., the `else` path), marimo shows nothing — which is fine since the `mo.md()` already provides feedback.
### Async Cells
When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`:
```python
@app.cell
async def analyze(meetings, llm_call, ResponseModel, asyncio):
async def _score(meeting):
return await llm_call(prompt=..., response_model=ResponseModel)
results = await asyncio.gather(*[_score(_m) for _m in meetings])
return (results,)
```
Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells.
If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below.
### Cells That Define Classes Must Return Them
If a cell defines Pydantic models (or any class) that other cells need, it **must** return them:
```python
# BaseModel and Field are imported in the setup cell and received as parameters
@app.cell
def models(BaseModel, Field):
class MeetingSentiment(BaseModel):
overall_sentiment: str
sentiment_score: int = Field(description="Score from -10 to +10")
class FrustrationExtraction(BaseModel):
has_frustrations: bool
frustrations: list[dict]
return MeetingSentiment, FrustrationExtraction # Other cells receive these as parameters
```
A bare `return` (or no return) means those classes are invisible to the rest of the notebook.
### Fixing `_unparsable_cell`
When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`.
**Common causes:**
1. Using `await` without making the cell `async def`
2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1)
**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function:
```python
# BROKEN — saved as _unparsable_cell because of top-level await
app._unparsable_cell("""
results = await asyncio.gather(...)
return results
""", name="my_cell")
# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
@app.cell
async def my_cell(some_dependency, asyncio):
results = await asyncio.gather(...)
return (results,)
```
**Key differences to note when converting:**
- Wrap the code in an `async def` function (if it uses `await`)
- Add cell dependencies as function parameters (including imports like `asyncio`)
- Return values as tuples: `return (var,)` not `return var`
- Prefix cell-local variables with `_`
- Never add `import` statements inside the cell — all imports belong in `setup`
### Inline Dependencies with PEP 723 ### Inline Dependencies with PEP 723
Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies: Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
@@ -90,10 +195,25 @@ Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
# "marimo", # "marimo",
# "httpx", # "httpx",
# "polars", # "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ] # ]
# /// # ///
``` ```
### Checking Notebooks Before Running
Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor:
```bash
uvx marimo check notebook.py # Check a single notebook
uvx marimo check workflows/ # Check all notebooks in a directory
uvx marimo check --fix notebook.py # Auto-fix fixable issues
```
**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
### Running Notebooks ### Running Notebooks
```bash ```bash
@@ -142,6 +262,9 @@ Every notebook against InternalAI follows this structure:
# "marimo", # "marimo",
# "httpx", # "httpx",
# "polars", # "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ] # ]
# /// # ///
@@ -166,11 +289,16 @@ def config():
@app.cell @app.cell
def setup(): def setup():
from dotenv import load_dotenv
load_dotenv(".env") # Load .env from the project root
import asyncio # All imports go here — never import inside other cells
import httpx import httpx
import marimo as mo import marimo as mo
import polars as pl import polars as pl
from pydantic import BaseModel, Field
client = httpx.Client(timeout=30) client = httpx.Client(timeout=30)
return (client, mo, pl,) return (asyncio, client, mo, pl, BaseModel, Field,)
# --- your IN / ETL / OUT cells here --- # --- your IN / ETL / OUT cells here ---
@@ -178,6 +306,8 @@ if __name__ == "__main__":
app.run() app.run()
``` ```
> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell.
**The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values. **The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.
## Pagination Helper ## Pagination Helper
@@ -264,6 +394,8 @@ Meetings have a `participants` list where each entry may or may not have a resol
**Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones. **Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones.
> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`.
```python ```python
@app.cell @app.cell
def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id): def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
@@ -283,7 +415,8 @@ def meeting_table(resolved_meetings, target_name, pl):
_names = [_p["display_name"] for _p in _participants] _names = [_p["display_name"] for _p in _participants]
_rows.append({ _rows.append({
"date": (_m.get("start_time") or _m["timestamp"])[:10], "date": (_m.get("start_time") or _m["timestamp"])[:10],
"title": _m.get("title", _m.get("room_name", "Untitled")), "title": _m.get("title", "Untitled"),
"room_name": _m.get("room_name", ""),
"participants": ", ".join(_names), "participants": ", ".join(_names),
"has_transcript": _m.get("transcript") is not None, "has_transcript": _m.get("transcript") is not None,
"has_summary": _m.get("summary") is not None, "has_summary": _m.get("summary") is not None,
@@ -425,6 +558,92 @@ def display_timeline(timeline_df):
timeline_df timeline_df
``` ```
## Pattern 5: LLM Filtering with `lib.llm`
When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies.
```python
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "marimo",
# "httpx",
# "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ]
# ///
```
### Setup cell — load `.env` and import `llm_call`
```python
@app.cell
def setup():
from dotenv import load_dotenv
load_dotenv(".env") # Makes LLM_API_KEY available to lib/llm.py
import asyncio
import httpx
import marimo as mo
import polars as pl
from pydantic import BaseModel, Field
from lib.llm import llm_call
client = httpx.Client(timeout=30)
return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
```
### Define a response model
Create a Pydantic model that describes the structured output you want from the LLM:
```python
@app.cell
def models(BaseModel, Field):
class RelevanceScore(BaseModel):
relevant: bool
reason: str
score: int # 0-10
return (RelevanceScore,)
```
### Filter entities through the LLM
Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
```python
@app.cell
async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
_topic = "Greyhaven"
async def _score(meeting):
_text = meeting.get("summary") or meeting.get("title") or ""
_result = await llm_call(
prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
response_model=RelevanceScore,
system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
)
return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
return (relevant_meetings,)
```
### Tips for LLM filtering
- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
## Do / Don't — Quick Reference for LLM Agents ## Do / Don't — Quick Reference for LLM Agents
When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime. When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
@@ -432,20 +651,28 @@ When generating marimo notebooks, follow these rules strictly. Violations cause
### Do ### Do
- **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells. - **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells.
- **Import shared modules once in `setup`** and pass them as cell parameters: `def my_cell(client, mo, pl):`. - **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`.
- **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell. - **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell.
- **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell. - **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell.
- **Use `from datetime import datetime` inside the cell** that needs it (stdlib imports are fine inline since they're `_`-safe inside functions, but avoid assigning them to non-`_` names if another cell does the same). - **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped.
- **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below. - **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below.
- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter.
- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`.
- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo.
- **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook. - **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook.
- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API.
- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`.
- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell.
### Don't ### Don't
- **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error. - **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error.
- **Don't `import marimo as mo` in multiple cells** — this defines `mo` twice. Import it once in `setup`, then receive it via `def my_cell(mo):`. - **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters.
- **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name. - **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name.
- **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame. - **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame.
- **Don't use `import X` at the top level of multiple cells** for the same module — the module variable name would be duplicated. Import once in `setup` or use `_`-prefixed local imports (`_json = __import__("json")`). - **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions.
- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters.
- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)).
## Cell Output Previews ## Cell Output Previews
@@ -502,7 +729,7 @@ def build_table(meetings, pl):
return (meeting_df,) return (meeting_df,)
``` ```
**Good**DataFrame is the last expression, so marimo renders it as an interactive table: **Good**the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table:
```python ```python
@app.cell @app.cell
@@ -517,6 +744,27 @@ def show_meeting_table(meeting_df):
meeting_df # Renders as interactive sortable table meeting_df # Renders as interactive sortable table
``` ```
### Separate display cells for DataFrames
When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
```python
# Cell 1: build and return the DataFrame, show a count
@app.cell
def build_sentiment_table(analyzed_meetings, pl, mo):
_rows = [...]
sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
return (sentiment_df,)
# Cell 2: standalone display — just the DataFrame, nothing else
@app.cell
def show_sentiment_table(sentiment_df):
sentiment_df
```
This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
### Utility cells (no preview needed) ### Utility cells (no preview needed)
Config, setup, and helper cells that only define constants or functions don't need previews: Config, setup, and helper cells that only define constants or functions don't need previews:

View File

@@ -0,0 +1,5 @@
"""Library modules for contact analysis workbooks."""
from lib.llm import llm_call
__all__ = ["llm_call"]

64
workflows/lib/llm.py Normal file
View File

@@ -0,0 +1,64 @@
"""Simple LLM helper for workbooks using Mirascope v2."""
import os
import re
from typing import TypeVar
from mirascope import llm
from pydantic import BaseModel
T = TypeVar("T", bound=BaseModel)
# Configure from environment (defaults match .env.example)
_api_key = os.getenv("LLM_API_KEY", "")
_base_url = os.getenv("LLM_API_URL", "https://litellm-notrack.app.monadical.io")
_model = os.getenv("LLM_MODEL", "GLM-4.5-Air-FP8-dev")
# Register our LiteLLM endpoint as an OpenAI-compatible provider
_base = (_base_url or "").rstrip("/")
llm.register_provider(
"openai",
scope="litellm/",
base_url=_base if _base.endswith("/v1") else f"{_base}/v1",
api_key=_api_key,
)
def _sanitize_json(text: str) -> str:
"""Strip control characters (U+0000U+001F) that break JSON parsing.
Some LLMs emit literal newlines/tabs inside JSON string values,
which is invalid per the JSON spec. Replace them with spaces.
"""
return re.sub(r"[\x00-\x1f]+", " ", text)
async def llm_call(
prompt: str,
response_model: type[T],
system_prompt: str = "You are a helpful assistant.",
model: str | None = None,
) -> T:
"""Make a structured LLM call.
Args:
prompt: The user prompt
response_model: Pydantic model for structured output
system_prompt: System instructions
model: Override the default model
Returns:
Parsed response matching the response_model schema
"""
use_model = model or _model
@llm.call(f"litellm/{use_model}", format=response_model)
async def _call() -> str:
return f"{system_prompt}\n\n{prompt}"
response = await _call()
try:
return response.parse()
except Exception:
# Fallback: sanitize control characters and parse manually
return response_model.model_validate_json(_sanitize_json(response.content))