- Use load_dotenv(".env") explicitly in all doc examples
- Move pydantic imports (BaseModel, Field) to setup cell in all examples
- Add separate display cell pattern for DataFrame inspection
- Fix LLM control character error: sanitize JSON before Pydantic parsing
- Remove debug print from llm.py
31 KiB
Marimo Notebook Patterns
This guide covers how to create marimo notebooks for data analysis against the InternalAI platform APIs. Marimo notebooks are plain .py files with reactive cells — no .ipynb format, no Jupyter dependency.
Marimo Basics
A marimo notebook is a Python file with @app.cell decorated functions. Each cell returns values as a tuple, and other cells receive them as function parameters — marimo builds a reactive DAG automatically.
import marimo
app = marimo.App()
@app.cell
def cell_one():
x = 42
return (x,)
@app.cell
def cell_two(x):
# Re-runs automatically when x changes
result = x * 2
return (result,)
Key rules:
- Cells declare dependencies via function parameters
- Cells return values as tuples:
return (var1, var2,) - The last expression at the top level of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside
if/else/forblocks do not count — see Cell Output Must Be at the Top Level below - Use
mo.md("# heading")for formatted markdown output (importmoonce in setup — see below) - No manual execution order; the DAG determines it
- Variable names must be unique across cells. Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define
resp, marimo raisesMultipleDefinitionErrorand refuses to run. Prefix cell-local variables with_(e.g.,_resp,_rows,_data) to make them private to that cell — marimo ignores_-prefixed names. - All imports must go in the
setupcell. Everyimportstatement creates a top-level variable (e.g.,import asynciodefinesasyncio). If two cells bothimport asyncio, marimo raisesMultipleDefinitionError. Place all imports in a single setup cell and pass them as cell parameters. Do NOTimport marimo as moorimport asyncioin multiple cells — import once insetup, then receive viadef my_cell(mo, asyncio):.
Cell Variable Scoping — Example
This is the most common mistake. Any variable assigned at the top level of a cell (not inside a def or comprehension) is tracked by marimo. If two cells assign the same name, the notebook refuses to run.
BROKEN — resp is defined at top level in both cells:
# Cell A
@app.cell
def search_meetings(client, DATAINDEX):
resp = client.post(f"{DATAINDEX}/search", json={...}) # defines 'resp'
resp.raise_for_status()
results = resp.json()["results"]
return (results,)
# Cell B
@app.cell
def fetch_details(client, DATAINDEX, results):
resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # also defines 'resp' → ERROR
meeting = resp.json()
return (meeting,)
Error:
MultipleDefinitionError: variable 'resp' is defined in multiple cells
FIXED — prefix cell-local variables with _:
# Cell A
@app.cell
def search_meetings(client, DATAINDEX):
_resp = client.post(f"{DATAINDEX}/search", json={...}) # _resp is cell-private
_resp.raise_for_status()
results = _resp.json()["results"]
return (results,)
# Cell B
@app.cell
def fetch_details(client, DATAINDEX, results):
_resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # _resp is cell-private, no conflict
meeting = _resp.json()
return (meeting,)
Rule of thumb: if a variable is only used within the cell to compute a return value, prefix it with _. Only leave names unprefixed if another cell needs to receive them.
Note: Variables inside nested
deffunctions are naturally local and don't need_prefixes — e.g.,respinside adef fetch_all(...)helper is fine because it's scoped to the function, not the cell.
Cell Output Must Be at the Top Level
Marimo only renders the last expression at the top level of a cell as rich output. An expression buried inside an if/else, for, try, or any other block is not displayed — it's silently discarded.
BROKEN — _df inside the if branch is never rendered:
@app.cell
def show_results(results, mo):
if results:
_df = pl.DataFrame(results)
mo.md(f"**Found {len(results)} results**")
_df # Inside an if block — marimo does NOT display this
else:
mo.md("**No results found**")
return
FIXED — assign inside the branches, display at the top level:
@app.cell
def show_results(results, mo):
_output = None
if results:
_output = pl.DataFrame(results)
mo.md(f"**Found {len(results)} results**")
else:
mo.md("**No results found**")
_output # Top-level last expression — marimo renders this
return
Rule of thumb: initialize a _output = None variable before any conditional, assign the displayable value inside the branches, then put _output as the last top-level expression. When it's None (e.g., the else path), marimo shows nothing — which is fine since the mo.md() already provides feedback.
Async Cells
When a cell uses await (e.g., for llm_call or asyncio.gather), you must declare it as async def:
@app.cell
async def analyze(meetings, llm_call, ResponseModel, asyncio):
async def _score(meeting):
return await llm_call(prompt=..., response_model=ResponseModel)
results = await asyncio.gather(*[_score(_m) for _m in meetings])
return (results,)
Note that asyncio is imported in the setup cell and received here as a parameter — never import asyncio inside individual cells.
If you write await in a non-async cell, marimo cannot parse the cell and saves it as an _unparsable_cell string literal — the cell won't run, and you'll see SyntaxError: 'return' outside function or similar errors. See Fixing _unparsable_cell below.
Cells That Define Classes Must Return Them
If a cell defines Pydantic models (or any class) that other cells need, it must return them:
# BaseModel and Field are imported in the setup cell and received as parameters
@app.cell
def models(BaseModel, Field):
class MeetingSentiment(BaseModel):
overall_sentiment: str
sentiment_score: int = Field(description="Score from -10 to +10")
class FrustrationExtraction(BaseModel):
has_frustrations: bool
frustrations: list[dict]
return MeetingSentiment, FrustrationExtraction # Other cells receive these as parameters
A bare return (or no return) means those classes are invisible to the rest of the notebook.
Fixing _unparsable_cell
When marimo can't parse a cell into a proper @app.cell function, it saves the raw code as app._unparsable_cell("...", name="cell_name"). These cells won't run and show errors like SyntaxError: 'return' outside function.
Common causes:
- Using
awaitwithout making the cellasync def - Using
returnin code that marimo failed to wrap into a function (usually a side effect of cause 1)
How to fix: Convert the _unparsable_cell string back into a proper @app.cell decorated function:
# BROKEN — saved as _unparsable_cell because of top-level await
app._unparsable_cell("""
results = await asyncio.gather(...)
return results
""", name="my_cell")
# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
@app.cell
async def my_cell(some_dependency, asyncio):
results = await asyncio.gather(...)
return (results,)
Key differences to note when converting:
- Wrap the code in an
async deffunction (if it usesawait) - Add cell dependencies as function parameters (including imports like
asyncio) - Return values as tuples:
return (var,)notreturn var - Prefix cell-local variables with
_ - Never add
importstatements inside the cell — all imports belong insetup
Inline Dependencies with PEP 723
Use PEP 723 /// script metadata so uv run auto-installs dependencies:
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "marimo",
# "httpx",
# "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ]
# ///
Checking Notebooks Before Running
Always run marimo check before opening or running a notebook. It catches common issues — duplicate variable definitions, _unparsable_cell blocks, branch expressions that won't display, and more — without needing to start the full editor:
uvx marimo check notebook.py # Check a single notebook
uvx marimo check workflows/ # Check all notebooks in a directory
uvx marimo check --fix notebook.py # Auto-fix fixable issues
Run this after every edit. A clean marimo check (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
Running Notebooks
uvx marimo edit notebook.py # Interactive editor (best for development)
uvx marimo run notebook.py # Read-only web app
uv run notebook.py # Script mode (terminal output)
Inspecting Cell Outputs
In marimo edit, every cell's return value is displayed as rich output below the cell. This is the primary way to introspect API responses:
- Dicts/lists render as collapsible JSON trees — click to expand nested fields
- Polars/Pandas DataFrames render as interactive sortable tables
- Strings render as plain text
To inspect a raw API response, just make it the last expression:
@app.cell
def inspect_response(client, DATAINDEX):
_resp = client.get(f"{DATAINDEX}/query", params={
"entity_types": "meeting", "limit": 2,
})
_resp.json() # This gets displayed as a collapsible JSON tree
To inspect an intermediate value alongside other work, use mo.accordion or return it:
@app.cell
def debug_meetings(meetings, mo):
mo.md(f"**Count:** {len(meetings)}")
# Show first item structure for inspection
mo.accordion({"First meeting raw": mo.json(meetings[0])}) if meetings else None
Notebook Skeleton
Every notebook against InternalAI follows this structure:
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "marimo",
# "httpx",
# "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ]
# ///
import marimo
app = marimo.App()
@app.cell
def params():
"""User parameters — edit these to change the workflow's behavior."""
SEARCH_TERMS = ["greyhaven"]
DATE_FROM = "2026-01-01T00:00:00Z"
DATE_TO = "2026-02-01T00:00:00Z"
TARGET_PERSON = None # Set to a name like "Alice" to filter by person, or None for all
return DATE_FROM, DATE_TO, SEARCH_TERMS, TARGET_PERSON
@app.cell
def config():
BASE = "http://localhost:42000"
CONTACTDB = f"{BASE}/contactdb-api"
DATAINDEX = f"{BASE}/dataindex/api/v1"
return (CONTACTDB, DATAINDEX,)
@app.cell
def setup():
from dotenv import load_dotenv
load_dotenv(".env") # Load .env from the project root
import asyncio # All imports go here — never import inside other cells
import httpx
import marimo as mo
import polars as pl
from pydantic import BaseModel, Field
client = httpx.Client(timeout=30)
return (asyncio, client, mo, pl, BaseModel, Field,)
# --- your IN / ETL / OUT cells here ---
if __name__ == "__main__":
app.run()
load_dotenv(".env")reads the.envfile explicitly by name. This makesLLM_API_KEYand other env vars available toos.getenv()calls inlib/llm.pywithout requiring the shell to have them pre-set. Always includepython-dotenvin PEP 723 dependencies and callload_dotenv(".env")early in the setup cell.
The params cell must always be the first cell after app = marimo.App(). It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.
Pagination Helper
The DataIndex GET /query endpoint paginates with limit and offset. Always paginate — result sets can be large.
@app.cell
def helpers(client):
def fetch_all(url, params):
"""Fetch all pages from a paginated DataIndex endpoint."""
all_items = []
limit = params.get("limit", 50)
params = {**params, "limit": limit, "offset": 0}
while True:
resp = client.get(url, params=params)
resp.raise_for_status()
data = resp.json()
all_items.extend(data["items"])
if params["offset"] + limit >= data["total"]:
break
params["offset"] += limit
return all_items
def resolve_contact(name, contactdb_url):
"""Find a contact by name, return their ID."""
resp = client.get(f"{contactdb_url}/api/contacts", params={"search": name})
resp.raise_for_status()
contacts = resp.json()["contacts"]
if not contacts:
raise ValueError(f"No contact found for '{name}'")
return contacts[0]
return (fetch_all, resolve_contact,)
Pattern 1: Emails Involving a Specific Person
Emails have from_contact_id, to_contact_ids, and cc_contact_ids. The query API's contact_ids filter matches entities where the contact appears in any of these roles.
@app.cell
def find_person(resolve_contact, CONTACTDB):
target = resolve_contact("Alice", CONTACTDB)
target_id = target["id"]
target_name = target["name"]
return (target_id, target_name,)
@app.cell
def fetch_emails(fetch_all, DATAINDEX, target_id):
emails = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "email",
"contact_ids": str(target_id),
"date_from": "2025-01-01T00:00:00Z",
"sort_order": "desc",
})
return (emails,)
@app.cell
def email_table(emails, target_id, target_name, pl):
email_df = pl.DataFrame([{
"date": e["timestamp"][:10],
"subject": e.get("title", "(no subject)"),
"direction": (
"sent" if str(target_id) == str(e.get("from_contact_id"))
else "received"
),
"snippet": (e.get("snippet") or e.get("text_content") or "")[:100],
} for e in emails])
return (email_df,)
@app.cell
def show_emails(email_df, target_name, mo):
mo.md(f"## Emails involving {target_name} ({len(email_df)} total)")
@app.cell
def display_email_table(email_df):
email_df # Renders as interactive table in marimo edit
Pattern 2: Meetings with a Specific Participant
Meetings have a participants list where each entry may or may not have a resolved contact_id. The query API's contact_ids filter only matches resolved participants.
Strategy: Query by contact_ids to get meetings with resolved participants, then optionally do a client-side check on participants[].display_name or transcript for unresolved ones.
@app.cell
def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
# Get meetings where the target appears in contact_ids
resolved_meetings = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "meeting",
"contact_ids": str(target_id),
"date_from": "2025-01-01T00:00:00Z",
})
return (resolved_meetings,)
@app.cell
def meeting_table(resolved_meetings, target_name, pl):
_rows = []
for _m in resolved_meetings:
_participants = _m.get("participants", [])
_names = [_p["display_name"] for _p in _participants]
_rows.append({
"date": (_m.get("start_time") or _m["timestamp"])[:10],
"title": _m.get("title", _m.get("room_name", "Untitled")),
"participants": ", ".join(_names),
"has_transcript": _m.get("transcript") is not None,
"has_summary": _m.get("summary") is not None,
})
meeting_df = pl.DataFrame(_rows)
return (meeting_df,)
To also find meetings where the person was present but not resolved (guest), search the transcript:
@app.cell
def search_unresolved(client, DATAINDEX, target_name):
# Semantic search for the person's name in meeting transcripts
_resp = client.post(f"{DATAINDEX}/search", json={
"search_text": target_name,
"entity_types": ["meeting"],
"limit": 50,
})
_resp.raise_for_status()
transcript_hits = _resp.json()["results"]
return (transcript_hits,)
Pattern 3: Calendar Events → Meeting Correlation
Calendar events and meetings are separate entities from different connectors. To find which calendar events had a corresponding recorded meeting, match by time overlap.
@app.cell
def fetch_calendar_and_meetings(fetch_all, DATAINDEX, my_id):
events = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "calendar_event",
"contact_ids": str(my_id),
"date_from": "2025-01-01T00:00:00Z",
"sort_by": "timestamp",
"sort_order": "asc",
})
meetings = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "meeting",
"contact_ids": str(my_id),
"date_from": "2025-01-01T00:00:00Z",
})
return (events, meetings,)
@app.cell
def correlate(events, meetings, pl):
from datetime import datetime, timedelta
def _parse_dt(s):
if not s:
return None
return datetime.fromisoformat(s.replace("Z", "+00:00"))
# Index meetings by start_time for matching
_meeting_by_time = {}
for _m in meetings:
_start = _parse_dt(_m.get("start_time"))
if _start:
_meeting_by_time[_start] = _m
_rows = []
for _ev in events:
_ev_start = _parse_dt(_ev.get("start_time"))
_ev_end = _parse_dt(_ev.get("end_time"))
if not _ev_start:
continue
# Find meeting within 15-min window of calendar event start
_matched = None
for _m_start, _m in _meeting_by_time.items():
if abs((_m_start - _ev_start).total_seconds()) < 900:
_matched = _m
break
_rows.append({
"date": _ev_start.strftime("%Y-%m-%d"),
"time": _ev_start.strftime("%H:%M"),
"event_title": _ev.get("title", "(untitled)"),
"has_recording": _matched is not None,
"meeting_title": _matched.get("title", "") if _matched else "",
"attendee_count": len(_ev.get("attendees", [])),
})
calendar_df = pl.DataFrame(_rows)
return (calendar_df,)
Pattern 4: Full Interaction Timeline for a Person
Combine emails, meetings, and Zulip messages into a single chronological view.
@app.cell
def fetch_all_interactions(fetch_all, DATAINDEX, target_id):
all_entities = fetch_all(f"{DATAINDEX}/query", {
"contact_ids": str(target_id),
"date_from": "2025-01-01T00:00:00Z",
"sort_by": "timestamp",
"sort_order": "desc",
})
return (all_entities,)
@app.cell
def interaction_timeline(all_entities, target_name, pl):
_rows = []
for _e in all_entities:
_etype = _e["entity_type"]
_summary = ""
if _etype == "email":
_summary = _e.get("snippet") or _e.get("title") or ""
elif _etype == "meeting":
_summary = _e.get("summary") or _e.get("title") or ""
elif _etype == "conversation_message":
_summary = (_e.get("message") or "")[:120]
elif _etype == "threaded_conversation":
_summary = _e.get("title") or ""
elif _etype == "calendar_event":
_summary = _e.get("title") or ""
else:
_summary = _e.get("title") or _e["entity_type"]
_rows.append({
"date": _e["timestamp"][:10],
"type": _etype,
"source": _e["connector_id"],
"summary": _summary[:120],
})
timeline_df = pl.DataFrame(_rows)
return (timeline_df,)
@app.cell
def show_timeline(timeline_df, target_name, mo):
mo.md(f"## Interaction Timeline: {target_name} ({len(timeline_df)} events)")
@app.cell
def display_timeline(timeline_df):
timeline_df
Pattern 5: LLM Filtering with lib.llm
When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the llm_call helper from workflows/lib. It sends each item to an LLM and parses the response into a typed Pydantic model.
Prerequisites: Copy .env.example to .env and fill in your LLM_API_KEY. Add mirascope, pydantic, and python-dotenv to the notebook's PEP 723 dependencies.
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "marimo",
# "httpx",
# "polars",
# "mirascope[openai]",
# "pydantic",
# "python-dotenv",
# ]
# ///
Setup cell — load .env and import llm_call
@app.cell
def setup():
from dotenv import load_dotenv
load_dotenv(".env") # Makes LLM_API_KEY available to lib/llm.py
import asyncio
import httpx
import marimo as mo
import polars as pl
from pydantic import BaseModel, Field
from lib.llm import llm_call
client = httpx.Client(timeout=30)
return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
Define a response model
Create a Pydantic model that describes the structured output you want from the LLM:
@app.cell
def models(BaseModel, Field):
class RelevanceScore(BaseModel):
relevant: bool
reason: str
score: int # 0-10
return (RelevanceScore,)
Filter entities through the LLM
Iterate over fetched entities and call llm_call for each one. Since llm_call is async, use asyncio.gather to process items concurrently:
@app.cell
async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
_topic = "Greyhaven"
async def _score(meeting):
_text = meeting.get("summary") or meeting.get("title") or ""
_result = await llm_call(
prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
response_model=RelevanceScore,
system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
)
return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
return (relevant_meetings,)
Tips for LLM filtering
- Keep prompts short — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
- Use structured output — always pass a
response_modelso you get typed fields back, not free-text. - Batch wisely —
asyncio.gathersends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits. - Cache results — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
Do / Don't — Quick Reference for LLM Agents
When generating marimo notebooks, follow these rules strictly. Violations cause MultipleDefinitionError at runtime.
Do
- Prefix cell-local variables with
_—_resp,_rows,_m,_data,_chunk. Marimo ignores_-prefixed names so they won't clash across cells. - Put all imports in the
setupcell and pass them as cell parameters:def my_cell(client, mo, pl, asyncio):. Neverimportinside other cells — evenimport asyncioin two async cells causesMultipleDefinitionError. - Give returned DataFrames unique names —
email_df,meeting_df,timeline_df. Never use a baredfthat might collide with another cell. - Return only values other cells need — everything else should be
_-prefixed and stays private to the cell. - Import stdlib modules in
setuptoo — evenfrom datetime import datetimecreates a top-level name. If two cells both importdatetime, marimo errors. Import it once insetupand receive it as a parameter, or use it inside a_-prefixed helper function where it's naturally scoped. - Every non-utility cell must show a preview — see the "Cell Output Previews" section below.
- Use separate display cells for DataFrames — the build cell returns the DataFrame and shows a
mo.md()count/heading; a standalone display cell (e.g.,def show_table(df): df) renders it as an interactive table the user can sort and filter. - Keep cell output expressions at the top level — if a cell conditionally displays a DataFrame, initialize
_output = Nonebefore theif/else, assign inside the branches, then put_outputas the last top-level expression. Expressions insideif/else/forblocks are silently ignored by marimo. - Put all user parameters in a
paramscell as the first cell — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook. - Declare cells as
async defwhen usingawait—@app.cellfollowed byasync def cell_name(...). This includes cells usingasyncio.gather,await llm_call(...), or any async API. - Return classes/models from cells that define them — if a cell defines
class MyModel(BaseModel), return it so other cells can use it as a parameter:return (MyModel,). - Use
python-dotenvto load.env— addpython-dotenvto PEP 723 dependencies and callload_dotenv(".env")early in the setup cell (before importinglib.llm). This ensuresLLM_API_KEYand other env vars are available without requiring them to be pre-set in the shell.
Don't
- Don't define the same variable name in two cells — even
resp = ...in cell A andresp = ...in cell B is a fatal error. - Don't
importinside non-setup cells — everyimport Xdefines a top-level variableX. If two cells bothimport asyncio, marimo raisesMultipleDefinitionErrorand refuses to run. Put all imports in thesetupcell and receive them as function parameters. - Don't use generic top-level names like
df,rows,resp,data,result— either prefix with_or give them a unique descriptive name. - Don't return temporary variables — if
_rowsis only used to build a DataFrame, keep it_-prefixed and only return the DataFrame. - Don't use
awaitin a non-async cell — this causes marimo to save the cell as_unparsable_cell(a string literal that won't execute). Always useasync deffor cells that call async functions. - Don't define classes in a cell without returning them — a bare
returnor no return makes classes invisible to the DAG. Other cells can't receive them as parameters. - Don't put display expressions inside
if/else/forblocks — marimo only renders the last top-level expression. A DataFrame inside anifbranch is silently discarded. Use the_output = Nonepattern instead (see Cell Output Must Be at the Top Level).
Cell Output Previews
Every cell that fetches, transforms, or produces data must display a preview so the user can validate results at each step. The only exceptions are utility cells (config, setup, helpers) that only define constants or functions.
Think from the user's perspective: when they open the notebook in marimo edit, each cell should tell them something useful — a count, a sample, a summary. Silent cells that do work but show nothing are hard to debug and validate.
What to show
| Cell type | What to preview |
|---|---|
| API fetch (list of items) | mo.md(f"**Fetched {len(items)} meetings**") |
| DataFrame build | The DataFrame itself as last expression (renders as interactive table) |
| Scalar result | mo.md(f"**Contact:** {name} (id={contact_id})") |
| Search / filter | mo.md(f"**{len(hits)} results** matching '{term}'") |
| Final output | Full DataFrame or mo.md() summary as last expression |
Example: fetch cell with preview
Bad — cell runs silently, user sees nothing:
@app.cell
def fetch_meetings(fetch_all, DATAINDEX, my_id):
meetings = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "meeting",
"contact_ids": str(my_id),
})
return (meetings,)
Good — cell shows a count so the user knows it worked:
@app.cell
def fetch_meetings(fetch_all, DATAINDEX, my_id, mo):
meetings = fetch_all(f"{DATAINDEX}/query", {
"entity_types": "meeting",
"contact_ids": str(my_id),
})
mo.md(f"**Fetched {len(meetings)} meetings**")
return (meetings,)
Example: transform cell with table preview
Bad — builds DataFrame but doesn't display it:
@app.cell
def build_table(meetings, pl):
_rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
meeting_df = pl.DataFrame(_rows)
return (meeting_df,)
Good — the build cell shows a mo.md() count, and a separate display cell renders the DataFrame as an interactive table:
@app.cell
def build_table(meetings, pl, mo):
_rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
meeting_df = pl.DataFrame(_rows).sort("date")
mo.md(f"### Meetings ({len(meeting_df)} results)")
return (meeting_df,)
@app.cell
def show_meeting_table(meeting_df):
meeting_df # Renders as interactive sortable table
Separate display cells for DataFrames
When a cell builds a DataFrame, use two cells: one that builds and returns it (with a mo.md() summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
# Cell 1: build and return the DataFrame, show a count
@app.cell
def build_sentiment_table(analyzed_meetings, pl, mo):
_rows = [...]
sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
return (sentiment_df,)
# Cell 2: standalone display — just the DataFrame, nothing else
@app.cell
def show_sentiment_table(sentiment_df):
sentiment_df
This pattern makes every result inspectable. The mo.md() cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
Utility cells (no preview needed)
Config, setup, and helper cells that only define constants or functions don't need previews:
@app.cell
def config():
BASE = "http://localhost:42000"
CONTACTDB = f"{BASE}/contactdb-api"
DATAINDEX = f"{BASE}/dataindex/api/v1"
return CONTACTDB, DATAINDEX
@app.cell
def helpers(client):
def fetch_all(url, params):
...
return (fetch_all,)
Tips
- Use
marimo editduring development to see cell outputs interactively - Make raw API responses the last expression in a cell to inspect their structure
- Use
polarsoverpandasfor better performance and type safety - Set
timeout=30on httpx clients — some queries over large date ranges are slow - Name cells descriptively — function names appear in the marimo sidebar