Files
internalai-agent/.agents/skills/workflow/SKILL.md

4.6 KiB

name, description, disable-model-invocation, argument-hint
name description disable-model-invocation argument-hint
workflow Create a marimo notebook for data analysis. Use when the request involves analysis over time periods, large data volumes, or when the user asks to "create a workflow". true
topic

Workflow — Create a Marimo Notebook

When to create a marimo notebook

Any request that involves analysis over a period of time (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a large volume of data — too much to process inline. In these cases, always produce a marimo notebook (a .py file following the patterns in the notebook-patterns skill).

Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".

If you're unsure whether a question is simple enough to answer directly or needs a notebook, ask the user.

Always create a new workflow

When the user requests a workflow, always create a new notebook file. Do not modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.

File naming and location

All notebooks go in the workflows/ directory. Use a sequential number prefix so workflows stay ordered by creation:

workflows/<NNN>_<topic>_<scope>.py
  • <NNN> — zero-padded sequence number (001, 002, …). Look at existing files in workflows/ to determine the next number.
  • <topic> — what is being analyzed, in snake_case (e.g., greyhaven_meetings, alice_emails, hiring_discussions)
  • <scope> — time range or qualifier (e.g., january, q1_2026, last_30d, all_time)

Examples:

workflows/001_greyhaven_meetings_january.py
workflows/002_alice_emails_q1_2026.py
workflows/003_hiring_discussions_last_30d.py
workflows/004_team_interaction_timeline_all_time.py

Before creating a new workflow, list existing files in workflows/ to find the highest number and increment it.

Plan before you implement

Before writing any notebook, always propose a plan first and get the user's approval. The plan should describe:

  1. Goal — What question are we answering?
  2. Data sources — Which entity types and API endpoints will be used?
  3. Algorithm / ETL steps — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
  4. Output format — Table columns, charts, or summary statistics the user will see.

Only proceed to implementation after the user confirms the plan.

Validate before delivering

After writing or editing a notebook, always run uvx marimo check to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):

uvx marimo check workflows/NNN_topic_scope.py

A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.

Steps

  1. Identify people — Use ContactDB to resolve names/emails to contact_id values. For "me"/"my" questions, always start with GET /api/contacts/me.
  2. Find data — Use DataIndex GET /query (exhaustive, paginated) or POST /search (semantic, ranked) with contact_ids, entity_types, date_from/date_to, connector_ids filters.
  3. Analyze — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see the notebook-patterns skill for detailed patterns).

Quick Example (Python)

"Find all emails involving Alice since January"

import httpx

CONTACTDB = "http://localhost:42000/contactdb-api"
DATAINDEX = "http://localhost:42000/dataindex/api/v1"
client = httpx.Client(timeout=30)

# 1. Resolve "Alice" to a contact_id
resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
alice_id = resp.json()["contacts"][0]["id"]  # e.g. 42

# 2. Fetch all emails involving Alice (with pagination)
emails = []
offset = 0
while True:
    resp = client.get(f"{DATAINDEX}/query", params={
        "entity_types": "email",
        "contact_ids": str(alice_id),
        "date_from": "2025-01-01T00:00:00Z",
        "limit": 50,
        "offset": offset,
    })
    data = resp.json()
    emails.extend(data["items"])
    if offset + 50 >= data["total"]:
        break
    offset += 50

print(f"Found {len(emails)} emails involving Alice")