feat: migrate to skills-based approach

This commit is contained in:
2026-02-19 11:36:32 -06:00
parent 42a9b5289f
commit eefac81e57
15 changed files with 2565 additions and 224 deletions

146
AGENTS.md
View File

@@ -2,16 +2,19 @@
The InternalAI platform aggregates company data from email, calendars, Zulip chat, meetings, and documents into two core APIs. These docs give LLM agents the context they need to build programmatic workflows — typically as marimo notebooks — that answer analytical questions about people and their interactions.
As an agent, assume you're running within our greywall sandbox.
## Routing Table
| I need to... | Read |
|---------------------------------------------|-------------------------------|
| Know who the user is and what they care about | [MYSELF.md] |
| Understand the company and its tools | [company-context.md] |
| Look up people, contacts, relationships | [contactdb-api.md] |
| Query emails, meetings, chats, documents | [dataindex-api.md] |
| Know which connector provides what data | [connectors-and-sources.md] |
| Create a marimo analysis notebook | [notebook-patterns.md] |
| I need to... | Read |
|---------------------------------------------|---------------------------------------------------|
| Know who the user is and what they care about | [MYSELF.md] |
| Understand the company and its tools | [company skill] |
| Look up people, contacts, relationships | [contactdb skill] |
| Query emails, meetings, chats, documents | [dataindex skill] |
| Know which connector provides what data | [connectors skill] |
| Create a marimo analysis notebook | [workflow skill] + [notebook-patterns skill] |
| Build a weekly checkout | [checkout skill] |
## About the User
@@ -28,8 +31,8 @@ If `MYSELF.md` does not exist, ask the user to copy `MYSELF.example.md` to `MYSE
| Service | Swagger UI | OpenAPI JSON |
|------------|---------------------------------------------------|----------------------------------------|
| ContactDB | `http://localhost:42000/contactdb-api/docs` | `/contactdb-api/openapi.json` |
| DataIndex | `http://localhost:42000/dataindex/docs` | `/dataindex/openapi.json` |
| ContactDB | `http://localhost:42000/contactdb-api/docs` (direct), or `http://caddy/contactdb-api/docs` (via greywall sandbox) | `/contactdb-api/openapi.json` |
| DataIndex | `http://localhost:42000/dataindex/docs` (direct), or `http://caddy/dataindex/docs` (via greywall sandbox) | `/dataindex/openapi.json` |
## Common Questions → API Calls
@@ -51,117 +54,22 @@ Use this table to translate natural language questions into API calls. The base
**Key pattern:** Any question about "me" / "my" / "I" requires calling `GET /contactdb-api/api/contacts/me` first to get your `contact_id`, then using that ID in subsequent DataIndex queries.
## Workflow
### When to create a marimo notebook
Any request that involves **analysis over a period of time** (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a **large volume of data** — too much to process inline. In these cases, **always produce a marimo notebook** (a `.py` file following the patterns in [notebook-patterns.md]).
Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".
If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
### Always create a new workflow
When the user requests a workflow, **always create a new notebook file**. Do **not** modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.
### File naming and location
All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
```
workflows/<NNN>_<topic>_<scope>.py
```
- `<NNN>` — zero-padded sequence number (`001`, `002`, …). Look at existing files in `workflows/` to determine the next number.
- `<topic>` — what is being analyzed, in snake_case (e.g., `greyhaven_meetings`, `alice_emails`, `hiring_discussions`)
- `<scope>` — time range or qualifier (e.g., `january`, `q1_2026`, `last_30d`, `all_time`)
**Examples:**
```
workflows/001_greyhaven_meetings_january.py
workflows/002_alice_emails_q1_2026.py
workflows/003_hiring_discussions_last_30d.py
workflows/004_team_interaction_timeline_all_time.py
```
**Before creating a new workflow**, list existing files in `workflows/` to find the highest number and increment it.
### Plan before you implement
Before writing any notebook, **always propose a plan first** and get the user's approval. The plan should describe:
1. **Goal** — What question are we answering?
2. **Data sources** — Which entity types and API endpoints will be used?
3. **Algorithm / ETL steps** — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
4. **Output format** — Table columns, charts, or summary statistics the user will see.
Only proceed to implementation after the user confirms the plan.
### Validate before delivering
After writing or editing a notebook, **always run `uvx marimo check`** to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):
```bash
uvx marimo check workflows/NNN_topic_scope.py
```
A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.
### Steps
1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
2. **Find data** — Use DataIndex `GET /query` (exhaustive, paginated) or `POST /search` (semantic, ranked) with `contact_ids`, `entity_types`, `date_from`/`date_to`, `connector_ids` filters.
3. **Analyze** — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see [notebook-patterns.md]).
### Quick Example (Python)
> "Find all emails involving Alice since January"
```python
import httpx
CONTACTDB = "http://localhost:42000/contactdb-api"
DATAINDEX = "http://localhost:42000/dataindex/api/v1"
client = httpx.Client(timeout=30)
# 1. Resolve "Alice" to a contact_id
resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
alice_id = resp.json()["contacts"][0]["id"] # e.g. 42
# 2. Fetch all emails involving Alice (with pagination)
emails = []
offset = 0
while True:
resp = client.get(f"{DATAINDEX}/query", params={
"entity_types": "email",
"contact_ids": str(alice_id),
"date_from": "2025-01-01T00:00:00Z",
"limit": 50,
"offset": offset,
})
data = resp.json()
emails.extend(data["items"])
if offset + 50 >= data["total"]:
break
offset += 50
print(f"Found {len(emails)} emails involving Alice")
```
## File Index
- [MYSELF.md] — User identity, role, collaborators, and preferences (gitignored, copy from `MYSELF.example.md`)
- [company-context.md] — Business context, team structure, vocabulary
- [contactdb-api.md] — ContactDB entities and REST endpoints
- [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints
- [connectors-and-sources.md] — Connector-to-entity-type mapping
- [notebook-patterns.md] — Marimo notebook patterns and common API workflows
- [company skill] — Business context, team structure, vocabulary
- [contactdb skill] — ContactDB entities and REST endpoints
- [dataindex skill] — DataIndex entity types, query modes, REST endpoints
- [connectors skill] — Connector-to-entity-type mapping
- [workflow skill] — How to create marimo analysis notebooks
- [notebook-patterns skill] — Marimo notebook patterns and common API workflows
- [checkout skill] — Weekly review builder
[MYSELF.md]: ./MYSELF.md
[company-context.md]: ./docs/company-context.md
[contactdb-api.md]: ./docs/contactdb-api.md
[dataindex-api.md]: ./docs/dataindex-api.md
[connectors-and-sources.md]: ./docs/connectors-and-sources.md
[notebook-patterns.md]: ./docs/notebook-patterns.md
[company skill]: ./.agents/skills/company/SKILL.md
[contactdb skill]: ./.agents/skills/contactdb/SKILL.md
[dataindex skill]: ./.agents/skills/dataindex/SKILL.md
[connectors skill]: ./.agents/skills/connectors/SKILL.md
[workflow skill]: ./.agents/skills/workflow/SKILL.md
[notebook-patterns skill]: ./.agents/skills/notebook-patterns/SKILL.md
[checkout skill]: ./.agents/skills/checkout/SKILL.md