feat: migrate to skills-based approach
This commit is contained in:
52
.agents/skills/checkout/SKILL.md
Normal file
52
.agents/skills/checkout/SKILL.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
name: checkout
|
||||
description: Build a weekly checkout/review covering Sunday through today. Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured summary.
|
||||
disable-model-invocation: true
|
||||
---
|
||||
|
||||
# Weekly Review Builder
|
||||
|
||||
Build my weekly checkout covering Sunday through today.
|
||||
|
||||
1. **Get my identity** with `contactdb_get_me` to obtain my contact_id
|
||||
2. **Determine date range**: Sunday to today (use `date -d "last sunday" +%Y-%m-%d`)
|
||||
3. **Gather activity in parallel**:
|
||||
- **Dataindex**: Launch **one subagent per day** (Sunday through today). Each subagent should query `dataindex_query_entities` for that specific day with my contact_id, looking for meetings, calendar events, emails, documents. Return day-by-day summary.
|
||||
- **Threaded Conversations**: Launch **one subagent per day** (Sunday through today). Each subagent should:
|
||||
1. Query `dataindex_query_entities` for entity_type `threaded_conversation` for that specific day with my contact_id
|
||||
2. For each conversation found, fetch all `conversation_message` entities using the conversation ID as parent_id filter
|
||||
3. Return messages I participated in with context
|
||||
- **Gitea**: Launch one subagent to run `~/bin/gitea-activity -s START -e END` and extract commits, PRs (opened/merged/approved), and repositories worked on
|
||||
4. **Query dataindex directly** for the full week as backup to ensure nothing is missed
|
||||
|
||||
**Build the checkout with this structure:**
|
||||
|
||||
```
|
||||
# Weekly Review: [Date Range]
|
||||
|
||||
## Objectives
|
||||
- List 2-3 high-level goals for the week based on the main themes of work
|
||||
|
||||
****Major Achievements****
|
||||
- Bullet points of concrete deliverables, grouped by theme
|
||||
- Focus on shipped features, solved problems, infrastructure built
|
||||
|
||||
****Code Activity****
|
||||
- Stats line: X commits across Y repositories, Z PRs total (N merged, M open)
|
||||
- **New Repositories**: `[name](url)` - brief description
|
||||
- **Pull Requests Merged**: `[#N Title](url)` - one per line with descriptive title
|
||||
- **Pull Requests Opened (not merged)**: `[#N](url)` - include status if known (approved, draft, etc.)
|
||||
|
||||
****Team Interactions****
|
||||
- **Meeting Type (Nx)**: Brief description of purpose/outcome
|
||||
With: Key participants
|
||||
- **Notable conversations**: Date, participants, main subject discussed
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Use `****Title****` format for section headers (not ##)
|
||||
- All PRs and repositories must be markdown links `[name](url)`
|
||||
- List merged PRs first, then open/unmerged ones
|
||||
- Only include meaningful interactions (skip routine standups unless notable decisions made)
|
||||
- No "who am I" header, no summary section at the end
|
||||
- Focus on outcomes and business value, not just activity volume
|
||||
49
.agents/skills/company/SKILL.md
Normal file
49
.agents/skills/company/SKILL.md
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
name: company
|
||||
description: Monadical company context. Use when you need to understand the organization structure, Zulip stream layout, communication tools, meeting/calendar relationships, or internal product names.
|
||||
user-invocable: false
|
||||
---
|
||||
|
||||
# Company Context
|
||||
|
||||
## About Monadical
|
||||
|
||||
Monadical is a software consultancy founded in 2016. The company operates across multiple locations: Montreal and Vancouver (Canada), and Medellin and Cali (Colombia). The team builds internal products alongside client work.
|
||||
|
||||
### Internal Products
|
||||
|
||||
- **Reflector** — Meeting recording and transcription tool (produces meeting entities in DataIndex)
|
||||
- **GreyHaven / InternalAI platform** — A local-first platform that aggregates personal data, resolve contact to do automation and analysis
|
||||
|
||||
## Communication Tools
|
||||
|
||||
| Tool | Role | Data in DataIndex? |
|
||||
|------------|-----------------------------|---------------------|
|
||||
| Zulip | Primary internal chat | Yes (connector: `zulip`) |
|
||||
| Fastmail/Email | External communication | Yes (connector: `mbsync_email`) |
|
||||
| Calendar | Scheduling (ICS feeds) | Yes (connector: `ics_calendar`) |
|
||||
| Reflector | Meeting recordings | Yes (connector: `reflector`) |
|
||||
| HedgeDoc | Collaborative documents | Yes (connector: `hedgedoc`) |
|
||||
|
||||
## How the company is working
|
||||
|
||||
We use zulip as our main hub for communication. Zulip have channels (top level) and topic (low level). Depending the channels, differents behavior have to be adopted.
|
||||
|
||||
### Zulip channels
|
||||
|
||||
Here is a list of zulip stream prefix with context on how the company is organized:
|
||||
|
||||
- InternalAI (zulip:stream:193) is about this specific platform.
|
||||
- Leads (zulip:stream:78) is where we talk about our leads/client. We usually create one topic per lead/client - So if you are searching information about a client, always have a look if a related topic exist, that match the client or the company name.
|
||||
- Checkins (zulip:stream:24) are usually one topic per employee. This is where an employee indicate what it did or will do during a period of time, or just some status update. Not everybody is using the system on regular basis.
|
||||
- Devcap (zulip:stream:156) is where we are talking about our investment / due diligence before investing. One topic per company.
|
||||
- General (zulip:stream:21) is where we talk about different topic on various subject, company wide or services.
|
||||
- Enginerring (zulip:stream:25) is where we talk about enginerring issue / services / new tool to try
|
||||
- Learning (zulip:stream:31) is where we share links about new tools / ideas or stuff to learn about
|
||||
- Reflector (zulip:stream:155) dedicated stream about reflector development and usage
|
||||
- GreyHaven is separated in multiple topics: branding is in (zulip:stream:206), leads specific to greyhaven (zulip:stream:208) with one topic per lead, and marketing (zulip:stream:212)
|
||||
|
||||
### Meeting and Calendar
|
||||
|
||||
Some persons in the company have a dedicated room for their meeting in reflector. This can be seen in `room_name` in `meeting` entity.
|
||||
For person like Max, dataindex have calendar information, and he mostly have a related meeting that will be in reflector. However, there is no direct relation between calendar information and reflector meeting. A correlation has to be done to figure out which meeting is it when talking about an event.
|
||||
105
.agents/skills/connectors/SKILL.md
Normal file
105
.agents/skills/connectors/SKILL.md
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
name: connectors
|
||||
description: Reference for all data connectors and their entity type mappings. Use when determining which connector produces which entity types, understanding connector-specific fields, or choosing the right data source for a query.
|
||||
user-invocable: false
|
||||
---
|
||||
|
||||
# Connectors and Data Sources
|
||||
|
||||
Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh.
|
||||
|
||||
Use `list_connectors()` at runtime to see which connectors are actually configured — not all connectors below may be active in every deployment.
|
||||
|
||||
## Connector → Entity Type Mapping
|
||||
|
||||
| Connector ID | Entity Types Produced | Description |
|
||||
|------------------|-----------------------------------------------------------------|----------------------------------|
|
||||
| `reflector` | `meeting` | Meeting recordings + transcripts |
|
||||
| `ics_calendar` | `calendar_event` | ICS calendar feed events |
|
||||
| `mbsync_email` | `email` | Email via mbsync IMAP sync |
|
||||
| `zulip` | `conversation`, `conversation_message`, `threaded_conversation` | Zulip chat streams and topics |
|
||||
| `babelfish` | `conversation_message`, `threaded_conversation` | Chat translation bridge |
|
||||
| `hedgedoc` | `document` | HedgeDoc collaborative documents |
|
||||
| `contactdb` | `contact` | Synced from ContactDB (static) |
|
||||
| `browser_history`| `webpage` | Browser extension page visits |
|
||||
| `api_document` | `document` | API-ingested documents (static) |
|
||||
|
||||
## Per-Connector Details
|
||||
|
||||
### `reflector` — Meeting Recordings
|
||||
|
||||
Ingests meetings from Reflector, Monadical's meeting recording tool.
|
||||
|
||||
- **Entity type:** `meeting`
|
||||
- **Key fields:** `transcript`, `summary`, `participants`, `start_time`, `end_time`, `room_name`
|
||||
- **Use cases:** Find meetings someone attended, search meeting transcripts, get summaries
|
||||
- **Tip:** Filter with `contact_ids` to find meetings involving specific people. The `transcript` field contains speaker-diarized text.
|
||||
|
||||
### `ics_calendar` — Calendar Events
|
||||
|
||||
Parses ICS calendar feeds (Google Calendar, Outlook, etc.).
|
||||
|
||||
- **Entity type:** `calendar_event`
|
||||
- **Key fields:** `start_time`, `end_time`, `attendees`, `location`, `description`, `calendar_name`
|
||||
- **Use cases:** Check upcoming events, find events with specific attendees, review past schedule
|
||||
- **Tip:** Multiple calendar feeds may be configured as separate connectors (e.g., `personal_calendar`, `work_calendar`). Use `list_connectors()` to discover them.
|
||||
|
||||
### `mbsync_email` — Email
|
||||
|
||||
Syncs email via mbsync (IMAP).
|
||||
|
||||
- **Entity type:** `email`
|
||||
- **Key fields:** `text_content`, `from_contact_id`, `to_contact_ids`, `cc_contact_ids`, `thread_id`, `has_attachments`
|
||||
- **Use cases:** Find emails from/to someone, search email content, track email threads
|
||||
- **Tip:** Use `from_contact_id` and `to_contact_ids` with `contact_ids` filter. For thread grouping, use the `thread_id` field.
|
||||
|
||||
### `zulip` — Chat
|
||||
|
||||
Ingests Zulip streams, topics, and messages.
|
||||
|
||||
- **Entity types:**
|
||||
- `conversation` — A Zulip stream/channel with recent messages
|
||||
- `conversation_message` — Individual chat messages
|
||||
- `threaded_conversation` — A topic thread within a stream
|
||||
- **Key fields:** `message`, `mentioned_contact_ids`, `recent_messages`
|
||||
- **Use cases:** Find discussions about a topic, track who said what, find @-mentions
|
||||
- **Tip:** Use `threaded_conversation` to find topic-level discussions. Use `conversation_message` with `mentioned_contact_ids` to find messages that mention specific people.
|
||||
|
||||
### `babelfish` — Translation Bridge
|
||||
|
||||
Ingests translated chat messages from the Babelfish service.
|
||||
|
||||
- **Entity types:** `conversation_message`, `threaded_conversation`
|
||||
- **Use cases:** Similar to Zulip but for translated cross-language conversations
|
||||
- **Tip:** Query alongside `zulip` connector for complete conversation coverage.
|
||||
|
||||
### `hedgedoc` — Collaborative Documents
|
||||
|
||||
Syncs documents from HedgeDoc (collaborative markdown editor).
|
||||
|
||||
- **Entity type:** `document`
|
||||
- **Key fields:** `content`, `description`, `url`, `revision_id`
|
||||
- **Use cases:** Find documents by content, track document revisions
|
||||
- **Tip:** Use `search()` for semantic document search rather than `query_entities` text filter.
|
||||
|
||||
### `contactdb` — Contact Sync (Static)
|
||||
|
||||
Mirrors contacts from ContactDB into DataIndex for unified search.
|
||||
|
||||
- **Entity type:** `contact`
|
||||
- **Note:** This is a read-only mirror. Use ContactDB MCP tools directly for contact operations.
|
||||
|
||||
### `browser_history` — Browser Extension (Static)
|
||||
|
||||
Captures visited webpages from a browser extension.
|
||||
|
||||
- **Entity type:** `webpage`
|
||||
- **Key fields:** `url`, `visit_time`, `text_content`
|
||||
- **Use cases:** Find previously visited pages, search page content
|
||||
|
||||
### `api_document` — API Documents (Static)
|
||||
|
||||
Documents ingested via the REST API (e.g., uploaded PDFs, imported files).
|
||||
|
||||
- **Entity type:** `document`
|
||||
- **Note:** These are ingested via `POST /api/v1/ingest/documents`, not periodic sync.
|
||||
160
.agents/skills/contactdb/SKILL.md
Normal file
160
.agents/skills/contactdb/SKILL.md
Normal file
@@ -0,0 +1,160 @@
|
||||
---
|
||||
name: contactdb
|
||||
description: ContactDB REST API reference. Use when resolving people to contact_ids, searching contacts by name/email, or accessing relationships, notes, and platform identities.
|
||||
user-invocable: false
|
||||
---
|
||||
|
||||
# ContactDB API Reference
|
||||
|
||||
ContactDB is the people directory. It stores contacts, their platform identities, relationships, notes, and links. Every person across all data sources resolves to a single ContactDB `contact_id`.
|
||||
|
||||
**Base URL:** `http://localhost:42000/contactdb-api` (via Caddy) or `http://localhost:42800` (direct)
|
||||
|
||||
## Core Entities
|
||||
|
||||
### Contact
|
||||
|
||||
The central entity — represents a person.
|
||||
|
||||
| Field | Type | Description |
|
||||
|----------------------|---------------------|------------------------------------------------|
|
||||
| `id` | int | Unique contact ID |
|
||||
| `name` | string | Display name |
|
||||
| `emails` | EmailField[] | `{type, value, preferred}` |
|
||||
| `phones` | PhoneField[] | `{type, value, preferred}` |
|
||||
| `bio` | string? | Short biography |
|
||||
| `avatar_url` | string? | Profile image URL |
|
||||
| `personal_info` | PersonalInfo | Birthday, partner, children, role, company, location, how_we_met |
|
||||
| `interests` | string[] | Topics of interest |
|
||||
| `values` | string[] | Personal values |
|
||||
| `tags` | string[] | User-assigned tags |
|
||||
| `profile_description`| string? | Extended description |
|
||||
| `is_placeholder` | bool | Auto-created stub (not yet fully resolved) |
|
||||
| `is_service_account` | bool | Non-human account (bot, no-reply) |
|
||||
| `stats` | ContactStats | Interaction statistics (see below) |
|
||||
| `enrichment_data` | dict | Data from enrichment providers |
|
||||
| `platform_identities`| PlatformIdentity[] | Identities on various platforms |
|
||||
| `created_at` | datetime | When created |
|
||||
| `updated_at` | datetime | Last modified |
|
||||
| `merged_into_id` | int? | If merged, target contact ID |
|
||||
| `deleted_at` | datetime? | Soft-delete timestamp |
|
||||
|
||||
### ContactStats
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------------|---------------|--------------------------------------|
|
||||
| `total_messages` | int | Total messages across platforms |
|
||||
| `platforms_count` | int | Number of platforms active on |
|
||||
| `last_interaction_at` | string? | ISO datetime of last interaction |
|
||||
| `interaction_count_30d` | int | Interactions in last 30 days |
|
||||
| `interaction_count_90d` | int | Interactions in last 90 days |
|
||||
| `hotness` | HotnessScore? | Composite engagement score (0-100) |
|
||||
|
||||
### PlatformIdentity
|
||||
|
||||
Links a contact to a specific platform account.
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------|-----------|------------------------------------------|
|
||||
| `id` | int | Identity record ID |
|
||||
| `contact_id` | int | Parent contact |
|
||||
| `source` | string | Data provenance (e.g., `dataindex_zulip`)|
|
||||
| `platform` | string | Platform name (e.g., `email`, `zulip`) |
|
||||
| `platform_user_id` | string | User ID on that platform |
|
||||
| `display_name` | string? | Name shown on that platform |
|
||||
| `avatar_url` | string? | Platform-specific avatar |
|
||||
| `bio` | string? | Platform-specific bio |
|
||||
| `extra_data` | dict | Additional platform-specific data |
|
||||
| `first_seen_at` | datetime | When first observed |
|
||||
| `last_seen_at` | datetime | When last observed |
|
||||
|
||||
### Relationship
|
||||
|
||||
Tracks connections between contacts.
|
||||
|
||||
| Field | Type | Description |
|
||||
|------------------------|-----------|--------------------------------------|
|
||||
| `id` | int | Relationship ID |
|
||||
| `from_contact_id` | int | Source contact |
|
||||
| `to_contact_id` | int | Target contact |
|
||||
| `relationship_type` | string | Type (e.g., "colleague", "client") |
|
||||
| `since_date` | date? | When relationship started |
|
||||
| `relationship_metadata`| dict | Additional metadata |
|
||||
|
||||
### Note
|
||||
|
||||
Free-text notes attached to a contact.
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------|----------|----------------------|
|
||||
| `id` | int | Note ID |
|
||||
| `contact_id` | int | Parent contact |
|
||||
| `content` | string | Note text |
|
||||
| `created_by` | string | Who wrote it |
|
||||
| `created_at` | datetime | When created |
|
||||
|
||||
### Link
|
||||
|
||||
External URLs associated with a contact.
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------|----------|--------------------------|
|
||||
| `id` | int | Link ID |
|
||||
| `contact_id` | int | Parent contact |
|
||||
| `type` | string | Link type (e.g., "github", "linkedin") |
|
||||
| `label` | string | Display label |
|
||||
| `url` | string | URL |
|
||||
|
||||
## REST Endpoints
|
||||
|
||||
### GET `/api/contacts` — List/search contacts
|
||||
|
||||
Primary way to find contacts. Returns `{contacts: [...], total, limit, offset}`.
|
||||
|
||||
**Query parameters:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|------------------------|---------------|----------------------------------------------|
|
||||
| `search` | string? | Search in name and bio |
|
||||
| `is_placeholder` | bool? | Filter by placeholder status |
|
||||
| `is_service_account` | bool? | Filter by service account status |
|
||||
| `sort_by` | string? | `"hotness"`, `"name"`, or `"updated_at"` |
|
||||
| `min_hotness` | float? | Minimum hotness score (0-100) |
|
||||
| `max_hotness` | float? | Maximum hotness score (0-100) |
|
||||
| `platforms` | string[]? | Contacts with ALL specified platforms (AND) |
|
||||
| `last_interaction_from`| string? | ISO datetime lower bound |
|
||||
| `last_interaction_to` | string? | ISO datetime upper bound |
|
||||
| `limit` | int | Max results (1-100, default 50) |
|
||||
| `offset` | int | Pagination offset (default 0) |
|
||||
|
||||
### GET `/api/contacts/me` — Get self contact
|
||||
|
||||
Returns the platform operator's own contact record. **Call this first** in most workflows to get your own `contact_id`.
|
||||
|
||||
### GET `/api/contacts/{id}` — Get contact by ID
|
||||
|
||||
Get full details for a single contact by numeric ID.
|
||||
|
||||
### GET `/api/contacts/by-email/{email}` — Get contact by email
|
||||
|
||||
Look up a contact by email address.
|
||||
|
||||
### Other Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|-----------------------------------------|----------------------------------|
|
||||
| POST | `/api/contacts` | Create contact |
|
||||
| PUT | `/api/contacts/{id}` | Update contact |
|
||||
| DELETE | `/api/contacts/{id}` | Delete contact |
|
||||
| POST | `/api/contacts/merge` | Merge two contacts |
|
||||
| GET | `/api/contacts/{id}/relationships` | List relationships |
|
||||
| GET | `/api/contacts/{id}/notes` | List notes |
|
||||
| GET | `/api/contacts/{id}/links` | List links |
|
||||
| GET | `/api/platform-identities/contacts/{id}`| List platform identities |
|
||||
|
||||
## Usage Pattern
|
||||
|
||||
1. **Start with `GET /api/contacts/me`** to get the operator's contact ID
|
||||
2. **Search by name** with `GET /api/contacts?search=Alice`
|
||||
3. **Use contact IDs** from results as filters in DataIndex queries (`contact_ids` parameter)
|
||||
4. **Paginate** large result sets with `offset` increments
|
||||
223
.agents/skills/dataindex/SKILL.md
Normal file
223
.agents/skills/dataindex/SKILL.md
Normal file
@@ -0,0 +1,223 @@
|
||||
---
|
||||
name: dataindex
|
||||
description: DataIndex REST API reference. Use when querying unified data (emails, meetings, calendar events, Zulip conversations, documents) via GET /query, POST /search, or GET /entities/{id}.
|
||||
user-invocable: false
|
||||
---
|
||||
|
||||
# DataIndex API Reference
|
||||
|
||||
DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an **entity** with a common base structure plus type-specific fields.
|
||||
|
||||
**Base URL:** `http://localhost:42000/dataindex/api/v1` (via Caddy) or `http://localhost:42180/api/v1` (direct)
|
||||
|
||||
## Entity Types
|
||||
|
||||
All entities share these base fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|----------------------|-------------|---------------------------------------------|
|
||||
| `id` | string | Format: `connector_name:native_id` |
|
||||
| `entity_type` | string | One of the types below |
|
||||
| `timestamp` | datetime | When the entity occurred |
|
||||
| `contact_ids` | string[] | ContactDB IDs of people involved |
|
||||
| `connector_id` | string | Which connector produced this |
|
||||
| `title` | string? | Display title |
|
||||
| `parent_id` | string? | Parent entity (e.g., thread for a message) |
|
||||
| `raw_data` | dict | Original source data (excluded by default) |
|
||||
|
||||
### `calendar_event`
|
||||
|
||||
From ICS calendar feeds.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-----------------------|-------------|--------------------------------|
|
||||
| `start_time` | datetime? | Event start |
|
||||
| `end_time` | datetime? | Event end |
|
||||
| `all_day` | bool | All-day event flag |
|
||||
| `description` | string? | Event description |
|
||||
| `location` | string? | Event location |
|
||||
| `attendees` | dict[] | Attendee list |
|
||||
| `organizer_contact_id`| string? | ContactDB ID of organizer |
|
||||
| `status` | string? | Event status |
|
||||
| `calendar_name` | string? | Source calendar name |
|
||||
| `meeting_url` | string? | Video call link |
|
||||
|
||||
### `meeting`
|
||||
|
||||
From Reflector (recorded meetings with transcripts).
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------|---------------------|-----------------------------------|
|
||||
| `start_time` | datetime? | Meeting start |
|
||||
| `end_time` | datetime? | Meeting end |
|
||||
| `participants` | MeetingParticipant[]| People in the meeting |
|
||||
| `meeting_platform` | string? | Platform (e.g., "jitsi") |
|
||||
| `transcript` | string? | Full transcript text |
|
||||
| `summary` | string? | AI-generated summary |
|
||||
| `meeting_url` | string? | Meeting link |
|
||||
| `recording_url` | string? | Recording link |
|
||||
| `location` | string? | Physical location |
|
||||
| `room_name` | string? | Virtual room name (also indicates meeting location — see below) |
|
||||
|
||||
**MeetingParticipant** fields: `display_name`, `contact_id?`, `platform_user_id?`, `email?`, `speaker?`
|
||||
|
||||
> **`room_name` as location indicator:** The `room_name` field often encodes where the meeting took place (e.g., a Jitsi room name like `standup-office-bogota`). Use it to infer the meeting location when `location` is not set.
|
||||
|
||||
> **Participant and contact coverage is incomplete.** Meeting data comes from Reflector, which only tracks users who are logged into the Reflector platform. This means:
|
||||
>
|
||||
> - **`contact_ids`** only contains ContactDB IDs for Reflector-logged participants who were matched to a known contact. It will often be a **subset** of the actual attendees — do not assume it is the full list.
|
||||
> - **`participants`** is more complete than `contact_ids` but still only includes people detected by Reflector. Not all participants have accounts or could be identified — some attendees may be entirely absent from this list.
|
||||
> - **`contact_id` within a participant** may be `null` if the person was detected but couldn't be matched to a ContactDB entry.
|
||||
>
|
||||
> **Consequence for queries:** Filtering meetings by `contact_ids` will **miss meetings** where the person attended but wasn't logged into Reflector or wasn't resolved. To get better coverage, combine multiple strategies:
|
||||
>
|
||||
> 1. Filter by `contact_ids` for resolved participants
|
||||
> 2. Search `participants[].display_name` client-side for name matches
|
||||
> 3. Use `POST /search` with the person's name to search meeting transcripts and summaries
|
||||
|
||||
### `email`
|
||||
|
||||
From mbsync email sync.
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------|-----------|--------------------------------------|
|
||||
| `thread_id` | string? | Email thread grouping |
|
||||
| `text_content` | string? | Plain text body |
|
||||
| `html_content` | string? | HTML body |
|
||||
| `snippet` | string? | Preview snippet |
|
||||
| `from_contact_id` | string? | Sender's ContactDB ID |
|
||||
| `to_contact_ids` | string[] | Recipient ContactDB IDs |
|
||||
| `cc_contact_ids` | string[] | CC recipient ContactDB IDs |
|
||||
| `has_attachments` | bool | Has attachments flag |
|
||||
| `attachments` | dict[] | Attachment metadata |
|
||||
|
||||
### `conversation`
|
||||
|
||||
A Zulip stream/channel.
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------|---------|----------------------------------------|
|
||||
| `recent_messages` | dict[] | Recent messages in the conversation |
|
||||
|
||||
### `conversation_message`
|
||||
|
||||
A single message in a Zulip conversation.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------------------------|-----------|-----------------------------------|
|
||||
| `message` | string? | Message text content |
|
||||
| `mentioned_contact_ids` | string[] | ContactDB IDs of mentioned people |
|
||||
|
||||
### `threaded_conversation`
|
||||
|
||||
A Zulip topic thread (group of messages under a topic).
|
||||
|
||||
| Field | Type | Description |
|
||||
|--------------------|---------|----------------------------------------|
|
||||
| `recent_messages` | dict[] | Recent messages in the thread |
|
||||
|
||||
### `document`
|
||||
|
||||
From HedgeDoc, API ingestion, or other document sources.
|
||||
|
||||
| Field | Type | Description |
|
||||
|----------------|-----------|------------------------------|
|
||||
| `content` | string? | Document body text |
|
||||
| `description` | string? | Document description |
|
||||
| `mimetype` | string? | MIME type |
|
||||
| `url` | string? | Source URL |
|
||||
| `revision_id` | string? | Revision identifier |
|
||||
|
||||
### `webpage`
|
||||
|
||||
From browser history extension.
|
||||
|
||||
| Field | Type | Description |
|
||||
|----------------|-----------|------------------------------|
|
||||
| `url` | string | Page URL |
|
||||
| `visit_time` | datetime | When visited |
|
||||
| `text_content` | string? | Page text content |
|
||||
|
||||
## REST Endpoints
|
||||
|
||||
### GET `/api/v1/query` — Exhaustive Filtered Enumeration
|
||||
|
||||
Use when you need **all** entities matching specific criteria. Supports pagination.
|
||||
|
||||
**When to use:** "List all meetings since January", "Get all emails from Alice", "Count calendar events this week"
|
||||
|
||||
**Query parameters:**
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|------------------|---------------|------------------------------------------------|
|
||||
| `entity_types` | string (repeat) | Filter by type — repeat param for multiple: `?entity_types=email&entity_types=meeting` |
|
||||
| `contact_ids` | string | Comma-separated ContactDB IDs: `"1,42"` |
|
||||
| `connector_ids` | string | Comma-separated connector IDs: `"zulip,reflector"` |
|
||||
| `date_from` | string | ISO datetime lower bound (UTC if no timezone) |
|
||||
| `date_to` | string | ISO datetime upper bound |
|
||||
| `search` | string? | Text filter on content fields |
|
||||
| `parent_id` | string? | Filter by parent entity |
|
||||
| `id_prefix` | string? | Filter entities by ID prefix (e.g., `zulip:stream:155`) |
|
||||
| `thread_id` | string? | Filter emails by thread ID |
|
||||
| `room_name` | string? | Filter meetings by room name |
|
||||
| `limit` | int | Max results per page (default 50) |
|
||||
| `offset` | int | Pagination offset (default 0) |
|
||||
| `sort_by` | string | `"timestamp"` (default), `"title"`, `"contact_activity"`, etc. |
|
||||
| `sort_order` | string | `"desc"` (default) or `"asc"` |
|
||||
| `include_raw_data`| bool | Include raw_data field (default false) |
|
||||
|
||||
**Response format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [...],
|
||||
"total": 152,
|
||||
"page": 1,
|
||||
"size": 50,
|
||||
"pages": 4
|
||||
}
|
||||
```
|
||||
|
||||
**Pagination:** loop with offset increments until `offset >= total`. See the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md) for a reusable helper.
|
||||
|
||||
### POST `/api/v1/search` — Semantic Search
|
||||
|
||||
Use when you need **relevant** results for a natural-language question. Returns ranked text chunks. No pagination — set a higher `limit` instead.
|
||||
|
||||
**When to use:** "What was discussed about the product roadmap?", "Find conversations about hiring"
|
||||
|
||||
**Request body (JSON):**
|
||||
|
||||
```json
|
||||
{
|
||||
"search_text": "product roadmap decisions",
|
||||
"entity_types": ["meeting", "threaded_conversation"],
|
||||
"contact_ids": ["1", "42"],
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
"date_to": "2025-06-01T00:00:00Z",
|
||||
"connector_ids": ["reflector", "zulip"],
|
||||
"limit": 20
|
||||
}
|
||||
```
|
||||
|
||||
**Response:** `{results: [...chunks], total_count}` — each chunk has `entity_ids`, `entity_type`, `connector_id`, `content`, `timestamp`.
|
||||
|
||||
### GET `/api/v1/entities/{id}` — Get Entity by ID
|
||||
|
||||
Retrieve full details of a single entity. The `entity_id` format is `connector_name:native_id`.
|
||||
|
||||
### GET `/api/v1/connectors/status` — Connector Status
|
||||
|
||||
Get sync status for all connectors (last sync time, entity count, health).
|
||||
|
||||
## Common Query Recipes
|
||||
|
||||
| Question | entity_type + connector_id |
|
||||
|---------------------------------------|------------------------------------------|
|
||||
| Meetings I attended | `meeting` + `reflector`, with your contact_id |
|
||||
| Upcoming calendar events | `calendar_event` + `ics_calendar`, date_from=now |
|
||||
| Emails from someone | `email` + `mbsync_email`, with their contact_id |
|
||||
| Zulip threads about a topic | `threaded_conversation` + `zulip`, search="topic" |
|
||||
| All documents | `document` + `hedgedoc` |
|
||||
| Chat messages mentioning someone | `conversation_message` + `zulip`, with contact_id |
|
||||
| What was discussed about X? | Use `POST /search` with `search_text` |
|
||||
808
.agents/skills/notebook-patterns/SKILL.md
Normal file
808
.agents/skills/notebook-patterns/SKILL.md
Normal file
@@ -0,0 +1,808 @@
|
||||
---
|
||||
name: notebook-patterns
|
||||
description: Marimo notebook patterns for InternalAI data analysis. Use when creating or editing marimo notebooks — covers cell scoping, async cells, pagination helpers, analysis patterns, and do/don't rules.
|
||||
user-invocable: false
|
||||
---
|
||||
|
||||
# Marimo Notebook Patterns
|
||||
|
||||
This guide covers how to create [marimo](https://marimo.io) notebooks for data analysis against the InternalAI platform APIs. Marimo notebooks are plain `.py` files with reactive cells — no `.ipynb` format, no Jupyter dependency.
|
||||
|
||||
## Marimo Basics
|
||||
|
||||
A marimo notebook is a Python file with `@app.cell` decorated functions. Each cell returns values as a tuple, and other cells receive them as function parameters — marimo builds a reactive DAG automatically.
|
||||
|
||||
```python
|
||||
import marimo
|
||||
app = marimo.App()
|
||||
|
||||
@app.cell
|
||||
def cell_one():
|
||||
x = 42
|
||||
return (x,)
|
||||
|
||||
@app.cell
|
||||
def cell_two(x):
|
||||
# Re-runs automatically when x changes
|
||||
result = x * 2
|
||||
return (result,)
|
||||
```
|
||||
|
||||
**Key rules:**
|
||||
- Cells declare dependencies via function parameters
|
||||
- Cells return values as tuples: `return (var1, var2,)`
|
||||
- The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below
|
||||
- Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below)
|
||||
- No manual execution order; the DAG determines it
|
||||
- **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names.
|
||||
- **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`.
|
||||
|
||||
### Cell Variable Scoping — Example
|
||||
|
||||
This is the **most common mistake**. Any variable assigned at the top level of a cell (not inside a `def` or comprehension) is tracked by marimo. If two cells assign the same name, the notebook refuses to run.
|
||||
|
||||
**BROKEN** — `resp` is defined at top level in both cells:
|
||||
|
||||
```python
|
||||
# Cell A
|
||||
@app.cell
|
||||
def search_meetings(client, DATAINDEX):
|
||||
resp = client.post(f"{DATAINDEX}/search", json={...}) # defines 'resp'
|
||||
resp.raise_for_status()
|
||||
results = resp.json()["results"]
|
||||
return (results,)
|
||||
|
||||
# Cell B
|
||||
@app.cell
|
||||
def fetch_details(client, DATAINDEX, results):
|
||||
resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # also defines 'resp' → ERROR
|
||||
meeting = resp.json()
|
||||
return (meeting,)
|
||||
```
|
||||
|
||||
> **Error:** `MultipleDefinitionError: variable 'resp' is defined in multiple cells`
|
||||
|
||||
**FIXED** — prefix cell-local variables with `_`:
|
||||
|
||||
```python
|
||||
# Cell A
|
||||
@app.cell
|
||||
def search_meetings(client, DATAINDEX):
|
||||
_resp = client.post(f"{DATAINDEX}/search", json={...}) # _resp is cell-private
|
||||
_resp.raise_for_status()
|
||||
results = _resp.json()["results"]
|
||||
return (results,)
|
||||
|
||||
# Cell B
|
||||
@app.cell
|
||||
def fetch_details(client, DATAINDEX, results):
|
||||
_resp = client.get(f"{DATAINDEX}/entities/{results[0]}") # _resp is cell-private, no conflict
|
||||
meeting = _resp.json()
|
||||
return (meeting,)
|
||||
```
|
||||
|
||||
**Rule of thumb:** if a variable is only used within the cell to compute a return value, prefix it with `_`. Only leave names unprefixed if another cell needs to receive them.
|
||||
|
||||
> **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell.
|
||||
|
||||
### Cell Output Must Be at the Top Level
|
||||
|
||||
Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded.
|
||||
|
||||
**BROKEN** — `_df` inside the `if` branch is never rendered, and `mo.md()` inside `if`/`else` is also discarded:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def show_results(results, mo):
|
||||
if results:
|
||||
_df = pl.DataFrame(results)
|
||||
mo.md(f"**Found {len(results)} results**")
|
||||
_df # Inside an if block — marimo does NOT display this
|
||||
else:
|
||||
mo.md("**No results found**") # Also inside a block — NOT displayed
|
||||
return
|
||||
```
|
||||
|
||||
**FIXED** — split into separate cells. Each cell displays exactly **one thing** at the top level:
|
||||
|
||||
```python
|
||||
# Cell 1: build the data, return it
|
||||
@app.cell
|
||||
def build_results(results, pl):
|
||||
results_df = pl.DataFrame(results) if results else None
|
||||
return (results_df,)
|
||||
|
||||
# Cell 2: heading — mo.md() is the top-level expression (use ternary for conditional text)
|
||||
@app.cell
|
||||
def show_results_heading(results_df, mo):
|
||||
mo.md(f"**Found {len(results_df)} results**" if results_df is not None else "**No results found**")
|
||||
|
||||
# Cell 3: table — DataFrame is the top-level expression
|
||||
@app.cell
|
||||
def show_results_table(results_df):
|
||||
results_df # Top-level expression — marimo renders this as interactive table
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Each cell should display **one thing** — either `mo.md()` OR a DataFrame, never both
|
||||
- `mo.md()` must be a **top-level expression**, not inside `if`/`else`/`for`/`try` blocks
|
||||
- Build conditional text using variables or ternary expressions, then call `mo.md(_text)` at the top level
|
||||
- For DataFrames, use a standalone display cell: `def show_table(df): df`
|
||||
|
||||
### Async Cells
|
||||
|
||||
When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
async def analyze(meetings, llm_call, ResponseModel, asyncio):
|
||||
async def _score(meeting):
|
||||
return await llm_call(prompt=..., response_model=ResponseModel)
|
||||
|
||||
results = await asyncio.gather(*[_score(_m) for _m in meetings])
|
||||
return (results,)
|
||||
```
|
||||
|
||||
Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells.
|
||||
|
||||
If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below.
|
||||
|
||||
### Cells That Define Classes Must Return Them
|
||||
|
||||
If a cell defines Pydantic models (or any class) that other cells need, it **must** return them:
|
||||
|
||||
```python
|
||||
# BaseModel and Field are imported in the setup cell and received as parameters
|
||||
@app.cell
|
||||
def models(BaseModel, Field):
|
||||
class MeetingSentiment(BaseModel):
|
||||
overall_sentiment: str
|
||||
sentiment_score: int = Field(description="Score from -10 to +10")
|
||||
|
||||
class FrustrationExtraction(BaseModel):
|
||||
has_frustrations: bool
|
||||
frustrations: list[dict]
|
||||
|
||||
return MeetingSentiment, FrustrationExtraction # Other cells receive these as parameters
|
||||
```
|
||||
|
||||
A bare `return` (or no return) means those classes are invisible to the rest of the notebook.
|
||||
|
||||
### Fixing `_unparsable_cell`
|
||||
|
||||
When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`.
|
||||
|
||||
**Common causes:**
|
||||
1. Using `await` without making the cell `async def`
|
||||
2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1)
|
||||
|
||||
**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function:
|
||||
|
||||
```python
|
||||
# BROKEN — saved as _unparsable_cell because of top-level await
|
||||
app._unparsable_cell("""
|
||||
results = await asyncio.gather(...)
|
||||
return results
|
||||
""", name="my_cell")
|
||||
|
||||
# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
|
||||
@app.cell
|
||||
async def my_cell(some_dependency, asyncio):
|
||||
results = await asyncio.gather(...)
|
||||
return (results,)
|
||||
```
|
||||
|
||||
**Key differences to note when converting:**
|
||||
- Wrap the code in an `async def` function (if it uses `await`)
|
||||
- Add cell dependencies as function parameters (including imports like `asyncio`)
|
||||
- Return values as tuples: `return (var,)` not `return var`
|
||||
- Prefix cell-local variables with `_`
|
||||
- Never add `import` statements inside the cell — all imports belong in `setup`
|
||||
|
||||
### Inline Dependencies with PEP 723
|
||||
|
||||
Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.12"
|
||||
# dependencies = [
|
||||
# "marimo",
|
||||
# "httpx",
|
||||
# "polars",
|
||||
# "mirascope[openai]",
|
||||
# "pydantic",
|
||||
# "python-dotenv",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Checking Notebooks Before Running
|
||||
|
||||
Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor:
|
||||
|
||||
```bash
|
||||
uvx marimo check notebook.py # Check a single notebook
|
||||
uvx marimo check workflows/ # Check all notebooks in a directory
|
||||
uvx marimo check --fix notebook.py # Auto-fix fixable issues
|
||||
```
|
||||
|
||||
**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
|
||||
|
||||
### Running Notebooks
|
||||
|
||||
```bash
|
||||
uvx marimo edit notebook.py # Interactive editor (best for development)
|
||||
uvx marimo run notebook.py # Read-only web app
|
||||
uv run notebook.py # Script mode (terminal output)
|
||||
```
|
||||
|
||||
### Inspecting Cell Outputs
|
||||
|
||||
In `marimo edit`, every cell's return value is displayed as rich output below the cell. This is the primary way to introspect API responses:
|
||||
|
||||
- **Dicts/lists** render as collapsible JSON trees — click to expand nested fields
|
||||
- **Polars/Pandas DataFrames** render as interactive sortable tables
|
||||
- **Strings** render as plain text
|
||||
|
||||
To inspect a raw API response, just make it the last expression:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def inspect_response(client, DATAINDEX):
|
||||
_resp = client.get(f"{DATAINDEX}/query", params={
|
||||
"entity_types": "meeting", "limit": 2,
|
||||
})
|
||||
_resp.json() # This gets displayed as a collapsible JSON tree
|
||||
```
|
||||
|
||||
To inspect an intermediate value alongside other work, use `mo.accordion` or return it:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def debug_meetings(meetings, mo):
|
||||
mo.md(f"**Count:** {len(meetings)}")
|
||||
# Show first item structure for inspection
|
||||
mo.accordion({"First meeting raw": mo.json(meetings[0])}) if meetings else None
|
||||
```
|
||||
|
||||
## Notebook Skeleton
|
||||
|
||||
Every notebook against InternalAI follows this structure:
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.12"
|
||||
# dependencies = [
|
||||
# "marimo",
|
||||
# "httpx",
|
||||
# "polars",
|
||||
# "mirascope[openai]",
|
||||
# "pydantic",
|
||||
# "python-dotenv",
|
||||
# ]
|
||||
# ///
|
||||
|
||||
import marimo
|
||||
app = marimo.App()
|
||||
|
||||
@app.cell
|
||||
def params():
|
||||
"""User parameters — edit these to change the workflow's behavior."""
|
||||
SEARCH_TERMS = ["greyhaven"]
|
||||
DATE_FROM = "2026-01-01T00:00:00Z"
|
||||
DATE_TO = "2026-02-01T00:00:00Z"
|
||||
TARGET_PERSON = None # Set to a name like "Alice" to filter by person, or None for all
|
||||
return DATE_FROM, DATE_TO, SEARCH_TERMS, TARGET_PERSON
|
||||
|
||||
@app.cell
|
||||
def config():
|
||||
BASE = "http://localhost:42000"
|
||||
CONTACTDB = f"{BASE}/contactdb-api"
|
||||
DATAINDEX = f"{BASE}/dataindex/api/v1"
|
||||
return (CONTACTDB, DATAINDEX,)
|
||||
|
||||
@app.cell
|
||||
def setup():
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(".env") # Load .env from the project root
|
||||
|
||||
import asyncio # All imports go here — never import inside other cells
|
||||
import httpx
|
||||
import marimo as mo
|
||||
import polars as pl
|
||||
from pydantic import BaseModel, Field
|
||||
client = httpx.Client(timeout=30)
|
||||
return (asyncio, client, mo, pl, BaseModel, Field,)
|
||||
|
||||
# --- your IN / ETL / OUT cells here ---
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run()
|
||||
```
|
||||
|
||||
> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell.
|
||||
|
||||
**The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.
|
||||
|
||||
## Pagination Helper
|
||||
|
||||
The DataIndex `GET /query` endpoint paginates with `limit` and `offset`. Always paginate — result sets can be large.
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def helpers(client):
|
||||
def fetch_all(url, params):
|
||||
"""Fetch all pages from a paginated DataIndex endpoint."""
|
||||
all_items = []
|
||||
limit = params.get("limit", 50)
|
||||
params = {**params, "limit": limit, "offset": 0}
|
||||
while True:
|
||||
resp = client.get(url, params=params)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
all_items.extend(data["items"])
|
||||
if params["offset"] + limit >= data["total"]:
|
||||
break
|
||||
params["offset"] += limit
|
||||
return all_items
|
||||
|
||||
def resolve_contact(name, contactdb_url):
|
||||
"""Find a contact by name, return their ID."""
|
||||
resp = client.get(f"{contactdb_url}/api/contacts", params={"search": name})
|
||||
resp.raise_for_status()
|
||||
contacts = resp.json()["contacts"]
|
||||
if not contacts:
|
||||
raise ValueError(f"No contact found for '{name}'")
|
||||
return contacts[0]
|
||||
|
||||
return (fetch_all, resolve_contact,)
|
||||
```
|
||||
|
||||
## Pattern 1: Emails Involving a Specific Person
|
||||
|
||||
Emails have `from_contact_id`, `to_contact_ids`, and `cc_contact_ids`. The query API's `contact_ids` filter matches entities where the contact appears in **any** of these roles.
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def find_person(resolve_contact, CONTACTDB):
|
||||
target = resolve_contact("Alice", CONTACTDB)
|
||||
target_id = target["id"]
|
||||
target_name = target["name"]
|
||||
return (target_id, target_name,)
|
||||
|
||||
@app.cell
|
||||
def fetch_emails(fetch_all, DATAINDEX, target_id):
|
||||
emails = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "email",
|
||||
"contact_ids": str(target_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
"sort_order": "desc",
|
||||
})
|
||||
return (emails,)
|
||||
|
||||
@app.cell
|
||||
def email_table(emails, target_id, target_name, pl):
|
||||
email_df = pl.DataFrame([{
|
||||
"date": e["timestamp"][:10],
|
||||
"subject": e.get("title", "(no subject)"),
|
||||
"direction": (
|
||||
"sent" if str(target_id) == str(e.get("from_contact_id"))
|
||||
else "received"
|
||||
),
|
||||
"snippet": (e.get("snippet") or e.get("text_content") or "")[:100],
|
||||
} for e in emails])
|
||||
return (email_df,)
|
||||
|
||||
@app.cell
|
||||
def show_emails(email_df, target_name, mo):
|
||||
mo.md(f"## Emails involving {target_name} ({len(email_df)} total)")
|
||||
|
||||
@app.cell
|
||||
def display_email_table(email_df):
|
||||
email_df # Renders as interactive table in marimo edit
|
||||
```
|
||||
|
||||
## Pattern 2: Meetings with a Specific Participant
|
||||
|
||||
Meetings have a `participants` list where each entry may or may not have a resolved `contact_id`. The query API's `contact_ids` filter only matches **resolved** participants.
|
||||
|
||||
**Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones.
|
||||
|
||||
> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`.
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
|
||||
# Get meetings where the target appears in contact_ids
|
||||
resolved_meetings = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "meeting",
|
||||
"contact_ids": str(target_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
})
|
||||
return (resolved_meetings,)
|
||||
|
||||
@app.cell
|
||||
def meeting_table(resolved_meetings, target_name, pl):
|
||||
_rows = []
|
||||
for _m in resolved_meetings:
|
||||
_participants = _m.get("participants", [])
|
||||
_names = [_p["display_name"] for _p in _participants]
|
||||
_rows.append({
|
||||
"date": (_m.get("start_time") or _m["timestamp"])[:10],
|
||||
"title": _m.get("title", "Untitled"),
|
||||
"room_name": _m.get("room_name", ""),
|
||||
"participants": ", ".join(_names),
|
||||
"has_transcript": _m.get("transcript") is not None,
|
||||
"has_summary": _m.get("summary") is not None,
|
||||
})
|
||||
meeting_df = pl.DataFrame(_rows)
|
||||
return (meeting_df,)
|
||||
```
|
||||
|
||||
To also find meetings where the person was present but **not resolved** (guest), search the transcript:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def search_unresolved(client, DATAINDEX, target_name):
|
||||
# Semantic search for the person's name in meeting transcripts
|
||||
_resp = client.post(f"{DATAINDEX}/search", json={
|
||||
"search_text": target_name,
|
||||
"entity_types": ["meeting"],
|
||||
"limit": 50,
|
||||
})
|
||||
_resp.raise_for_status()
|
||||
transcript_hits = _resp.json()["results"]
|
||||
return (transcript_hits,)
|
||||
```
|
||||
|
||||
## Pattern 3: Calendar Events → Meeting Correlation
|
||||
|
||||
Calendar events and meetings are separate entities from different connectors. To find which calendar events had a corresponding recorded meeting, match by time overlap.
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def fetch_calendar_and_meetings(fetch_all, DATAINDEX, my_id):
|
||||
events = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "calendar_event",
|
||||
"contact_ids": str(my_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
"sort_by": "timestamp",
|
||||
"sort_order": "asc",
|
||||
})
|
||||
meetings = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "meeting",
|
||||
"contact_ids": str(my_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
})
|
||||
return (events, meetings,)
|
||||
|
||||
@app.cell
|
||||
def correlate(events, meetings, pl):
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
def _parse_dt(s):
|
||||
if not s:
|
||||
return None
|
||||
return datetime.fromisoformat(s.replace("Z", "+00:00"))
|
||||
|
||||
# Index meetings by start_time for matching
|
||||
_meeting_by_time = {}
|
||||
for _m in meetings:
|
||||
_start = _parse_dt(_m.get("start_time"))
|
||||
if _start:
|
||||
_meeting_by_time[_start] = _m
|
||||
|
||||
_rows = []
|
||||
for _ev in events:
|
||||
_ev_start = _parse_dt(_ev.get("start_time"))
|
||||
_ev_end = _parse_dt(_ev.get("end_time"))
|
||||
if not _ev_start:
|
||||
continue
|
||||
|
||||
# Find meeting within 15-min window of calendar event start
|
||||
_matched = None
|
||||
for _m_start, _m in _meeting_by_time.items():
|
||||
if abs((_m_start - _ev_start).total_seconds()) < 900:
|
||||
_matched = _m
|
||||
break
|
||||
|
||||
_rows.append({
|
||||
"date": _ev_start.strftime("%Y-%m-%d"),
|
||||
"time": _ev_start.strftime("%H:%M"),
|
||||
"event_title": _ev.get("title", "(untitled)"),
|
||||
"has_recording": _matched is not None,
|
||||
"meeting_title": _matched.get("title", "") if _matched else "",
|
||||
"attendee_count": len(_ev.get("attendees", [])),
|
||||
})
|
||||
|
||||
calendar_df = pl.DataFrame(_rows)
|
||||
return (calendar_df,)
|
||||
```
|
||||
|
||||
## Pattern 4: Full Interaction Timeline for a Person
|
||||
|
||||
Combine emails, meetings, and Zulip messages into a single chronological view.
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def fetch_all_interactions(fetch_all, DATAINDEX, target_id):
|
||||
all_entities = fetch_all(f"{DATAINDEX}/query", {
|
||||
"contact_ids": str(target_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
"sort_by": "timestamp",
|
||||
"sort_order": "desc",
|
||||
})
|
||||
return (all_entities,)
|
||||
|
||||
@app.cell
|
||||
def interaction_timeline(all_entities, target_name, pl):
|
||||
_rows = []
|
||||
for _e in all_entities:
|
||||
_etype = _e["entity_type"]
|
||||
_summary = ""
|
||||
if _etype == "email":
|
||||
_summary = _e.get("snippet") or _e.get("title") or ""
|
||||
elif _etype == "meeting":
|
||||
_summary = _e.get("summary") or _e.get("title") or ""
|
||||
elif _etype == "conversation_message":
|
||||
_summary = (_e.get("message") or "")[:120]
|
||||
elif _etype == "threaded_conversation":
|
||||
_summary = _e.get("title") or ""
|
||||
elif _etype == "calendar_event":
|
||||
_summary = _e.get("title") or ""
|
||||
else:
|
||||
_summary = _e.get("title") or _e["entity_type"]
|
||||
|
||||
_rows.append({
|
||||
"date": _e["timestamp"][:10],
|
||||
"type": _etype,
|
||||
"source": _e["connector_id"],
|
||||
"summary": _summary[:120],
|
||||
})
|
||||
|
||||
timeline_df = pl.DataFrame(_rows)
|
||||
return (timeline_df,)
|
||||
|
||||
@app.cell
|
||||
def show_timeline(timeline_df, target_name, mo):
|
||||
mo.md(f"## Interaction Timeline: {target_name} ({len(timeline_df)} events)")
|
||||
|
||||
@app.cell
|
||||
def display_timeline(timeline_df):
|
||||
timeline_df
|
||||
```
|
||||
|
||||
## Pattern 5: LLM Filtering with `lib.llm`
|
||||
|
||||
When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
|
||||
|
||||
**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies.
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.12"
|
||||
# dependencies = [
|
||||
# "marimo",
|
||||
# "httpx",
|
||||
# "polars",
|
||||
# "mirascope[openai]",
|
||||
# "pydantic",
|
||||
# "python-dotenv",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Setup cell — load `.env` and import `llm_call`
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def setup():
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(".env") # Makes LLM_API_KEY available to lib/llm.py
|
||||
|
||||
import asyncio
|
||||
import httpx
|
||||
import marimo as mo
|
||||
import polars as pl
|
||||
from pydantic import BaseModel, Field
|
||||
from lib.llm import llm_call
|
||||
client = httpx.Client(timeout=30)
|
||||
return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
|
||||
```
|
||||
|
||||
### Define a response model
|
||||
|
||||
Create a Pydantic model that describes the structured output you want from the LLM:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def models(BaseModel, Field):
|
||||
|
||||
class RelevanceScore(BaseModel):
|
||||
relevant: bool
|
||||
reason: str
|
||||
score: int # 0-10
|
||||
|
||||
return (RelevanceScore,)
|
||||
```
|
||||
|
||||
### Filter entities through the LLM
|
||||
|
||||
Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
|
||||
_topic = "Greyhaven"
|
||||
|
||||
async def _score(meeting):
|
||||
_text = meeting.get("summary") or meeting.get("title") or ""
|
||||
_result = await llm_call(
|
||||
prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
|
||||
response_model=RelevanceScore,
|
||||
system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
|
||||
)
|
||||
return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
|
||||
|
||||
scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
|
||||
relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
|
||||
|
||||
mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
|
||||
return (relevant_meetings,)
|
||||
```
|
||||
|
||||
### Tips for LLM filtering
|
||||
|
||||
- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
|
||||
- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
|
||||
- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
|
||||
- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
|
||||
|
||||
## Do / Don't — Quick Reference for LLM Agents
|
||||
|
||||
When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
|
||||
|
||||
### Do
|
||||
|
||||
- **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells.
|
||||
- **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`.
|
||||
- **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell.
|
||||
- **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell.
|
||||
- **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped.
|
||||
- **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below.
|
||||
- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter.
|
||||
- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`.
|
||||
- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo.
|
||||
- **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook.
|
||||
- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API.
|
||||
- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`.
|
||||
- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell.
|
||||
|
||||
### Don't
|
||||
|
||||
- **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error.
|
||||
- **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters.
|
||||
- **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name.
|
||||
- **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame.
|
||||
- **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions.
|
||||
- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters.
|
||||
- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)).
|
||||
|
||||
## Cell Output Previews
|
||||
|
||||
Every cell that fetches, transforms, or produces data **must display a preview** so the user can validate results at each step. The only exceptions are **utility cells** (config, setup, helpers) that only define constants or functions.
|
||||
|
||||
Think from the user's perspective: when they open the notebook in `marimo edit`, each cell should tell them something useful — a count, a sample, a summary. Silent cells that do work but show nothing are hard to debug and validate.
|
||||
|
||||
### What to show
|
||||
|
||||
| Cell type | What to preview |
|
||||
|-----------|----------------|
|
||||
| API fetch (list of items) | `mo.md(f"**Fetched {len(items)} meetings**")` |
|
||||
| DataFrame build | The DataFrame itself as last expression (renders as interactive table) |
|
||||
| Scalar result | `mo.md(f"**Contact:** {name} (id={contact_id})")` |
|
||||
| Search / filter | `mo.md(f"**{len(hits)} results** matching '{term}'")` |
|
||||
| Final output | Full DataFrame or `mo.md()` summary as last expression |
|
||||
|
||||
### Example: fetch cell with preview
|
||||
|
||||
**Bad** — cell runs silently, user sees nothing:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def fetch_meetings(fetch_all, DATAINDEX, my_id):
|
||||
meetings = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "meeting",
|
||||
"contact_ids": str(my_id),
|
||||
})
|
||||
return (meetings,)
|
||||
```
|
||||
|
||||
**Good** — cell shows a count so the user knows it worked:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def fetch_meetings(fetch_all, DATAINDEX, my_id, mo):
|
||||
meetings = fetch_all(f"{DATAINDEX}/query", {
|
||||
"entity_types": "meeting",
|
||||
"contact_ids": str(my_id),
|
||||
})
|
||||
mo.md(f"**Fetched {len(meetings)} meetings**")
|
||||
return (meetings,)
|
||||
```
|
||||
|
||||
### Example: transform cell with table preview
|
||||
|
||||
**Bad** — builds DataFrame but doesn't display it:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def build_table(meetings, pl):
|
||||
_rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
|
||||
meeting_df = pl.DataFrame(_rows)
|
||||
return (meeting_df,)
|
||||
```
|
||||
|
||||
**Good** — the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def build_table(meetings, pl, mo):
|
||||
_rows = [{"date": _m["timestamp"][:10], "title": _m.get("title", "")} for _m in meetings]
|
||||
meeting_df = pl.DataFrame(_rows).sort("date")
|
||||
mo.md(f"### Meetings ({len(meeting_df)} results)")
|
||||
return (meeting_df,)
|
||||
|
||||
@app.cell
|
||||
def show_meeting_table(meeting_df):
|
||||
meeting_df # Renders as interactive sortable table
|
||||
```
|
||||
|
||||
### Separate display cells for DataFrames
|
||||
|
||||
When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
|
||||
|
||||
```python
|
||||
# Cell 1: build and return the DataFrame, show a count
|
||||
@app.cell
|
||||
def build_sentiment_table(analyzed_meetings, pl, mo):
|
||||
_rows = [...]
|
||||
sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
|
||||
mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
|
||||
return (sentiment_df,)
|
||||
|
||||
# Cell 2: standalone display — just the DataFrame, nothing else
|
||||
@app.cell
|
||||
def show_sentiment_table(sentiment_df):
|
||||
sentiment_df
|
||||
```
|
||||
|
||||
This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
|
||||
|
||||
### Utility cells (no preview needed)
|
||||
|
||||
Config, setup, and helper cells that only define constants or functions don't need previews:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def config():
|
||||
BASE = "http://localhost:42000"
|
||||
CONTACTDB = f"{BASE}/contactdb-api"
|
||||
DATAINDEX = f"{BASE}/dataindex/api/v1"
|
||||
return CONTACTDB, DATAINDEX
|
||||
|
||||
@app.cell
|
||||
def helpers(client):
|
||||
def fetch_all(url, params):
|
||||
...
|
||||
return (fetch_all,)
|
||||
```
|
||||
|
||||
## Tips
|
||||
|
||||
- Use `marimo edit` during development to see cell outputs interactively
|
||||
- Make raw API responses the last expression in a cell to inspect their structure
|
||||
- Use `polars` over `pandas` for better performance and type safety
|
||||
- Set `timeout=30` on httpx clients — some queries over large date ranges are slow
|
||||
- Name cells descriptively — function names appear in the marimo sidebar
|
||||
364
.agents/skills/project-history/SKILL.md
Normal file
364
.agents/skills/project-history/SKILL.md
Normal file
@@ -0,0 +1,364 @@
|
||||
---
|
||||
name: project-history
|
||||
description: Build initial historical timeline for a project. Queries all datasources and creates week-by-week analysis files up to a sync date. Requires project-init to have been run first (datasources.md must exist).
|
||||
disable-model-invocation: true
|
||||
argument-hint: [project-name] [date-from] [date-to]
|
||||
---
|
||||
|
||||
# Build Project History
|
||||
|
||||
**When to use:** After `/project-init` has been run and the user has reviewed `datasources.md`. This skill gathers historical data and builds the week-by-week timeline.
|
||||
|
||||
**Precondition:** `projects/$0/datasources.md` must exist. If it doesn't, run `/project-init $0` first.
|
||||
|
||||
## Step 1: Read Datasources
|
||||
|
||||
Read `projects/$0/datasources.md` to determine:
|
||||
- Which Zulip stream IDs and search terms to query
|
||||
- Which git repository to clone/pull
|
||||
- Which meeting room names to filter by
|
||||
- Which entity types to prioritize
|
||||
|
||||
## Step 2: Gather Historical Data
|
||||
|
||||
Query data for the period `$1` to `$2`.
|
||||
|
||||
### A. Query Zulip
|
||||
|
||||
For each PRIMARY stream in datasources.md:
|
||||
|
||||
```python
|
||||
# Paginate through all threaded conversations
|
||||
GET /api/v1/query
|
||||
entity_types=threaded_conversation
|
||||
connector_ids=zulip
|
||||
date_from=$1
|
||||
date_to=$2
|
||||
search={project-search-term}
|
||||
limit=100
|
||||
offset=0
|
||||
```
|
||||
|
||||
### B. Clone/Pull Git Repository
|
||||
|
||||
```bash
|
||||
# First time
|
||||
git clone --depth 200 {url} ./tmp/$0-clone
|
||||
# Or if already cloned
|
||||
cd ./tmp/$0-clone && git pull
|
||||
|
||||
# Extract commit history for the period
|
||||
git log --since="$1" --until="$2" --format="%H|%an|%ae|%ad|%s" --date=short
|
||||
git log --since="$1" --until="$2" --format="%an" | sort | uniq -c | sort -rn
|
||||
```
|
||||
|
||||
### C. Query Meeting Recordings
|
||||
|
||||
For each PRIMARY meeting room in datasources.md:
|
||||
|
||||
```python
|
||||
GET /api/v1/query
|
||||
entity_types=meeting
|
||||
date_from=$1
|
||||
date_to=$2
|
||||
room_name={room-name}
|
||||
limit=100
|
||||
```
|
||||
|
||||
Also do a semantic search for broader coverage:
|
||||
|
||||
```python
|
||||
POST /api/v1/search
|
||||
search_text={project-name}
|
||||
entity_types=["meeting"]
|
||||
date_from=$1
|
||||
date_to=$2
|
||||
limit=50
|
||||
```
|
||||
|
||||
## Step 3: Analyze by Week
|
||||
|
||||
For each week in the period, create a week file. Group the gathered data into calendar weeks (Monday-Sunday).
|
||||
|
||||
For each week, analyze:
|
||||
|
||||
1. **Key Decisions** — Strategic choices, architecture changes, vendor selections, security responses
|
||||
2. **Technical Work** — Features developed, bug fixes, infrastructure changes, merges/PRs
|
||||
3. **Team Activity** — Who was active, new people, departures, role changes
|
||||
4. **Blockers** — Issues, delays, dependencies
|
||||
|
||||
### Week file template
|
||||
|
||||
**File:** `projects/$0/timeline/{year-month}/week-{n}.md`
|
||||
|
||||
```markdown
|
||||
# $0 - Week {n}, {Month} {Year}
|
||||
|
||||
**Period:** {date-range}
|
||||
**Status:** [Active/Quiet/Blocked]
|
||||
|
||||
## Key Decisions
|
||||
|
||||
### Decision Title
|
||||
- **Decision:** What was decided
|
||||
- **Date:** {date}
|
||||
- **Who:** {decision-makers}
|
||||
- **Impact:** Why it matters
|
||||
- **Context:** Background
|
||||
|
||||
## Technical Work
|
||||
|
||||
- [{Date}] {Description} - {Who}
|
||||
|
||||
## Team Activity
|
||||
|
||||
### Core Contributors
|
||||
- **Name:** Focus area
|
||||
|
||||
### Occasional Contributors
|
||||
- Name: What they contributed
|
||||
|
||||
## GitHub Activity
|
||||
|
||||
**Commits:** {count}
|
||||
**Focus Areas:**
|
||||
- Area 1
|
||||
|
||||
**Key Commits:**
|
||||
- Hash: Description (Author)
|
||||
|
||||
## Zulip Activity
|
||||
|
||||
**Active Streams:**
|
||||
- Stream: Topics discussed
|
||||
|
||||
## Current Blockers
|
||||
|
||||
1. Blocker description
|
||||
|
||||
## Milestones Reached
|
||||
|
||||
If any milestones were completed this week, document with business objective:
|
||||
- **Milestone:** What was achieved
|
||||
- **Business Objective:** WHY this matters (search for this in discussions, PRs, meetings)
|
||||
- **Impact:** Quantifiable results if available
|
||||
|
||||
## Next Week Focus
|
||||
|
||||
- Priority 1
|
||||
|
||||
## Notes
|
||||
|
||||
- Context and observations
|
||||
- Always try to capture the WHY behind decisions and milestones
|
||||
```
|
||||
|
||||
### Categorization principles
|
||||
|
||||
**Key Decisions:**
|
||||
- Technology migrations
|
||||
- Architecture changes
|
||||
- Vendor switches
|
||||
- Security incidents
|
||||
- Strategic pivots
|
||||
|
||||
**Technical Work:**
|
||||
- Feature implementations
|
||||
- Bug fixes
|
||||
- Infrastructure changes
|
||||
- Refactoring
|
||||
|
||||
**Skip Unless Meaningful:**
|
||||
- Routine check-ins
|
||||
- Minor documentation updates
|
||||
- Social chat
|
||||
|
||||
### Contributor types
|
||||
|
||||
**Core Contributors:** Regular commits (multiple per week), active in technical discussions, making architectural decisions, reviewing PRs.
|
||||
|
||||
**Occasional Contributors:** Sporadic commits, topic-specific involvement, testing/QA, feedback only.
|
||||
|
||||
## Step 4: Create/Update Timeline Index
|
||||
|
||||
**File:** `projects/$0/timeline/index.md`
|
||||
|
||||
```markdown
|
||||
# $0 Timeline Index
|
||||
|
||||
## {Year}
|
||||
|
||||
### {Quarter}
|
||||
- [Month Week 1](./{year}-{month}/week-1.md)
|
||||
- [Month Week 2](./{year}-{month}/week-2.md)
|
||||
|
||||
## Key Milestones
|
||||
|
||||
| Date | Milestone | Business Objective | Status |
|
||||
|------|-----------|-------------------|--------|
|
||||
| Mar 2025 | SQLite → PostgreSQL migration | Improve query performance (107ms→27ms) and enable concurrent access for scaling | Complete |
|
||||
| Jul 2025 | Chakra UI 3 migration | Modernize UI component library and improve accessibility | Complete |
|
||||
|
||||
## Summary by Quarter
|
||||
|
||||
### Q{X} {Year}
|
||||
- **Milestone 1:** What happened + Business objective
|
||||
- **Milestone 2:** What happened + Business objective
|
||||
```
|
||||
|
||||
## Step 5: Create Project Dashboard (project.md)
|
||||
|
||||
**File:** `projects/$0/project.md`
|
||||
|
||||
Create the **living document** — the entry point showing current status:
|
||||
|
||||
```markdown
|
||||
# $0 Project
|
||||
|
||||
**One-liner:** [Brief description]
|
||||
**Status:** [Active/On Hold/Deprecated]
|
||||
**Last Updated:** [Date]
|
||||
|
||||
---
|
||||
|
||||
## This Week's Focus
|
||||
|
||||
### Primary Objective
|
||||
[What the team is working on right now - from the most recent week]
|
||||
|
||||
### Active Work
|
||||
- [From recent commits and discussions]
|
||||
|
||||
### Blockers
|
||||
- [Any current blockers]
|
||||
|
||||
---
|
||||
|
||||
## Last Week's Focus
|
||||
|
||||
### Delivered
|
||||
- ✅ [What was completed]
|
||||
|
||||
### Decisions Made
|
||||
- [Key decisions from last week]
|
||||
|
||||
---
|
||||
|
||||
## Team
|
||||
|
||||
### Core Contributors (Active)
|
||||
| Name | Focus | Availability |
|
||||
|------|-------|--------------|
|
||||
| [From git analysis] | [Area] | Full-time/Part-time |
|
||||
|
||||
### Occasional Contributors
|
||||
- [Name] - [Role]
|
||||
|
||||
---
|
||||
|
||||
## Milestones
|
||||
|
||||
### In Progress 🔄
|
||||
| Milestone | Target | Business Objective |
|
||||
|-----------|--------|-------------------|
|
||||
| [Active milestones from the data] | [Date] | [WHY this matters] |
|
||||
|
||||
### Recently Completed ✅
|
||||
| Milestone | Date | Business Objective |
|
||||
|-----------|------|-------------------|
|
||||
| [Recently completed] | [Date] | [WHY this mattered] |
|
||||
|
||||
### Lost in Sight / Paused ⏸️
|
||||
| Milestone | Status | Reason |
|
||||
|-----------|--------|--------|
|
||||
| [If any] | Paused | [Why] |
|
||||
|
||||
---
|
||||
|
||||
## Recent Decisions
|
||||
|
||||
### Week [N] (Current)
|
||||
- **[Decision]** - [Context from data]
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [📊 Timeline](./timeline/index.md) - Week-by-week history
|
||||
- [📋 Background](./background.md) - Project architecture
|
||||
- [🔌 Data Sources](./datasources.md) - How to gather information
|
||||
|
||||
---
|
||||
|
||||
*This is a living document. It reflects the current state and changes frequently.*
|
||||
```
|
||||
|
||||
**Fill in from the analyzed data:**
|
||||
- Team members from git contributors
|
||||
- Current focus from the most recent week's activity
|
||||
- Milestones from major features/deployments found in the data
|
||||
- Recent decisions from meeting transcripts and Zulip discussions
|
||||
|
||||
## Step 6: Update Sync State
|
||||
|
||||
Update `projects/$0/sync-state.md`:
|
||||
|
||||
```markdown
|
||||
# Sync State
|
||||
|
||||
status: history_complete
|
||||
created_at: {original date}
|
||||
last_sync_date: $2
|
||||
initial_history_from: $1
|
||||
initial_history_to: $2
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Security Incident
|
||||
```markdown
|
||||
### Security Incident: {CVE-ID}
|
||||
- **Discovered:** {date}
|
||||
- **Severity:** CRITICAL/HIGH/MEDIUM
|
||||
- **Who:** {discoverers}
|
||||
- **Impact:** {description}
|
||||
- **Actions:**
|
||||
1. Immediate fix
|
||||
2. Secrets rotated
|
||||
3. Monitoring added
|
||||
```
|
||||
|
||||
### Technology Migration
|
||||
```markdown
|
||||
### Migration: {Old} -> {New}
|
||||
- **Decision:** {date}
|
||||
- **Who:** {decision-makers}
|
||||
- **Timeline:** {duration}
|
||||
- **Rationale:** {why} ← Always include the business objective
|
||||
- **Status:** Complete/In Progress/Planned
|
||||
```
|
||||
|
||||
**Important:** When documenting any milestone or decision, always search for and include the WHY:
|
||||
- Performance improvements (quantify if possible: "reduced from X to Y")
|
||||
- Business capabilities enabled ("allows concurrent access for scaling")
|
||||
- User experience improvements ("improves accessibility")
|
||||
- Risk mitigation ("addresses security vulnerability")
|
||||
- Cost reduction ("eliminates cloud dependency")
|
||||
|
||||
Look for this context in: meeting recordings, Zulip planning threads, PR descriptions, release notes.
|
||||
|
||||
### Team Change
|
||||
```markdown
|
||||
### Team: {Name} {Joined/Left/Role Change}
|
||||
- **Date:** {date}
|
||||
- **From:** {old role} (if applicable)
|
||||
- **To:** {new role}
|
||||
- **Impact:** {on project}
|
||||
```
|
||||
|
||||
## Key Rules
|
||||
|
||||
- **Link to sources**: Always reference commit hashes, PR numbers, Zulip topic names, meeting dates
|
||||
- **Be explicit about exclusions**: Document what streams/sources you're NOT analyzing and why
|
||||
- **Write once**: Week files are historical records — don't modify them after creation
|
||||
- **Paginate all queries**: Result sets can be large, always loop through all pages
|
||||
264
.agents/skills/project-init/SKILL.md
Normal file
264
.agents/skills/project-init/SKILL.md
Normal file
@@ -0,0 +1,264 @@
|
||||
---
|
||||
name: project-init
|
||||
description: Initialize a new project analysis. Creates directory structure, discovers relevant data sources (Zulip streams, git repos, meeting rooms), and writes datasources.md, background.md skeleton, and sync-state.md.
|
||||
disable-model-invocation: true
|
||||
argument-hint: [project-name]
|
||||
---
|
||||
|
||||
# Initialize Project Analysis
|
||||
|
||||
**When to use:** Starting analysis of a new project. This skill sets up the project structure and discovers data sources. It does NOT gather historical data — use `/project-history` for that after reviewing the datasources.
|
||||
|
||||
## Step 1: Create Project Structure
|
||||
|
||||
```bash
|
||||
mkdir -p projects/$0/timeline
|
||||
```
|
||||
|
||||
## Step 2: Discover and Document Data Sources
|
||||
|
||||
Investigate what data sources exist for this project. Use the [connectors skill](../connectors/SKILL.md) and [company skill](../company/SKILL.md) for reference.
|
||||
|
||||
### Discovery process
|
||||
|
||||
1. **Zulip streams**: Search DataIndex for `threaded_conversation` entities matching the project name. Note which stream IDs appear. Cross-reference with the company skill's Zulip channel list to identify primary vs. secondary streams.
|
||||
2. **Git repositories**: Ask the user for the repository URL, or search Gitea/GitHub if accessible.
|
||||
3. **Meeting rooms**: Search DataIndex for `meeting` entities matching the project name. Note which `room_name` values appear — these are the relevant meeting rooms.
|
||||
4. **Search terms**: Identify the project name, key technologies, and domain-specific terms that surface relevant data.
|
||||
5. **Entity type priority**: Determine which entity types are most relevant (typically `threaded_conversation`, `meeting`, and possibly `email`).
|
||||
|
||||
### Write datasources.md
|
||||
|
||||
**File:** `projects/$0/datasources.md`
|
||||
|
||||
```markdown
|
||||
# $0 - Data Sources
|
||||
|
||||
## Zulip Streams
|
||||
|
||||
### PRIMARY Streams (Analyze All)
|
||||
| Stream ID | Name | Topics | Priority | What to Look For |
|
||||
|-----------|------|--------|----------|------------------|
|
||||
| XXX | stream-name | N topics | CRITICAL | Development discussions |
|
||||
|
||||
### SECONDARY Streams (Selective)
|
||||
| Stream ID | Name | Topics to Analyze | Context |
|
||||
|-----------|------|-------------------|---------|
|
||||
| YYY | integration-stream | specific-topic | Integration work |
|
||||
|
||||
### EXCLUDE
|
||||
- stream-id-1: reason
|
||||
- stream-id-2: reason
|
||||
|
||||
## Git Repository
|
||||
|
||||
**URL:** https://...
|
||||
|
||||
**Commands:**
|
||||
```
|
||||
git clone {url} ./tmp/$0-clone
|
||||
cd ./tmp/$0-clone
|
||||
git log --format="%H|%an|%ae|%ad|%s" --date=short > commits.csv
|
||||
git log --format="%an|%ae" | sort | uniq -c | sort -rn
|
||||
```
|
||||
|
||||
## Meeting Rooms
|
||||
|
||||
### PRIMARY
|
||||
- room-name: Project-specific discussions
|
||||
|
||||
### SECONDARY (Context Only)
|
||||
- allhands: General updates
|
||||
|
||||
### EXCLUDE
|
||||
- personal-rooms: Other projects
|
||||
|
||||
## Search Terms
|
||||
|
||||
### Primary
|
||||
- project-name
|
||||
- key-technology-1
|
||||
|
||||
### Technical
|
||||
- architecture-term-1
|
||||
|
||||
## Entity Types Priority
|
||||
1. threaded_conversation (Zulip)
|
||||
2. meeting (recordings)
|
||||
3. [Exclude: calendar, email, document if not relevant]
|
||||
```
|
||||
|
||||
## Step 3: Create Project Dashboard (Living Document)
|
||||
|
||||
**File:** `projects/$0/project.md`
|
||||
|
||||
This is the **entry point** — the living document showing current status.
|
||||
|
||||
```markdown
|
||||
# $0 Project
|
||||
|
||||
**One-liner:** [Brief description]
|
||||
**Status:** [Active/On Hold/Deprecated]
|
||||
**Repository:** URL
|
||||
**Last Updated:** [Date]
|
||||
|
||||
---
|
||||
|
||||
## This Week's Focus
|
||||
|
||||
### Primary Objective
|
||||
[What the team is working on right now]
|
||||
|
||||
### Active Work
|
||||
- [Current task 1]
|
||||
- [Current task 2]
|
||||
|
||||
### Blockers
|
||||
- [Any blockers]
|
||||
|
||||
---
|
||||
|
||||
## Last Week's Focus
|
||||
|
||||
### Delivered
|
||||
- ✅ [What was completed]
|
||||
|
||||
### Decisions Made
|
||||
- [Key decisions from last week]
|
||||
|
||||
---
|
||||
|
||||
## Team
|
||||
|
||||
### Core Contributors (Active)
|
||||
| Name | Focus | Availability |
|
||||
|------|-------|--------------|
|
||||
| [Name] | [Area] | Full-time/Part-time |
|
||||
|
||||
### Occasional Contributors
|
||||
- [Name] - [Role]
|
||||
|
||||
---
|
||||
|
||||
## Milestones
|
||||
|
||||
### In Progress 🔄
|
||||
| Milestone | Target | Business Objective |
|
||||
|-----------|--------|-------------------|
|
||||
| [Name] | [Date] | [WHY this matters] |
|
||||
|
||||
### Recently Completed ✅
|
||||
| Milestone | Date | Business Objective |
|
||||
|-----------|------|-------------------|
|
||||
| [Name] | [Date] | [WHY this mattered] |
|
||||
|
||||
### Lost in Sight / Paused ⏸️
|
||||
| Milestone | Status | Reason |
|
||||
|-----------|--------|--------|
|
||||
| [Name] | Paused | [Why paused] |
|
||||
|
||||
---
|
||||
|
||||
## Recent Decisions
|
||||
|
||||
### Week [N] (Current)
|
||||
- **[Decision]** - [Context]
|
||||
|
||||
### Week [N-1]
|
||||
- **[Decision]** - [Context]
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [📊 Timeline](./timeline/index.md) - Week-by-week history
|
||||
- [📋 Background](./background.md) - Project architecture and details
|
||||
- [🔌 Data Sources](./datasources.md) - How to gather information
|
||||
- [⚙️ Sync State](./sync-state.md) - Last sync information
|
||||
|
||||
---
|
||||
|
||||
*This is a living document. It reflects the current state and changes frequently.*
|
||||
```
|
||||
|
||||
## Step 4: Create Background Skeleton
|
||||
|
||||
**File:** `projects/$0/background.md`
|
||||
|
||||
Static/architecture information that rarely changes.
|
||||
|
||||
```markdown
|
||||
# $0 - Background
|
||||
|
||||
**Type:** [Web app/Mobile app/Library/Service]
|
||||
**Repository:** URL
|
||||
|
||||
## What is $0?
|
||||
|
||||
[Brief description of what the project does]
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
- Component 1 - Purpose
|
||||
- Component 2 - Purpose
|
||||
|
||||
### Technology Stack
|
||||
- Technology 1 - Usage
|
||||
- Technology 2 - Usage
|
||||
|
||||
## Data Sources
|
||||
|
||||
See: [datasources.md](./datasources.md)
|
||||
|
||||
## Timeline Structure
|
||||
|
||||
Weekly timeline files are organized in `timeline/` directory.
|
||||
|
||||
## How This Project Is Updated
|
||||
|
||||
1. Gather Data: Query Zulip, Git, meetings
|
||||
2. Update Timeline: Create week-by-week entries
|
||||
3. Update Project Dashboard: Refresh [project.md](./project.md)
|
||||
|
||||
For current status, see: [project.md](./project.md)
|
||||
```
|
||||
|
||||
## Step 4: Create Timeline Index
|
||||
|
||||
**File:** `projects/$0/timeline/index.md`
|
||||
|
||||
```markdown
|
||||
# $0 Timeline Index
|
||||
|
||||
## Key Milestones
|
||||
|
||||
| Date | Milestone | Status |
|
||||
|------|-----------|--------|
|
||||
| [To be filled by project-history] | | |
|
||||
|
||||
## Summary by Quarter
|
||||
|
||||
[To be filled by project-history]
|
||||
```
|
||||
|
||||
## Step 5: Initialize Sync State
|
||||
|
||||
**File:** `projects/$0/sync-state.md`
|
||||
|
||||
```markdown
|
||||
# Sync State
|
||||
|
||||
status: initialized
|
||||
created_at: [today's date]
|
||||
last_sync_date: null
|
||||
initial_history_from: null
|
||||
initial_history_to: null
|
||||
```
|
||||
|
||||
## Done
|
||||
|
||||
After this skill completes, the user should:
|
||||
1. **Review `datasources.md`** — confirm the streams, repos, and meeting rooms are correct
|
||||
2. **Edit `background.md`** — fill in any known project details
|
||||
3. **Run `/project-history $0 [date-from] [date-to]`** — to build the initial historical timeline
|
||||
344
.agents/skills/project-sync/SKILL.md
Normal file
344
.agents/skills/project-sync/SKILL.md
Normal file
@@ -0,0 +1,344 @@
|
||||
---
|
||||
name: project-sync
|
||||
description: Sync a project timeline using subagents for parallelism. Splits work by week and datasource to stay within context limits. Handles both first-time and incremental syncs.
|
||||
disable-model-invocation: true
|
||||
argument-hint: [project-name]
|
||||
---
|
||||
|
||||
# Project Sync
|
||||
|
||||
**When to use:** Keep a project timeline up to date. Works whether the project has been synced before or not.
|
||||
|
||||
**Precondition:** `projects/$0/datasources.md` must exist. If it doesn't, run `/project-init $0` first.
|
||||
|
||||
## Architecture: Coordinator + Subagents
|
||||
|
||||
This skill is designed for **subagent execution** to stay within context limits. The main agent acts as a **coordinator** that delegates data-intensive work to subagents.
|
||||
|
||||
```
|
||||
Coordinator
|
||||
├── Phase 1: Gather (parallel subagents, one per datasource)
|
||||
│ ├── Subagent: Zulip → writes tmp/$0-sync/zulip.md
|
||||
│ ├── Subagent: Git → writes tmp/$0-sync/git.md
|
||||
│ └── Subagent: Meetings → writes tmp/$0-sync/meetings.md
|
||||
│
|
||||
├── Phase 2: Synthesize (parallel subagents, one per week)
|
||||
│ ├── Subagent: Week 1 → writes timeline/{year-month}/week-{n}.md
|
||||
│ ├── Subagent: Week 2 → writes timeline/{year-month}/week-{n}.md
|
||||
│ └── ...
|
||||
│
|
||||
└── Phase 3: Finalize (coordinator directly)
|
||||
├── timeline/index.md (add links to new weeks)
|
||||
├── project.md (update living document)
|
||||
└── sync-state.md (update sync status)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Coordinator Steps
|
||||
|
||||
### Step 1: Determine Sync Range
|
||||
|
||||
Check whether `projects/$0/sync-state.md` exists.
|
||||
|
||||
**Case A — First sync (no sync-state.md):**
|
||||
Default range is **last 12 months through today**. If the user provided explicit dates as extra arguments (`$1`, `$2`), use those instead.
|
||||
|
||||
**Case B — Incremental sync (sync-state.md exists):**
|
||||
Read `last_sync_date` from `projects/$0/sync-state.md`. Range is `last_sync_date` to today.
|
||||
|
||||
### Step 2: Read Datasources
|
||||
|
||||
Read `projects/$0/datasources.md` to determine:
|
||||
- Zulip stream IDs and search terms
|
||||
- Git repository URL
|
||||
- Meeting room names
|
||||
- Entity types to prioritize
|
||||
|
||||
### Step 3: Prepare Scratch Directory
|
||||
|
||||
```bash
|
||||
mkdir -p tmp/$0-sync
|
||||
```
|
||||
|
||||
This directory holds intermediate outputs from Phase 1 subagents. It is ephemeral — delete it after the sync completes.
|
||||
|
||||
### Step 4: Compute Week Boundaries
|
||||
|
||||
Split the sync range into ISO calendar weeks (Monday–Sunday). Produce a list of `(week_number, week_start, week_end, year_month)` tuples. This list drives Phase 2.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Gather Data (parallel subagents)
|
||||
|
||||
Launch **one subagent per datasource**, all in parallel. Each subagent covers the **full sync range** and writes its output to a scratch file. The output must be organized by week so Phase 2 subagents can consume it.
|
||||
|
||||
### Subagent: Zulip
|
||||
|
||||
**Input:** Sync range, PRIMARY stream IDs and search terms from datasources.md.
|
||||
|
||||
**Important:** `threaded_conversation` entities only contain the **last 50 messages** in a topic. To get complete message history for a week, you must query `conversation_message` entities.
|
||||
|
||||
**Task:** Two-step process for each PRIMARY stream:
|
||||
|
||||
**Step 1:** List all thread IDs in the stream using `id_prefix`:
|
||||
```
|
||||
GET /api/v1/query
|
||||
entity_types=threaded_conversation
|
||||
connector_ids=zulip
|
||||
id_prefix=zulip:stream:{stream_id}
|
||||
limit=100
|
||||
offset=0
|
||||
```
|
||||
|
||||
This returns all thread entities (e.g., `zulip:stream:155:topic_name`). Save these IDs.
|
||||
|
||||
**Step 2:** For each week in the sync range, query messages from each thread:
|
||||
```
|
||||
GET /api/v1/query
|
||||
entity_types=conversation_message
|
||||
connector_ids=zulip
|
||||
parent_id={thread_id} # e.g., zulip:stream:155:standalone
|
||||
date_from={week_start}
|
||||
date_to={week_end}
|
||||
limit=100
|
||||
offset=0
|
||||
```
|
||||
|
||||
Paginate through all messages for each thread/week combination.
|
||||
|
||||
**Output:** Write `tmp/$0-sync/zulip.md` with results grouped by week:
|
||||
|
||||
```markdown
|
||||
## Week {n} ({week_start} to {week_end})
|
||||
|
||||
### Stream: {stream_name}
|
||||
- **Topic:** {topic} ({date}, {message_count} messages, {participant_count} participants)
|
||||
{brief summary or key quote}
|
||||
```
|
||||
|
||||
### Subagent: Git
|
||||
|
||||
**Input:** Sync range, git repository URL from datasources.md.
|
||||
|
||||
**Task:**
|
||||
|
||||
**Important:** Git commands may fail due to gitconfig permission issues. Use a temporary HOME directory:
|
||||
|
||||
```bash
|
||||
# Set temporary HOME to avoid gitconfig permission issues
|
||||
export HOME=$(pwd)/.tmp-home
|
||||
mkdir -p ./tmp
|
||||
|
||||
# Clone if needed, pull if exists
|
||||
if [ -d ./tmp/$0-clone ]; then
|
||||
export HOME=$(pwd)/.tmp-home && cd ./tmp/$0-clone && git pull
|
||||
else
|
||||
export HOME=$(pwd)/.tmp-home && git clone --depth 500 {url} ./tmp/$0-clone
|
||||
cd ./tmp/$0-clone
|
||||
fi
|
||||
|
||||
# Get commits in the date range
|
||||
export HOME=$(pwd)/.tmp-home && git log --since="{range_start}" --until="{range_end}" --format="%H|%an|%ae|%ad|%s" --date=short
|
||||
|
||||
# Get contributor statistics
|
||||
export HOME=$(pwd)/.tmp-home && git log --since="{range_start}" --until="{range_end}" --format="%an" | sort | uniq -c | sort -rn
|
||||
```
|
||||
|
||||
**Output:** Write `tmp/$0-sync/git.md` with results grouped by week:
|
||||
|
||||
```markdown
|
||||
## Week {n} ({week_start} to {week_end})
|
||||
|
||||
**Commits:** {count}
|
||||
**Contributors:** {name} ({count}), {name} ({count})
|
||||
|
||||
### Key Commits
|
||||
- `{short_hash}` {subject} — {author} ({date})
|
||||
```
|
||||
|
||||
### Subagent: Meetings
|
||||
|
||||
**Input:** Sync range, meeting room names from datasources.md.
|
||||
|
||||
**Task:** For each PRIMARY room, query meetings and run semantic search:
|
||||
|
||||
```
|
||||
GET /api/v1/query
|
||||
entity_types=meeting
|
||||
date_from={range_start}
|
||||
date_to={range_end}
|
||||
room_name={room-name}
|
||||
limit=100
|
||||
|
||||
POST /api/v1/search
|
||||
search_text={project-name}
|
||||
entity_types=["meeting"]
|
||||
date_from={range_start}
|
||||
date_to={range_end}
|
||||
limit=50
|
||||
```
|
||||
|
||||
**Output:** Write `tmp/$0-sync/meetings.md` with results grouped by week:
|
||||
|
||||
```markdown
|
||||
## Week {n} ({week_start} to {week_end})
|
||||
|
||||
### Meeting: {title} ({date}, {room})
|
||||
**Participants:** {names}
|
||||
**Summary:** {brief summary}
|
||||
**Key points:**
|
||||
- {point}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Synthesize Week Files (parallel subagents)
|
||||
|
||||
After all Phase 1 subagents complete, launch **one subagent per week**, all in parallel. Each produces a single week file.
|
||||
|
||||
### Subagent: Week {n}
|
||||
|
||||
**Input:** The relevant `## Week {n}` sections extracted from each of:
|
||||
- `tmp/$0-sync/zulip.md`
|
||||
- `tmp/$0-sync/git.md`
|
||||
- `tmp/$0-sync/meetings.md`
|
||||
|
||||
Pass only the sections for this specific week — do NOT pass the full files.
|
||||
|
||||
**Task:** Merge and analyze the data from all three sources. Categorize into:
|
||||
|
||||
1. **Key Decisions** — Technology migrations, architecture changes, vendor switches, security incidents, strategic pivots
|
||||
2. **Technical Work** — Feature implementations, bug fixes, infrastructure changes
|
||||
3. **Team Activity** — Core vs. occasional contributors, role changes
|
||||
4. **Blockers** — Issues, delays, dependencies
|
||||
|
||||
**Milestones:** When documenting milestones, capture BOTH:
|
||||
- **WHAT** — The technical achievement (e.g., "PostgreSQL migration")
|
||||
- **WHY** — The business objective (e.g., "to improve query performance from 107ms to 27ms and enable concurrent access for scaling")
|
||||
|
||||
Search for business objectives in: meeting discussions about roadmap, Zulip threads about planning, PR descriptions, release notes, and any "why are we doing this" conversations.
|
||||
|
||||
**Skip unless meaningful:** Routine check-ins, minor documentation updates, social chat.
|
||||
|
||||
**Output:** Write `projects/$0/timeline/{year-month}/week-{n}.md` using the week file template from [project-history](../project-history/SKILL.md). Also return a **3-5 line summary** to the coordinator for use in Phase 3.
|
||||
|
||||
Create the month directory first if needed: `mkdir -p projects/$0/timeline/{year-month}`
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Finalize (coordinator directly)
|
||||
|
||||
The coordinator collects the summaries returned by all Phase 2 subagents. These summaries are small enough to fit in the coordinator's context.
|
||||
|
||||
### Step 5: Update Timeline Index
|
||||
|
||||
Add links to new week files in `projects/$0/timeline/index.md`. Append entries under the appropriate year/quarter sections. Update milestones if any were reached.
|
||||
|
||||
### Step 6: Update Project Dashboard (project.md)
|
||||
|
||||
**File:** `projects/$0/project.md`
|
||||
|
||||
This is the **living document** — update it with current status from the week summaries:
|
||||
|
||||
**Update these sections:**
|
||||
|
||||
1. **This Week's Focus** - What the team is actively working on now
|
||||
2. **Last Week's Focus** - What was completed in the most recent week
|
||||
3. **Team** - Current contributors and their focus areas
|
||||
4. **Milestones** - Update status and add new ones with business objectives
|
||||
5. **Recent Decisions** - Key decisions from the last 2-3 weeks
|
||||
|
||||
**Milestone Format:**
|
||||
```markdown
|
||||
### In Progress 🔄
|
||||
| Milestone | Target | Business Objective |
|
||||
|-----------|--------|-------------------|
|
||||
| Standalone deployment | Feb 2026 | Enable non-developers to self-host without complex setup |
|
||||
|
||||
### Recently Completed ✅
|
||||
| Milestone | Date | Business Objective |
|
||||
|-----------|------|-------------------|
|
||||
| PostgreSQL migration | Mar 2025 | Improve performance (107ms→27ms) and enable scaling |
|
||||
|
||||
### Lost in Sight / Paused ⏸️
|
||||
| Milestone | Status | Reason |
|
||||
|-----------|--------|--------|
|
||||
| Feature X | Paused | Resources reallocated to higher priority |
|
||||
```
|
||||
|
||||
**Note:** Milestones in this company change frequently — update status (in progress/done/paused) as needed.
|
||||
|
||||
### Step 7: Update Sync State
|
||||
|
||||
Create or update `projects/$0/sync-state.md`:
|
||||
|
||||
**First sync (Case A):**
|
||||
|
||||
```markdown
|
||||
# Sync State
|
||||
|
||||
status: synced
|
||||
created_at: {today's date}
|
||||
last_sync_date: {today's date}
|
||||
initial_history_from: {range_start}
|
||||
initial_history_to: {range_end}
|
||||
last_incremental_sync: {today's date}
|
||||
```
|
||||
|
||||
**Incremental sync (Case B):**
|
||||
|
||||
```markdown
|
||||
# Sync State
|
||||
|
||||
status: synced
|
||||
created_at: {original value}
|
||||
last_sync_date: {today's date}
|
||||
initial_history_from: {original value}
|
||||
initial_history_to: {original value}
|
||||
last_incremental_sync: {today's date}
|
||||
```
|
||||
|
||||
### Step 8: Cleanup
|
||||
|
||||
```bash
|
||||
rm -rf tmp/$0-sync
|
||||
```
|
||||
|
||||
### Step 9: Summary Report
|
||||
|
||||
Output a brief summary:
|
||||
|
||||
```markdown
|
||||
## Sync Summary: {Date}
|
||||
|
||||
### Period Covered
|
||||
{range_start} to {range_end}
|
||||
|
||||
### Key Changes
|
||||
1. Decision: {brief description}
|
||||
2. Feature: {what was built}
|
||||
3. Team: {who joined/left}
|
||||
|
||||
### Metrics
|
||||
- {n} new commits
|
||||
- {n} active contributors
|
||||
- {n} weeks analyzed
|
||||
- {n} new Zulip threads
|
||||
- {n} meetings recorded
|
||||
|
||||
### Current Status
|
||||
[Status description]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Rules
|
||||
|
||||
- **Link to sources**: Always reference commit hashes, PR numbers, Zulip topic names, meeting dates
|
||||
- **Be explicit about exclusions**: Document what you're NOT analyzing and why
|
||||
- **Write once**: Week files are historical records — don't modify existing ones, only create new ones
|
||||
- **Paginate all queries**: Always loop through all pages of results
|
||||
- **Distinguish contributor types**: Core (regular activity) vs. occasional (sporadic)
|
||||
- **Subagent isolation**: Each subagent should be self-contained. Pass only the data it needs — never the full scratch files
|
||||
- **Fail gracefully**: If a datasource subagent fails (e.g., git clone errors, API down), the coordinator should continue with available data and note the gap in the summary
|
||||
105
.agents/skills/workflow/SKILL.md
Normal file
105
.agents/skills/workflow/SKILL.md
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
name: workflow
|
||||
description: Create a marimo notebook for data analysis. Use when the request involves analysis over time periods, large data volumes, or when the user asks to "create a workflow".
|
||||
disable-model-invocation: true
|
||||
argument-hint: [topic]
|
||||
---
|
||||
|
||||
# Workflow — Create a Marimo Notebook
|
||||
|
||||
## When to create a marimo notebook
|
||||
|
||||
Any request that involves **analysis over a period of time** (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a **large volume of data** — too much to process inline. In these cases, **always produce a marimo notebook** (a `.py` file following the patterns in the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md)).
|
||||
|
||||
Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".
|
||||
|
||||
If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
|
||||
|
||||
## Always create a new workflow
|
||||
|
||||
When the user requests a workflow, **always create a new notebook file**. Do **not** modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.
|
||||
|
||||
## File naming and location
|
||||
|
||||
All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
|
||||
|
||||
```
|
||||
workflows/<NNN>_<topic>_<scope>.py
|
||||
```
|
||||
|
||||
- `<NNN>` — zero-padded sequence number (`001`, `002`, …). Look at existing files in `workflows/` to determine the next number.
|
||||
- `<topic>` — what is being analyzed, in snake_case (e.g., `greyhaven_meetings`, `alice_emails`, `hiring_discussions`)
|
||||
- `<scope>` — time range or qualifier (e.g., `january`, `q1_2026`, `last_30d`, `all_time`)
|
||||
|
||||
**Examples:**
|
||||
|
||||
```
|
||||
workflows/001_greyhaven_meetings_january.py
|
||||
workflows/002_alice_emails_q1_2026.py
|
||||
workflows/003_hiring_discussions_last_30d.py
|
||||
workflows/004_team_interaction_timeline_all_time.py
|
||||
```
|
||||
|
||||
**Before creating a new workflow**, list existing files in `workflows/` to find the highest number and increment it.
|
||||
|
||||
## Plan before you implement
|
||||
|
||||
Before writing any notebook, **always propose a plan first** and get the user's approval. The plan should describe:
|
||||
|
||||
1. **Goal** — What question are we answering?
|
||||
2. **Data sources** — Which entity types and API endpoints will be used?
|
||||
3. **Algorithm / ETL steps** — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
|
||||
4. **Output format** — Table columns, charts, or summary statistics the user will see.
|
||||
|
||||
Only proceed to implementation after the user confirms the plan.
|
||||
|
||||
## Validate before delivering
|
||||
|
||||
After writing or editing a notebook, **always run `uvx marimo check`** to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):
|
||||
|
||||
```bash
|
||||
uvx marimo check workflows/NNN_topic_scope.py
|
||||
```
|
||||
|
||||
A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.
|
||||
|
||||
## Steps
|
||||
|
||||
1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
|
||||
2. **Find data** — Use DataIndex `GET /query` (exhaustive, paginated) or `POST /search` (semantic, ranked) with `contact_ids`, `entity_types`, `date_from`/`date_to`, `connector_ids` filters.
|
||||
3. **Analyze** — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md) for detailed patterns).
|
||||
|
||||
## Quick Example (Python)
|
||||
|
||||
> "Find all emails involving Alice since January"
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
CONTACTDB = "http://localhost:42000/contactdb-api"
|
||||
DATAINDEX = "http://localhost:42000/dataindex/api/v1"
|
||||
client = httpx.Client(timeout=30)
|
||||
|
||||
# 1. Resolve "Alice" to a contact_id
|
||||
resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
|
||||
alice_id = resp.json()["contacts"][0]["id"] # e.g. 42
|
||||
|
||||
# 2. Fetch all emails involving Alice (with pagination)
|
||||
emails = []
|
||||
offset = 0
|
||||
while True:
|
||||
resp = client.get(f"{DATAINDEX}/query", params={
|
||||
"entity_types": "email",
|
||||
"contact_ids": str(alice_id),
|
||||
"date_from": "2025-01-01T00:00:00Z",
|
||||
"limit": 50,
|
||||
"offset": offset,
|
||||
})
|
||||
data = resp.json()
|
||||
emails.extend(data["items"])
|
||||
if offset + 50 >= data["total"]:
|
||||
break
|
||||
offset += 50
|
||||
|
||||
print(f"Found {len(emails)} emails involving Alice")
|
||||
```
|
||||
Reference in New Issue
Block a user