fix: ignore user-created workflow and projects

feat: add deadline search on project sync/init
feat: investor meeting skill
2026-02-20 16:25:18 -06:00 · 2026-02-20 16:24:25 -06:00 · 2026-02-20 17:21:03 -05:00 · 2026-02-20 15:29:49 -06:00 · 2026-02-20 15:28:28 -06:00 · 2026-02-20 15:14:43 -06:00
21 changed files with 3095 additions and 204 deletions
--- a/.agents/skills/checkout/SKILL.md
+++ b/.agents/skills/checkout/SKILL.md
@@ -0,0 +1,52 @@
+---
+name: checkout
+description: Build a weekly checkout/review covering Sunday through today. Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured summary.
+disable-model-invocation: true
+---
+
+# Weekly Review Builder
+
+Build my weekly checkout covering Sunday through today.
+
+1. **Get my identity** with `contactdb_get_me` to obtain my contact_id
+2. **Determine date range**: Sunday to today (use `date -d "last sunday" +%Y-%m-%d`)
+3. **Gather activity in parallel**:
+   - **Dataindex**: Launch **one subagent per day** (Sunday through today). Each subagent should query `dataindex_query_entities` for that specific day with my contact_id, looking for meetings, calendar events, emails, documents. Return day-by-day summary.
+   - **Threaded Conversations**: Launch **one subagent per day** (Sunday through today). Each subagent should:
+     1. Query `dataindex_query_entities` for entity_type `threaded_conversation` for that specific day with my contact_id
+     2. For each conversation found, fetch all `conversation_message` entities using the conversation ID as parent_id filter
+     3. Return messages I participated in with context
+   - **Gitea**: Launch one subagent to run `~/bin/gitea-activity -s START -e END` and extract commits, PRs (opened/merged/approved), and repositories worked on
+4. **Query dataindex directly** for the full week as backup to ensure nothing is missed
+
+**Build the checkout with this structure:**
+
+```
+# Weekly Review: [Date Range]
+
+## Objectives
+- List 2-3 high-level goals for the week based on the main themes of work
+
+****Major Achievements****
+- Bullet points of concrete deliverables, grouped by theme
+- Focus on shipped features, solved problems, infrastructure built
+
+****Code Activity****
+- Stats line: X commits across Y repositories, Z PRs total (N merged, M open)
+- **New Repositories**: `[name](url)` - brief description
+- **Pull Requests Merged**: `[#N Title](url)` - one per line with descriptive title
+- **Pull Requests Opened (not merged)**: `[#N](url)` - include status if known (approved, draft, etc.)
+
+****Team Interactions****
+- **Meeting Type (Nx)**: Brief description of purpose/outcome
+  With: Key participants
+- **Notable conversations**: Date, participants, main subject discussed
+```
+
+**Rules:**
+- Use `****Title****` format for section headers (not ##)
+- All PRs and repositories must be markdown links `[name](url)`
+- List merged PRs first, then open/unmerged ones
+- Only include meaningful interactions (skip routine standups unless notable decisions made)
+- No "who am I" header, no summary section at the end
+- Focus on outcomes and business value, not just activity volume
--- a/.agents/skills/company/SKILL.md
+++ b/.agents/skills/company/SKILL.md
@@ -1,3 +1,9 @@
+---
+name: company
+description: Monadical company context. Use when you need to understand the organization structure, Zulip stream layout, communication tools, meeting/calendar relationships, or internal product names.
+user-invocable: false
+---
+
 # Company Context

 ## About Monadical
--- a/.agents/skills/connectors/SKILL.md
+++ b/.agents/skills/connectors/SKILL.md
@@ -1,3 +1,9 @@
+---
+name: connectors
+description: Reference for all data connectors and their entity type mappings. Use when determining which connector produces which entity types, understanding connector-specific fields, or choosing the right data source for a query.
+user-invocable: false
+---
+
 # Connectors and Data Sources

 Each connector ingests data from an external source into DataIndex. Connectors run periodic background syncs to keep data fresh.
--- a/.agents/skills/contactdb/SKILL.md
+++ b/.agents/skills/contactdb/SKILL.md
@@ -1,3 +1,9 @@
+---
+name: contactdb
+description: ContactDB REST API reference. Use when resolving people to contact_ids, searching contacts by name/email, or accessing relationships, notes, and platform identities.
+user-invocable: false
+---
+
 # ContactDB API Reference

 ContactDB is the people directory. It stores contacts, their platform identities, relationships, notes, and links. Every person across all data sources resolves to a single ContactDB `contact_id`.
--- a/.agents/skills/dataindex/SKILL.md
+++ b/.agents/skills/dataindex/SKILL.md
@@ -1,3 +1,9 @@
+---
+name: dataindex
+description: DataIndex REST API reference. Use when querying unified data (emails, meetings, calendar events, Zulip conversations, documents) via GET /query, POST /search, or GET /entities/{id}.
+user-invocable: false
+---
+
 # DataIndex API Reference

 DataIndex aggregates data from all connected sources (email, calendar, Zulip, meetings, documents) into a unified query interface. Every piece of data is an **entity** with a common base structure plus type-specific fields.
@@ -151,6 +157,7 @@ Use when you need **all** entities matching specific criteria. Supports paginati
 | `date_to`        | string        | ISO datetime upper bound                       |
 | `search`         | string?       | Text filter on content fields                  |
 | `parent_id`      | string?       | Filter by parent entity                        |
+| `id_prefix`      | string?       | Filter entities by ID prefix (e.g., `zulip:stream:155`) |
 | `thread_id`      | string?       | Filter emails by thread ID                     |
 | `room_name`      | string?       | Filter meetings by room name                   |
 | `limit`          | int           | Max results per page (default 50)              |
@@ -171,7 +178,7 @@ Use when you need **all** entities matching specific criteria. Supports paginati
 }
 ```

-**Pagination:** loop with offset increments until `offset >= total`. See [notebook-patterns.md] for a reusable helper.
+**Pagination:** loop with offset increments until `offset >= total`. See the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md) for a reusable helper.

 ### POST `/api/v1/search` — Semantic Search

@@ -214,5 +221,3 @@ Get sync status for all connectors (last sync time, entity count, health).
 | All documents                         | `document` + `hedgedoc`                  |
 | Chat messages mentioning someone      | `conversation_message` + `zulip`, with contact_id |
 | What was discussed about X?           | Use `POST /search` with `search_text`    |
-
-[notebook-patterns.md]: ./notebook-patterns.md
--- a/.agents/skills/find-investor-meetings/SKILL.md
+++ b/.agents/skills/find-investor-meetings/SKILL.md
@@ -0,0 +1,311 @@
+---
+name: find-investor-meetings
+description: Analyze reflector meeting transcripts to identify conversations with potential investors. Extracts investor identity, implied next steps, and key quotes. Generates a markdown report with findings.
+user-invocable: true
+argument-hint: [start-date]
+---
+
+# Find Investor Meetings
+
+Analyze reflector meeting transcripts to identify conversations with potential investors. For each investor meeting found, extract who the investor was, the implied next steps, and direct quotes.
+
+**When to use:** You need to review your fundraising history, track investor conversations, or identify which meetings were with potential investors.
+
+---
+
+## Prerequisites
+
+- DataIndex API access
+- Reflector connector enabled
+
+---
+
+## Workflow
+
+### Step 1: Get User Identity
+
+First, get the current user's contact_id so we can filter for meetings they participated in.
+
+```python
+contactdb_get_me()
+```
+
+Extract the `id` field from the response - this is your `contact_id`.
+
+### Step 2: Query Reflector Meetings
+
+Fetch all reflector meetings from the specified start date to today, filtered to only include meetings where you participated.
+
+```
+GET /dataindex/api/v1/query
+  ?entity_types=meeting
+  &connector_ids=reflector
+  &contact_ids={your_contact_id}
+  &date_from={start-date}
+  &date_to={today}
+  &limit=100
+  &sort_by=timestamp
+  &sort_order=asc
+```
+
+**Note:** The `contact_ids` filter ensures we only analyze meetings you actually participated in.
+
+### Step 3: Launch Parallel Subagents
+
+For each meeting found, launch one subagent to analyze the transcript. Subagents run in parallel for efficiency.
+
+**Subagent Task Template:**
+
+```
+Your task is to analyze a reflector meeting transcript to determine if it was with a potential investor.
+
+Meeting to analyze:
+- entity_id: reflector:{meeting_id}
+- title: {title}
+- date: {date}
+- room: {room_name}
+- participants: {participant_list}
+
+Fetch the full transcript using:
+```python
+dataindex_get_entity_by_id(
+    entity_id="reflector:{meeting_id}",
+    include_raw_data=true,
+    max_content_length=null
+)
+```
+
+**Analyze for investor meeting indicators:**
+
+Direct indicators:
+- Terms: investment, investor, fundraising, valuation, term sheet, due diligence, funding, equity, stake, capital, VC, venture capital, angel investor, seed round, Series A, pitch deck, runway, burn rate, traction, market size
+
+Indirect indicators (implied investment interest):
+- Questions about: scalability, revenue model, market opportunity, competitive advantage, team background, customer traction, unit economics, go-to-market strategy
+- Discussion of: financial projections, growth metrics, exit strategy, ownership percentage, board composition
+- Mentions of: portfolio companies, investment thesis, follow-on funding, pro-rata rights
+
+**Return JSON:**
+{
+  "is_investor_meeting": true/false,
+  "confidence": "high/medium/low",
+  "investor_name": "Name/Organization or null",
+  "investor_type": "VC/Angel/Strategic/Unknown or null",
+  "summary": "1-paragraph summary of who the investor was and their interest",
+  "implied_next_steps": "What was implied as the next action after the call",
+  "investor_quote": "Direct quote showing investor interest, concern, or feedback (use actual words from transcript)",
+  "meeting_url": "https://reflector.monadical.com/transcripts/{meeting_id}"
+}
+
+If not an investor meeting, return:
+{
+  "is_investor_meeting": false,
+  "confidence": "high",
+  "investor_name": null,
+  "investor_type": null,
+  "summary": "Brief explanation of what this meeting was about",
+  "implied_next_steps": null,
+  "investor_quote": null,
+  "meeting_url": "https://reflector.monadical.com/transcripts/{meeting_id}"
+}
+
+**Important:** 
+- Only mark as investor meeting if there are clear indicators (high confidence)
+- Internal discussions ABOUT fundraising preparation are NOT investor meetings
+- Max/founders reporting on investor activities to the team is NOT an investor meeting
+- The investor must be an external party, not an internal team member
+- Extract actual quotes from the transcript, don't paraphrase
+```
+
+### Step 4: Compile Results
+
+Collect all subagent responses and categorize:
+
+**Investor Meetings:** Meetings where `is_investor_meeting: true`
+**Non-Investor Meetings:** Meetings where `is_investor_meeting: false`
+**Inconclusive:** Meetings with `confidence: low` - flag for manual review
+
+### Step 5: Generate Report
+
+Write findings to `investor-meetings-analysis-{start-date}-to-{today}.md`
+
+**Report Structure:**
+
+```markdown
+# Investor Meeting Analysis
+
+**Analysis Period:** {start-date} to {today}  
+**Total Meetings Analyzed:** {count}  
+**Investor Meetings Found:** {count}
+
+---
+
+## Summary
+
+[Brief summary of findings - 2-3 sentences]
+
+### Key Insights
+
+[If investor meetings found:]
+- Most recent investor conversation: {date} with {investor_name}
+- Investor types: {breakdown by VC/Angel/Strategic}
+- Common themes: {recurring topics across meetings}
+- Next steps status: {summary of follow-up actions}
+
+[If no investor meetings found:]
+- No external investor conversations were captured in this date range
+- Consider: meetings may have occurred on other platforms (Zoom, Google Meet), in-person, or in reflector rooms not covered
+- {number} meetings were internal preparation for fundraising
+
+---
+
+## Investor Meetings
+
+### {Date}: {Investor Name} ({Investor Type})
+
+**Meeting:** [{Title}]({meeting_url})  
+**Date:** {date}  
+**Room:** {room_name}
+
+**Summary:**  
+{1-paragraph summary of who the investor was and what their interest/feedback was}
+
+**Implied Next Steps:**  
+{What was discussed as the next action after this call}
+
+**Key Quote:**  
+> "{investor_quote}"
+
+---
+
+[Repeat for each investor meeting found]
+
+---
+
+## Internal Fundraising Preparation
+
+The following meetings focused on preparing for investor outreach but were not actual investor conversations:
+
+| Date | Title | Activity |
+|------|-------|----------|
+| {date} | [{title}]({url}) | {brief description of preparation activity} |
+
+---
+
+## All Meetings Analyzed
+
+| Date | Title | Room | Investor Meeting? | Notes |
+|------|-------|------|-------------------|-------|
+| {date} | [{title}]({url}) | {room} | {Yes/No} | {brief note} |
+
+---
+
+*Analysis completed: {date}*  
+*Data Source: Reflector meeting transcripts via DataIndex API*
+```
+
+---
+
+## Key Rules
+
+1. **Only external investor meetings count** - Internal discussions about fundraising, pitch practice sessions, and team reports about investor activities are NOT investor meetings
+
+2. **High confidence threshold** - Only mark as investor meeting with clear indicators. "Mentions investor" in context of CRM development is NOT an investor meeting
+
+3. **Extract actual quotes** - Use the investor's exact words from the transcript, not paraphrases
+
+4. **Include context** - When investor meetings are NOT found, document the preparation activities that were happening (CRM building, demo prep, pitch practice)
+
+5. **Be specific about gaps** - If no investor meetings found, explain likely reasons (other platforms, in-person only, reflector not recording)
+
+---
+
+## Example Output
+
+When investor meetings are found:
+
+```markdown
+## Investor Meetings
+
+### 2025-12-15: Acme Ventures (VC)
+
+**Meeting:** [GreyHaven Platform Demo](https://reflector.monadical.com/transcripts/abc123)  
+**Date:** December 15, 2025  
+**Room:** demo-room
+
+**Summary:**  
+Acme Ventures, a Series A-focused venture capital firm, expressed strong interest in GreyHaven's data sovereignty platform. The partner was particularly impressed by the local-first architecture and enterprise security features. They asked detailed questions about go-to-market strategy and current traction.
+
+**Implied Next Steps:**  
+Send detailed financial projections and customer pipeline by end of week. Schedule follow-up call with their technical partner for architecture review.
+
+**Key Quote:**  
+> "This is exactly the kind of infrastructure play we're looking for. The timing is right with all the data privacy regulations coming online. Can you get us those projections by Friday?"
+
+---
+
+### 2025-11-28: Sarah Chen (Angel)
+
+**Meeting:** [Initial Introduction](https://reflector.monadical.com/transcripts/def456)  
+**Date:** November 28, 2025  
+**Room:** max-office
+
+**Summary:**  
+Sarah Chen, former CTO of DataCorp and active angel investor, met with Max for an introductory conversation. She's interested in the space and wanted to understand the technical differentiation from Palantir and other competitors.
+
+**Implied Next Steps:**  
+Send product demo video and technical architecture overview. She will intro us to two portfolio companies that might be pilot customers.
+
+**Key Quote:**  
+> "I've been looking for something in this space. Your approach to data sovereignty is novel. Send me the demo and let's talk again after the holidays."
+```
+
+When NO investor meetings are found:
+
+```markdown
+## Summary
+
+After analyzing 47 reflector meetings from December 1, 2025 to January 29, 2026, **no meetings with external investors were identified**. All meetings were internal team discussions.
+
+### Key Finding: Preparation for Investor Meetings
+
+While no actual investor meetings were captured, several meetings focused on preparing for investor outreach:
+
+1. **Max practicing his GreyHaven pitch** (Jan 6, 2026) - Internal all-hands where Max rehearsed his investor presentation
+2. **Demo preparation discussions** (Jan 26-29, 2026) - Multiple meetings about creating demo materials for investor presentations
+3. **CRM pipeline development** (Dec 17-19, 2025) - Building systems to track investor leads and fundraising progress
+
+### Why No Investor Meetings?
+
+Actual investor meetings may have occurred:
+- On other platforms (Zoom, Google Meet, Microsoft Teams)
+- In-person without reflector recording
+- In reflector rooms not captured in the query
+- Before the specified date range
+```
+
+---
+
+## Dependencies
+
+- [dataindex skill](../dataindex/SKILL.md) - Meeting queries and transcript retrieval
+- [contactdb skill](../contactdb/SKILL.md) - Participant name resolution
+
+---
+
+## Usage Examples
+
+**Default (last 90 days):**
+```
+/find-investor-meetings
+```
+
+**Specific start date:**
+```
+/find-investor-meetings 2025-12-01
+```
+
+**Full year analysis:**
+```
+/find-investor-meetings 2025-01-01
+```
--- a/.agents/skills/init-greyhaven/SKILL.md
+++ b/.agents/skills/init-greyhaven/SKILL.md
@@ -0,0 +1,248 @@
+---
+name: init-greyhaven
+description: Build comprehensive Greyhaven business documentation by analyzing Reflector meeting transcripts and Zulip threads from the last 4 months via DataIndex API, using parallel subagents to extract business context, positioning, product details, and open items from authoritative sources (Corey Gallon, Max, Jordan). Only Zulip threads and Reflector transcripts are used.
+user-invocable: true
+---
+
+# Initialize Greyhaven Business Context
+
+## What I do
+
+- Query DataIndex for all-hands meetings and Greyhaven-related Zulip threads from the last 4 months (only Zulip threads and Reflector transcripts)
+- Launch parallel subagents to analyze transcripts and threads for Greyhaven/data-sovereignty content
+- Synthesize findings into a structured business document covering vision, positioning, product, marketing, and open items
+- Prioritize statements from **Corey Gallon**, **Max**, and **Jordan (jordan@monadical.com)**
+
+## When to use me
+
+Use this when you need to create or update comprehensive company documentation for Greyhaven, extract business context from meetings and Zulip threads, or synthesize information across multiple data sources into a single reference document.
+
+## Prerequisites
+
+- DataIndex API access (meeting and threaded_conversation entities)
+- ContactDB for resolving participant names
+- Subagent capability for parallel analysis
+
+## Data Sources (Explicit Connector Requirements)
+
+This skill ONLY uses these two connectors - do not query any others:
+
+| Connector | Entity Type | Use For |
+|-----------|-------------|---------|
+| `reflector` | `meeting` | Meeting recordings with transcripts |
+| `zulip` | `threaded_conversation` | Zulip topic threads |
+
+**⚠️ NEVER use:** `ics_calendar`, `mbsync_email`, `hedgedoc`, `babelfish`, `browser_history`, or other connectors.
+
+## Workflow
+
+### Step 1: Query Reflector all-hands meetings
+
+**REQUIRED QUERY PATTERN:**
+```
+GET /dataindex/api/v1/query
+  ?entity_types=meeting
+  &connector_ids=reflector
+  &room_name=allhands
+  &date_from={4_months_ago}
+  &date_to={today}
+  &limit=100
+```
+
+**Entity type:** `meeting`
+**Connector:** `reflector` (MUST be specified)
+**Room filter:** `room_name=allhands` (MUST be specified - only all-hands meetings)
+**Date range:** Last 4 months (120 days)
+
+**Key fields to extract:**
+- `transcript` - Full meeting transcript text
+- `summary` - AI-generated summary  
+- `room_name` - Should be "allhands" (verified by filter)
+- `participants` - List of attendees
+- `start_time`, `end_time` - Meeting timestamps
+
+**⚠️ CRITICAL:** 
+- Only use `connector_ids=reflector`
+- MUST filter by `room_name=allhands` - this ensures we only get company all-hands meetings
+- Never use other meeting sources or other room types
+
+### Step 2: Fetch full meeting transcripts
+
+For each meeting, fetch the complete entity:
+
+```
+GET /dataindex/api/v1/entities/{meeting_id}
+  ?include_raw_data=false
+```
+
+If a transcript exceeds 200KB, save it to a temp file and pass the file path to subagents.
+
+### Step 3: Analyze meetings in parallel
+
+Launch one subagent per meeting to scan for Greyhaven/data-sovereignty mentions.
+
+**Subagent prompt template:**
+```
+Analyze this meeting transcript for mentions of "greyhaven", "data sovereignty",
+or related topics (data privacy, data ownership, data control, local data storage).
+
+Meeting: {meeting_id} | Date: {timestamp} | Title: {title}
+Transcript: {transcript_content}
+
+Return JSON:
+{
+  "is_relevant": true/false,
+  "quotes": [{"timestamp": "HH:MM:SS", "speaker": "Name", "quote": "exact text"}],
+  "explanation": "Why this is relevant"
+}
+```
+
+Launch all subagents simultaneously.
+
+### Step 4: Query Zulip threaded conversations
+
+**REQUIRED QUERY PATTERN:**
+```
+GET /dataindex/api/v1/query
+  ?entity_types=threaded_conversation
+  &connector_ids=zulip
+  &date_from={4_months_ago}
+  &date_to={today}
+  &limit=100
+```
+
+**Entity type:** `threaded_conversation` (NOT `conversation` or `conversation_message`)
+**Connector:** `zulip` (MUST be specified)
+**Date range:** Last 4 months (120 days)
+
+**Why `threaded_conversation`:**
+- Represents a **Zulip topic thread** (grouped messages under a topic)
+- Contains `recent_messages` with full thread content
+- Better for analysis than individual `conversation_message` entities
+
+**High-priority streams to search:**
+- `[greyhaven] marketing` (stream 212)
+- `[greyhaven] branding` (stream 206)
+- `[greyhaven] leads` (stream 208)
+- `InternalAI` (stream 193)
+- `engineering` - selfhostyour.tech topic
+- `bizdev + marketing`
+
+**⚠️ CRITICAL:** Only use `connector_ids=zulip` and `entity_types=threaded_conversation`. Never use other chat connectors like `babelfish`.
+
+### Step 5: Filter threads by relevance
+
+Launch a subagent to rank threads and return the top 20-30 most relevant, scored by Greyhaven/data-sovereignty content and message activity.
+
+### Step 6: Fetch and analyze top threads
+
+For each high-priority thread, fetch full content via `GET /dataindex/api/v1/entities/{thread_id}?include_raw_data=false`, then launch parallel subagents to extract:
+
+1. Substantive Greyhaven mentions
+2. Data sovereignty content
+3. Key decisions or debates
+4. Quotes from Max, Corey Gallon, or Jordan Wimmer
+
+### Step 7: Synthesize into categories
+
+Launch four specialized subagents in parallel, each focused on one domain:
+
+| Subagent | Extracts |
+|----------|----------|
+| Business Context & Vision | Origin, problem statement, target market/ICP, vision, fundraising status, value proposition |
+| Positioning & Marketing | Market positioning, go-to-market strategy, marketing channels, branding, investor comms |
+| Product & Technical | Architecture, features, projects built on Greyhaven, user stories, data sovereignty concepts |
+| Open Items & Risks | Unresolved decisions, action items, risks, areas of agreement/disagreement among stakeholders |
+
+All subagents should prioritize statements from Corey Gallon, Max, and Jordan.
+
+### Step 8: Compile documentation
+
+Create `Greyhaven_Company_Documentation.md` with this structure:
+
+```markdown
+# Greyhaven Company Documentation
+
+## Executive Summary
+## 1. Business Context & Origin
+## 2. Vision & Philosophy
+## 3. Market Positioning
+## 4. Product Architecture
+## 5. Product Portfolio
+## 6. Projects Built on Greyhaven
+## 7. User Stories & Use Cases
+## 8. Go-to-Market Strategy
+## 9. Branding & Identity
+## 10. Open Discussions & Debates
+## 11. Unresolved Items
+## 12. Risks & Challenges
+## 13. Key Stakeholder Perspectives
+## 14. Action Items & Next Steps
+## 15. Appendix
+```
+
+**Writing guidelines:**
+- Include direct quotes with attribution
+- Maintain neutral tone while capturing debates
+- Include timestamps where available
+- Cross-reference multiple sources for key claims
+
+### Step 9: Clean up
+
+Remove any temporary JSON or transcript files created during analysis.
+
+## Error handling
+
+- **Large transcripts (>200KB):** Save to file, pass path to subagent
+- **Missing transcript:** Skip meeting with a note
+- **Empty thread:** Mark as low relevance, skip
+- **API failures:** Retry with backoff, continue with available data
+
+## Query Patterns Summary
+
+**EXACT API CALLS TO MAKE:**
+
+### For Reflector all-hands meetings:
+```python
+dataindex_query_entities(
+    entity_types="meeting",
+    connector_ids="reflector",
+    room_name="allhands",              # MUST filter for all-hands only
+    date_from="2024-10-19T00:00:00Z",  # 4 months ago
+    date_to="2025-02-19T23:59:59Z",   # today
+    limit=100
+)
+```
+
+Then for each meeting ID:
+```python
+dataindex_get_entity_by_id(
+    entity_id="reflector:{meeting_id}",
+    max_content_length=null  # Get full transcript
+)
+```
+
+### For Zulip threads:
+```python
+dataindex_query_entities(
+    entity_types="threaded_conversation",  # MUST be threaded_conversation
+    connector_ids="zulip",                  # MUST be zulip
+    date_from="2024-10-19T00:00:00Z",      # 4 months ago
+    date_to="2025-02-19T23:59:59Z",       # today
+    limit=100
+)
+```
+
+Then for each thread ID:
+```python
+dataindex_get_entity_by_id(
+    entity_id="zulip:{thread_id}",
+    max_content_length=null  # Get full thread content
+)
+```
+
+## Dependencies
+
+- [dataindex](../dataindex/SKILL.md) — entity querying
+- [contactdb](../contactdb/SKILL.md) — participant resolution
+- [notebook-patterns](../notebook-patterns/SKILL.md) — pagination helpers (if needed)
--- a/.agents/skills/internalai-business-context/SKILL.md
+++ b/.agents/skills/internalai-business-context/SKILL.md
@@ -0,0 +1,406 @@
+---
+name: internalai-business-context
+description: Generate business context updates for InternalAI meetings. Fetches the latest InternalAI meeting (or specified meeting ID), analyzes participant contributions, researches business context via DataIndex, and creates a concise business update document with 1-week priorities. Requires Greyhaven_Company_Documentation.md in repo root.
+user-invocable: true
+argument-hint: [meeting_id]
+---
+
+# InternalAI Business Context Update
+
+Generate a business context update document for InternalAI meetings, connecting technical discussions to Greyhaven's strategic mission.
+
+## What I do
+
+- Fetch the latest InternalAI meeting transcript (or use specified meeting ID)
+- Extract participant talking points and work items
+- Research business context for each participant via DataIndex (last 30 days)
+- Synthesize findings into a concise business update document
+- Map work to Greyhaven's strategic pillars and revenue goals
+
+## When to use me
+
+Use this skill after InternalAI meetings to create business-focused updates that:
+- Connect technical work to business value
+- Show strategic alignment with Greyhaven's mission
+- Highlight immediate priorities (1 week ahead)
+- Provide context for stakeholders (Max, Jordan, investors)
+
+## Prerequisites
+
+- Greyhaven_Company_Documentation.md in repository root
+- DataIndex API access for meeting transcripts and business context
+- If Greyhaven doc missing, run `/init-greyhaven` first
+
+## Workflow
+
+### Step 1: Check for Greyhaven Documentation
+
+```bash
+# Check if Greyhaven_Company_Documentation.md exists in repo root
+ls -la Greyhaven_Company_Documentation.md
+```
+
+**If missing:** Run `/init-greyhaven` to generate it, then retry this skill.
+
+### Step 2: Fetch Meeting Data
+
+**Option A: Use specified meeting ID**
+```python
+dataindex_get_entity_by_id(
+    entity_id="reflector:{meeting_id}",
+    include_raw_data=true,
+    max_content_length=null
+)
+```
+
+**Option B: Fetch latest InternalAI meeting (default)**
+```python
+# Query for latest InternalAI meeting
+dataindex_query_entities(
+    entity_types="meeting",
+    connector_ids="reflector",
+    search="internalai",
+    limit=1,
+    sort_by="timestamp",
+    sort_order="desc"
+)
+
+# Then fetch full transcript
+dataindex_get_entity_by_id(
+    entity_id="reflector:{latest_meeting_id}",
+    include_raw_data=true,
+    max_content_length=null
+)
+```
+
+**Extract from meeting entity:**
+- `raw_data.transcript` - Full conversation text
+- `participants` - List of attendees with contact_ids
+- `title` - Meeting title
+- `timestamp` - Meeting date
+- `id` - Meeting ID for reference links (extract UUID portion)
+
+### Step 3: Parse Meeting into JSON Structure
+
+Create structured JSON with participant talking points:
+
+```json
+{
+  "meeting": {
+    "title": "Meeting Title",
+    "date": "2026-02-18",
+    "room": "internalai",
+    "participants": ["Name1", "Name2"],
+    "meeting_id": "reflector:xxx"
+  },
+  "participants": {
+    "ParticipantName": {
+      "business_value": ["talking point 1", "talking point 2"],
+      "things_done": ["completed item 1"],
+      "things_in_progress": ["current work 1"],
+      "things_to_do": ["planned item 1"],
+      "key_references": ["reference 1"]
+    }
+  }
+}
+```
+
+**Parsing approach:**
+1. Segment transcript by speaker (lines starting with "SpeakerName: ")
+2. Extract business outcomes from each segment
+3. Categorize into: completed, in-progress, planned
+4. Identify key technical references mentioned
+
+### Step 4: Research Business Context (Parallel Subagents)
+
+For each participant, launch parallel subagents to search DataIndex.
+
+**Subagent task template:**
+```
+Research business context for {participant_name} from meeting on {date}.
+
+Meeting context: {meeting_title} - {brief_description_of_discussion}
+
+Search DataIndex for the last 30 days (from {date_minus_30} to {date}) to find:
+- What projects this person is working on
+- What business problems they're solving
+- Recent decisions or priorities
+- Related work from other team members
+
+**Search Command to Use:**
+Use dataindex_search with these parameters:
+- query: [your search terms]
+- limit: 10
+- date_from: {date_minus_30} (ISO format)
+- date_to: {date} (ISO format)
+
+**Maximum 10 searches per participant.** Choose your search queries strategically based on the meeting topics discussed.
+
+**URL Construction Helpers:**
+
+For Zulip references (from threaded_conversation entities):
+- Base: https://zulip.monadical.com/#narrow/channel/
+- Format: https://zulip.monadical.com/#narrow/channel/{stream_id}-{stream_name}/topic/{topic_name}/with/{first_message_id}
+- stream_id: From entity.connector_metadata.stream_id or parse from entity.id
+- stream_name: From entity.title (extract stream name before the dash)
+- topic_name: From entity.connector_metadata.topic or parse from entity.title
+- message_id: Use the first message ID from entity.recent_messages[0].id
+
+For Reflector references (from meeting entities):
+- Base: https://reflector.monadical.com/transcripts/
+- Format: https://reflector.monadical.com/transcripts/{transcript_id}
+- transcript_id: Extract from meeting.entity_id (remove "reflector:" prefix)
+
+**What to return:**
+For each participant, provide:
+1. Key themes from their work (2-3 themes)
+2. Business value of each theme (1 sentence)
+3. Strategic alignment with Greyhaven (1 sentence)
+4. Direct URLs to relevant sources (use formats above)
+
+Return as structured JSON:
+{
+  "participant": "Name",
+  "research_date_range": "{date_minus_30} to {date}",
+  "key_themes": [
+    {
+      "theme": "Theme name",
+      "business_value": "Why this matters commercially",
+      "strategic_alignment": "How this supports Greyhaven's mission",
+      "references": [
+        {
+          "type": "zulip|reflector",
+          "title": "Brief description",
+          "url": "Full URL"
+        }
+      ]
+    }
+  ]
+}
+```
+
+**Launch all subagents simultaneously** - one per participant.
+
+**Search command example for subagents:**
+```python
+# Example search - subagent adapts query based on context
+dataindex_search(
+    query="{participant_name} {topic}",
+    limit=10,
+    date_from="2026-01-18T00:00:00Z",
+    date_to="2026-02-18T23:59:59Z"
+)
+```
+
+### Step 5: Read Greyhaven Business Context
+
+```bash
+# Read Greyhaven documentation
+cat Greyhaven_Company_Documentation.md
+```
+
+Extract key context:
+- Three Pillars: Human-centered design, Local-first sovereignty, Rapid prototyping
+- Other Three Pillars of Data Sovereignty: Choice, Control, Clarity
+- Current positioning: "Palantir for SMEs"
+- Key stakeholders: Max (Founder), Jordan (BizDev), Corey Gallon (Strategic Advisor)
+- Immediate business priorities: Fundraising, first paying customer (Electra), enterprise readiness
+
+### Step 6: Generate Business Context Document
+
+**Output file:** `internalai_context_YYYY-MM-DD.md` (in repo root)
+
+**Document structure:**
+
+```markdown
+# InternalAI Business Context Update
+
+[2-3 sentences total business impact across all participants]
+
+---
+
+## Participant Updates
+
+### {Participant Name}
+
+**What They've Done:**
+[2-3 sentences on completed work with business outcomes]
+
+**Immediate Priorities (Next 7 Days):**
+- [Priority 1]
+- [Priority 2]
+
+**Strategic Context:**
+[1-2 sentences connecting work to Greyhaven's mission and pillars]
+
+**References:**
+- [Title](https://zulip.monadical.com/#narrow/channel/{stream_id}-{stream}/topic/{topic}/with/{message_id})
+- [Title](https://reflector.monadical.com/transcripts/{transcript_id})
+
+---
+
+**Revenue Enablement:**
+[How this work supports fundraising, customer acquisition, or retention]
+
+**Market Positioning:**
+[How this supports "Palantir for SMEs" or data sovereignty messaging]
+
+---
+
+## Reference Links
+
+### Key Meetings
+- [Meeting Title](https://reflector.monadical.com/transcripts/{transcript_id})
+
+### Zulip Threads
+- [Thread Title](https://zulip.monadical.com/#narrow/channel/{stream_id}-{stream}/topic/{topic}/with/{message_id})
+
+---
+
+*Document generated from meeting transcript and DataIndex research*
+```
+
+**Content guidelines:**
+- **Per participant:** Max 2 short paragraphs (What Done + Strategic Context)
+- **Immediate priorities:** Only 1 week ahead (not 2 weeks)
+- **Business language:** No technical jargon, focus on outcomes
+- **URL formats:** 
+  - Zulip: `https://zulip.monadical.com/#narrow/channel/{stream_id}-{stream_name}/topic/{topic_name}/with/{message_id}`
+  - Reflector: `https://reflector.monadical.com/transcripts/{transcript_id}`
+
+### Step 7: Generate Meeting JSON
+
+Create `{date}_meeting_analysis.json` with parsed talking points (saved by default):
+
+```json
+{
+  "meeting": {
+    "title": "...",
+    "date": "...",
+    "participants": ["..."],
+    "meeting_url": "https://reflector.monadical.com/transcripts/{id}"
+  },
+  "participants": {
+    "Name": {
+      "business_value": [...],
+      "things_done": [...],
+      "things_in_progress": [...],
+      "things_to_do": [...],
+      "key_references": [...]
+    }
+  },
+  "decisions_made": [...],
+  "open_questions": [...]
+}
+```
+
+### Step 8: Output Summary
+
+Display to user:
+```
+✅ Business Context Update Complete
+
+📄 Documents generated:
+   - internalai_context_{date}.md
+   - {date}_meeting_analysis.json
+
+👥 Participants covered: {names}
+
+🎯 Key business themes:
+   - {theme 1}
+   - {theme 2}
+```
+
+## Commands Summary
+
+**Pre-flight check:**
+```bash
+ls -la Greyhaven_Company_Documentation.md || echo "Run /init-greyhaven first"
+```
+
+**Fetch meeting (latest):**
+```python
+dataindex_query_entities(
+    entity_types="meeting",
+    connector_ids="reflector",
+    search="internalai",
+    limit=1,
+    sort_by="timestamp",
+    sort_order="desc"
+)
+```
+
+**Fetch meeting (by ID):**
+```python
+dataindex_get_entity_by_id(
+    entity_id="reflector:{meeting_id}",
+    include_raw_data=true,
+    max_content_length=null
+)
+```
+
+**Research business context (per participant - max 10 searches):**
+```python
+# Launch parallel subagents, one per participant
+# Each subagent runs up to 10 dataindex_search queries
+# Subagent constructs queries based on meeting context
+```
+
+**Generate output:**
+```bash
+# Write to: internalai_context_YYYY-MM-DD.md
+# Write to: YYYY-MM-DD_meeting_analysis.json
+```
+
+## URL Construction Reference
+
+### Zulip URLs
+From `threaded_conversation` entity:
+```
+https://zulip.monadical.com/#narrow/channel/{stream_id}-{stream_name}/topic/{topic_name}/with/{message_id}
+```
+
+**Field mapping:**
+- `stream_id`: entity.connector_metadata.stream_id
+- `stream_name`: Parse from entity.title (before dash)
+- `topic_name`: entity.connector_metadata.topic
+- `message_id`: entity.recent_messages[0].id
+
+### Reflector URLs
+From `meeting` entity:
+```
+https://reflector.monadical.com/transcripts/{transcript_id}
+```
+
+**Field mapping:**
+- `transcript_id`: entity.entity_id.replace("reflector:", "")
+
+## Error Handling
+
+- **Greyhaven doc missing:** Prompt user to run `/init-greyhaven`
+- **No InternalAI meetings found:** Check room_name filter, suggest checking reflector connector status
+- **Empty transcript:** Mark meeting as processed but note limited content
+- **Subagent failures:** Continue with available data, note gaps in output
+- **Large transcripts (>200KB):** Save to temp file, pass path to subagents
+
+## Dependencies
+
+- [dataindex](../dataindex/SKILL.md) — meeting queries and business context search
+- [init-greyhaven](../init-greyhaven/SKILL.md) — Greyhaven business context generation (if needed)
+- [contactdb](../contactdb/SKILL.md) — participant name resolution
+
+## Example Usage
+
+**Default (latest meeting):**
+```
+/internalai-business-context
+```
+
+**Specific meeting:**
+```
+/internalai-business-context aecfd2e9-990f-4f25-b746-eb14ddae7494
+```
+
+**Output location:**
+- `{repo_root}/internalai_context_2026-02-18.md`
+- `{repo_root}/2026-02-18_meeting_analysis.json`
--- a/.agents/skills/notebook-patterns/SKILL.md
+++ b/.agents/skills/notebook-patterns/SKILL.md
@@ -1,3 +1,9 @@
+---
+name: notebook-patterns
+description: Marimo notebook patterns for InternalAI data analysis. Use when creating or editing marimo notebooks — covers cell scoping, async cells, pagination helpers, analysis patterns, and do/don't rules.
+user-invocable: false
+---
+
 # Marimo Notebook Patterns

 This guide covers how to create [marimo](https://marimo.io) notebooks for data analysis against the InternalAI platform APIs. Marimo notebooks are plain `.py` files with reactive cells — no `.ipynb` format, no Jupyter dependency.
@@ -25,11 +31,11 @@ def cell_two(x):
 **Key rules:**
 - Cells declare dependencies via function parameters
 - Cells return values as tuples: `return (var1, var2,)`
- The **last expression** in a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees)
+- The **last expression at the top level** of a cell is displayed as rich output in the marimo UI (dataframes render as tables, dicts as collapsible trees). Expressions inside `if`/`else`/`for` blocks do **not** count — see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level) below
 - Use `mo.md("# heading")` for formatted markdown output (import `mo` once in setup — see below)
 - No manual execution order; the DAG determines it
 - **Variable names must be unique across cells.** Every variable assigned at the top level of a cell is tracked by marimo's DAG. If two cells both define `resp`, marimo raises `MultipleDefinitionError` and refuses to run. Prefix cell-local variables with `_` (e.g., `_resp`, `_rows`, `_data`) to make them **private** to that cell — marimo ignores `_`-prefixed names.
- **Import shared modules once** in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` in multiple cells — that defines `mo` twice. Instead, import it once in `setup` and receive it via `def my_cell(mo):`.
+- **All imports must go in the `setup` cell.** Every `import` statement creates a top-level variable (e.g., `import asyncio` defines `asyncio`). If two cells both `import asyncio`, marimo raises `MultipleDefinitionError`. Place **all** imports in a single setup cell and pass them as cell parameters. Do NOT `import marimo as mo` or `import asyncio` in multiple cells — import once in `setup`, then receive via `def my_cell(mo, asyncio):`.

 ### Cell Variable Scoping — Example

@@ -79,6 +85,120 @@ def fetch_details(client, DATAINDEX, results):

 > **Note:** Variables inside nested `def` functions are naturally local and don't need `_` prefixes — e.g., `resp` inside a `def fetch_all(...)` helper is fine because it's scoped to the function, not the cell.

+### Cell Output Must Be at the Top Level
+
+Marimo only renders the **last expression at the top level** of a cell as rich output. An expression buried inside an `if`/`else`, `for`, `try`, or any other block is **not** displayed — it's silently discarded.
+
+**BROKEN** — `_df` inside the `if` branch is never rendered, and `mo.md()` inside `if`/`else` is also discarded:
+
+```python
+@app.cell
+def show_results(results, mo):
+    if results:
+        _df = pl.DataFrame(results)
+        mo.md(f"**Found {len(results)} results**")
+        _df  # Inside an if block — marimo does NOT display this
+    else:
+        mo.md("**No results found**")  # Also inside a block — NOT displayed
+    return
+```
+
+**FIXED** — split into separate cells. Each cell displays exactly **one thing** at the top level:
+
+```python
+# Cell 1: build the data, return it
+@app.cell
+def build_results(results, pl):
+    results_df = pl.DataFrame(results) if results else None
+    return (results_df,)
+
+# Cell 2: heading — mo.md() is the top-level expression (use ternary for conditional text)
+@app.cell
+def show_results_heading(results_df, mo):
+    mo.md(f"**Found {len(results_df)} results**" if results_df is not None else "**No results found**")
+
+# Cell 3: table — DataFrame is the top-level expression
+@app.cell
+def show_results_table(results_df):
+    results_df  # Top-level expression — marimo renders this as interactive table
+```
+
+**Rules:**
+- Each cell should display **one thing** — either `mo.md()` OR a DataFrame, never both
+- `mo.md()` must be a **top-level expression**, not inside `if`/`else`/`for`/`try` blocks
+- Build conditional text using variables or ternary expressions, then call `mo.md(_text)` at the top level
+- For DataFrames, use a standalone display cell: `def show_table(df): df`
+
+### Async Cells
+
+When a cell uses `await` (e.g., for `llm_call` or `asyncio.gather`), you **must** declare it as `async def`:
+
+```python
+@app.cell
+async def analyze(meetings, llm_call, ResponseModel, asyncio):
+    async def _score(meeting):
+        return await llm_call(prompt=..., response_model=ResponseModel)
+
+    results = await asyncio.gather(*[_score(_m) for _m in meetings])
+    return (results,)
+```
+
+Note that `asyncio` is imported in the `setup` cell and received here as a parameter — never `import asyncio` inside individual cells.
+
+If you write `await` in a non-async cell, marimo cannot parse the cell and saves it as an `_unparsable_cell` string literal — the cell won't run, and you'll see `SyntaxError: 'return' outside function` or similar errors. See [Fixing `_unparsable_cell`](#fixing-_unparsable_cell) below.
+
+### Cells That Define Classes Must Return Them
+
+If a cell defines Pydantic models (or any class) that other cells need, it **must** return them:
+
+```python
+# BaseModel and Field are imported in the setup cell and received as parameters
+@app.cell
+def models(BaseModel, Field):
+    class MeetingSentiment(BaseModel):
+        overall_sentiment: str
+        sentiment_score: int = Field(description="Score from -10 to +10")
+
+    class FrustrationExtraction(BaseModel):
+        has_frustrations: bool
+        frustrations: list[dict]
+
+    return MeetingSentiment, FrustrationExtraction  # Other cells receive these as parameters
+```
+
+A bare `return` (or no return) means those classes are invisible to the rest of the notebook.
+
+### Fixing `_unparsable_cell`
+
+When marimo can't parse a cell into a proper `@app.cell` function, it saves the raw code as `app._unparsable_cell("...", name="cell_name")`. These cells **won't run** and show errors like `SyntaxError: 'return' outside function`.
+
+**Common causes:**
+1. Using `await` without making the cell `async def`
+2. Using `return` in code that marimo failed to wrap into a function (usually a side effect of cause 1)
+
+**How to fix:** Convert the `_unparsable_cell` string back into a proper `@app.cell` decorated function:
+
+```python
+# BROKEN — saved as _unparsable_cell because of top-level await
+app._unparsable_cell("""
+results = await asyncio.gather(...)
+return results
+""", name="my_cell")
+
+# FIXED — proper async cell function (asyncio imported in setup, received as parameter)
+@app.cell
+async def my_cell(some_dependency, asyncio):
+    results = await asyncio.gather(...)
+    return (results,)
+```
+
+**Key differences to note when converting:**
+- Wrap the code in an `async def` function (if it uses `await`)
+- Add cell dependencies as function parameters (including imports like `asyncio`)
+- Return values as tuples: `return (var,)` not `return var`
+- Prefix cell-local variables with `_`
+- Never add `import` statements inside the cell — all imports belong in `setup`
+
 ### Inline Dependencies with PEP 723

 Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
@@ -90,10 +210,25 @@ Use PEP 723 `/// script` metadata so `uv run` auto-installs dependencies:
 #     "marimo",
 #     "httpx",
 #     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
 # ]
 # ///
 ```

+### Checking Notebooks Before Running
+
+Always run `marimo check` before opening or running a notebook. It catches common issues — duplicate variable definitions, `_unparsable_cell` blocks, branch expressions that won't display, and more — without needing to start the full editor:
+
+```bash
+uvx marimo check notebook.py           # Check a single notebook
+uvx marimo check workflows/            # Check all notebooks in a directory
+uvx marimo check --fix notebook.py     # Auto-fix fixable issues
+```
+
+**Run this after every edit.** A clean `marimo check` (no output, exit code 0) means the notebook is structurally valid. Any errors must be fixed before running.
+
 ### Running Notebooks

 ```bash
@@ -142,6 +277,9 @@ Every notebook against InternalAI follows this structure:
 #     "marimo",
 #     "httpx",
 #     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
 # ]
 # ///

@@ -166,11 +304,16 @@ def config():

@app.cell
 def setup():
+    from dotenv import load_dotenv
+    load_dotenv(".env")  # Load .env from the project root
+
+    import asyncio  # All imports go here — never import inside other cells
    import httpx
    import marimo as mo
    import polars as pl
+    from pydantic import BaseModel, Field
    client = httpx.Client(timeout=30)
-    return (client, mo, pl,)
+    return (asyncio, client, mo, pl, BaseModel, Field,)

 # --- your IN / ETL / OUT cells here ---

@@ -178,6 +321,8 @@ if __name__ == "__main__":
    app.run()
 ```

+> **`load_dotenv(".env")`** reads the `.env` file explicitly by name. This makes `LLM_API_KEY` and other env vars available to `os.getenv()` calls in `lib/llm.py` without requiring the shell to have them pre-set. Always include `python-dotenv` in PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell.
+
 **The `params` cell must always be the first cell** after `app = marimo.App()`. It contains all user-configurable constants (search terms, date ranges, target names, etc.) as plain Python values. This way the user can tweak the workflow by editing a single cell at the top — no need to hunt through the code for hardcoded values.

 ## Pagination Helper
@@ -264,6 +409,8 @@ Meetings have a `participants` list where each entry may or may not have a resol

 **Strategy:** Query by `contact_ids` to get meetings with resolved participants, then optionally do a client-side check on `participants[].display_name` or `transcript` for unresolved ones.

+> **Always include `room_name` in meeting tables.** The `room_name` field contains the virtual room name (e.g., `standup-office-bogota`) and often indicates where the meeting took place. It's useful context when `title` is generic or missing — include it as a column alongside `title`.
+
 ```python
@app.cell
 def fetch_meetings(fetch_all, DATAINDEX, target_id, my_id):
@@ -283,7 +430,8 @@ def meeting_table(resolved_meetings, target_name, pl):
        _names = [_p["display_name"] for _p in _participants]
        _rows.append({
            "date": (_m.get("start_time") or _m["timestamp"])[:10],
-            "title": _m.get("title", _m.get("room_name", "Untitled")),
+            "title": _m.get("title", "Untitled"),
+            "room_name": _m.get("room_name", ""),
            "participants": ", ".join(_names),
            "has_transcript": _m.get("transcript") is not None,
            "has_summary": _m.get("summary") is not None,
@@ -425,6 +573,92 @@ def display_timeline(timeline_df):
    timeline_df
 ```

+## Pattern 5: LLM Filtering with `lib.llm`
+
+When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
+
+**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope`, `pydantic`, and `python-dotenv` to the notebook's PEP 723 dependencies.
+
+```python
+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+#     "marimo",
+#     "httpx",
+#     "polars",
+#     "mirascope[openai]",
+#     "pydantic",
+#     "python-dotenv",
+# ]
+# ///
+```
+
+### Setup cell — load `.env` and import `llm_call`
+
+```python
+@app.cell
+def setup():
+    from dotenv import load_dotenv
+    load_dotenv(".env")  # Makes LLM_API_KEY available to lib/llm.py
+
+    import asyncio
+    import httpx
+    import marimo as mo
+    import polars as pl
+    from pydantic import BaseModel, Field
+    from lib.llm import llm_call
+    client = httpx.Client(timeout=30)
+    return (asyncio, client, llm_call, mo, pl, BaseModel, Field,)
+```
+
+### Define a response model
+
+Create a Pydantic model that describes the structured output you want from the LLM:
+
+```python
+@app.cell
+def models(BaseModel, Field):
+
+    class RelevanceScore(BaseModel):
+        relevant: bool
+        reason: str
+        score: int  # 0-10
+
+    return (RelevanceScore,)
+```
+
+### Filter entities through the LLM
+
+Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
+
+```python
+@app.cell
+async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo, asyncio):
+    _topic = "Greyhaven"
+
+    async def _score(meeting):
+        _text = meeting.get("summary") or meeting.get("title") or ""
+        _result = await llm_call(
+            prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
+            response_model=RelevanceScore,
+            system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
+        )
+        return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
+
+    scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
+    relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
+
+    mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
+    return (relevant_meetings,)
+```
+
+### Tips for LLM filtering
+
+- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
+- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
+- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
+- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
+
 ## Do / Don't — Quick Reference for LLM Agents

 When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
@@ -432,20 +666,28 @@ When generating marimo notebooks, follow these rules strictly. Violations cause
 ### Do

 - **Prefix cell-local variables with `_`** — `_resp`, `_rows`, `_m`, `_data`, `_chunk`. Marimo ignores `_`-prefixed names so they won't clash across cells.
- **Import shared modules once in `setup`** and pass them as cell parameters: `def my_cell(client, mo, pl):`.
+- **Put all imports in the `setup` cell** and pass them as cell parameters: `def my_cell(client, mo, pl, asyncio):`. Never `import` inside other cells — even `import asyncio` in two async cells causes `MultipleDefinitionError`.
 - **Give returned DataFrames unique names** — `email_df`, `meeting_df`, `timeline_df`. Never use a bare `df` that might collide with another cell.
 - **Return only values other cells need** — everything else should be `_`-prefixed and stays private to the cell.
- **Use `from datetime import datetime` inside the cell** that needs it (stdlib imports are fine inline since they're `_`-safe inside functions, but avoid assigning them to non-`_` names if another cell does the same).
+- **Import stdlib modules in `setup` too** — even `from datetime import datetime` creates a top-level name. If two cells both import `datetime`, marimo errors. Import it once in `setup` and receive it as a parameter, or use it inside a `_`-prefixed helper function where it's naturally scoped.
 - **Every non-utility cell must show a preview** — see the "Cell Output Previews" section below.
+- **Use separate display cells for DataFrames** — the build cell returns the DataFrame and shows a `mo.md()` count/heading; a standalone display cell (e.g., `def show_table(df): df`) renders it as an interactive table the user can sort and filter.
+- **Include `room_name` when listing meetings** — the virtual room name provides useful context about where the meeting took place (e.g., `standup-office-bogota`). Show it as a column alongside `title`.
+- **Keep cell output expressions at the top level** — if a cell conditionally displays a DataFrame, initialize `_output = None` before the `if`/`else`, assign inside the branches, then put `_output` as the last top-level expression. Expressions inside `if`/`else`/`for` blocks are silently ignored by marimo.
 - **Put all user parameters in a `params` cell as the first cell** — date ranges, search terms, target names, limits. Never hardcode these values deeper in the notebook.
+- **Declare cells as `async def` when using `await`** — `@app.cell` followed by `async def cell_name(...)`. This includes cells using `asyncio.gather`, `await llm_call(...)`, or any async API.
+- **Return classes/models from cells that define them** — if a cell defines `class MyModel(BaseModel)`, return it so other cells can use it as a parameter: `return (MyModel,)`.
+- **Use `python-dotenv` to load `.env`** — add `python-dotenv` to PEP 723 dependencies and call `load_dotenv(".env")` early in the setup cell (before importing `lib.llm`). This ensures `LLM_API_KEY` and other env vars are available without requiring them to be pre-set in the shell.

 ### Don't

 - **Don't define the same variable name in two cells** — even `resp = ...` in cell A and `resp = ...` in cell B is a fatal error.
- **Don't `import marimo as mo` in multiple cells** — this defines `mo` twice. Import it once in `setup`, then receive it via `def my_cell(mo):`.
+- **Don't `import` inside non-setup cells** — every `import X` defines a top-level variable `X`. If two cells both `import asyncio`, marimo raises `MultipleDefinitionError` and refuses to run. Put all imports in the `setup` cell and receive them as function parameters.
 - **Don't use generic top-level names** like `df`, `rows`, `resp`, `data`, `result` — either prefix with `_` or give them a unique descriptive name.
 - **Don't return temporary variables** — if `_rows` is only used to build a DataFrame, keep it `_`-prefixed and only return the DataFrame.
- **Don't use `import X` at the top level of multiple cells** for the same module — the module variable name would be duplicated. Import once in `setup` or use `_`-prefixed local imports (`_json = __import__("json")`).
+- **Don't use `await` in a non-async cell** — this causes marimo to save the cell as `_unparsable_cell` (a string literal that won't execute). Always use `async def` for cells that call async functions.
+- **Don't define classes in a cell without returning them** — a bare `return` or no return makes classes invisible to the DAG. Other cells can't receive them as parameters.
+- **Don't put display expressions inside `if`/`else`/`for` blocks** — marimo only renders the last top-level expression. A DataFrame inside an `if` branch is silently discarded. Use the `_output = None` pattern instead (see [Cell Output Must Be at the Top Level](#cell-output-must-be-at-the-top-level)).

 ## Cell Output Previews

@@ -502,7 +744,7 @@ def build_table(meetings, pl):
    return (meeting_df,)
 ```

-**Good** — DataFrame is the last expression, so marimo renders it as an interactive table:
+**Good** — the build cell shows a `mo.md()` count, and a **separate display cell** renders the DataFrame as an interactive table:

 ```python
@app.cell
@@ -517,6 +759,27 @@ def show_meeting_table(meeting_df):
    meeting_df  # Renders as interactive sortable table
 ```

+### Separate display cells for DataFrames
+
+When a cell builds a DataFrame, use **two cells**: one that builds and returns it (with a `mo.md()` summary), and a standalone display cell that renders it as a table. This keeps the build logic clean and gives the user an interactive table they can sort and filter in the marimo UI.
+
+```python
+# Cell 1: build and return the DataFrame, show a count
+@app.cell
+def build_sentiment_table(analyzed_meetings, pl, mo):
+    _rows = [...]
+    sentiment_df = pl.DataFrame(_rows).sort("date", descending=True)
+    mo.md(f"### Sentiment Analysis ({len(sentiment_df)} meetings)")
+    return (sentiment_df,)
+
+# Cell 2: standalone display — just the DataFrame, nothing else
+@app.cell
+def show_sentiment_table(sentiment_df):
+    sentiment_df
+```
+
+This pattern makes every result inspectable. The `mo.md()` cell gives a quick count/heading; the display cell lets the user explore the full data interactively.
+
 ### Utility cells (no preview needed)

 Config, setup, and helper cells that only define constants or functions don't need previews:
--- a/.agents/skills/project-history/SKILL.md
+++ b/.agents/skills/project-history/SKILL.md
@@ -0,0 +1,364 @@
+---
+name: project-history
+description: Build initial historical timeline for a project. Queries all datasources and creates week-by-week analysis files up to a sync date. Requires project-init to have been run first (datasources.md must exist).
+disable-model-invocation: true
+argument-hint: [project-name] [date-from] [date-to]
+---
+
+# Build Project History
+
+**When to use:** After `/project-init` has been run and the user has reviewed `datasources.md`. This skill gathers historical data and builds the week-by-week timeline.
+
+**Precondition:** `projects/$0/datasources.md` must exist. If it doesn't, run `/project-init $0` first.
+
+## Step 1: Read Datasources
+
+Read `projects/$0/datasources.md` to determine:
+- Which Zulip stream IDs and search terms to query
+- Which git repository to clone/pull
+- Which meeting room names to filter by
+- Which entity types to prioritize
+
+## Step 2: Gather Historical Data
+
+Query data for the period `$1` to `$2`.
+
+### A. Query Zulip
+
+For each PRIMARY stream in datasources.md:
+
+```python
+# Paginate through all threaded conversations
+GET /api/v1/query
+  entity_types=threaded_conversation
+  connector_ids=zulip
+  date_from=$1
+  date_to=$2
+  search={project-search-term}
+  limit=100
+  offset=0
+```
+
+### B. Clone/Pull Git Repository
+
+```bash
+# First time
+git clone --depth 200 {url} ./tmp/$0-clone
+# Or if already cloned
+cd ./tmp/$0-clone && git pull
+
+# Extract commit history for the period
+git log --since="$1" --until="$2" --format="%H|%an|%ae|%ad|%s" --date=short
+git log --since="$1" --until="$2" --format="%an" | sort | uniq -c | sort -rn
+```
+
+### C. Query Meeting Recordings
+
+For each PRIMARY meeting room in datasources.md:
+
+```python
+GET /api/v1/query
+  entity_types=meeting
+  date_from=$1
+  date_to=$2
+  room_name={room-name}
+  limit=100
+```
+
+Also do a semantic search for broader coverage:
+
+```python
+POST /api/v1/search
+  search_text={project-name}
+  entity_types=["meeting"]
+  date_from=$1
+  date_to=$2
+  limit=50
+```
+
+## Step 3: Analyze by Week
+
+For each week in the period, create a week file. Group the gathered data into calendar weeks (Monday-Sunday).
+
+For each week, analyze:
+
+1. **Key Decisions** — Strategic choices, architecture changes, vendor selections, security responses
+2. **Technical Work** — Features developed, bug fixes, infrastructure changes, merges/PRs
+3. **Team Activity** — Who was active, new people, departures, role changes
+4. **Blockers** — Issues, delays, dependencies
+
+### Week file template
+
+**File:** `projects/$0/timeline/{year-month}/week-{n}.md`
+
+```markdown
+# $0 - Week {n}, {Month} {Year}
+
+**Period:** {date-range}
+**Status:** [Active/Quiet/Blocked]
+
+## Key Decisions
+
+### Decision Title
+- **Decision:** What was decided
+- **Date:** {date}
+- **Who:** {decision-makers}
+- **Impact:** Why it matters
+- **Context:** Background
+
+## Technical Work
+
+- [{Date}] {Description} - {Who}
+
+## Team Activity
+
+### Core Contributors
+- **Name:** Focus area
+
+### Occasional Contributors
+- Name: What they contributed
+
+## GitHub Activity
+
+**Commits:** {count}
+**Focus Areas:**
+- Area 1
+
+**Key Commits:**
+- Hash: Description (Author)
+
+## Zulip Activity
+
+**Active Streams:**
+- Stream: Topics discussed
+
+## Current Blockers
+
+1. Blocker description
+
+## Milestones Reached
+
+If any milestones were completed this week, document with business objective:
+- **Milestone:** What was achieved
+- **Business Objective:** WHY this matters (search for this in discussions, PRs, meetings)
+- **Impact:** Quantifiable results if available
+
+## Next Week Focus
+
+- Priority 1
+
+## Notes
+
+- Context and observations
+- Always try to capture the WHY behind decisions and milestones
+```
+
+### Categorization principles
+
+**Key Decisions:**
+- Technology migrations
+- Architecture changes
+- Vendor switches
+- Security incidents
+- Strategic pivots
+
+**Technical Work:**
+- Feature implementations
+- Bug fixes
+- Infrastructure changes
+- Refactoring
+
+**Skip Unless Meaningful:**
+- Routine check-ins
+- Minor documentation updates
+- Social chat
+
+### Contributor types
+
+**Core Contributors:** Regular commits (multiple per week), active in technical discussions, making architectural decisions, reviewing PRs.
+
+**Occasional Contributors:** Sporadic commits, topic-specific involvement, testing/QA, feedback only.
+
+## Step 4: Create/Update Timeline Index
+
+**File:** `projects/$0/timeline/index.md`
+
+```markdown
+# $0 Timeline Index
+
+## {Year}
+
+### {Quarter}
+- [Month Week 1](./{year}-{month}/week-1.md)
+- [Month Week 2](./{year}-{month}/week-2.md)
+
+## Key Milestones
+
+| Date | Milestone | Business Objective | Status |
+|------|-----------|-------------------|--------|
+| Mar 2025 | SQLite → PostgreSQL migration | Improve query performance (107ms→27ms) and enable concurrent access for scaling | Complete |
+| Jul 2025 | Chakra UI 3 migration | Modernize UI component library and improve accessibility | Complete |
+
+## Summary by Quarter
+
+### Q{X} {Year}
+- **Milestone 1:** What happened + Business objective
+- **Milestone 2:** What happened + Business objective
+```
+
+## Step 5: Create Project Dashboard (project.md)
+
+**File:** `projects/$0/project.md`
+
+Create the **living document** — the entry point showing current status:
+
+```markdown
+# $0 Project
+
+**One-liner:** [Brief description]  
+**Status:** [Active/On Hold/Deprecated]  
+**Last Updated:** [Date]
+
+---
+
+## This Week's Focus
+
+### Primary Objective
+[What the team is working on right now - from the most recent week]
+
+### Active Work
+- [From recent commits and discussions]
+
+### Blockers
+- [Any current blockers]
+
+---
+
+## Last Week's Focus
+
+### Delivered
+- ✅ [What was completed]
+
+### Decisions Made
+- [Key decisions from last week]
+
+---
+
+## Team
+
+### Core Contributors (Active)
+| Name | Focus | Availability |
+|------|-------|--------------|
+| [From git analysis] | [Area] | Full-time/Part-time |
+
+### Occasional Contributors
+- [Name] - [Role]
+
+---
+
+## Milestones
+
+### In Progress 🔄
+| Milestone | Target | Business Objective |
+|-----------|--------|-------------------|
+| [Active milestones from the data] | [Date] | [WHY this matters] |
+
+### Recently Completed ✅
+| Milestone | Date | Business Objective |
+|-----------|------|-------------------|
+| [Recently completed] | [Date] | [WHY this mattered] |
+
+### Lost in Sight / Paused ⏸️
+| Milestone | Status | Reason |
+|-----------|--------|--------|
+| [If any] | Paused | [Why] |
+
+---
+
+## Recent Decisions
+
+### Week [N] (Current)
+- **[Decision]** - [Context from data]
+
+---
+
+## Quick Links
+
+- [📊 Timeline](./timeline/index.md) - Week-by-week history
+- [📋 Background](./background.md) - Project architecture
+- [🔌 Data Sources](./datasources.md) - How to gather information
+
+---
+
+*This is a living document. It reflects the current state and changes frequently.*
+```
+
+**Fill in from the analyzed data:**
+- Team members from git contributors
+- Current focus from the most recent week's activity
+- Milestones from major features/deployments found in the data
+- Recent decisions from meeting transcripts and Zulip discussions
+
+## Step 6: Update Sync State
+
+Update `projects/$0/sync-state.md`:
+
+```markdown
+# Sync State
+
+status: history_complete
+created_at: {original date}
+last_sync_date: $2
+initial_history_from: $1
+initial_history_to: $2
+```
+
+## Common Patterns
+
+### Security Incident
+```markdown
+### Security Incident: {CVE-ID}
+- **Discovered:** {date}
+- **Severity:** CRITICAL/HIGH/MEDIUM
+- **Who:** {discoverers}
+- **Impact:** {description}
+- **Actions:**
+  1. Immediate fix
+  2. Secrets rotated
+  3. Monitoring added
+```
+
+### Technology Migration
+```markdown
+### Migration: {Old} -> {New}
+- **Decision:** {date}
+- **Who:** {decision-makers}
+- **Timeline:** {duration}
+- **Rationale:** {why} ← Always include the business objective
+- **Status:** Complete/In Progress/Planned
+```
+
+**Important:** When documenting any milestone or decision, always search for and include the WHY:
+- Performance improvements (quantify if possible: "reduced from X to Y")
+- Business capabilities enabled ("allows concurrent access for scaling")
+- User experience improvements ("improves accessibility")
+- Risk mitigation ("addresses security vulnerability")
+- Cost reduction ("eliminates cloud dependency")
+
+Look for this context in: meeting recordings, Zulip planning threads, PR descriptions, release notes.
+
+### Team Change
+```markdown
+### Team: {Name} {Joined/Left/Role Change}
+- **Date:** {date}
+- **From:** {old role} (if applicable)
+- **To:** {new role}
+- **Impact:** {on project}
+```
+
+## Key Rules
+
+- **Link to sources**: Always reference commit hashes, PR numbers, Zulip topic names, meeting dates
+- **Be explicit about exclusions**: Document what streams/sources you're NOT analyzing and why
+- **Write once**: Week files are historical records — don't modify them after creation
+- **Paginate all queries**: Result sets can be large, always loop through all pages
--- a/.agents/skills/project-init/SKILL.md
+++ b/.agents/skills/project-init/SKILL.md
@@ -0,0 +1,311 @@
+---
+name: project-init
+description: Initialize a new project analysis. Creates directory structure, discovers relevant data sources (Zulip streams, git repos, meeting rooms), and writes datasources.md, background.md skeleton, and sync-state.md.
+disable-model-invocation: true
+argument-hint: [project-name]
+---
+
+# Initialize Project Analysis
+
+**When to use:** Starting analysis of a new project. This skill sets up the project structure and discovers data sources. It does NOT gather historical data — use `/project-history` for that after reviewing the datasources.
+
+## Step 1: Create Project Structure
+
+```bash
+mkdir -p projects/$0/timeline
+```
+
+## Step 2: Discover and Document Data Sources
+
+Investigate what data sources exist for this project. Use the [connectors skill](../connectors/SKILL.md) and [company skill](../company/SKILL.md) for reference.
+
+### Discovery process
+
+1. **Zulip streams**: Search DataIndex for `threaded_conversation` entities matching the project name. Note which stream IDs appear. Cross-reference with the company skill's Zulip channel list to identify primary vs. secondary streams.
+2. **Git repositories**: Ask the user for the repository URL, or search Gitea/GitHub if accessible.
+3. **Meeting rooms**: Search DataIndex for `meeting` entities matching the project name. Note which `room_name` values appear — these are the relevant meeting rooms.
+4. **Search terms**: Identify the project name, key technologies, and domain-specific terms that surface relevant data.
+5. **Entity type priority**: Determine which entity types are most relevant (typically `threaded_conversation`, `meeting`, and possibly `email`).
+
+### Write datasources.md
+
+**File:** `projects/$0/datasources.md`
+
+```markdown
+# $0 - Data Sources
+
+## Zulip Streams
+
+### PRIMARY Streams (Analyze All)
+| Stream ID | Name | Topics | Priority | What to Look For |
+|-----------|------|--------|----------|------------------|
+| XXX | stream-name | N topics | CRITICAL | Development discussions |
+
+### SECONDARY Streams (Selective)
+| Stream ID | Name | Topics to Analyze | Context |
+|-----------|------|-------------------|---------|
+| YYY | integration-stream | specific-topic | Integration work |
+
+### EXCLUDE
+- stream-id-1: reason
+- stream-id-2: reason
+
+## Git Repository
+
+**URL:** https://...
+
+**Commands:**
+```
+git clone {url} ./tmp/$0-clone
+cd ./tmp/$0-clone
+git log --format="%H|%an|%ae|%ad|%s" --date=short > commits.csv
+git log --format="%an|%ae" | sort | uniq -c | sort -rn
+```
+
+## Meeting Rooms
+
+### PRIMARY
+- room-name: Project-specific discussions
+
+### SECONDARY (Context Only)
+- allhands: General updates
+
+### EXCLUDE
+- personal-rooms: Other projects
+
+## Search Terms
+
+### Primary
+- project-name
+- key-technology-1
+
+### Technical
+- architecture-term-1
+
+### Deadline & Timeline Tracking
+- deadline
+- due date
+- target date
+- ETA
+- "by when"
+- "ship by"
+- milestone
+- launch date
+- release date
+- "when will"
+- "when is"
+- pushed back
+- extended
+- slipped
+- delayed
+- ahead of schedule
+- behind schedule
+
+## Entity Types Priority
+1. threaded_conversation (Zulip)
+2. meeting (recordings)
+3. calendar_event (for deadline tracking)
+4. [Exclude: email, document if not relevant]
+
+## Deadline Discovery
+
+During initial data gathering, search for deadline-related discussions:
+- **Direct mentions**: "deadline", "due date", "target", "ETA", "launch date"
+- **Timeline discussions**: When delivery dates are discussed or negotiated
+- **Changes**: Deadline extensions, delays, accelerations
+- **Commitments**: Statements like "we'll ship by X", "targeting Y for release"
+
+Track these with dates, sources, and any context about rationale for changes.
+```
+
+## Step 3: Create Project Dashboard (Living Document)
+
+**File:** `projects/$0/project.md`
+
+This is the **entry point** — the living document showing current status.
+
+```markdown
+# $0 Project
+
+**One-liner:** [Brief description]  
+**Status:** [Active/On Hold/Deprecated]  
+**Repository:** URL  
+**Last Updated:** [Date]
+
+---
+
+## This Week's Focus
+
+### Primary Objective
+[What the team is working on right now]
+
+### Active Work
+- [Current task 1]
+- [Current task 2]
+
+### Blockers
+- [Any blockers]
+
+---
+
+## Last Week's Focus
+
+### Delivered
+- ✅ [What was completed]
+
+### Decisions Made
+- [Key decisions from last week]
+
+---
+
+## Team
+
+### Core Contributors (Active)
+| Name | Focus | Availability |
+|------|-------|--------------|
+| [Name] | [Area] | Full-time/Part-time |
+
+### Occasional Contributors
+- [Name] - [Role]
+
+---
+
+## Milestones
+
+### In Progress 🔄
+| Milestone | Target | Business Objective |
+|-----------|--------|-------------------|
+| [Name] | [Date] | [WHY this matters] |
+
+### Recently Completed ✅
+| Milestone | Date | Business Objective |
+|-----------|------|-------------------|
+| [Name] | [Date] | [WHY this mattered] |
+
+### Lost in Sight / Paused ⏸️
+| Milestone | Status | Reason |
+|-----------|--------|--------|
+| [Name] | Paused | [Why paused] |
+
+---
+
+## Deadline History
+
+Track when deadlines were discussed, committed to, or changed. This shows the evolution of project timeline expectations.
+
+### Current Commitments
+| Deliverable | Current Target | Source | Confidence |
+|-------------|---------------|--------|------------|
+| [Feature/Milestone] | [Date] | [Meeting/Thread] | [High/Medium/Low] |
+
+### Timeline Evolution
+| Date | Change | Previous | New | Reason | Source |
+|------|--------|----------|-----|--------|--------|
+| [Date] | Extended | [Old Date] | [New Date] | [Why changed] | [Thread/Meeting] |
+| [Date] | Committed | - | [Date] | [Initial commitment context] | [Thread/Meeting] |
+
+---
+
+## Recent Decisions
+
+### Week [N] (Current)
+- **[Decision]** - [Context]
+
+### Week [N-1]
+- **[Decision]** - [Context]
+
+---
+
+## Quick Links
+
+- [📊 Timeline](./timeline/index.md) - Week-by-week history
+- [📋 Background](./background.md) - Project architecture and details
+- [🔌 Data Sources](./datasources.md) - How to gather information
+- [⚙️ Sync State](./sync-state.md) - Last sync information
+
+---
+
+*This is a living document. It reflects the current state and changes frequently.*
+```
+
+## Step 4: Create Background Skeleton
+
+**File:** `projects/$0/background.md`
+
+Static/architecture information that rarely changes.
+
+```markdown
+# $0 - Background
+
+**Type:** [Web app/Mobile app/Library/Service]  
+**Repository:** URL
+
+## What is $0?
+
+[Brief description of what the project does]
+
+## Architecture
+
+### Components
+- Component 1 - Purpose
+- Component 2 - Purpose
+
+### Technology Stack
+- Technology 1 - Usage
+- Technology 2 - Usage
+
+## Data Sources
+
+See: [datasources.md](./datasources.md)
+
+## Timeline Structure
+
+Weekly timeline files are organized in `timeline/` directory.
+
+## How This Project Is Updated
+
+1. Gather Data: Query Zulip, Git, meetings
+2. Update Timeline: Create week-by-week entries
+3. Update Project Dashboard: Refresh [project.md](./project.md)
+
+For current status, see: [project.md](./project.md)
+```
+
+## Step 4: Create Timeline Index
+
+**File:** `projects/$0/timeline/index.md`
+
+```markdown
+# $0 Timeline Index
+
+## Key Milestones
+
+| Date | Milestone | Status |
+|------|-----------|--------|
+| [To be filled by project-history] | | |
+
+## Summary by Quarter
+
+[To be filled by project-history]
+```
+
+## Step 5: Initialize Sync State
+
+**File:** `projects/$0/sync-state.md`
+
+```markdown
+# Sync State
+
+status: initialized
+created_at: [today's date]
+last_sync_date: null
+initial_history_from: null
+initial_history_to: null
+```
+
+## Done
+
+After this skill completes, the user should:
+1. **Review `datasources.md`** — confirm the streams, repos, and meeting rooms are correct
+2. **Edit `background.md`** — fill in any known project details
+3. **Run `/project-history $0 [date-from] [date-to]`** — to build the initial historical timeline
--- a/.agents/skills/project-sync/SKILL.md
+++ b/.agents/skills/project-sync/SKILL.md
@@ -0,0 +1,385 @@
+---
+name: project-sync
+description: Sync a project timeline using subagents for parallelism. Splits work by week and datasource to stay within context limits. Handles both first-time and incremental syncs.
+disable-model-invocation: true
+argument-hint: [project-name]
+---
+
+# Project Sync
+
+**When to use:** Keep a project timeline up to date. Works whether the project has been synced before or not.
+
+**Precondition:** `projects/$0/datasources.md` must exist. If it doesn't, run `/project-init $0` first.
+
+## Architecture: Coordinator + Subagents
+
+This skill is designed for **subagent execution** to stay within context limits. The main agent acts as a **coordinator** that delegates data-intensive work to subagents.
+
+```
+Coordinator
+├── Phase 1: Gather (parallel subagents, one per datasource)
+│   ├── Subagent: Zulip  → writes tmp/$0-sync/zulip.md
+│   ├── Subagent: Git    → writes tmp/$0-sync/git.md
+│   └── Subagent: Meetings → writes tmp/$0-sync/meetings.md
+│
+├── Phase 2: Synthesize (parallel subagents, one per week)
+│   ├── Subagent: Week 1 → writes timeline/{year-month}/week-{n}.md
+│   ├── Subagent: Week 2 → writes timeline/{year-month}/week-{n}.md
+│   └── ...
+│
+└── Phase 3: Finalize (coordinator directly)
+    ├── timeline/index.md (add links to new weeks)
+    ├── project.md (update living document)
+    └── sync-state.md (update sync status)
+```
+
+---
+
+## Coordinator Steps
+
+### Step 1: Determine Sync Range
+
+Check whether `projects/$0/sync-state.md` exists.
+
+**Case A — First sync (no sync-state.md):**
+Default range is **last 12 months through today**. If the user provided explicit dates as extra arguments (`$1`, `$2`), use those instead.
+
+**Case B — Incremental sync (sync-state.md exists):**
+Read `last_sync_date` from `projects/$0/sync-state.md`. Range is `last_sync_date` to today.
+
+### Step 2: Read Datasources
+
+Read `projects/$0/datasources.md` to determine:
+- Zulip stream IDs and search terms
+- Git repository URL
+- Meeting room names
+- Entity types to prioritize
+
+### Step 3: Prepare Scratch Directory
+
+```bash
+mkdir -p tmp/$0-sync
+```
+
+This directory holds intermediate outputs from Phase 1 subagents. It is ephemeral — delete it after the sync completes.
+
+### Step 4: Compute Week Boundaries
+
+Split the sync range into ISO calendar weeks (Monday–Sunday). Produce a list of `(week_number, week_start, week_end, year_month)` tuples. This list drives Phase 2.
+
+---
+
+## Phase 1: Gather Data (parallel subagents)
+
+Launch **one subagent per datasource**, all in parallel. Each subagent covers the **full sync range** and writes its output to a scratch file. The output must be organized by week so Phase 2 subagents can consume it.
+
+### Subagent: Zulip
+
+**Input:** Sync range, PRIMARY stream IDs and search terms from datasources.md.
+
+**Important:** `threaded_conversation` entities only contain the **last 50 messages** in a topic. To get complete message history for a week, you must query `conversation_message` entities.
+
+**Task:** Two-step process for each PRIMARY stream:
+
+**Step 1:** List all thread IDs in the stream using `id_prefix`:
+```
+GET /api/v1/query
+  entity_types=threaded_conversation
+  connector_ids=zulip
+  id_prefix=zulip:stream:{stream_id}
+  limit=100
+  offset=0
+```
+
+This returns all thread entities (e.g., `zulip:stream:155:topic_name`). Save these IDs.
+
+**Step 2:** For each week in the sync range, query messages from each thread:
+```
+GET /api/v1/query
+  entity_types=conversation_message
+  connector_ids=zulip
+  parent_id={thread_id}  # e.g., zulip:stream:155:standalone
+  date_from={week_start}
+  date_to={week_end}
+  limit=100
+  offset=0
+```
+
+Paginate through all messages for each thread/week combination.
+
+**Output:** Write `tmp/$0-sync/zulip.md` with results grouped by week:
+
+```markdown
+## Week {n} ({week_start} to {week_end})
+
+### Stream: {stream_name}
+- **Topic:** {topic} ({date}, {message_count} messages, {participant_count} participants)
+  {brief summary or key quote}
+```
+
+### Subagent: Git
+
+**Input:** Sync range, git repository URL from datasources.md.
+
+**Task:**
+
+**Important:** Git commands may fail due to gitconfig permission issues. Use a temporary HOME directory:
+
+```bash
+# Set temporary HOME to avoid gitconfig permission issues
+export HOME=$(pwd)/.tmp-home
+mkdir -p ./tmp
+
+# Clone if needed, pull if exists
+if [ -d ./tmp/$0-clone ]; then
+  export HOME=$(pwd)/.tmp-home && cd ./tmp/$0-clone && git pull
+else
+  export HOME=$(pwd)/.tmp-home && git clone --depth 500 {url} ./tmp/$0-clone
+  cd ./tmp/$0-clone
+fi
+
+# Get commits in the date range
+export HOME=$(pwd)/.tmp-home && git log --since="{range_start}" --until="{range_end}" --format="%H|%an|%ae|%ad|%s" --date=short
+
+# Get contributor statistics
+export HOME=$(pwd)/.tmp-home && git log --since="{range_start}" --until="{range_end}" --format="%an" | sort | uniq -c | sort -rn
+```
+
+**Output:** Write `tmp/$0-sync/git.md` with results grouped by week:
+
+```markdown
+## Week {n} ({week_start} to {week_end})
+
+**Commits:** {count}
+**Contributors:** {name} ({count}), {name} ({count})
+
+### Key Commits
+- `{short_hash}` {subject} — {author} ({date})
+```
+
+### Subagent: Meetings
+
+**Input:** Sync range, meeting room names from datasources.md.
+
+**Task:** For each PRIMARY room, query meetings and run semantic search:
+
+```
+GET /api/v1/query
+  entity_types=meeting
+  date_from={range_start}
+  date_to={range_end}
+  room_name={room-name}
+  limit=100
+
+POST /api/v1/search
+  search_text={project-name}
+  entity_types=["meeting"]
+  date_from={range_start}
+  date_to={range_end}
+  limit=50
+```
+
+**Output:** Write `tmp/$0-sync/meetings.md` with results grouped by week:
+
+```markdown
+## Week {n} ({week_start} to {week_end})
+
+### Meeting: {title} ({date}, {room})
+**Participants:** {names}
+**Summary:** {brief summary}
+**Key points:**
+- {point}
+```
+
+---
+
+## Phase 2: Synthesize Week Files (parallel subagents)
+
+After all Phase 1 subagents complete, launch **one subagent per week**, all in parallel. Each produces a single week file.
+
+### Subagent: Week {n}
+
+**Input:** The relevant `## Week {n}` sections extracted from each of:
+- `tmp/$0-sync/zulip.md`
+- `tmp/$0-sync/git.md`
+- `tmp/$0-sync/meetings.md`
+
+Pass only the sections for this specific week — do NOT pass the full files.
+
+**Task:** Merge and analyze the data from all three sources. Categorize into:
+
+1. **Key Decisions** — Technology migrations, architecture changes, vendor switches, security incidents, strategic pivots
+2. **Technical Work** — Feature implementations, bug fixes, infrastructure changes  
+3. **Team Activity** — Core vs. occasional contributors, role changes
+4. **Blockers** — Issues, delays, dependencies
+5. **Deadline Discussions** — Target dates, commitments, timeline changes
+
+**Milestones:** When documenting milestones, capture BOTH:
+- **WHAT** — The technical achievement (e.g., "PostgreSQL migration")
+- **WHY** — The business objective (e.g., "to improve query performance from 107ms to 27ms and enable concurrent access for scaling")
+
+Search for business objectives in: meeting discussions about roadmap, Zulip threads about planning, PR descriptions, release notes, and any "why are we doing this" conversations.
+
+**Deadlines & Timeline Changes:** Search all data sources for:
+- **Explicit deadlines**: "deadline is", "due by", "target date", "ship by X", "launch date"
+- **ETAs and estimates**: "ETA", "expected by", "should be done"
+- **Changes**: "pushed back", "extended", "delayed", "moved up", "ahead of schedule", "slipped"
+- **Commitments**: Agreements on when something will be delivered
+- **Uncertainty**: "not sure when", "TBD", "need to figure out timeline"
+
+For each deadline discussion found, record:
+- What deliverable/milestone is being discussed
+- The date mentioned (if any)
+- Whether it's a new commitment, change, or removal
+- The source (which thread/meeting/commit)
+- Any context about why the timeline changed
+
+**Skip unless meaningful:** Routine check-ins, minor documentation updates, social chat.
+
+**Output:** Write `projects/$0/timeline/{year-month}/week-{n}.md` using the week file template from [project-history](../project-history/SKILL.md). Also return a **3-5 line summary** to the coordinator for use in Phase 3.
+
+Create the month directory first if needed: `mkdir -p projects/$0/timeline/{year-month}`
+
+---
+
+## Phase 3: Finalize (coordinator directly)
+
+The coordinator collects the summaries returned by all Phase 2 subagents. These summaries are small enough to fit in the coordinator's context.
+
+### Step 5: Update Timeline Index
+
+Add links to new week files in `projects/$0/timeline/index.md`. Append entries under the appropriate year/quarter sections. Update milestones if any were reached.
+
+### Step 6: Update Project Dashboard (project.md)
+
+**File:** `projects/$0/project.md`
+
+This is the **living document** — update it with current status from the week summaries:
+
+**Update these sections:**
+
+1. **This Week's Focus** - What the team is actively working on now
+2. **Last Week's Focus** - What was completed in the most recent week
+3. **Team** - Current contributors and their focus areas
+4. **Milestones** - Update status and add new ones with business objectives
+5. **Recent Decisions** - Key decisions from the last 2-3 weeks
+6. **Deadline History** - Track timeline discussions, commitments, and changes
+
+**Deadline Tracking:**
+- Scan all week summaries for deadline-related discussions
+- Add new entries to the "Timeline Evolution" table showing changes
+- Update "Current Commitments" with latest target dates
+- Note when deadlines are mentioned without specific dates (uncertainty)
+- Capture the reasoning behind timeline changes when available
+
+**Deadline History Format:**
+```markdown
+## Deadline History
+
+### Current Commitments
+| Deliverable | Current Target | Source | Confidence |
+|-------------|---------------|--------|------------|
+| Feature X | Mar 15, 2026 | Sprint planning meeting | High |
+| Beta release | Q2 2026 | Roadmap discussion | Medium |
+
+### Timeline Evolution
+| Date | Change | Previous | New | Reason | Source |
+|------|--------|----------|-----|--------|--------|
+| Feb 10 | Extended | Feb 28 | Mar 15 | Additional testing needed | #dev channel |
+| Jan 15 | Committed | - | Feb 28 | Initial sprint commitment | Sprint kickoff |
+```
+
+**Milestone Format:**
+```markdown
+### In Progress 🔄
+| Milestone | Target | Business Objective |
+|-----------|--------|-------------------|
+| Standalone deployment | Feb 2026 | Enable non-developers to self-host without complex setup |
+
+### Recently Completed ✅
+| Milestone | Date | Business Objective |
+|-----------|------|-------------------|
+| PostgreSQL migration | Mar 2025 | Improve performance (107ms→27ms) and enable scaling |
+
+### Lost in Sight / Paused ⏸️
+| Milestone | Status | Reason |
+|-----------|--------|--------|
+| Feature X | Paused | Resources reallocated to higher priority |
+```
+
+**Note:** Milestones in this company change frequently — update status (in progress/done/paused) as needed.
+
+### Step 7: Update Sync State
+
+Create or update `projects/$0/sync-state.md`:
+
+**First sync (Case A):**
+
+```markdown
+# Sync State
+
+status: synced
+created_at: {today's date}
+last_sync_date: {today's date}
+initial_history_from: {range_start}
+initial_history_to: {range_end}
+last_incremental_sync: {today's date}
+```
+
+**Incremental sync (Case B):**
+
+```markdown
+# Sync State
+
+status: synced
+created_at: {original value}
+last_sync_date: {today's date}
+initial_history_from: {original value}
+initial_history_to: {original value}
+last_incremental_sync: {today's date}
+```
+
+### Step 8: Cleanup
+
+```bash
+rm -rf tmp/$0-sync
+```
+
+### Step 9: Summary Report
+
+Output a brief summary:
+
+```markdown
+## Sync Summary: {Date}
+
+### Period Covered
+{range_start} to {range_end}
+
+### Key Changes
+1. Decision: {brief description}
+2. Feature: {what was built}
+3. Team: {who joined/left}
+4. Timeline: {deadline changes or commitments made}
+
+### Metrics
+- {n} new commits
+- {n} active contributors
+- {n} weeks analyzed
+- {n} new Zulip threads
+- {n} meetings recorded
+
+### Current Status
+[Status description]
+```
+
+---
+
+## Key Rules
+
+- **Link to sources**: Always reference commit hashes, PR numbers, Zulip topic names, meeting dates
+- **Be explicit about exclusions**: Document what you're NOT analyzing and why
+- **Write once**: Week files are historical records — don't modify existing ones, only create new ones
+- **Paginate all queries**: Always loop through all pages of results
+- **Distinguish contributor types**: Core (regular activity) vs. occasional (sporadic)
+- **Subagent isolation**: Each subagent should be self-contained. Pass only the data it needs — never the full scratch files
+- **Fail gracefully**: If a datasource subagent fails (e.g., git clone errors, API down), the coordinator should continue with available data and note the gap in the summary
--- a/.agents/skills/self-onboarding/SKILL.md
+++ b/.agents/skills/self-onboarding/SKILL.md
@@ -0,0 +1,357 @@
+---
+name: self-onboarding
+description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
+user-invocable: true
+---
+
+# Self-Onboarding Skill
+
+This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
+
+## ⚠️ MANDATORY EXECUTION PLAN
+
+**ALWAYS follow these phases in order. Do NOT skip phases or combine them.**
+
+```
+PHASE 1: Identity Resolution (Sequential)
+  └─→ Get contact_id via contactdb_get_me()
+  └─→ Extract: name, email, role, location, contact_id
+
+PHASE 2: Historical Data Gathering (Parallel Subagents)
+  └─→ Launch 12 subagents (1 per month, querying all entity types)
+  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
+  └─→ Wait for ALL subagents to complete
+  └─→ Collect and synthesize findings
+
+PHASE 3: Initial Synthesis & Generation (Sequential)
+  └─→ Read MYSELF.example.md
+  └─→ Generate initial MYSELF.md draft
+
+PHASE 4: Deep Dive & Refinement (Parallel Subagents)
+  └─→ Launch 7 subagents for background discovery
+  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
+  └─→ Wait for ALL subagents to complete
+  └─→ Enhance MYSELF.md with findings
+
+PHASE 5: Final Delivery (Sequential)
+  └─→ Review and polish MYSELF.md
+  └─→ Deliver to user
+```
+
+**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)
+
+## When to Use
+
+Use this skill when:
+- A new team member joins and needs to create their `MYSELF.md`
+- Someone wants to update their existing `MYSELF.md` with fresh data
+- You need to understand a person's work patterns, collaborators, and preferences
+
+## Prerequisites
+
+Before starting, ensure:
+1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
+2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
+3. You have access to the MYSELF.example.md template
+
+## Process Overview
+
+The onboarding process consists of 5 phases:
+
+1. **Identity Resolution** - Get the person's contact record
+2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
+3. **Initial Synthesis** - Create initial MYSELF.md draft
+4. **Deep Dive & Refinement** - Search for specific personal details and enhance
+5. **Final Delivery** - Review and deliver completed MYSELF.md
+
+## Phase 1: Identity Resolution
+
+Get the person's identity from ContactDB:
+
+```python
+# Get self
+contactdb_get_me()
+
+# Or search by name
+contactdb_query_contacts(search="Person Name")
+```
+
+**Extract key info:**
+- Name, email, role, location
+- Contact ID (needed for all subsequent queries)
+- Platform identities (Zulip, email, Reflector)
+- Stats (hotness score, interaction counts)
+
+## Phase 2: Historical Data Gathering
+
+**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**
+
+Launch parallel subagents to query all entity types for each monthly time range.
+
+**Mandatory approach (NO EXCEPTIONS):**
+- Time range: Past 12 months (or since joining)
+- One subagent per month that queries ALL entity types
+- Total: 12 subagents (one for each month)
+
+**Why subagents are required:**
+- Each monthly query is independent and can run in parallel
+- Direct queries would take too long and exceed context limits
+- Subagents aggregate data per time period, making synthesis easier
+- This is the ONLY way to get comprehensive historical coverage
+
+**Benefits of 1 subagent per month:**
+- Holistic view of each month across all channels
+- Cross-channel context (e.g., meeting follows up on Zulip discussion)
+- Simpler to implement and debug
+- Results pre-aggregated by time period
+
+**Subagent task structure:**
+
+```
+Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
+
+For each entity type (threaded_conversation, conversation_message, meeting, email):
+  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
+
+Synthesize findings across all channels and return a monthly summary with:
+1. Total activity counts per entity type
+2. Key topics/projects discussed
+3. Notable patterns and themes
+4. Collaborators involved
+5. Work areas/projects identified
+```
+
+**Example time ranges (monthly):**
+- 2025-02-19 to 2025-03-19
+- 2025-03-19 to 2025-04-19
+- ... (continue for 12 months)
+
+## Phase 3: Initial Synthesis & Generation
+
+After gathering all data:
+
+1. **Summarize findings:**
+   - Total activity counts per entity type
+   - Most active time periods
+   - Key projects/topics
+   - Frequent collaborators
+
+2. **Read MYSELF.example.md** to understand the template structure
+
+3. **Generate initial MYSELF.md** with:
+   - Identity section (from ContactDB)
+   - Work areas (from meeting topics, Zulip streams)
+   - Collaborators (from meeting participants, message contacts)
+   - Basic preferences (inferred from activity patterns)
+
+## Phase 4: Deep Dive & Refinement
+
+**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**
+
+**Launch 7 parallel subagents** to search for background information across all categories.
+
+Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.
+
+### Discovery Categories (One Subagent Per Category)
+
+**1. Personal Interests & Hobbies**
+```
+Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
+Look for: recreational activities, interests, entertainment preferences
+```
+
+**2. Work Schedule & Availability**
+```
+Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
+Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
+```
+
+**3. Professional Values & Goals**
+```
+Search: "{Name} values goals mission purpose why he works career objective philosophy"
+Look for: motivations, career aspirations, professional beliefs, purpose statements
+```
+
+**4. Communication & Workflow Preferences**
+```
+Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
+Look for: preferred tools, work methodologies, communication patterns, feedback preferences
+```
+
+**5. Background & Career History**
+```
+Search: "{Name} background career history previous roles education transition story experience"
+Look for: prior jobs, education, career changes, professional journey
+```
+
+**6. Daily Life & Routines**
+```
+Search: "{Name} daily routine family married children commute work-life balance personal context"
+Look for: family situation, daily schedule, personal commitments, lifestyle
+```
+
+**7. Technical Preferences**
+```
+Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
+Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
+```
+
+### Subagent Task Template
+
+```
+Search DataIndex for background information about {Name}.
+
+API Call:
+POST /dataindex/api/v1/search
+{
+  "search_text": "{Name} {category-specific search terms}",
+  "date_from": "{12_months_ago}",
+  "date_to": "{today}",
+  "limit": 20
+}
+
+Extract and return:
+- Specific details found (quotes if available)
+- Patterns or recurring themes
+- Context about personal/professional life
+- Any notable insights
+```
+
+**Why parallel subagents:**
+- Each search is independent - perfect for parallelization
+- Reduces execution time from minutes to seconds
+- Comprehensive coverage without overwhelming the main agent
+- Gathers rich context for personalizing the MYSELF.md
+
+**Critical: Use name-based search**
+- Always include the person's full name in the search query
+- Do NOT rely on contact_id filtering for semantic search
+- Personal details appear in transcripts where names are mentioned
+- contact_id filters work for exact queries but fail for RAG/semantic retrieval
+
+When searching for personal details, use the person's full name in the query:
+
+```python
+# GOOD - Uses name in search text
+dataindex_search(
+    query="Mathieu Virbel hobbies interests personal life outside work",
+    date_from="2025-02-19T00:00:00Z",
+    date_to="2026-02-19T00:00:00Z",
+    limit=20
+)
+
+# BAD - Only filters by contact_id (won't find personal context)
+dataindex_search(
+    query="hobbies interests personal life",
+    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
+    limit=20
+)
+```
+
+**Key Insight:**
+- Semantic search works best with full context in the query text
+- contact_id filtering works for exact entity matching but not for RAG retrieval
+- Personal details often appear in meeting transcripts where names are mentioned
+
+## Output: MYSELF.md Structure
+
+The final document should include:
+
+```markdown
+# About Me
+
+## Identity
+- Name, Role, Contact ID, Email, Location
+- Family status (if discovered)
+
+## What I work on
+- Primary projects with descriptions
+- Client work
+- Additional responsibilities
+
+## People I work with frequently
+- List of key collaborators with context
+
+## Personal Context (if discovered)
+- Background/career history
+- Daily schedule & constraints
+- Interests & values
+
+## Preferences
+- Work style
+- Default date ranges
+- Output formats
+- Topics of interest
+- Communication patterns
+- Tools & workflow
+- Security/privacy stance
+- Current learning areas
+- Known challenges
+```
+
+## Tips for Quality Results
+
+1. **Be thorough in Phase 2** - More historical data = better insights
+2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
+3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
+4. **Ask follow-up questions** - Users often want to discover unexpected things
+5. **Search by name, not ID** - Critical for finding personal context
+6. **Synthesize meeting transcripts** - They contain rich personal details
+7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
+8. **Update over time** - MYSELF.md should evolve as the person does
+
+## Common Mistakes to Avoid
+
+**❌ DON'T query DataIndex directly in Phase 2 or 4**
+- Direct queries miss the monthly breakdown
+- You won't get comprehensive historical coverage
+- Context limits will truncate results
+
+**❌ DON'T launch 48 subagents (12 months × 4 entity types)**
+- Use 12 subagents (1 per month) instead
+- Each monthly subagent queries all 4 entity types
+- Simpler coordination and better cross-channel context
+
+**❌ DON'T skip Phase 2 and go straight to Phase 4**
+- You need historical context before doing deep searches
+- The monthly aggregation reveals patterns you can't see otherwise
+
+**❌ DON'T use contact_id filtering for semantic searches**
+- RAG/semantic search requires the person's name in the query text
+- contact_id filters only work for exact entity matching
+
+**✅ ALWAYS use the Task tool to launch subagents**
+- This is the only way to achieve true parallelism
+- Each subagent gets its own context window
+- Results can be aggregated after all complete
+
+## Example Usage
+
+```
+User: "Help me create my MYSELF.md"
+
+Agent:
+1. Gets user's identity via contactdb_get_me()
+2. Discovers contact_id = 4, name = "Mathieu Virbel"
+3. Launches 12 subagents for historical data (1 per month, all entity types)
+4. Gathers summaries from all subagents
+5. Generates initial MYSELF.md
+6. Launches 7 parallel subagents for background discovery:
+   - Personal interests & hobbies
+   - Work schedule & availability
+   - Professional values & goals
+   - Communication & workflow preferences
+   - Background & career history
+   - Daily life & routines
+   - Technical preferences
+7. Gathers all search results
+8. Updates MYSELF.md with rich personal context
+9. Delivers final document
+```
+
+**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks
+
+## Files
+
+- `MYSELF.example.md` - Template file to copy and fill
+- `MYSELF.md` - Generated output (gitignored, personal to each user)
--- a/.agents/skills/workflow/SKILL.md
+++ b/.agents/skills/workflow/SKILL.md
@@ -0,0 +1,105 @@
+---
+name: workflow
+description: Create a marimo notebook for data analysis. Use when the request involves analysis over time periods, large data volumes, or when the user asks to "create a workflow".
+disable-model-invocation: true
+argument-hint: [topic]
+---
+
+# Workflow — Create a Marimo Notebook
+
+## When to create a marimo notebook
+
+Any request that involves **analysis over a period of time** (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a **large volume of data** — too much to process inline. In these cases, **always produce a marimo notebook** (a `.py` file following the patterns in the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md)).
+
+Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".
+
+If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
+
+## Always create a new workflow
+
+When the user requests a workflow, **always create a new notebook file**. Do **not** modify or re-run an existing workflow unless the user explicitly asks you to (e.g., "update workflow 001", "fix the sentiment notebook", "re-run the existing analysis"). Each new request gets its own sequentially numbered file — even if it covers a similar topic to an earlier workflow.
+
+## File naming and location
+
+All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
+
+```
+workflows/<NNN>_<topic>_<scope>.py
+```
+
+- `<NNN>` — zero-padded sequence number (`001`, `002`, …). Look at existing files in `workflows/` to determine the next number.
+- `<topic>` — what is being analyzed, in snake_case (e.g., `greyhaven_meetings`, `alice_emails`, `hiring_discussions`)
+- `<scope>` — time range or qualifier (e.g., `january`, `q1_2026`, `last_30d`, `all_time`)
+
+**Examples:**
+
+```
+workflows/001_greyhaven_meetings_january.py
+workflows/002_alice_emails_q1_2026.py
+workflows/003_hiring_discussions_last_30d.py
+workflows/004_team_interaction_timeline_all_time.py
+```
+
+**Before creating a new workflow**, list existing files in `workflows/` to find the highest number and increment it.
+
+## Plan before you implement
+
+Before writing any notebook, **always propose a plan first** and get the user's approval. The plan should describe:
+
+1. **Goal** — What question are we answering?
+2. **Data sources** — Which entity types and API endpoints will be used?
+3. **Algorithm / ETL steps** — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
+4. **Output format** — Table columns, charts, or summary statistics the user will see.
+
+Only proceed to implementation after the user confirms the plan.
+
+## Validate before delivering
+
+After writing or editing a notebook, **always run `uvx marimo check`** to verify it has no structural errors (duplicate variables, undefined names, branch expressions, etc.):
+
+```bash
+uvx marimo check workflows/NNN_topic_scope.py
+```
+
+A clean check (no output, exit code 0) means the notebook is valid. Fix any errors before delivering the notebook to the user.
+
+## Steps
+
+1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
+2. **Find data** — Use DataIndex `GET /query` (exhaustive, paginated) or `POST /search` (semantic, ranked) with `contact_ids`, `entity_types`, `date_from`/`date_to`, `connector_ids` filters.
+3. **Analyze** — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md) for detailed patterns).
+
+## Quick Example (Python)
+
+> "Find all emails involving Alice since January"
+
+```python
+import httpx
+
+CONTACTDB = "http://localhost:42000/contactdb-api"
+DATAINDEX = "http://localhost:42000/dataindex/api/v1"
+client = httpx.Client(timeout=30)
+
+# 1. Resolve "Alice" to a contact_id
+resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
+alice_id = resp.json()["contacts"][0]["id"]  # e.g. 42
+
+# 2. Fetch all emails involving Alice (with pagination)
+emails = []
+offset = 0
+while True:
+    resp = client.get(f"{DATAINDEX}/query", params={
+        "entity_types": "email",
+        "contact_ids": str(alice_id),
+        "date_from": "2025-01-01T00:00:00Z",
+        "limit": 50,
+        "offset": offset,
+    })
+    data = resp.json()
+    emails.extend(data["items"])
+    if offset + 50 >= data["total"]:
+        break
+    offset += 50
+
+print(f"Found {len(emails)} emails involving Alice")
+```
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,3 @@
+LLM_API_URL=https://litellm-notrack.app.monadical.io
+LLM_MODEL=Kimi-K2.5-sandbox
+LLM_API_KEY=xxxxx
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,5 @@
+.env
+MYSELF.md
+__pycache__/
+workflows/*.py
+projects
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -2,26 +2,38 @@

 The InternalAI platform aggregates company data from email, calendars, Zulip chat, meetings, and documents into two core APIs. These docs give LLM agents the context they need to build programmatic workflows — typically as marimo notebooks — that answer analytical questions about people and their interactions.

+As an agent, assume you're running within our greywall sandbox.
+
 ## Routing Table

 | I need to...                                | Read                                              |
-|---------------------------------------------|-------------------------------|
-| Understand the company and its tools        | [company-context.md]          |
-| Look up people, contacts, relationships     | [contactdb-api.md]            |
-| Query emails, meetings, chats, documents    | [dataindex-api.md]            |
-| Know which connector provides what data     | [connectors-and-sources.md]   |
-| Create a marimo analysis notebook           | [notebook-patterns.md]        |
+|---------------------------------------------|---------------------------------------------------|
+| Know who the user is and what they care about | [MYSELF.md]                                     |
+| Understand the company and its tools        | [company skill]                                   |
+| Look up people, contacts, relationships     | [contactdb skill]                                 |
+| Query emails, meetings, chats, documents    | [dataindex skill]                                 |
+| Know which connector provides what data     | [connectors skill]                                |
+| Create a marimo analysis notebook           | [workflow skill] + [notebook-patterns skill]      |
+| Build a weekly checkout                     | [checkout skill]                                  |
+| Create my MYSELF.md from historical data    | [self-onboarding skill]                           |
+
+## About the User
+
+If `MYSELF.md` exists in the project root, **read it first** before starting any workflow. It contains the user's name, role, team, frequent collaborators, and preferences. Use this context to:
+
+- Address the user by name in notebook markdown
+- Default `TARGET_PERSON` or filter values to people they work with
+- Scope date ranges and topics to their stated interests
+- Tailor output format to their preferences
+
+If `MYSELF.md` does not exist, suggest running `/self-onboarding` to generate it automatically from historical data. Alternatively, the user can copy `MYSELF.example.md` to `MYSELF.md` and fill it in manually.

 ## API Base URLs

 | Service    | Swagger UI                                        | OpenAPI JSON                           |
 |------------|---------------------------------------------------|----------------------------------------|
-| ContactDB  | `http://localhost:42000/contactdb-api/docs`       | `/contactdb-api/openapi.json`          |
-| DataIndex  | `http://localhost:42000/dataindex/docs`            | `/dataindex/openapi.json`              |
-
-Direct backend ports (without Caddy proxy):
- ContactDB: `http://localhost:42800`
- DataIndex: `http://localhost:42180`
+| ContactDB  | `http://localhost:42000/contactdb-api/docs` (direct), or `http://caddy/contactdb-api/docs` (via greywall sandbox)       | `/contactdb-api/openapi.json`          |
+| DataIndex  | `http://localhost:42000/dataindex/docs` (direct), or `http://caddy/dataindex/docs` (via greywall sandbox)            | `/dataindex/openapi.json`              |

 ## Common Questions → API Calls

@@ -43,101 +55,24 @@ Use this table to translate natural language questions into API calls. The base

 **Key pattern:** Any question about "me" / "my" / "I" requires calling `GET /contactdb-api/api/contacts/me` first to get your `contact_id`, then using that ID in subsequent DataIndex queries.

-## Workflow
-
-### When to create a marimo notebook
-
-Any request that involves **analysis over a period of time** (e.g., "meetings this month", "emails since January", "interaction trends") is likely to return a **large volume of data** — too much to process inline. In these cases, **always produce a marimo notebook** (a `.py` file following the patterns in [notebook-patterns.md]).
-
-Also create a notebook when the user asks to "create a workflow", "write a workflow", or "build an analysis".
-
-If you're unsure whether a question is simple enough to answer directly or needs a notebook, **ask the user**.
-
-### File naming and location
-
-All notebooks go in the **`workflows/`** directory. Use a sequential number prefix so workflows stay ordered by creation:
-
-```
-workflows/<NNN>_<topic>_<scope>.py
-```
-
- `<NNN>` — zero-padded sequence number (`001`, `002`, …). Look at existing files in `workflows/` to determine the next number.
- `<topic>` — what is being analyzed, in snake_case (e.g., `greyhaven_meetings`, `alice_emails`, `hiring_discussions`)
- `<scope>` — time range or qualifier (e.g., `january`, `q1_2026`, `last_30d`, `all_time`)
-
-**Examples:**
-
-```
-workflows/001_greyhaven_meetings_january.py
-workflows/002_alice_emails_q1_2026.py
-workflows/003_hiring_discussions_last_30d.py
-workflows/004_team_interaction_timeline_all_time.py
-```
-
-**Before creating a new workflow**, list existing files in `workflows/` to find the highest number and increment it.
-
-### Plan before you implement
-
-Before writing any notebook, **always propose a plan first** and get the user's approval. The plan should describe:
-
-1. **Goal** — What question are we answering?
-2. **Data sources** — Which entity types and API endpoints will be used?
-3. **Algorithm / ETL steps** — Step-by-step description of the data pipeline: what gets fetched, how it's filtered, joined, or aggregated, and what the final output looks like.
-4. **Output format** — Table columns, charts, or summary statistics the user will see.
-
-Only proceed to implementation after the user confirms the plan.
-
-### Steps
-
-1. **Identify people** — Use ContactDB to resolve names/emails to `contact_id` values. For "me"/"my" questions, always start with `GET /api/contacts/me`.
-2. **Find data** — Use DataIndex `GET /query` (exhaustive, paginated) or `POST /search` (semantic, ranked) with `contact_ids`, `entity_types`, `date_from`/`date_to`, `connector_ids` filters.
-3. **Analyze** — For simple answers, process the API response directly. For complex multi-step analysis, build a marimo notebook (see [notebook-patterns.md]).
-
-### Quick Example (Python)
-
-> "Find all emails involving Alice since January"
-
-```python
-import httpx
-
-CONTACTDB = "http://localhost:42000/contactdb-api"
-DATAINDEX = "http://localhost:42000/dataindex/api/v1"
-client = httpx.Client(timeout=30)
-
-# 1. Resolve "Alice" to a contact_id
-resp = client.get(f"{CONTACTDB}/api/contacts", params={"search": "Alice"})
-alice_id = resp.json()["contacts"][0]["id"]  # e.g. 42
-
-# 2. Fetch all emails involving Alice (with pagination)
-emails = []
-offset = 0
-while True:
-    resp = client.get(f"{DATAINDEX}/query", params={
-        "entity_types": "email",
-        "contact_ids": str(alice_id),
-        "date_from": "2025-01-01T00:00:00Z",
-        "limit": 50,
-        "offset": offset,
-    })
-    data = resp.json()
-    emails.extend(data["items"])
-    if offset + 50 >= data["total"]:
-        break
-    offset += 50
-
-print(f"Found {len(emails)} emails involving Alice")
-```
-
 ## File Index

- [company-context.md] — Business context, team structure, vocabulary
- [contactdb-api.md] — ContactDB entities and REST endpoints
- [dataindex-api.md] — DataIndex entity types, query modes, REST endpoints
- [connectors-and-sources.md] — Connector-to-entity-type mapping
- [notebook-patterns.md] — Marimo notebook patterns and common API workflows
+- [MYSELF.md] — User identity, role, collaborators, and preferences (gitignored, copy from `MYSELF.example.md`)
+- [company skill] — Business context, team structure, vocabulary
+- [contactdb skill] — ContactDB entities and REST endpoints
+- [dataindex skill] — DataIndex entity types, query modes, REST endpoints
+- [connectors skill] — Connector-to-entity-type mapping
+- [workflow skill] — How to create marimo analysis notebooks
+- [notebook-patterns skill] — Marimo notebook patterns and common API workflows
+- [checkout skill] — Weekly review builder
+- [self-onboarding skill] — Generate a personalized MYSELF.md from 12 months of historical activity

-[company-context.md]: ./docs/company-context.md
-[contactdb-api.md]: ./docs/contactdb-api.md
-[dataindex-api.md]: ./docs/dataindex-api.md
-[connectors-and-sources.md]: ./docs/connectors-and-sources.md
-[notebook-patterns.md]: ./docs/notebook-patterns.md
+[MYSELF.md]: ./MYSELF.md
+[company skill]: ./.agents/skills/company/SKILL.md
+[contactdb skill]: ./.agents/skills/contactdb/SKILL.md
+[dataindex skill]: ./.agents/skills/dataindex/SKILL.md
+[connectors skill]: ./.agents/skills/connectors/SKILL.md
+[workflow skill]: ./.agents/skills/workflow/SKILL.md
+[notebook-patterns skill]: ./.agents/skills/notebook-patterns/SKILL.md
+[checkout skill]: ./.agents/skills/checkout/SKILL.md
+[self-onboarding skill]: ./.agents/skills/self-onboarding/SKILL.md
--- a/README.md
+++ b/README.md
@@ -1,19 +1,58 @@
-# InternalAI Agent
+# InternalAI Workspace

-A documentation and pattern library that gives LLM agents the context they need to build data analysis workflows against Monadical's internal systems — ContactDB (people directory) and DataIndex (unified data from email, calendar, Zulip, meetings, documents).
+Agent-assisted workspace to work on your own data with InternalAI (ContactDB / DataIndex).

-The goal is to use [opencode](https://opencode.ai) (or any LLM-powered coding tool) to iteratively create [marimo](https://marimo.io) notebook workflows that query and analyze company data.
+## Things you can do

-## Getting Started
+- **Onboard yourself** — `can you onboard me?` creates your `MYSELF.md`
+- **Weekly checkout** — `create my checkout of last week` builds a summary from your activity
+- **Data analysis** — `create a workflow that searches all meetings since 2024 where Max is listed as a participant (not a contactdb), and output as csv` creates a marimo notebook in `workflows/`
+- **Init a project** — `create the creatrix project` creates `projects/creatrix/` with base information
+- **Sync a project** — `sync the creatrix project` runs a full 1-year analysis on the first run, then incremental syncs afterward, producing a live `project.md` document
+
+## Setup

 ### Prerequisites

- [opencode](https://opencode.ai) installed
- Access to the InternalAI platform (ContactDB + DataIndex running locally, accessible via http://localhost:42000)
+- [Greywall](https://gitea.app.monadical.io/monadical/greywall) installed — verify with `greywall --version`
+- [OpenCode](https://opencode.ai) installed as a native binary (not a wrapper via bun/npm/pnpm)

-### Configuring opencode with LiteLLM
+### Greywall sandbox template

-To use models through LiteLLM, add the following to `~/.config/opencode/config.json`:
+Run OpenCode in learning mode so Greywall can observe which files it reads and writes:
+
+```
+greywall --learning -- opencode
+```
+
+Interact briefly, then exit OpenCode. Greywall generates a sandbox template based on the observed filesystem access. Edit the template if needed.
+
+### MCP configuration
+
+Add the ContactDB and DataIndex MCP servers:
+
+```
+greywall -- opencode mcp add
+```
+
+Run the command twice with these settings:
+
+| Name | Type | URL | OAuth |
+|------|------|-----|-------|
+| `contactdb` | Remote MCP | `http://caddy/contactdb-api/mcp/` | No |
+| `dataindex` | Remote MCP | `http://caddy/dataindex/mcp/` | No |
+
+Verify the servers are registered:
+
+```
+greywall -- opencode mcp list
+```
+
+Then open your proxy at `http://localhost:42000/proxy` and allow access to Caddy.
+
+### LiteLLM provider
+
+Add a `litellm` provider in `opencode.json`:

 ```json
 {
@@ -23,12 +62,12 @@ To use models through LiteLLM, add the following to `~/.config/opencode/config.j
      "npm": "@ai-sdk/openai-compatible",
      "name": "Litellm",
      "options": {
-        "baseURL": "https://litellm.app.monadical.io",
-        "apiKey": "xxxxx"
+        "baseURL": "https://litellm-notrack.app.monadical.io",
+        "apiKey": "sk-xxxxx"
      },
      "models": {
-        "Kimi-K2.5-dev": {
-          "name": "Kimi-K2.5-dev"
+        "Kimi-K2.5-sandbox": {
+          "name": "Kimi-K2.5-sandbox"
        }
      }
    }
@@ -36,85 +75,93 @@ To use models through LiteLLM, add the following to `~/.config/opencode/config.j
 }
 ```

-Replace `xxxxx` with your actual LiteLLM API key.
+Replace `apiKey` with your own key (check 1Password for "litellm notrack").

-### Running opencode
+## Usage

-From the project root:
-
-```bash
-opencode
-```
-
-opencode will pick up `AGENTS.md` automatically and use it as the entry point to understand the project, the available APIs, and how to write workflows.
-
-## How AGENTS.md Works
-
-`AGENTS.md` is the routing guide for LLM agents. It is structured as follows:
-
-1. **Purpose statement** — Explains that the agent's job is to build marimo notebooks that analyze company data.
-
-2. **Documentation routing table** — Directs the agent to the right file depending on the topic:
-
-   | Topic | File |
-   |-------|------|
-   | Company context, tools, connectors overview | `docs/company-context.md` |
-   | People, contacts, relationships | `docs/contactdb-api.md` |
-   | Querying emails, meetings, chats, docs | `docs/dataindex-api.md` |
-   | Connector-to-entity-type mappings | `docs/connectors-and-sources.md` |
-   | Notebook creation patterns and templates | `docs/notebook-patterns.md` |
-
-3. **API base URLs** — ContactDB and DataIndex endpoints (both via Caddy proxy and direct).
-
-4. **Common query translation table** — Maps natural-language questions (e.g. "Who am I?", "Recent meetings") to the corresponding API calls.
-
-5. **Workflow rules** — When to create a notebook vs. answer inline, naming conventions, and the requirement to propose a plan before implementing.
-
-## Workflow
-
-### How it works
-
-1. **Ask a question in opencode** — Describe what you want to analyze (e.g. "Show me all meetings about Greyhaven in January").
-
-2. **Agent reads AGENTS.md** — opencode picks up the routing guide and navigates to the relevant docs to understand the APIs.
-
-3. **Agent proposes a plan** — Before writing code, the agent outlines: Goal, Data Sources, Algorithm, and Output Format.
-
-4. **Agent creates a marimo notebook** — A `.py` file is written to `workflows/` following the naming convention `<NNN>_<topic>_<scope>.py`.
-
-5. **Iterate** — Run the notebook with `marimo edit workflows/<name>.py`, review the output, and ask the agent to refine.
-
-### Workflow output format
-
-Workflows are [marimo notebooks](https://marimo.io) — plain Python files with `@app.cell` decorators. They typically follow this structure:
-
- **params cell** — User-editable parameters (search terms, date ranges, contact names)
- **config cell** — API base URLs
- **setup cell** — Shared imports (`httpx`, `polars`, `marimo`)
- **data cells** — Fetch and transform data from ContactDB / DataIndex
- **output cells** — Tables, charts, or markdown summaries
-
-### Naming convention
+Start OpenCode inside the Greywall sandbox:

 ```
-workflows/<NNN>_<topic>_<scope>.py
+greywall -- opencode
 ```

-Examples:
- `001_greyhaven_meetings_january.py`
- `002_email_activity_q1.py`
+### First-run checklist

-## Project Structure
+1. Select the Kimi K2.5 model under litellm in `/models` — type "hello" to confirm it responds (if not, check the proxy)
+2. Test ContactDB access — ask "who am I?" (should trigger `get_me`)
+3. Test DataIndex access — ask "what was my last meeting about?"
+
+## Skills
+
+Skills are agent instructions stored in `.agents/skills/`. They follow the [Agent Skills](https://agentskills.io) standard (same structure as `.claude/skills/`). Some are invoked by the user via `/name`, others are background knowledge the agent loads automatically when relevant.
+
+### Task Skills (user-invoked)
+
+These are workflows you trigger explicitly. The agent will not run them on its own.
+
+| Skill | Invocation | Purpose |
+|-------|-----------|---------|
+| **project-init** | `/project-init [name]` | Set up a new project: create directory structure, discover data sources (Zulip streams, git repos, meeting rooms), write `datasources.md` and `background.md` skeleton. Stops before gathering data so you can review the sources. |
+| **project-history** | `/project-history [name] [from] [to]` | Build the initial timeline for a project. Queries all datasources for a date range, creates week-by-week analysis files, builds the timeline index, and synthesizes the background. Requires `project-init` first. |
+| **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. |
+| **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. |
+| **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. |
+| **self-onboarding** | `/self-onboarding` | Generate a personalized `MYSELF.md` by analyzing 12 months of historical activity (meetings, emails, Zulip, calendar). Runs 19 parallel subagents to build a comprehensive profile. |
+
+### Reference Skills (agent-loaded automatically)
+
+These provide background knowledge the agent loads when relevant. They don't appear in the `/` menu.
+
+| Skill | What the agent learns |
+|-------|----------------------|
+| **connectors** | Which data connectors exist and what entity types they produce (reflector, zulip, email, calendar, etc.) |
+| **dataindex** | How to query the DataIndex REST API (`GET /query`, `POST /search`, `GET /entities/{id}`) |
+| **contactdb** | How to resolve people to contact IDs via the ContactDB REST API |
+| **company** | Monadical org structure, Zulip channel layout, communication tools, meeting/calendar relationships |
+| **notebook-patterns** | Marimo notebook rules: cell scoping, async patterns, pagination helpers, analysis templates |
+
+## Project Tracking
+
+Project analysis files live in `projects/`. See [projects/README.md](projects/README.md) for the directory structure and categorization guidelines.
+
+**Typical workflow:**

 ```
-internalai-agent/
-├── AGENTS.md                        # LLM agent routing guide (entry point)
-├── README.md
-├── docs/
-│   ├── company-context.md           # Monadical org, tools, key concepts
-│   ├── contactdb-api.md             # ContactDB REST API reference
-│   ├── dataindex-api.md             # DataIndex REST API reference
-│   ├── connectors-and-sources.md    # Connector → entity type mappings
-│   └── notebook-patterns.md         # Marimo notebook templates and patterns
-└── workflows/                       # Generated analysis notebooks go here
+/project-init myproject           # 1. Discover sources, create skeleton
+# Review datasources.md, adjust if needed
+/project-history myproject 2025-06-01 2026-02-17  # 2. Backfill history
+# ... time passes ...
+/project-sync myproject           # 3. Incremental update
 ```
+
+Each project produces:
+
+```
+projects/{name}/
+├── datasources.md      # Where to find data (Zulip streams, git repos, meeting rooms)
+├── background.md       # Living doc: current status, team, architecture
+├── sync-state.md       # Tracks last sync date for incremental updates
+└── timeline/
+    ├── index.md        # Navigation and milestones
+    └── {year-month}/
+        └── week-{n}.md # One week of history (write-once)
+```
+
+## Data Analysis Workflows
+
+Analysis notebooks live in `workflows/`. Each is a marimo `.py` file.
+
+```
+/workflow meetings-with-alice    # Creates workflows/NNN_meetings_with_alice.py
+```
+
+See the [workflow skill](.agents/skills/workflow/SKILL.md) for naming conventions and the [notebook-patterns skill](.agents/skills/notebook-patterns/SKILL.md) for marimo coding rules.
+
+## Data Sources
+
+All data flows through two APIs:
+
+- **DataIndex** (`localhost:42000/dataindex/api/v1` direct, `http://caddy/dataindex/api/v1` via greywall sandbox) — unified query interface for all entity types
+- **ContactDB** (`localhost:42000/contactdb-api` direct, `http://caddy/contactdb-api/` via greywall sandbox) — people directory, resolves names/emails to contact IDs
+
+Connectors that feed DataIndex: `reflector` (meetings), `zulip` (chat), `mbsync_email` (email), `ics_calendar` (calendar), `hedgedoc` (documents), `browser_history` (web pages), `babelfish` (translations).
--- a/workflows/.empty
+++ b/workflows/.empty
--- a/workflows/lib/init.py
+++ b/workflows/lib/init.py
@@ -0,0 +1,5 @@
+"""Library modules for contact analysis workbooks."""
+
+from lib.llm import llm_call
+
+__all__ = ["llm_call"]
--- a/workflows/lib/llm.py
+++ b/workflows/lib/llm.py
@@ -0,0 +1,71 @@
+"""Simple LLM helper for workbooks using Mirascope v2."""
+
+import os
+import re
+from typing import TypeVar
+
+from mirascope import llm
+from pydantic import BaseModel
+
+T = TypeVar("T", bound=BaseModel)
+
+# Configure from environment (defaults match .env.example)
+_api_key = os.getenv("LLM_API_KEY", "")
+_base_url = os.getenv("LLM_API_URL", "https://litellm-notrack.app.monadical.io")
+_model = os.getenv("LLM_MODEL", "GLM-4.5-Air-FP8-dev")
+
+# Register our LiteLLM endpoint as an OpenAI-compatible provider
+_base = (_base_url or "").rstrip("/")
+llm.register_provider(
+    "openai",
+    scope="litellm/",
+    base_url=_base if _base.endswith("/v1") else f"{_base}/v1",
+    api_key=_api_key,
+)
+
+
+def _sanitize_json(text: str) -> str:
+    """Strip control characters (U+0000–U+001F) that break JSON parsing.
+
+    Some LLMs emit literal newlines/tabs inside JSON string values,
+    which is invalid per the JSON spec. Replace them with spaces.
+    """
+    return re.sub(r"[\x00-\x1f]+", " ", text)
+
+
+async def llm_call(
+    prompt: str,
+    response_model: type[T],
+    system_prompt: str = "You are a helpful assistant.",
+    model: str | None = None,
+) -> T:
+    """Make a structured LLM call.
+
+    Args:
+        prompt: The user prompt
+        response_model: Pydantic model for structured output
+        system_prompt: System instructions
+        model: Override the default model
+
+    Returns:
+        Parsed response matching the response_model schema
+    """
+    use_model = model or _model
+
+    @llm.call(f"litellm/{use_model}", format=response_model)
+    async def _call() -> str:
+        return f"{system_prompt}\n\n{prompt}"
+
+    response = await _call()
+    try:
+        return response.parse()
+    except Exception:
+        # Fallback: extract content and parse manually
+        # response.content could be a string or a list of Text objects
+        content = response.content
+        if isinstance(content, list):
+            # Extract text from list of Text objects
+            content = "".join([chunk.text if hasattr(chunk, 'text') else str(chunk) for chunk in content])
+        elif not isinstance(content, str):
+            content = str(content)
+        return response_model.model_validate_json(_sanitize_json(content))
Author	SHA1	Message	Date
Mathieu Virbel	722b9d186a	fix: ignore user-created workflow and projects	2026-02-20 16:25:18 -06:00
Mathieu Virbel	a677ae861b	feat: add deadline search on project sync/init	2026-02-20 16:24:25 -06:00
Nik L	0b8e2f105d	feat: investor meeting skill	2026-02-20 17:21:03 -05:00
Mathieu Virbel	d48d989495	chore: move "things you can do" before setup in README	2026-02-20 15:29:49 -06:00
Mathieu Virbel	54de10a8cc	feat: add setup, configuration, and usage guide to README	2026-02-20 15:28:28 -06:00
Mathieu Virbel	0f81bb77d5	chore: update default LLM model to Kimi-K2.5-sandbox	2026-02-20 15:14:43 -06:00
Mathieu Virbel	3b8d45e863	feat: suggest self-onboarding when MYSELF.md is missing	2026-02-20 15:14:16 -06:00
Mathieu Virbel	1f971e2f9b	feat: add self-onboarding skill to AGENTS.md	2026-02-20 15:13:35 -06:00
Mathieu Virbel	f2a8eaff3c	feat: add self-onboarding skill	2026-02-20 15:12:29 -06:00
Mathieu Virbel	fc3bdf86ba	chore: remove unused docs/ directory	2026-02-20 15:11:22 -06:00
Nik L	864038a60e	feat: internalai-business-context	2026-02-19 18:24:56 -05:00
Nik L	8c4b32df08	feat: improved init-greyhaven	2026-02-19 17:55:22 -05:00
Nik L	03cc8c797a	feat: greyhaven latest context initializer	2026-02-19 17:00:43 -05:00
Mathieu Virbel	eefac81e57	feat: migrate to skills-based approach	2026-02-19 11:36:32 -06:00
Mathieu Virbel	42a9b5289f	fix	2026-02-19 11:33:17 -06:00
Mathieu Virbel	e15057e75e	fix: remove old url	2026-02-10 20:33:10 -06:00
Mathieu Virbel	438b380d04	Fix broken "Cell Output Must Be at the Top Level" example in notebook patterns The previous "FIXED" example still had mo.md() called as a side effect inside if/else blocks (never rendered) while mixing it with a DataFrame in the same cell. Replace with the correct pattern: split into separate cells where each displays exactly one thing at the top level.	2026-02-10 20:26:05 -06:00
Mathieu Virbel	1abb74705c	remove unecessary .empty	2026-02-10 19:50:10 -06:00
Mathieu Virbel	0a306d847c	Add __pycache__ to gitignore	2026-02-10 19:49:23 -06:00
Mathieu Virbel	f641cc267e	Add room_name to meeting table examples in docs Include room_name as a column in meeting DataFrames — it shows the virtual room name and helps identify meeting location when title is generic or missing.	2026-02-10 19:48:52 -06:00
Mathieu Virbel	46dfebd05f	Update docs and fix LLM JSON parsing - Use load_dotenv(".env") explicitly in all doc examples - Move pydantic imports (BaseModel, Field) to setup cell in all examples - Add separate display cell pattern for DataFrame inspection - Fix LLM control character error: sanitize JSON before Pydantic parsing - Remove debug print from llm.py	2026-02-10 19:45:04 -06:00
Mathieu Virbel	8eb1fb87a7	Add MYSELF.md user profile for agent personalization - Add MYSELF.example.md template with identity, role, collaborators, and preferences sections - Add MYSELF.md to .gitignore (contains personal info) - Reference MYSELF.md in AGENTS.md routing table, new "About the User" section, and file index - Add setup step and routing entry in README.md	2026-02-10 19:30:14 -06:00
Mathieu Virbel	d04aa26f31	Update marimo notebook docs with lessons from workflow debugging - Add rules: all imports in setup cell, cell output at top level, async cells need async def, return classes from model cells, use python-dotenv for .env loading - Add marimo check validation step to AGENTS.md and notebook-patterns.md - Add "always create new workflow" rule to AGENTS.md - Add new doc sections: Cell Output Must Be at the Top Level, Async Cells, Cells That Define Classes, Fixing _unparsable_cell, Checking Notebooks Before Running - Update all code examples to follow new import/output rules - Update workflows/lib/llm.py for mirascope v2 API	2026-02-10 19:25:53 -06:00
Mathieu Virbel	439e9db0a4	Add LLM filtering pattern, .env.example, and workflows/lib - Add .env.example with LLM_API_URL, LLM_MODEL, LLM_API_KEY - Add .gitignore to exclude .env - Add Pattern 5 (LLM filtering) to notebook-patterns.md - Track workflows/lib with llm_call helper using mirascope - Update README with LLM setup step and updated project structure	2026-02-10 18:32:20 -06:00
Mathieu Virbel	a17cf63d2f	Add concise README with setup, quickstart, and AGENTS.md overview	2026-02-10 18:24:55 -06:00