From f2a8eaff3cbe27dc0ba4a0cf37785280b5960c78 Mon Sep 17 00:00:00 2001
From: Mathieu Virbel <mat@meltingrocks.com>
Date: Fri, 20 Feb 2026 15:12:29 -0600
Subject: [PATCH] feat: add self-onboarding skill

---
 .agents/skills/self-onboarding/SKILL.md | 357 ++++++++++++++++++++++++
 README.md                               |   1 +
 2 files changed, 358 insertions(+)
 create mode 100644 .agents/skills/self-onboarding/SKILL.md

diff --git a/.agents/skills/self-onboarding/SKILL.md b/.agents/skills/self-onboarding/SKILL.md
new file mode 100644
index 0000000..c530624
--- /dev/null
+++ b/.agents/skills/self-onboarding/SKILL.md
@@ -0,0 +1,357 @@
+---
+name: self-onboarding
+description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
+user-invocable: true
+---
+
+# Self-Onboarding Skill
+
+This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
+
+## ⚠️ MANDATORY EXECUTION PLAN
+
+**ALWAYS follow these phases in order. Do NOT skip phases or combine them.**
+
+```
+PHASE 1: Identity Resolution (Sequential)
+  └─→ Get contact_id via contactdb_get_me()
+  └─→ Extract: name, email, role, location, contact_id
+
+PHASE 2: Historical Data Gathering (Parallel Subagents)
+  └─→ Launch 12 subagents (1 per month, querying all entity types)
+  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
+  └─→ Wait for ALL subagents to complete
+  └─→ Collect and synthesize findings
+
+PHASE 3: Initial Synthesis & Generation (Sequential)
+  └─→ Read MYSELF.example.md
+  └─→ Generate initial MYSELF.md draft
+
+PHASE 4: Deep Dive & Refinement (Parallel Subagents)
+  └─→ Launch 7 subagents for background discovery
+  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
+  └─→ Wait for ALL subagents to complete
+  └─→ Enhance MYSELF.md with findings
+
+PHASE 5: Final Delivery (Sequential)
+  └─→ Review and polish MYSELF.md
+  └─→ Deliver to user
+```
+
+**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)
+
+## When to Use
+
+Use this skill when:
+- A new team member joins and needs to create their `MYSELF.md`
+- Someone wants to update their existing `MYSELF.md` with fresh data
+- You need to understand a person's work patterns, collaborators, and preferences
+
+## Prerequisites
+
+Before starting, ensure:
+1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
+2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
+3. You have access to the MYSELF.example.md template
+
+## Process Overview
+
+The onboarding process consists of 5 phases:
+
+1. **Identity Resolution** - Get the person's contact record
+2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
+3. **Initial Synthesis** - Create initial MYSELF.md draft
+4. **Deep Dive & Refinement** - Search for specific personal details and enhance
+5. **Final Delivery** - Review and deliver completed MYSELF.md
+
+## Phase 1: Identity Resolution
+
+Get the person's identity from ContactDB:
+
+```python
+# Get self
+contactdb_get_me()
+
+# Or search by name
+contactdb_query_contacts(search="Person Name")
+```
+
+**Extract key info:**
+- Name, email, role, location
+- Contact ID (needed for all subsequent queries)
+- Platform identities (Zulip, email, Reflector)
+- Stats (hotness score, interaction counts)
+
+## Phase 2: Historical Data Gathering
+
+**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**
+
+Launch parallel subagents to query all entity types for each monthly time range.
+
+**Mandatory approach (NO EXCEPTIONS):**
+- Time range: Past 12 months (or since joining)
+- One subagent per month that queries ALL entity types
+- Total: 12 subagents (one for each month)
+
+**Why subagents are required:**
+- Each monthly query is independent and can run in parallel
+- Direct queries would take too long and exceed context limits
+- Subagents aggregate data per time period, making synthesis easier
+- This is the ONLY way to get comprehensive historical coverage
+
+**Benefits of 1 subagent per month:**
+- Holistic view of each month across all channels
+- Cross-channel context (e.g., meeting follows up on Zulip discussion)
+- Simpler to implement and debug
+- Results pre-aggregated by time period
+
+**Subagent task structure:**
+
+```
+Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
+
+For each entity type (threaded_conversation, conversation_message, meeting, email):
+  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
+
+Synthesize findings across all channels and return a monthly summary with:
+1. Total activity counts per entity type
+2. Key topics/projects discussed
+3. Notable patterns and themes
+4. Collaborators involved
+5. Work areas/projects identified
+```
+
+**Example time ranges (monthly):**
+- 2025-02-19 to 2025-03-19
+- 2025-03-19 to 2025-04-19
+- ... (continue for 12 months)
+
+## Phase 3: Initial Synthesis & Generation
+
+After gathering all data:
+
+1. **Summarize findings:**
+   - Total activity counts per entity type
+   - Most active time periods
+   - Key projects/topics
+   - Frequent collaborators
+
+2. **Read MYSELF.example.md** to understand the template structure
+
+3. **Generate initial MYSELF.md** with:
+   - Identity section (from ContactDB)
+   - Work areas (from meeting topics, Zulip streams)
+   - Collaborators (from meeting participants, message contacts)
+   - Basic preferences (inferred from activity patterns)
+
+## Phase 4: Deep Dive & Refinement
+
+**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**
+
+**Launch 7 parallel subagents** to search for background information across all categories.
+
+Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.
+
+### Discovery Categories (One Subagent Per Category)
+
+**1. Personal Interests & Hobbies**
+```
+Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
+Look for: recreational activities, interests, entertainment preferences
+```
+
+**2. Work Schedule & Availability**
+```
+Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
+Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
+```
+
+**3. Professional Values & Goals**
+```
+Search: "{Name} values goals mission purpose why he works career objective philosophy"
+Look for: motivations, career aspirations, professional beliefs, purpose statements
+```
+
+**4. Communication & Workflow Preferences**
+```
+Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
+Look for: preferred tools, work methodologies, communication patterns, feedback preferences
+```
+
+**5. Background & Career History**
+```
+Search: "{Name} background career history previous roles education transition story experience"
+Look for: prior jobs, education, career changes, professional journey
+```
+
+**6. Daily Life & Routines**
+```
+Search: "{Name} daily routine family married children commute work-life balance personal context"
+Look for: family situation, daily schedule, personal commitments, lifestyle
+```
+
+**7. Technical Preferences**
+```
+Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
+Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
+```
+
+### Subagent Task Template
+
+```
+Search DataIndex for background information about {Name}.
+
+API Call:
+POST /dataindex/api/v1/search
+{
+  "search_text": "{Name} {category-specific search terms}",
+  "date_from": "{12_months_ago}",
+  "date_to": "{today}",
+  "limit": 20
+}
+
+Extract and return:
+- Specific details found (quotes if available)
+- Patterns or recurring themes
+- Context about personal/professional life
+- Any notable insights
+```
+
+**Why parallel subagents:**
+- Each search is independent - perfect for parallelization
+- Reduces execution time from minutes to seconds
+- Comprehensive coverage without overwhelming the main agent
+- Gathers rich context for personalizing the MYSELF.md
+
+**Critical: Use name-based search**
+- Always include the person's full name in the search query
+- Do NOT rely on contact_id filtering for semantic search
+- Personal details appear in transcripts where names are mentioned
+- contact_id filters work for exact queries but fail for RAG/semantic retrieval
+
+When searching for personal details, use the person's full name in the query:
+
+```python
+# GOOD - Uses name in search text
+dataindex_search(
+    query="Mathieu Virbel hobbies interests personal life outside work",
+    date_from="2025-02-19T00:00:00Z",
+    date_to="2026-02-19T00:00:00Z",
+    limit=20
+)
+
+# BAD - Only filters by contact_id (won't find personal context)
+dataindex_search(
+    query="hobbies interests personal life",
+    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
+    limit=20
+)
+```
+
+**Key Insight:**
+- Semantic search works best with full context in the query text
+- contact_id filtering works for exact entity matching but not for RAG retrieval
+- Personal details often appear in meeting transcripts where names are mentioned
+
+## Output: MYSELF.md Structure
+
+The final document should include:
+
+```markdown
+# About Me
+
+## Identity
+- Name, Role, Contact ID, Email, Location
+- Family status (if discovered)
+
+## What I work on
+- Primary projects with descriptions
+- Client work
+- Additional responsibilities
+
+## People I work with frequently
+- List of key collaborators with context
+
+## Personal Context (if discovered)
+- Background/career history
+- Daily schedule & constraints
+- Interests & values
+
+## Preferences
+- Work style
+- Default date ranges
+- Output formats
+- Topics of interest
+- Communication patterns
+- Tools & workflow
+- Security/privacy stance
+- Current learning areas
+- Known challenges
+```
+
+## Tips for Quality Results
+
+1. **Be thorough in Phase 2** - More historical data = better insights
+2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
+3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
+4. **Ask follow-up questions** - Users often want to discover unexpected things
+5. **Search by name, not ID** - Critical for finding personal context
+6. **Synthesize meeting transcripts** - They contain rich personal details
+7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
+8. **Update over time** - MYSELF.md should evolve as the person does
+
+## Common Mistakes to Avoid
+
+**❌ DON'T query DataIndex directly in Phase 2 or 4**
+- Direct queries miss the monthly breakdown
+- You won't get comprehensive historical coverage
+- Context limits will truncate results
+
+**❌ DON'T launch 48 subagents (12 months × 4 entity types)**
+- Use 12 subagents (1 per month) instead
+- Each monthly subagent queries all 4 entity types
+- Simpler coordination and better cross-channel context
+
+**❌ DON'T skip Phase 2 and go straight to Phase 4**
+- You need historical context before doing deep searches
+- The monthly aggregation reveals patterns you can't see otherwise
+
+**❌ DON'T use contact_id filtering for semantic searches**
+- RAG/semantic search requires the person's name in the query text
+- contact_id filters only work for exact entity matching
+
+**✅ ALWAYS use the Task tool to launch subagents**
+- This is the only way to achieve true parallelism
+- Each subagent gets its own context window
+- Results can be aggregated after all complete
+
+## Example Usage
+
+```
+User: "Help me create my MYSELF.md"
+
+Agent:
+1. Gets user's identity via contactdb_get_me()
+2. Discovers contact_id = 4, name = "Mathieu Virbel"
+3. Launches 12 subagents for historical data (1 per month, all entity types)
+4. Gathers summaries from all subagents
+5. Generates initial MYSELF.md
+6. Launches 7 parallel subagents for background discovery:
+   - Personal interests & hobbies
+   - Work schedule & availability
+   - Professional values & goals
+   - Communication & workflow preferences
+   - Background & career history
+   - Daily life & routines
+   - Technical preferences
+7. Gathers all search results
+8. Updates MYSELF.md with rich personal context
+9. Delivers final document
+```
+
+**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks
+
+## Files
+
+- `MYSELF.example.md` - Template file to copy and fill
+- `MYSELF.md` - Generated output (gitignored, personal to each user)
diff --git a/README.md b/README.md
index 756be32..c3f1fb5 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,7 @@ These are workflows you trigger explicitly. The agent will not run them on its o
 | **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. |
 | **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. |
 | **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. |
+| **self-onboarding** | `/self-onboarding` | Generate a personalized `MYSELF.md` by analyzing 12 months of historical activity (meetings, emails, Zulip, calendar). Runs 19 parallel subagents to build a comprehensive profile. |
 
 ### Reference Skills (agent-loaded automatically)