feat: add self-onboarding skill

This commit is contained in:
2026-02-20 15:12:29 -06:00
parent fc3bdf86ba
commit f2a8eaff3c
2 changed files with 358 additions and 0 deletions

View File

@@ -0,0 +1,357 @@
---
name: self-onboarding
description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
user-invocable: true
---
# Self-Onboarding Skill
This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
## ⚠️ MANDATORY EXECUTION PLAN
**ALWAYS follow these phases in order. Do NOT skip phases or combine them.**
```
PHASE 1: Identity Resolution (Sequential)
└─→ Get contact_id via contactdb_get_me()
└─→ Extract: name, email, role, location, contact_id
PHASE 2: Historical Data Gathering (Parallel Subagents)
└─→ Launch 12 subagents (1 per month, querying all entity types)
└─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
└─→ Wait for ALL subagents to complete
└─→ Collect and synthesize findings
PHASE 3: Initial Synthesis & Generation (Sequential)
└─→ Read MYSELF.example.md
└─→ Generate initial MYSELF.md draft
PHASE 4: Deep Dive & Refinement (Parallel Subagents)
└─→ Launch 7 subagents for background discovery
└─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
└─→ Wait for ALL subagents to complete
└─→ Enhance MYSELF.md with findings
PHASE 5: Final Delivery (Sequential)
└─→ Review and polish MYSELF.md
└─→ Deliver to user
```
**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)
## When to Use
Use this skill when:
- A new team member joins and needs to create their `MYSELF.md`
- Someone wants to update their existing `MYSELF.md` with fresh data
- You need to understand a person's work patterns, collaborators, and preferences
## Prerequisites
Before starting, ensure:
1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
3. You have access to the MYSELF.example.md template
## Process Overview
The onboarding process consists of 5 phases:
1. **Identity Resolution** - Get the person's contact record
2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
3. **Initial Synthesis** - Create initial MYSELF.md draft
4. **Deep Dive & Refinement** - Search for specific personal details and enhance
5. **Final Delivery** - Review and deliver completed MYSELF.md
## Phase 1: Identity Resolution
Get the person's identity from ContactDB:
```python
# Get self
contactdb_get_me()
# Or search by name
contactdb_query_contacts(search="Person Name")
```
**Extract key info:**
- Name, email, role, location
- Contact ID (needed for all subsequent queries)
- Platform identities (Zulip, email, Reflector)
- Stats (hotness score, interaction counts)
## Phase 2: Historical Data Gathering
**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**
Launch parallel subagents to query all entity types for each monthly time range.
**Mandatory approach (NO EXCEPTIONS):**
- Time range: Past 12 months (or since joining)
- One subagent per month that queries ALL entity types
- Total: 12 subagents (one for each month)
**Why subagents are required:**
- Each monthly query is independent and can run in parallel
- Direct queries would take too long and exceed context limits
- Subagents aggregate data per time period, making synthesis easier
- This is the ONLY way to get comprehensive historical coverage
**Benefits of 1 subagent per month:**
- Holistic view of each month across all channels
- Cross-channel context (e.g., meeting follows up on Zulip discussion)
- Simpler to implement and debug
- Results pre-aggregated by time period
**Subagent task structure:**
```
Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
For each entity type (threaded_conversation, conversation_message, meeting, email):
Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified
```
**Example time ranges (monthly):**
- 2025-02-19 to 2025-03-19
- 2025-03-19 to 2025-04-19
- ... (continue for 12 months)
## Phase 3: Initial Synthesis & Generation
After gathering all data:
1. **Summarize findings:**
- Total activity counts per entity type
- Most active time periods
- Key projects/topics
- Frequent collaborators
2. **Read MYSELF.example.md** to understand the template structure
3. **Generate initial MYSELF.md** with:
- Identity section (from ContactDB)
- Work areas (from meeting topics, Zulip streams)
- Collaborators (from meeting participants, message contacts)
- Basic preferences (inferred from activity patterns)
## Phase 4: Deep Dive & Refinement
**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**
**Launch 7 parallel subagents** to search for background information across all categories.
Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.
### Discovery Categories (One Subagent Per Category)
**1. Personal Interests & Hobbies**
```
Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences
```
**2. Work Schedule & Availability**
```
Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
```
**3. Professional Values & Goals**
```
Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements
```
**4. Communication & Workflow Preferences**
```
Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences
```
**5. Background & Career History**
```
Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey
```
**6. Daily Life & Routines**
```
Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle
```
**7. Technical Preferences**
```
Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
```
### Subagent Task Template
```
Search DataIndex for background information about {Name}.
API Call:
POST /dataindex/api/v1/search
{
"search_text": "{Name} {category-specific search terms}",
"date_from": "{12_months_ago}",
"date_to": "{today}",
"limit": 20
}
Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights
```
**Why parallel subagents:**
- Each search is independent - perfect for parallelization
- Reduces execution time from minutes to seconds
- Comprehensive coverage without overwhelming the main agent
- Gathers rich context for personalizing the MYSELF.md
**Critical: Use name-based search**
- Always include the person's full name in the search query
- Do NOT rely on contact_id filtering for semantic search
- Personal details appear in transcripts where names are mentioned
- contact_id filters work for exact queries but fail for RAG/semantic retrieval
When searching for personal details, use the person's full name in the query:
```python
# GOOD - Uses name in search text
dataindex_search(
query="Mathieu Virbel hobbies interests personal life outside work",
date_from="2025-02-19T00:00:00Z",
date_to="2026-02-19T00:00:00Z",
limit=20
)
# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
query="hobbies interests personal life",
contact_ids=[4], # RAG/semantic search doesn't work well with contact_id
limit=20
)
```
**Key Insight:**
- Semantic search works best with full context in the query text
- contact_id filtering works for exact entity matching but not for RAG retrieval
- Personal details often appear in meeting transcripts where names are mentioned
## Output: MYSELF.md Structure
The final document should include:
```markdown
# About Me
## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)
## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities
## People I work with frequently
- List of key collaborators with context
## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values
## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges
```
## Tips for Quality Results
1. **Be thorough in Phase 2** - More historical data = better insights
2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
4. **Ask follow-up questions** - Users often want to discover unexpected things
5. **Search by name, not ID** - Critical for finding personal context
6. **Synthesize meeting transcripts** - They contain rich personal details
7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
8. **Update over time** - MYSELF.md should evolve as the person does
## Common Mistakes to Avoid
**❌ DON'T query DataIndex directly in Phase 2 or 4**
- Direct queries miss the monthly breakdown
- You won't get comprehensive historical coverage
- Context limits will truncate results
**❌ DON'T launch 48 subagents (12 months × 4 entity types)**
- Use 12 subagents (1 per month) instead
- Each monthly subagent queries all 4 entity types
- Simpler coordination and better cross-channel context
**❌ DON'T skip Phase 2 and go straight to Phase 4**
- You need historical context before doing deep searches
- The monthly aggregation reveals patterns you can't see otherwise
**❌ DON'T use contact_id filtering for semantic searches**
- RAG/semantic search requires the person's name in the query text
- contact_id filters only work for exact entity matching
**✅ ALWAYS use the Task tool to launch subagents**
- This is the only way to achieve true parallelism
- Each subagent gets its own context window
- Results can be aggregated after all complete
## Example Usage
```
User: "Help me create my MYSELF.md"
Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
- Personal interests & hobbies
- Work schedule & availability
- Professional values & goals
- Communication & workflow preferences
- Background & career history
- Daily life & routines
- Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document
```
**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks
## Files
- `MYSELF.example.md` - Template file to copy and fill
- `MYSELF.md` - Generated output (gitignored, personal to each user)

View File

@@ -17,6 +17,7 @@ These are workflows you trigger explicitly. The agent will not run them on its o
| **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. | | **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. |
| **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. | | **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. |
| **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. | | **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. |
| **self-onboarding** | `/self-onboarding` | Generate a personalized `MYSELF.md` by analyzing 12 months of historical activity (meetings, emails, Zulip, calendar). Runs 19 parallel subagents to build a comprehensive profile. |
### Reference Skills (agent-loaded automatically) ### Reference Skills (agent-loaded automatically)