12 KiB
name, description, user-invocable
| name | description | user-invocable |
|---|---|---|
| self-onboarding | Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events). | true |
Self-Onboarding Skill
This skill helps new team members create a comprehensive MYSELF.md file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
⚠️ MANDATORY EXECUTION PLAN
ALWAYS follow these phases in order. Do NOT skip phases or combine them.
PHASE 1: Identity Resolution (Sequential)
└─→ Get contact_id via contactdb_get_me()
└─→ Extract: name, email, role, location, contact_id
PHASE 2: Historical Data Gathering (Parallel Subagents)
└─→ Launch 12 subagents (1 per month, querying all entity types)
└─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
└─→ Wait for ALL subagents to complete
└─→ Collect and synthesize findings
PHASE 3: Initial Synthesis & Generation (Sequential)
└─→ Read MYSELF.example.md
└─→ Generate initial MYSELF.md draft
PHASE 4: Deep Dive & Refinement (Parallel Subagents)
└─→ Launch 7 subagents for background discovery
└─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
└─→ Wait for ALL subagents to complete
└─→ Enhance MYSELF.md with findings
PHASE 5: Final Delivery (Sequential)
└─→ Review and polish MYSELF.md
└─→ Deliver to user
Total subagents required: 19 (12 for Phase 2 + 7 for Phase 4)
When to Use
Use this skill when:
- A new team member joins and needs to create their
MYSELF.md - Someone wants to update their existing
MYSELF.mdwith fresh data - You need to understand a person's work patterns, collaborators, and preferences
Prerequisites
Before starting, ensure:
- The person has a ContactDB record (use
contactdb_get_meorcontactdb_query_contacts) - They have historical data in the system (meetings, Zulip messages, emails, etc.)
- You have access to the MYSELF.example.md template
Process Overview
The onboarding process consists of 5 phases:
- Identity Resolution - Get the person's contact record
- Historical Data Gathering - Query all entity types across 12 monthly periods
- Initial Synthesis - Create initial MYSELF.md draft
- Deep Dive & Refinement - Search for specific personal details and enhance
- Final Delivery - Review and deliver completed MYSELF.md
Phase 1: Identity Resolution
Get the person's identity from ContactDB:
# Get self
contactdb_get_me()
# Or search by name
contactdb_query_contacts(search="Person Name")
Extract key info:
- Name, email, role, location
- Contact ID (needed for all subsequent queries)
- Platform identities (Zulip, email, Reflector)
- Stats (hotness score, interaction counts)
Phase 2: Historical Data Gathering
⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.
Launch parallel subagents to query all entity types for each monthly time range.
Mandatory approach (NO EXCEPTIONS):
- Time range: Past 12 months (or since joining)
- One subagent per month that queries ALL entity types
- Total: 12 subagents (one for each month)
Why subagents are required:
- Each monthly query is independent and can run in parallel
- Direct queries would take too long and exceed context limits
- Subagents aggregate data per time period, making synthesis easier
- This is the ONLY way to get comprehensive historical coverage
Benefits of 1 subagent per month:
- Holistic view of each month across all channels
- Cross-channel context (e.g., meeting follows up on Zulip discussion)
- Simpler to implement and debug
- Results pre-aggregated by time period
Subagent task structure:
Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
For each entity type (threaded_conversation, conversation_message, meeting, email):
Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified
Example time ranges (monthly):
- 2025-02-19 to 2025-03-19
- 2025-03-19 to 2025-04-19
- ... (continue for 12 months)
Phase 3: Initial Synthesis & Generation
After gathering all data:
-
Summarize findings:
- Total activity counts per entity type
- Most active time periods
- Key projects/topics
- Frequent collaborators
-
Read MYSELF.example.md to understand the template structure
-
Generate initial MYSELF.md with:
- Identity section (from ContactDB)
- Work areas (from meeting topics, Zulip streams)
- Collaborators (from meeting participants, message contacts)
- Basic preferences (inferred from activity patterns)
Phase 4: Deep Dive & Refinement
⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.
Launch 7 parallel subagents to search for background information across all categories.
Each subagent searches using the person's full name in the query text (not contact_id filtering) and returns findings for one category.
Discovery Categories (One Subagent Per Category)
1. Personal Interests & Hobbies
Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences
2. Work Schedule & Availability
Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
3. Professional Values & Goals
Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements
4. Communication & Workflow Preferences
Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences
5. Background & Career History
Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey
6. Daily Life & Routines
Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle
7. Technical Preferences
Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
Subagent Task Template
Search DataIndex for background information about {Name}.
API Call:
POST /dataindex/api/v1/search
{
"search_text": "{Name} {category-specific search terms}",
"date_from": "{12_months_ago}",
"date_to": "{today}",
"limit": 20
}
Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights
Why parallel subagents:
- Each search is independent - perfect for parallelization
- Reduces execution time from minutes to seconds
- Comprehensive coverage without overwhelming the main agent
- Gathers rich context for personalizing the MYSELF.md
Critical: Use name-based search
- Always include the person's full name in the search query
- Do NOT rely on contact_id filtering for semantic search
- Personal details appear in transcripts where names are mentioned
- contact_id filters work for exact queries but fail for RAG/semantic retrieval
When searching for personal details, use the person's full name in the query:
# GOOD - Uses name in search text
dataindex_search(
query="Mathieu Virbel hobbies interests personal life outside work",
date_from="2025-02-19T00:00:00Z",
date_to="2026-02-19T00:00:00Z",
limit=20
)
# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
query="hobbies interests personal life",
contact_ids=[4], # RAG/semantic search doesn't work well with contact_id
limit=20
)
Key Insight:
- Semantic search works best with full context in the query text
- contact_id filtering works for exact entity matching but not for RAG retrieval
- Personal details often appear in meeting transcripts where names are mentioned
Output: MYSELF.md Structure
The final document should include:
# About Me
## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)
## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities
## People I work with frequently
- List of key collaborators with context
## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values
## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges
Tips for Quality Results
- Be thorough in Phase 2 - More historical data = better insights
- Use parallel subagents - 12 monthly subagents run concurrently for speed
- Cross-channel synthesis - Monthly subagents see the full picture across all channels
- Ask follow-up questions - Users often want to discover unexpected things
- Search by name, not ID - Critical for finding personal context
- Synthesize meeting transcripts - They contain rich personal details
- Look for patterns - Timezone mentions, scheduling preferences, recurring topics
- Update over time - MYSELF.md should evolve as the person does
Common Mistakes to Avoid
❌ DON'T query DataIndex directly in Phase 2 or 4
- Direct queries miss the monthly breakdown
- You won't get comprehensive historical coverage
- Context limits will truncate results
❌ DON'T launch 48 subagents (12 months × 4 entity types)
- Use 12 subagents (1 per month) instead
- Each monthly subagent queries all 4 entity types
- Simpler coordination and better cross-channel context
❌ DON'T skip Phase 2 and go straight to Phase 4
- You need historical context before doing deep searches
- The monthly aggregation reveals patterns you can't see otherwise
❌ DON'T use contact_id filtering for semantic searches
- RAG/semantic search requires the person's name in the query text
- contact_id filters only work for exact entity matching
✅ ALWAYS use the Task tool to launch subagents
- This is the only way to achieve true parallelism
- Each subagent gets its own context window
- Results can be aggregated after all complete
Example Usage
User: "Help me create my MYSELF.md"
Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
- Personal interests & hobbies
- Work schedule & availability
- Professional values & goals
- Communication & workflow preferences
- Background & career history
- Daily life & routines
- Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document
Total subagents launched: 12 (historical) + 7 (discovery) = 19 parallel tasks
Files
MYSELF.example.md- Template file to copy and fillMYSELF.md- Generated output (gitignored, personal to each user)