Files

Mathieu Virbel f2a8eaff3c feat: add self-onboarding skill

2026-02-20 15:12:29 -06:00

12 KiB

Raw Blame History

name, description, user-invocable

name	description	user-invocable
self-onboarding	Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).	true

Self-Onboarding Skill

This skill helps new team members create a comprehensive MYSELF.md file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.

⚠️ MANDATORY EXECUTION PLAN

ALWAYS follow these phases in order. Do NOT skip phases or combine them.

PHASE 1: Identity Resolution (Sequential)
  └─→ Get contact_id via contactdb_get_me()
  └─→ Extract: name, email, role, location, contact_id

PHASE 2: Historical Data Gathering (Parallel Subagents)
  └─→ Launch 12 subagents (1 per month, querying all entity types)
  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
  └─→ Wait for ALL subagents to complete
  └─→ Collect and synthesize findings

PHASE 3: Initial Synthesis & Generation (Sequential)
  └─→ Read MYSELF.example.md
  └─→ Generate initial MYSELF.md draft

PHASE 4: Deep Dive & Refinement (Parallel Subagents)
  └─→ Launch 7 subagents for background discovery
  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
  └─→ Wait for ALL subagents to complete
  └─→ Enhance MYSELF.md with findings

PHASE 5: Final Delivery (Sequential)
  └─→ Review and polish MYSELF.md
  └─→ Deliver to user

Total subagents required: 19 (12 for Phase 2 + 7 for Phase 4)

When to Use

Use this skill when:

A new team member joins and needs to create their MYSELF.md
Someone wants to update their existing MYSELF.md with fresh data
You need to understand a person's work patterns, collaborators, and preferences

Prerequisites

Before starting, ensure:

The person has a ContactDB record (use contactdb_get_me or contactdb_query_contacts)
They have historical data in the system (meetings, Zulip messages, emails, etc.)
You have access to the MYSELF.example.md template

Process Overview

The onboarding process consists of 5 phases:

Identity Resolution - Get the person's contact record
Historical Data Gathering - Query all entity types across 12 monthly periods
Initial Synthesis - Create initial MYSELF.md draft
Deep Dive & Refinement - Search for specific personal details and enhance
Final Delivery - Review and deliver completed MYSELF.md

Phase 1: Identity Resolution

Get the person's identity from ContactDB:

# Get self
contactdb_get_me()

# Or search by name
contactdb_query_contacts(search="Person Name")

Extract key info:

Name, email, role, location
Contact ID (needed for all subsequent queries)
Platform identities (Zulip, email, Reflector)
Stats (hotness score, interaction counts)

Phase 2: Historical Data Gathering

⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.

Launch parallel subagents to query all entity types for each monthly time range.

Mandatory approach (NO EXCEPTIONS):

Time range: Past 12 months (or since joining)
One subagent per month that queries ALL entity types
Total: 12 subagents (one for each month)

Why subagents are required:

Each monthly query is independent and can run in parallel
Direct queries would take too long and exceed context limits
Subagents aggregate data per time period, making synthesis easier
This is the ONLY way to get comprehensive historical coverage

Benefits of 1 subagent per month:

Holistic view of each month across all channels
Cross-channel context (e.g., meeting follows up on Zulip discussion)
Simpler to implement and debug
Results pre-aggregated by time period

Subagent task structure:

Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.

For each entity type (threaded_conversation, conversation_message, meeting, email):
  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100

Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified

Example time ranges (monthly):

2025-02-19 to 2025-03-19
2025-03-19 to 2025-04-19
... (continue for 12 months)

Phase 3: Initial Synthesis & Generation

After gathering all data:

Summarize findings:
- Total activity counts per entity type
- Most active time periods
- Key projects/topics
- Frequent collaborators
Read MYSELF.example.md to understand the template structure
Generate initial MYSELF.md with:
- Identity section (from ContactDB)
- Work areas (from meeting topics, Zulip streams)
- Collaborators (from meeting participants, message contacts)
- Basic preferences (inferred from activity patterns)

Phase 4: Deep Dive & Refinement

⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.

Launch 7 parallel subagents to search for background information across all categories.

Each subagent searches using the person's full name in the query text (not contact_id filtering) and returns findings for one category.

Discovery Categories (One Subagent Per Category)

1. Personal Interests & Hobbies

Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences

2. Work Schedule & Availability

Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints

3. Professional Values & Goals

Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements

4. Communication & Workflow Preferences

Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences

5. Background & Career History

Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey

6. Daily Life & Routines

Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle

7. Technical Preferences

Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks

Subagent Task Template

Search DataIndex for background information about {Name}.

API Call:
POST /dataindex/api/v1/search
{
  "search_text": "{Name} {category-specific search terms}",
  "date_from": "{12_months_ago}",
  "date_to": "{today}",
  "limit": 20
}

Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights

Why parallel subagents:

Each search is independent - perfect for parallelization
Reduces execution time from minutes to seconds
Comprehensive coverage without overwhelming the main agent
Gathers rich context for personalizing the MYSELF.md

Critical: Use name-based search

Always include the person's full name in the search query
Do NOT rely on contact_id filtering for semantic search
Personal details appear in transcripts where names are mentioned
contact_id filters work for exact queries but fail for RAG/semantic retrieval

When searching for personal details, use the person's full name in the query:

# GOOD - Uses name in search text
dataindex_search(
    query="Mathieu Virbel hobbies interests personal life outside work",
    date_from="2025-02-19T00:00:00Z",
    date_to="2026-02-19T00:00:00Z",
    limit=20
)

# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
    query="hobbies interests personal life",
    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
    limit=20
)

Key Insight:

Semantic search works best with full context in the query text
contact_id filtering works for exact entity matching but not for RAG retrieval
Personal details often appear in meeting transcripts where names are mentioned

Output: MYSELF.md Structure

The final document should include:

# About Me

## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)

## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities

## People I work with frequently
- List of key collaborators with context

## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values

## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges

Tips for Quality Results

Be thorough in Phase 2 - More historical data = better insights
Use parallel subagents - 12 monthly subagents run concurrently for speed
Cross-channel synthesis - Monthly subagents see the full picture across all channels
Ask follow-up questions - Users often want to discover unexpected things
Search by name, not ID - Critical for finding personal context
Synthesize meeting transcripts - They contain rich personal details
Look for patterns - Timezone mentions, scheduling preferences, recurring topics
Update over time - MYSELF.md should evolve as the person does

Common Mistakes to Avoid

❌ DON'T query DataIndex directly in Phase 2 or 4

Direct queries miss the monthly breakdown
You won't get comprehensive historical coverage
Context limits will truncate results

❌ DON'T launch 48 subagents (12 months × 4 entity types)

Use 12 subagents (1 per month) instead
Each monthly subagent queries all 4 entity types
Simpler coordination and better cross-channel context

❌ DON'T skip Phase 2 and go straight to Phase 4

You need historical context before doing deep searches
The monthly aggregation reveals patterns you can't see otherwise

❌ DON'T use contact_id filtering for semantic searches

RAG/semantic search requires the person's name in the query text
contact_id filters only work for exact entity matching

✅ ALWAYS use the Task tool to launch subagents

This is the only way to achieve true parallelism
Each subagent gets its own context window
Results can be aggregated after all complete

Example Usage

User: "Help me create my MYSELF.md"

Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
   - Personal interests & hobbies
   - Work schedule & availability
   - Professional values & goals
   - Communication & workflow preferences
   - Background & career history
   - Daily life & routines
   - Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document

Total subagents launched: 12 (historical) + 7 (discovery) = 19 parallel tasks

Files

MYSELF.example.md - Template file to copy and fill
MYSELF.md - Generated output (gitignored, personal to each user)

12 KiB Raw Blame History Unescape Escape