--- name: self-onboarding description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events). user-invocable: true --- # Self-Onboarding Skill This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system. ## ⚠️ MANDATORY EXECUTION PLAN **ALWAYS follow these phases in order. Do NOT skip phases or combine them.** ``` PHASE 1: Identity Resolution (Sequential) └─→ Get contact_id via contactdb_get_me() └─→ Extract: name, email, role, location, contact_id PHASE 2: Historical Data Gathering (Parallel Subagents) └─→ Launch 12 subagents (1 per month, querying all entity types) └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email └─→ Wait for ALL subagents to complete └─→ Collect and synthesize findings PHASE 3: Initial Synthesis & Generation (Sequential) └─→ Read MYSELF.example.md └─→ Generate initial MYSELF.md draft PHASE 4: Deep Dive & Refinement (Parallel Subagents) └─→ Launch 7 subagents for background discovery └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs └─→ Wait for ALL subagents to complete └─→ Enhance MYSELF.md with findings PHASE 5: Final Delivery (Sequential) └─→ Review and polish MYSELF.md └─→ Deliver to user ``` **Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4) ## When to Use Use this skill when: - A new team member joins and needs to create their `MYSELF.md` - Someone wants to update their existing `MYSELF.md` with fresh data - You need to understand a person's work patterns, collaborators, and preferences ## Prerequisites Before starting, ensure: 1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`) 2. They have historical data in the system (meetings, Zulip messages, emails, etc.) 3. You have access to the MYSELF.example.md template ## Process Overview The onboarding process consists of 5 phases: 1. **Identity Resolution** - Get the person's contact record 2. **Historical Data Gathering** - Query all entity types across 12 monthly periods 3. **Initial Synthesis** - Create initial MYSELF.md draft 4. **Deep Dive & Refinement** - Search for specific personal details and enhance 5. **Final Delivery** - Review and deliver completed MYSELF.md ## Phase 1: Identity Resolution Get the person's identity from ContactDB: ```python # Get self contactdb_get_me() # Or search by name contactdb_query_contacts(search="Person Name") ``` **Extract key info:** - Name, email, role, location - Contact ID (needed for all subsequent queries) - Platform identities (Zulip, email, Reflector) - Stats (hotness score, interaction counts) ## Phase 2: Historical Data Gathering **⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.** Launch parallel subagents to query all entity types for each monthly time range. **Mandatory approach (NO EXCEPTIONS):** - Time range: Past 12 months (or since joining) - One subagent per month that queries ALL entity types - Total: 12 subagents (one for each month) **Why subagents are required:** - Each monthly query is independent and can run in parallel - Direct queries would take too long and exceed context limits - Subagents aggregate data per time period, making synthesis easier - This is the ONLY way to get comprehensive historical coverage **Benefits of 1 subagent per month:** - Holistic view of each month across all channels - Cross-channel context (e.g., meeting follows up on Zulip discussion) - Simpler to implement and debug - Results pre-aggregated by time period **Subagent task structure:** ``` Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}. For each entity type (threaded_conversation, conversation_message, meeting, email): Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100 Synthesize findings across all channels and return a monthly summary with: 1. Total activity counts per entity type 2. Key topics/projects discussed 3. Notable patterns and themes 4. Collaborators involved 5. Work areas/projects identified ``` **Example time ranges (monthly):** - 2025-02-19 to 2025-03-19 - 2025-03-19 to 2025-04-19 - ... (continue for 12 months) ## Phase 3: Initial Synthesis & Generation After gathering all data: 1. **Summarize findings:** - Total activity counts per entity type - Most active time periods - Key projects/topics - Frequent collaborators 2. **Read MYSELF.example.md** to understand the template structure 3. **Generate initial MYSELF.md** with: - Identity section (from ContactDB) - Work areas (from meeting topics, Zulip streams) - Collaborators (from meeting participants, message contacts) - Basic preferences (inferred from activity patterns) ## Phase 4: Deep Dive & Refinement **⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.** **Launch 7 parallel subagents** to search for background information across all categories. Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category. ### Discovery Categories (One Subagent Per Category) **1. Personal Interests & Hobbies** ``` Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking" Look for: recreational activities, interests, entertainment preferences ``` **2. Work Schedule & Availability** ``` Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon" Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints ``` **3. Professional Values & Goals** ``` Search: "{Name} values goals mission purpose why he works career objective philosophy" Look for: motivations, career aspirations, professional beliefs, purpose statements ``` **4. Communication & Workflow Preferences** ``` Search: "{Name} workflow tools preferences how he likes to work communication style feedback" Look for: preferred tools, work methodologies, communication patterns, feedback preferences ``` **5. Background & Career History** ``` Search: "{Name} background career history previous roles education transition story experience" Look for: prior jobs, education, career changes, professional journey ``` **6. Daily Life & Routines** ``` Search: "{Name} daily routine family married children commute work-life balance personal context" Look for: family situation, daily schedule, personal commitments, lifestyle ``` **7. Technical Preferences** ``` Search: "{Name} tools development workflow process methodology architecture decisions technical approach" Look for: favorite tools, coding practices, technical philosophy, preferred frameworks ``` ### Subagent Task Template ``` Search DataIndex for background information about {Name}. API Call: POST /dataindex/api/v1/search { "search_text": "{Name} {category-specific search terms}", "date_from": "{12_months_ago}", "date_to": "{today}", "limit": 20 } Extract and return: - Specific details found (quotes if available) - Patterns or recurring themes - Context about personal/professional life - Any notable insights ``` **Why parallel subagents:** - Each search is independent - perfect for parallelization - Reduces execution time from minutes to seconds - Comprehensive coverage without overwhelming the main agent - Gathers rich context for personalizing the MYSELF.md **Critical: Use name-based search** - Always include the person's full name in the search query - Do NOT rely on contact_id filtering for semantic search - Personal details appear in transcripts where names are mentioned - contact_id filters work for exact queries but fail for RAG/semantic retrieval When searching for personal details, use the person's full name in the query: ```python # GOOD - Uses name in search text dataindex_search( query="Mathieu Virbel hobbies interests personal life outside work", date_from="2025-02-19T00:00:00Z", date_to="2026-02-19T00:00:00Z", limit=20 ) # BAD - Only filters by contact_id (won't find personal context) dataindex_search( query="hobbies interests personal life", contact_ids=[4], # RAG/semantic search doesn't work well with contact_id limit=20 ) ``` **Key Insight:** - Semantic search works best with full context in the query text - contact_id filtering works for exact entity matching but not for RAG retrieval - Personal details often appear in meeting transcripts where names are mentioned ## Output: MYSELF.md Structure The final document should include: ```markdown # About Me ## Identity - Name, Role, Contact ID, Email, Location - Family status (if discovered) ## What I work on - Primary projects with descriptions - Client work - Additional responsibilities ## People I work with frequently - List of key collaborators with context ## Personal Context (if discovered) - Background/career history - Daily schedule & constraints - Interests & values ## Preferences - Work style - Default date ranges - Output formats - Topics of interest - Communication patterns - Tools & workflow - Security/privacy stance - Current learning areas - Known challenges ``` ## Tips for Quality Results 1. **Be thorough in Phase 2** - More historical data = better insights 2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed 3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels 4. **Ask follow-up questions** - Users often want to discover unexpected things 5. **Search by name, not ID** - Critical for finding personal context 6. **Synthesize meeting transcripts** - They contain rich personal details 7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics 8. **Update over time** - MYSELF.md should evolve as the person does ## Common Mistakes to Avoid **❌ DON'T query DataIndex directly in Phase 2 or 4** - Direct queries miss the monthly breakdown - You won't get comprehensive historical coverage - Context limits will truncate results **❌ DON'T launch 48 subagents (12 months × 4 entity types)** - Use 12 subagents (1 per month) instead - Each monthly subagent queries all 4 entity types - Simpler coordination and better cross-channel context **❌ DON'T skip Phase 2 and go straight to Phase 4** - You need historical context before doing deep searches - The monthly aggregation reveals patterns you can't see otherwise **❌ DON'T use contact_id filtering for semantic searches** - RAG/semantic search requires the person's name in the query text - contact_id filters only work for exact entity matching **✅ ALWAYS use the Task tool to launch subagents** - This is the only way to achieve true parallelism - Each subagent gets its own context window - Results can be aggregated after all complete ## Example Usage ``` User: "Help me create my MYSELF.md" Agent: 1. Gets user's identity via contactdb_get_me() 2. Discovers contact_id = 4, name = "Mathieu Virbel" 3. Launches 12 subagents for historical data (1 per month, all entity types) 4. Gathers summaries from all subagents 5. Generates initial MYSELF.md 6. Launches 7 parallel subagents for background discovery: - Personal interests & hobbies - Work schedule & availability - Professional values & goals - Communication & workflow preferences - Background & career history - Daily life & routines - Technical preferences 7. Gathers all search results 8. Updates MYSELF.md with rich personal context 9. Delivers final document ``` **Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks ## Files - `MYSELF.example.md` - Template file to copy and fill - `MYSELF.md` - Generated output (gitignored, personal to each user)