internalai-agent/.agents/skills/self-onboarding/SKILL.md

---
name: self-onboarding
description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
user-invocable: true
---

# Self-Onboarding Skill

This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.

## ⚠️ MANDATORY EXECUTION PLAN

**ALWAYS follow these phases in order. Do NOT skip phases or combine them.**

```
PHASE 1: Identity Resolution (Sequential)
  └─→ Get contact_id via contactdb_get_me()
  └─→ Extract: name, email, role, location, contact_id

PHASE 2: Historical Data Gathering (Parallel Subagents)
  └─→ Launch 12 subagents (1 per month, querying all entity types)
  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
  └─→ Wait for ALL subagents to complete
  └─→ Collect and synthesize findings

PHASE 3: Initial Synthesis & Generation (Sequential)
  └─→ Read MYSELF.example.md
  └─→ Generate initial MYSELF.md draft

PHASE 4: Deep Dive & Refinement (Parallel Subagents)
  └─→ Launch 7 subagents for background discovery
  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
  └─→ Wait for ALL subagents to complete
  └─→ Enhance MYSELF.md with findings

PHASE 5: Final Delivery (Sequential)
  └─→ Review and polish MYSELF.md
  └─→ Deliver to user
```

**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)

## When to Use

Use this skill when:
- A new team member joins and needs to create their `MYSELF.md`
- Someone wants to update their existing `MYSELF.md` with fresh data
- You need to understand a person's work patterns, collaborators, and preferences

## Prerequisites

Before starting, ensure:
1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
3. You have access to the MYSELF.example.md template

## Process Overview

The onboarding process consists of 5 phases:

1. **Identity Resolution** - Get the person's contact record
2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
3. **Initial Synthesis** - Create initial MYSELF.md draft
4. **Deep Dive & Refinement** - Search for specific personal details and enhance
5. **Final Delivery** - Review and deliver completed MYSELF.md

## Phase 1: Identity Resolution

Get the person's identity from ContactDB:

```python
# Get self
contactdb_get_me()

# Or search by name
contactdb_query_contacts(search="Person Name")
```

**Extract key info:**
- Name, email, role, location
- Contact ID (needed for all subsequent queries)
- Platform identities (Zulip, email, Reflector)
- Stats (hotness score, interaction counts)

## Phase 2: Historical Data Gathering

**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**

Launch parallel subagents to query all entity types for each monthly time range.

**Mandatory approach (NO EXCEPTIONS):**
- Time range: Past 12 months (or since joining)
- One subagent per month that queries ALL entity types
- Total: 12 subagents (one for each month)

**Why subagents are required:**
- Each monthly query is independent and can run in parallel
- Direct queries would take too long and exceed context limits
- Subagents aggregate data per time period, making synthesis easier
- This is the ONLY way to get comprehensive historical coverage

**Benefits of 1 subagent per month:**
- Holistic view of each month across all channels
- Cross-channel context (e.g., meeting follows up on Zulip discussion)
- Simpler to implement and debug
- Results pre-aggregated by time period

**Subagent task structure:**

```
Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.

For each entity type (threaded_conversation, conversation_message, meeting, email):
  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100

Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified
```

**Example time ranges (monthly):**
- 2025-02-19 to 2025-03-19
- 2025-03-19 to 2025-04-19
- ... (continue for 12 months)

## Phase 3: Initial Synthesis & Generation

After gathering all data:

1. **Summarize findings:**
   - Total activity counts per entity type
   - Most active time periods
   - Key projects/topics
   - Frequent collaborators

2. **Read MYSELF.example.md** to understand the template structure

3. **Generate initial MYSELF.md** with:
   - Identity section (from ContactDB)
   - Work areas (from meeting topics, Zulip streams)
   - Collaborators (from meeting participants, message contacts)
   - Basic preferences (inferred from activity patterns)

## Phase 4: Deep Dive & Refinement

**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**

**Launch 7 parallel subagents** to search for background information across all categories.

Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.

### Discovery Categories (One Subagent Per Category)

**1. Personal Interests & Hobbies**
```
Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences
```

**2. Work Schedule & Availability**
```
Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
```

**3. Professional Values & Goals**
```
Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements
```

**4. Communication & Workflow Preferences**
```
Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences
```

**5. Background & Career History**
```
Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey
```

**6. Daily Life & Routines**
```
Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle
```

**7. Technical Preferences**
```
Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
```

### Subagent Task Template

```
Search DataIndex for background information about {Name}.

API Call:
POST /dataindex/api/v1/search
{
  "search_text": "{Name} {category-specific search terms}",
  "date_from": "{12_months_ago}",
  "date_to": "{today}",
  "limit": 20
}

Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights
```

**Why parallel subagents:**
- Each search is independent - perfect for parallelization
- Reduces execution time from minutes to seconds
- Comprehensive coverage without overwhelming the main agent
- Gathers rich context for personalizing the MYSELF.md

**Critical: Use name-based search**
- Always include the person's full name in the search query
- Do NOT rely on contact_id filtering for semantic search
- Personal details appear in transcripts where names are mentioned
- contact_id filters work for exact queries but fail for RAG/semantic retrieval

When searching for personal details, use the person's full name in the query:

```python
# GOOD - Uses name in search text
dataindex_search(
    query="Mathieu Virbel hobbies interests personal life outside work",
    date_from="2025-02-19T00:00:00Z",
    date_to="2026-02-19T00:00:00Z",
    limit=20
)

# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
    query="hobbies interests personal life",
    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
    limit=20
)
```

**Key Insight:**
- Semantic search works best with full context in the query text
- contact_id filtering works for exact entity matching but not for RAG retrieval
- Personal details often appear in meeting transcripts where names are mentioned

## Output: MYSELF.md Structure

The final document should include:

```markdown
# About Me

## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)

## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities

## People I work with frequently
- List of key collaborators with context

## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values

## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges
```

## Tips for Quality Results

1. **Be thorough in Phase 2** - More historical data = better insights
2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
4. **Ask follow-up questions** - Users often want to discover unexpected things
5. **Search by name, not ID** - Critical for finding personal context
6. **Synthesize meeting transcripts** - They contain rich personal details
7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
8. **Update over time** - MYSELF.md should evolve as the person does

## Common Mistakes to Avoid

**❌ DON'T query DataIndex directly in Phase 2 or 4**
- Direct queries miss the monthly breakdown
- You won't get comprehensive historical coverage
- Context limits will truncate results

**❌ DON'T launch 48 subagents (12 months × 4 entity types)**
- Use 12 subagents (1 per month) instead
- Each monthly subagent queries all 4 entity types
- Simpler coordination and better cross-channel context

**❌ DON'T skip Phase 2 and go straight to Phase 4**
- You need historical context before doing deep searches
- The monthly aggregation reveals patterns you can't see otherwise

**❌ DON'T use contact_id filtering for semantic searches**
- RAG/semantic search requires the person's name in the query text
- contact_id filters only work for exact entity matching

**✅ ALWAYS use the Task tool to launch subagents**
- This is the only way to achieve true parallelism
- Each subagent gets its own context window
- Results can be aggregated after all complete

## Example Usage

```
User: "Help me create my MYSELF.md"

Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
   - Personal interests & hobbies
   - Work schedule & availability
   - Professional values & goals
   - Communication & workflow preferences
   - Background & career history
   - Daily life & routines
   - Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document
```

**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks

## Files

- `MYSELF.example.md` - Template file to copy and fill
- `MYSELF.md` - Generated output (gitignored, personal to each user)