Files
internalai-agent/.agents/skills/self-onboarding/SKILL.md

358 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: self-onboarding
description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
user-invocable: true
---
# Self-Onboarding Skill
This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
## ⚠️ MANDATORY EXECUTION PLAN
**ALWAYS follow these phases in order. Do NOT skip phases or combine them.**
```
PHASE 1: Identity Resolution (Sequential)
└─→ Get contact_id via contactdb_get_me()
└─→ Extract: name, email, role, location, contact_id
PHASE 2: Historical Data Gathering (Parallel Subagents)
└─→ Launch 12 subagents (1 per month, querying all entity types)
└─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
└─→ Wait for ALL subagents to complete
└─→ Collect and synthesize findings
PHASE 3: Initial Synthesis & Generation (Sequential)
└─→ Read MYSELF.example.md
└─→ Generate initial MYSELF.md draft
PHASE 4: Deep Dive & Refinement (Parallel Subagents)
└─→ Launch 7 subagents for background discovery
└─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
└─→ Wait for ALL subagents to complete
└─→ Enhance MYSELF.md with findings
PHASE 5: Final Delivery (Sequential)
└─→ Review and polish MYSELF.md
└─→ Deliver to user
```
**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)
## When to Use
Use this skill when:
- A new team member joins and needs to create their `MYSELF.md`
- Someone wants to update their existing `MYSELF.md` with fresh data
- You need to understand a person's work patterns, collaborators, and preferences
## Prerequisites
Before starting, ensure:
1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
3. You have access to the MYSELF.example.md template
## Process Overview
The onboarding process consists of 5 phases:
1. **Identity Resolution** - Get the person's contact record
2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
3. **Initial Synthesis** - Create initial MYSELF.md draft
4. **Deep Dive & Refinement** - Search for specific personal details and enhance
5. **Final Delivery** - Review and deliver completed MYSELF.md
## Phase 1: Identity Resolution
Get the person's identity from ContactDB:
```python
# Get self
contactdb_get_me()
# Or search by name
contactdb_query_contacts(search="Person Name")
```
**Extract key info:**
- Name, email, role, location
- Contact ID (needed for all subsequent queries)
- Platform identities (Zulip, email, Reflector)
- Stats (hotness score, interaction counts)
## Phase 2: Historical Data Gathering
**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**
Launch parallel subagents to query all entity types for each monthly time range.
**Mandatory approach (NO EXCEPTIONS):**
- Time range: Past 12 months (or since joining)
- One subagent per month that queries ALL entity types
- Total: 12 subagents (one for each month)
**Why subagents are required:**
- Each monthly query is independent and can run in parallel
- Direct queries would take too long and exceed context limits
- Subagents aggregate data per time period, making synthesis easier
- This is the ONLY way to get comprehensive historical coverage
**Benefits of 1 subagent per month:**
- Holistic view of each month across all channels
- Cross-channel context (e.g., meeting follows up on Zulip discussion)
- Simpler to implement and debug
- Results pre-aggregated by time period
**Subagent task structure:**
```
Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
For each entity type (threaded_conversation, conversation_message, meeting, email):
Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified
```
**Example time ranges (monthly):**
- 2025-02-19 to 2025-03-19
- 2025-03-19 to 2025-04-19
- ... (continue for 12 months)
## Phase 3: Initial Synthesis & Generation
After gathering all data:
1. **Summarize findings:**
- Total activity counts per entity type
- Most active time periods
- Key projects/topics
- Frequent collaborators
2. **Read MYSELF.example.md** to understand the template structure
3. **Generate initial MYSELF.md** with:
- Identity section (from ContactDB)
- Work areas (from meeting topics, Zulip streams)
- Collaborators (from meeting participants, message contacts)
- Basic preferences (inferred from activity patterns)
## Phase 4: Deep Dive & Refinement
**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**
**Launch 7 parallel subagents** to search for background information across all categories.
Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.
### Discovery Categories (One Subagent Per Category)
**1. Personal Interests & Hobbies**
```
Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences
```
**2. Work Schedule & Availability**
```
Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
```
**3. Professional Values & Goals**
```
Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements
```
**4. Communication & Workflow Preferences**
```
Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences
```
**5. Background & Career History**
```
Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey
```
**6. Daily Life & Routines**
```
Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle
```
**7. Technical Preferences**
```
Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
```
### Subagent Task Template
```
Search DataIndex for background information about {Name}.
API Call:
POST /dataindex/api/v1/search
{
"search_text": "{Name} {category-specific search terms}",
"date_from": "{12_months_ago}",
"date_to": "{today}",
"limit": 20
}
Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights
```
**Why parallel subagents:**
- Each search is independent - perfect for parallelization
- Reduces execution time from minutes to seconds
- Comprehensive coverage without overwhelming the main agent
- Gathers rich context for personalizing the MYSELF.md
**Critical: Use name-based search**
- Always include the person's full name in the search query
- Do NOT rely on contact_id filtering for semantic search
- Personal details appear in transcripts where names are mentioned
- contact_id filters work for exact queries but fail for RAG/semantic retrieval
When searching for personal details, use the person's full name in the query:
```python
# GOOD - Uses name in search text
dataindex_search(
query="Mathieu Virbel hobbies interests personal life outside work",
date_from="2025-02-19T00:00:00Z",
date_to="2026-02-19T00:00:00Z",
limit=20
)
# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
query="hobbies interests personal life",
contact_ids=[4], # RAG/semantic search doesn't work well with contact_id
limit=20
)
```
**Key Insight:**
- Semantic search works best with full context in the query text
- contact_id filtering works for exact entity matching but not for RAG retrieval
- Personal details often appear in meeting transcripts where names are mentioned
## Output: MYSELF.md Structure
The final document should include:
```markdown
# About Me
## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)
## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities
## People I work with frequently
- List of key collaborators with context
## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values
## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges
```
## Tips for Quality Results
1. **Be thorough in Phase 2** - More historical data = better insights
2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
4. **Ask follow-up questions** - Users often want to discover unexpected things
5. **Search by name, not ID** - Critical for finding personal context
6. **Synthesize meeting transcripts** - They contain rich personal details
7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
8. **Update over time** - MYSELF.md should evolve as the person does
## Common Mistakes to Avoid
**❌ DON'T query DataIndex directly in Phase 2 or 4**
- Direct queries miss the monthly breakdown
- You won't get comprehensive historical coverage
- Context limits will truncate results
**❌ DON'T launch 48 subagents (12 months × 4 entity types)**
- Use 12 subagents (1 per month) instead
- Each monthly subagent queries all 4 entity types
- Simpler coordination and better cross-channel context
**❌ DON'T skip Phase 2 and go straight to Phase 4**
- You need historical context before doing deep searches
- The monthly aggregation reveals patterns you can't see otherwise
**❌ DON'T use contact_id filtering for semantic searches**
- RAG/semantic search requires the person's name in the query text
- contact_id filters only work for exact entity matching
**✅ ALWAYS use the Task tool to launch subagents**
- This is the only way to achieve true parallelism
- Each subagent gets its own context window
- Results can be aggregated after all complete
## Example Usage
```
User: "Help me create my MYSELF.md"
Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
- Personal interests & hobbies
- Work schedule & availability
- Professional values & goals
- Communication & workflow preferences
- Background & career history
- Daily life & routines
- Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document
```
**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks
## Files
- `MYSELF.example.md` - Template file to copy and fill
- `MYSELF.md` - Generated output (gitignored, personal to each user)