Files
internalai-agent/.agents/skills/self-onboarding/SKILL.md

12 KiB
Raw Permalink Blame History

name, description, user-invocable
name description user-invocable
self-onboarding Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events). true

Self-Onboarding Skill

This skill helps new team members create a comprehensive MYSELF.md file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.

⚠️ MANDATORY EXECUTION PLAN

ALWAYS follow these phases in order. Do NOT skip phases or combine them.

PHASE 1: Identity Resolution (Sequential)
  └─→ Get contact_id via contactdb_get_me()
  └─→ Extract: name, email, role, location, contact_id

PHASE 2: Historical Data Gathering (Parallel Subagents)
  └─→ Launch 12 subagents (1 per month, querying all entity types)
  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
  └─→ Wait for ALL subagents to complete
  └─→ Collect and synthesize findings

PHASE 3: Initial Synthesis & Generation (Sequential)
  └─→ Read MYSELF.example.md
  └─→ Generate initial MYSELF.md draft

PHASE 4: Deep Dive & Refinement (Parallel Subagents)
  └─→ Launch 7 subagents for background discovery
  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
  └─→ Wait for ALL subagents to complete
  └─→ Enhance MYSELF.md with findings

PHASE 5: Final Delivery (Sequential)
  └─→ Review and polish MYSELF.md
  └─→ Deliver to user

Total subagents required: 19 (12 for Phase 2 + 7 for Phase 4)

When to Use

Use this skill when:

  • A new team member joins and needs to create their MYSELF.md
  • Someone wants to update their existing MYSELF.md with fresh data
  • You need to understand a person's work patterns, collaborators, and preferences

Prerequisites

Before starting, ensure:

  1. The person has a ContactDB record (use contactdb_get_me or contactdb_query_contacts)
  2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
  3. You have access to the MYSELF.example.md template

Process Overview

The onboarding process consists of 5 phases:

  1. Identity Resolution - Get the person's contact record
  2. Historical Data Gathering - Query all entity types across 12 monthly periods
  3. Initial Synthesis - Create initial MYSELF.md draft
  4. Deep Dive & Refinement - Search for specific personal details and enhance
  5. Final Delivery - Review and deliver completed MYSELF.md

Phase 1: Identity Resolution

Get the person's identity from ContactDB:

# Get self
contactdb_get_me()

# Or search by name
contactdb_query_contacts(search="Person Name")

Extract key info:

  • Name, email, role, location
  • Contact ID (needed for all subsequent queries)
  • Platform identities (Zulip, email, Reflector)
  • Stats (hotness score, interaction counts)

Phase 2: Historical Data Gathering

⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.

Launch parallel subagents to query all entity types for each monthly time range.

Mandatory approach (NO EXCEPTIONS):

  • Time range: Past 12 months (or since joining)
  • One subagent per month that queries ALL entity types
  • Total: 12 subagents (one for each month)

Why subagents are required:

  • Each monthly query is independent and can run in parallel
  • Direct queries would take too long and exceed context limits
  • Subagents aggregate data per time period, making synthesis easier
  • This is the ONLY way to get comprehensive historical coverage

Benefits of 1 subagent per month:

  • Holistic view of each month across all channels
  • Cross-channel context (e.g., meeting follows up on Zulip discussion)
  • Simpler to implement and debug
  • Results pre-aggregated by time period

Subagent task structure:

Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.

For each entity type (threaded_conversation, conversation_message, meeting, email):
  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100

Synthesize findings across all channels and return a monthly summary with:
1. Total activity counts per entity type
2. Key topics/projects discussed
3. Notable patterns and themes
4. Collaborators involved
5. Work areas/projects identified

Example time ranges (monthly):

  • 2025-02-19 to 2025-03-19
  • 2025-03-19 to 2025-04-19
  • ... (continue for 12 months)

Phase 3: Initial Synthesis & Generation

After gathering all data:

  1. Summarize findings:

    • Total activity counts per entity type
    • Most active time periods
    • Key projects/topics
    • Frequent collaborators
  2. Read MYSELF.example.md to understand the template structure

  3. Generate initial MYSELF.md with:

    • Identity section (from ContactDB)
    • Work areas (from meeting topics, Zulip streams)
    • Collaborators (from meeting participants, message contacts)
    • Basic preferences (inferred from activity patterns)

Phase 4: Deep Dive & Refinement

⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.

Launch 7 parallel subagents to search for background information across all categories.

Each subagent searches using the person's full name in the query text (not contact_id filtering) and returns findings for one category.

Discovery Categories (One Subagent Per Category)

1. Personal Interests & Hobbies

Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
Look for: recreational activities, interests, entertainment preferences

2. Work Schedule & Availability

Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints

3. Professional Values & Goals

Search: "{Name} values goals mission purpose why he works career objective philosophy"
Look for: motivations, career aspirations, professional beliefs, purpose statements

4. Communication & Workflow Preferences

Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
Look for: preferred tools, work methodologies, communication patterns, feedback preferences

5. Background & Career History

Search: "{Name} background career history previous roles education transition story experience"
Look for: prior jobs, education, career changes, professional journey

6. Daily Life & Routines

Search: "{Name} daily routine family married children commute work-life balance personal context"
Look for: family situation, daily schedule, personal commitments, lifestyle

7. Technical Preferences

Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
Look for: favorite tools, coding practices, technical philosophy, preferred frameworks

Subagent Task Template

Search DataIndex for background information about {Name}.

API Call:
POST /dataindex/api/v1/search
{
  "search_text": "{Name} {category-specific search terms}",
  "date_from": "{12_months_ago}",
  "date_to": "{today}",
  "limit": 20
}

Extract and return:
- Specific details found (quotes if available)
- Patterns or recurring themes
- Context about personal/professional life
- Any notable insights

Why parallel subagents:

  • Each search is independent - perfect for parallelization
  • Reduces execution time from minutes to seconds
  • Comprehensive coverage without overwhelming the main agent
  • Gathers rich context for personalizing the MYSELF.md

Critical: Use name-based search

  • Always include the person's full name in the search query
  • Do NOT rely on contact_id filtering for semantic search
  • Personal details appear in transcripts where names are mentioned
  • contact_id filters work for exact queries but fail for RAG/semantic retrieval

When searching for personal details, use the person's full name in the query:

# GOOD - Uses name in search text
dataindex_search(
    query="Mathieu Virbel hobbies interests personal life outside work",
    date_from="2025-02-19T00:00:00Z",
    date_to="2026-02-19T00:00:00Z",
    limit=20
)

# BAD - Only filters by contact_id (won't find personal context)
dataindex_search(
    query="hobbies interests personal life",
    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
    limit=20
)

Key Insight:

  • Semantic search works best with full context in the query text
  • contact_id filtering works for exact entity matching but not for RAG retrieval
  • Personal details often appear in meeting transcripts where names are mentioned

Output: MYSELF.md Structure

The final document should include:

# About Me

## Identity
- Name, Role, Contact ID, Email, Location
- Family status (if discovered)

## What I work on
- Primary projects with descriptions
- Client work
- Additional responsibilities

## People I work with frequently
- List of key collaborators with context

## Personal Context (if discovered)
- Background/career history
- Daily schedule & constraints
- Interests & values

## Preferences
- Work style
- Default date ranges
- Output formats
- Topics of interest
- Communication patterns
- Tools & workflow
- Security/privacy stance
- Current learning areas
- Known challenges

Tips for Quality Results

  1. Be thorough in Phase 2 - More historical data = better insights
  2. Use parallel subagents - 12 monthly subagents run concurrently for speed
  3. Cross-channel synthesis - Monthly subagents see the full picture across all channels
  4. Ask follow-up questions - Users often want to discover unexpected things
  5. Search by name, not ID - Critical for finding personal context
  6. Synthesize meeting transcripts - They contain rich personal details
  7. Look for patterns - Timezone mentions, scheduling preferences, recurring topics
  8. Update over time - MYSELF.md should evolve as the person does

Common Mistakes to Avoid

DON'T query DataIndex directly in Phase 2 or 4

  • Direct queries miss the monthly breakdown
  • You won't get comprehensive historical coverage
  • Context limits will truncate results

DON'T launch 48 subagents (12 months × 4 entity types)

  • Use 12 subagents (1 per month) instead
  • Each monthly subagent queries all 4 entity types
  • Simpler coordination and better cross-channel context

DON'T skip Phase 2 and go straight to Phase 4

  • You need historical context before doing deep searches
  • The monthly aggregation reveals patterns you can't see otherwise

DON'T use contact_id filtering for semantic searches

  • RAG/semantic search requires the person's name in the query text
  • contact_id filters only work for exact entity matching

ALWAYS use the Task tool to launch subagents

  • This is the only way to achieve true parallelism
  • Each subagent gets its own context window
  • Results can be aggregated after all complete

Example Usage

User: "Help me create my MYSELF.md"

Agent:
1. Gets user's identity via contactdb_get_me()
2. Discovers contact_id = 4, name = "Mathieu Virbel"
3. Launches 12 subagents for historical data (1 per month, all entity types)
4. Gathers summaries from all subagents
5. Generates initial MYSELF.md
6. Launches 7 parallel subagents for background discovery:
   - Personal interests & hobbies
   - Work schedule & availability
   - Professional values & goals
   - Communication & workflow preferences
   - Background & career history
   - Daily life & routines
   - Technical preferences
7. Gathers all search results
8. Updates MYSELF.md with rich personal context
9. Delivers final document

Total subagents launched: 12 (historical) + 7 (discovery) = 19 parallel tasks

Files

  • MYSELF.example.md - Template file to copy and fill
  • MYSELF.md - Generated output (gitignored, personal to each user)