feat: add self-onboarding skill

2026-02-20 15:12:29 -06:00
parent fc3bdf86ba
commit f2a8eaff3c
2 changed files with 358 additions and 0 deletions
--- a/.agents/skills/self-onboarding/SKILL.md
+++ b/.agents/skills/self-onboarding/SKILL.md
@@ -0,0 +1,357 @@
 ---
 name: self-onboarding
 description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events).
 user-invocable: true
 ---
 # Self-Onboarding Skill
 This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system.
 ## ⚠️ MANDATORY EXECUTION PLAN
 **ALWAYS follow these phases in order. Do NOT skip phases or combine them.**
 ```
 PHASE 1: Identity Resolution (Sequential)
  └─→ Get contact_id via contactdb_get_me()
  └─→ Extract: name, email, role, location, contact_id
 PHASE 2: Historical Data Gathering (Parallel Subagents)
  └─→ Launch 12 subagents (1 per month, querying all entity types)
  └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email
  └─→ Wait for ALL subagents to complete
  └─→ Collect and synthesize findings
 PHASE 3: Initial Synthesis & Generation (Sequential)
  └─→ Read MYSELF.example.md
  └─→ Generate initial MYSELF.md draft
 PHASE 4: Deep Dive & Refinement (Parallel Subagents)
  └─→ Launch 7 subagents for background discovery
  └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs
  └─→ Wait for ALL subagents to complete
  └─→ Enhance MYSELF.md with findings
 PHASE 5: Final Delivery (Sequential)
  └─→ Review and polish MYSELF.md
  └─→ Deliver to user
 ```
 **Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4)
 ## When to Use
 Use this skill when:
 - A new team member joins and needs to create their `MYSELF.md`
 - Someone wants to update their existing `MYSELF.md` with fresh data
 - You need to understand a person's work patterns, collaborators, and preferences
 ## Prerequisites
 Before starting, ensure:
 1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`)
 2. They have historical data in the system (meetings, Zulip messages, emails, etc.)
 3. You have access to the MYSELF.example.md template
 ## Process Overview
 The onboarding process consists of 5 phases:
 1. **Identity Resolution** - Get the person's contact record
 2. **Historical Data Gathering** - Query all entity types across 12 monthly periods
 3. **Initial Synthesis** - Create initial MYSELF.md draft
 4. **Deep Dive & Refinement** - Search for specific personal details and enhance
 5. **Final Delivery** - Review and deliver completed MYSELF.md
 ## Phase 1: Identity Resolution
 Get the person's identity from ContactDB:
 ```python
 # Get self
 contactdb_get_me()
 # Or search by name
 contactdb_query_contacts(search="Person Name")
 ```
 **Extract key info:**
 - Name, email, role, location
 - Contact ID (needed for all subsequent queries)
 - Platform identities (Zulip, email, Reflector)
 - Stats (hotness score, interaction counts)
 ## Phase 2: Historical Data Gathering
 **⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.**
 Launch parallel subagents to query all entity types for each monthly time range.
 **Mandatory approach (NO EXCEPTIONS):**
 - Time range: Past 12 months (or since joining)
 - One subagent per month that queries ALL entity types
 - Total: 12 subagents (one for each month)
 **Why subagents are required:**
 - Each monthly query is independent and can run in parallel
 - Direct queries would take too long and exceed context limits
 - Subagents aggregate data per time period, making synthesis easier
 - This is the ONLY way to get comprehensive historical coverage
 **Benefits of 1 subagent per month:**
 - Holistic view of each month across all channels
 - Cross-channel context (e.g., meeting follows up on Zulip discussion)
 - Simpler to implement and debug
 - Results pre-aggregated by time period
 **Subagent task structure:**
 ```
 Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}.
 For each entity type (threaded_conversation, conversation_message, meeting, email):
  Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100
 Synthesize findings across all channels and return a monthly summary with:
 1. Total activity counts per entity type
 2. Key topics/projects discussed
 3. Notable patterns and themes
 4. Collaborators involved
 5. Work areas/projects identified
 ```
 **Example time ranges (monthly):**
 - 2025-02-19 to 2025-03-19
 - 2025-03-19 to 2025-04-19
 - ... (continue for 12 months)
 ## Phase 3: Initial Synthesis & Generation
 After gathering all data:
 1. **Summarize findings:**
   - Total activity counts per entity type
   - Most active time periods
   - Key projects/topics
   - Frequent collaborators
 2. **Read MYSELF.example.md** to understand the template structure
 3. **Generate initial MYSELF.md** with:
   - Identity section (from ContactDB)
   - Work areas (from meeting topics, Zulip streams)
   - Collaborators (from meeting participants, message contacts)
   - Basic preferences (inferred from activity patterns)
 ## Phase 4: Deep Dive & Refinement
 **⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.**
 **Launch 7 parallel subagents** to search for background information across all categories.
 Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category.
 ### Discovery Categories (One Subagent Per Category)
 **1. Personal Interests & Hobbies**
 ```
 Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking"
 Look for: recreational activities, interests, entertainment preferences
 ```
 **2. Work Schedule & Availability**
 ```
 Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon"
 Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints
 ```
 **3. Professional Values & Goals**
 ```
 Search: "{Name} values goals mission purpose why he works career objective philosophy"
 Look for: motivations, career aspirations, professional beliefs, purpose statements
 ```
 **4. Communication & Workflow Preferences**
 ```
 Search: "{Name} workflow tools preferences how he likes to work communication style feedback"
 Look for: preferred tools, work methodologies, communication patterns, feedback preferences
 ```
 **5. Background & Career History**
 ```
 Search: "{Name} background career history previous roles education transition story experience"
 Look for: prior jobs, education, career changes, professional journey
 ```
 **6. Daily Life & Routines**
 ```
 Search: "{Name} daily routine family married children commute work-life balance personal context"
 Look for: family situation, daily schedule, personal commitments, lifestyle
 ```
 **7. Technical Preferences**
 ```
 Search: "{Name} tools development workflow process methodology architecture decisions technical approach"
 Look for: favorite tools, coding practices, technical philosophy, preferred frameworks
 ```
 ### Subagent Task Template
 ```
 Search DataIndex for background information about {Name}.
 API Call:
 POST /dataindex/api/v1/search
 {
  "search_text": "{Name} {category-specific search terms}",
  "date_from": "{12_months_ago}",
  "date_to": "{today}",
  "limit": 20
 }
 Extract and return:
 - Specific details found (quotes if available)
 - Patterns or recurring themes
 - Context about personal/professional life
 - Any notable insights
 ```
 **Why parallel subagents:**
 - Each search is independent - perfect for parallelization
 - Reduces execution time from minutes to seconds
 - Comprehensive coverage without overwhelming the main agent
 - Gathers rich context for personalizing the MYSELF.md
 **Critical: Use name-based search**
 - Always include the person's full name in the search query
 - Do NOT rely on contact_id filtering for semantic search
 - Personal details appear in transcripts where names are mentioned
 - contact_id filters work for exact queries but fail for RAG/semantic retrieval
 When searching for personal details, use the person's full name in the query:
 ```python
 # GOOD - Uses name in search text
 dataindex_search(
    query="Mathieu Virbel hobbies interests personal life outside work",
    date_from="2025-02-19T00:00:00Z",
    date_to="2026-02-19T00:00:00Z",
    limit=20
 )
 # BAD - Only filters by contact_id (won't find personal context)
 dataindex_search(
    query="hobbies interests personal life",
    contact_ids=[4],  # RAG/semantic search doesn't work well with contact_id
    limit=20
 )
 ```
 **Key Insight:**
 - Semantic search works best with full context in the query text
 - contact_id filtering works for exact entity matching but not for RAG retrieval
 - Personal details often appear in meeting transcripts where names are mentioned
 ## Output: MYSELF.md Structure
 The final document should include:
 ```markdown
 # About Me
 ## Identity
 - Name, Role, Contact ID, Email, Location
 - Family status (if discovered)
 ## What I work on
 - Primary projects with descriptions
 - Client work
 - Additional responsibilities
 ## People I work with frequently
 - List of key collaborators with context
 ## Personal Context (if discovered)
 - Background/career history
 - Daily schedule & constraints
 - Interests & values
 ## Preferences
 - Work style
 - Default date ranges
 - Output formats
 - Topics of interest
 - Communication patterns
 - Tools & workflow
 - Security/privacy stance
 - Current learning areas
 - Known challenges
 ```
 ## Tips for Quality Results
 1. **Be thorough in Phase 2** - More historical data = better insights
 2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed
 3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels
 4. **Ask follow-up questions** - Users often want to discover unexpected things
 5. **Search by name, not ID** - Critical for finding personal context
 6. **Synthesize meeting transcripts** - They contain rich personal details
 7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics
 8. **Update over time** - MYSELF.md should evolve as the person does
 ## Common Mistakes to Avoid
 **❌ DON'T query DataIndex directly in Phase 2 or 4**
 - Direct queries miss the monthly breakdown
 - You won't get comprehensive historical coverage
 - Context limits will truncate results
 **❌ DON'T launch 48 subagents (12 months × 4 entity types)**
 - Use 12 subagents (1 per month) instead
 - Each monthly subagent queries all 4 entity types
 - Simpler coordination and better cross-channel context
 **❌ DON'T skip Phase 2 and go straight to Phase 4**
 - You need historical context before doing deep searches
 - The monthly aggregation reveals patterns you can't see otherwise
 **❌ DON'T use contact_id filtering for semantic searches**
 - RAG/semantic search requires the person's name in the query text
 - contact_id filters only work for exact entity matching
 **✅ ALWAYS use the Task tool to launch subagents**
 - This is the only way to achieve true parallelism
 - Each subagent gets its own context window
 - Results can be aggregated after all complete
 ## Example Usage
 ```
 User: "Help me create my MYSELF.md"
 Agent:
 1. Gets user's identity via contactdb_get_me()
 2. Discovers contact_id = 4, name = "Mathieu Virbel"
 3. Launches 12 subagents for historical data (1 per month, all entity types)
 4. Gathers summaries from all subagents
 5. Generates initial MYSELF.md
 6. Launches 7 parallel subagents for background discovery:
   - Personal interests & hobbies
   - Work schedule & availability
   - Professional values & goals
   - Communication & workflow preferences
   - Background & career history
   - Daily life & routines
   - Technical preferences
 7. Gathers all search results
 8. Updates MYSELF.md with rich personal context
 9. Delivers final document
 ```
 **Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks
 ## Files
 - `MYSELF.example.md` - Template file to copy and fill
 - `MYSELF.md` - Generated output (gitignored, personal to each user)
--- a/README.md
+++ b/README.md
@@ -17,6 +17,7 @@ These are workflows you trigger explicitly. The agent will not run them on its o
 | **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. |
 | **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. |
 | **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. |
 | **self-onboarding** | `/self-onboarding` | Generate a personalized `MYSELF.md` by analyzing 12 months of historical activity (meetings, emails, Zulip, calendar). Runs 19 parallel subagents to build a comprehensive profile. |
 ### Reference Skills (agent-loaded automatically)