From f2a8eaff3cbe27dc0ba4a0cf37785280b5960c78 Mon Sep 17 00:00:00 2001 From: Mathieu Virbel Date: Fri, 20 Feb 2026 15:12:29 -0600 Subject: [PATCH] feat: add self-onboarding skill --- .agents/skills/self-onboarding/SKILL.md | 357 ++++++++++++++++++++++++ README.md | 1 + 2 files changed, 358 insertions(+) create mode 100644 .agents/skills/self-onboarding/SKILL.md diff --git a/.agents/skills/self-onboarding/SKILL.md b/.agents/skills/self-onboarding/SKILL.md new file mode 100644 index 0000000..c530624 --- /dev/null +++ b/.agents/skills/self-onboarding/SKILL.md @@ -0,0 +1,357 @@ +--- +name: self-onboarding +description: Generate a personalized MYSELF.md file for new team members by analyzing their historical activity across all data sources (meetings, emails, Zulip conversations, calendar events). +user-invocable: true +--- + +# Self-Onboarding Skill + +This skill helps new team members create a comprehensive `MYSELF.md` file that captures their identity, work patterns, collaborations, and preferences based on their actual historical activity in the system. + +## ⚠️ MANDATORY EXECUTION PLAN + +**ALWAYS follow these phases in order. Do NOT skip phases or combine them.** + +``` +PHASE 1: Identity Resolution (Sequential) + └─→ Get contact_id via contactdb_get_me() + └─→ Extract: name, email, role, location, contact_id + +PHASE 2: Historical Data Gathering (Parallel Subagents) + └─→ Launch 12 subagents (1 per month, querying all entity types) + └─→ Each subagent queries: threaded_conversation, conversation_message, meeting, email + └─→ Wait for ALL subagents to complete + └─→ Collect and synthesize findings + +PHASE 3: Initial Synthesis & Generation (Sequential) + └─→ Read MYSELF.example.md + └─→ Generate initial MYSELF.md draft + +PHASE 4: Deep Dive & Refinement (Parallel Subagents) + └─→ Launch 7 subagents for background discovery + └─→ Categories: interests, schedule, values, workflow, background, daily life, technical prefs + └─→ Wait for ALL subagents to complete + └─→ Enhance MYSELF.md with findings + +PHASE 5: Final Delivery (Sequential) + └─→ Review and polish MYSELF.md + └─→ Deliver to user +``` + +**Total subagents required: 19** (12 for Phase 2 + 7 for Phase 4) + +## When to Use + +Use this skill when: +- A new team member joins and needs to create their `MYSELF.md` +- Someone wants to update their existing `MYSELF.md` with fresh data +- You need to understand a person's work patterns, collaborators, and preferences + +## Prerequisites + +Before starting, ensure: +1. The person has a ContactDB record (use `contactdb_get_me` or `contactdb_query_contacts`) +2. They have historical data in the system (meetings, Zulip messages, emails, etc.) +3. You have access to the MYSELF.example.md template + +## Process Overview + +The onboarding process consists of 5 phases: + +1. **Identity Resolution** - Get the person's contact record +2. **Historical Data Gathering** - Query all entity types across 12 monthly periods +3. **Initial Synthesis** - Create initial MYSELF.md draft +4. **Deep Dive & Refinement** - Search for specific personal details and enhance +5. **Final Delivery** - Review and deliver completed MYSELF.md + +## Phase 1: Identity Resolution + +Get the person's identity from ContactDB: + +```python +# Get self +contactdb_get_me() + +# Or search by name +contactdb_query_contacts(search="Person Name") +``` + +**Extract key info:** +- Name, email, role, location +- Contact ID (needed for all subsequent queries) +- Platform identities (Zulip, email, Reflector) +- Stats (hotness score, interaction counts) + +## Phase 2: Historical Data Gathering + +**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT query directly.** + +Launch parallel subagents to query all entity types for each monthly time range. + +**Mandatory approach (NO EXCEPTIONS):** +- Time range: Past 12 months (or since joining) +- One subagent per month that queries ALL entity types +- Total: 12 subagents (one for each month) + +**Why subagents are required:** +- Each monthly query is independent and can run in parallel +- Direct queries would take too long and exceed context limits +- Subagents aggregate data per time period, making synthesis easier +- This is the ONLY way to get comprehensive historical coverage + +**Benefits of 1 subagent per month:** +- Holistic view of each month across all channels +- Cross-channel context (e.g., meeting follows up on Zulip discussion) +- Simpler to implement and debug +- Results pre-aggregated by time period + +**Subagent task structure:** + +``` +Query DataIndex API for ALL entity types involving contact_id {ID} from {date_from} to {date_to}. + +For each entity type (threaded_conversation, conversation_message, meeting, email): + Use: GET http://localhost:42000/dataindex/api/v1/query?entity_types={entity_type}&contact_ids={ID}&date_from={date_from}&date_to={date_to}&limit=100 + +Synthesize findings across all channels and return a monthly summary with: +1. Total activity counts per entity type +2. Key topics/projects discussed +3. Notable patterns and themes +4. Collaborators involved +5. Work areas/projects identified +``` + +**Example time ranges (monthly):** +- 2025-02-19 to 2025-03-19 +- 2025-03-19 to 2025-04-19 +- ... (continue for 12 months) + +## Phase 3: Initial Synthesis & Generation + +After gathering all data: + +1. **Summarize findings:** + - Total activity counts per entity type + - Most active time periods + - Key projects/topics + - Frequent collaborators + +2. **Read MYSELF.example.md** to understand the template structure + +3. **Generate initial MYSELF.md** with: + - Identity section (from ContactDB) + - Work areas (from meeting topics, Zulip streams) + - Collaborators (from meeting participants, message contacts) + - Basic preferences (inferred from activity patterns) + +## Phase 4: Deep Dive & Refinement + +**⚠️ CRITICAL: This phase MUST use parallel subagents. Do NOT search directly.** + +**Launch 7 parallel subagents** to search for background information across all categories. + +Each subagent searches using the person's **full name** in the query text (not contact_id filtering) and returns findings for one category. + +### Discovery Categories (One Subagent Per Category) + +**1. Personal Interests & Hobbies** +``` +Search: "{Name} hobbies interests personal life outside work sports books travel music games cooking" +Look for: recreational activities, interests, entertainment preferences +``` + +**2. Work Schedule & Availability** +``` +Search: "{Name} schedule availability hours timezone meeting time preference morning afternoon" +Look for: preferred work hours, timezone mentions, lunch breaks, scheduling constraints +``` + +**3. Professional Values & Goals** +``` +Search: "{Name} values goals mission purpose why he works career objective philosophy" +Look for: motivations, career aspirations, professional beliefs, purpose statements +``` + +**4. Communication & Workflow Preferences** +``` +Search: "{Name} workflow tools preferences how he likes to work communication style feedback" +Look for: preferred tools, work methodologies, communication patterns, feedback preferences +``` + +**5. Background & Career History** +``` +Search: "{Name} background career history previous roles education transition story experience" +Look for: prior jobs, education, career changes, professional journey +``` + +**6. Daily Life & Routines** +``` +Search: "{Name} daily routine family married children commute work-life balance personal context" +Look for: family situation, daily schedule, personal commitments, lifestyle +``` + +**7. Technical Preferences** +``` +Search: "{Name} tools development workflow process methodology architecture decisions technical approach" +Look for: favorite tools, coding practices, technical philosophy, preferred frameworks +``` + +### Subagent Task Template + +``` +Search DataIndex for background information about {Name}. + +API Call: +POST /dataindex/api/v1/search +{ + "search_text": "{Name} {category-specific search terms}", + "date_from": "{12_months_ago}", + "date_to": "{today}", + "limit": 20 +} + +Extract and return: +- Specific details found (quotes if available) +- Patterns or recurring themes +- Context about personal/professional life +- Any notable insights +``` + +**Why parallel subagents:** +- Each search is independent - perfect for parallelization +- Reduces execution time from minutes to seconds +- Comprehensive coverage without overwhelming the main agent +- Gathers rich context for personalizing the MYSELF.md + +**Critical: Use name-based search** +- Always include the person's full name in the search query +- Do NOT rely on contact_id filtering for semantic search +- Personal details appear in transcripts where names are mentioned +- contact_id filters work for exact queries but fail for RAG/semantic retrieval + +When searching for personal details, use the person's full name in the query: + +```python +# GOOD - Uses name in search text +dataindex_search( + query="Mathieu Virbel hobbies interests personal life outside work", + date_from="2025-02-19T00:00:00Z", + date_to="2026-02-19T00:00:00Z", + limit=20 +) + +# BAD - Only filters by contact_id (won't find personal context) +dataindex_search( + query="hobbies interests personal life", + contact_ids=[4], # RAG/semantic search doesn't work well with contact_id + limit=20 +) +``` + +**Key Insight:** +- Semantic search works best with full context in the query text +- contact_id filtering works for exact entity matching but not for RAG retrieval +- Personal details often appear in meeting transcripts where names are mentioned + +## Output: MYSELF.md Structure + +The final document should include: + +```markdown +# About Me + +## Identity +- Name, Role, Contact ID, Email, Location +- Family status (if discovered) + +## What I work on +- Primary projects with descriptions +- Client work +- Additional responsibilities + +## People I work with frequently +- List of key collaborators with context + +## Personal Context (if discovered) +- Background/career history +- Daily schedule & constraints +- Interests & values + +## Preferences +- Work style +- Default date ranges +- Output formats +- Topics of interest +- Communication patterns +- Tools & workflow +- Security/privacy stance +- Current learning areas +- Known challenges +``` + +## Tips for Quality Results + +1. **Be thorough in Phase 2** - More historical data = better insights +2. **Use parallel subagents** - 12 monthly subagents run concurrently for speed +3. **Cross-channel synthesis** - Monthly subagents see the full picture across all channels +4. **Ask follow-up questions** - Users often want to discover unexpected things +5. **Search by name, not ID** - Critical for finding personal context +6. **Synthesize meeting transcripts** - They contain rich personal details +7. **Look for patterns** - Timezone mentions, scheduling preferences, recurring topics +8. **Update over time** - MYSELF.md should evolve as the person does + +## Common Mistakes to Avoid + +**❌ DON'T query DataIndex directly in Phase 2 or 4** +- Direct queries miss the monthly breakdown +- You won't get comprehensive historical coverage +- Context limits will truncate results + +**❌ DON'T launch 48 subagents (12 months × 4 entity types)** +- Use 12 subagents (1 per month) instead +- Each monthly subagent queries all 4 entity types +- Simpler coordination and better cross-channel context + +**❌ DON'T skip Phase 2 and go straight to Phase 4** +- You need historical context before doing deep searches +- The monthly aggregation reveals patterns you can't see otherwise + +**❌ DON'T use contact_id filtering for semantic searches** +- RAG/semantic search requires the person's name in the query text +- contact_id filters only work for exact entity matching + +**✅ ALWAYS use the Task tool to launch subagents** +- This is the only way to achieve true parallelism +- Each subagent gets its own context window +- Results can be aggregated after all complete + +## Example Usage + +``` +User: "Help me create my MYSELF.md" + +Agent: +1. Gets user's identity via contactdb_get_me() +2. Discovers contact_id = 4, name = "Mathieu Virbel" +3. Launches 12 subagents for historical data (1 per month, all entity types) +4. Gathers summaries from all subagents +5. Generates initial MYSELF.md +6. Launches 7 parallel subagents for background discovery: + - Personal interests & hobbies + - Work schedule & availability + - Professional values & goals + - Communication & workflow preferences + - Background & career history + - Daily life & routines + - Technical preferences +7. Gathers all search results +8. Updates MYSELF.md with rich personal context +9. Delivers final document +``` + +**Total subagents launched:** 12 (historical) + 7 (discovery) = 19 parallel tasks + +## Files + +- `MYSELF.example.md` - Template file to copy and fill +- `MYSELF.md` - Generated output (gitignored, personal to each user) diff --git a/README.md b/README.md index 756be32..c3f1fb5 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ These are workflows you trigger explicitly. The agent will not run them on its o | **project-sync** | `/project-sync [name]` | Incremental update of a project timeline. Reads the last sync date from `sync-state.md`, fetches new data through today, creates new week files, and refreshes the timeline and background. | | **checkout** | `/checkout` | Build a weekly review (Sunday through today). Gathers meetings, emails, Zulip conversations, and Gitea activity, then produces a structured checkout summary. | | **workflow** | `/workflow [topic]` | Create a marimo notebook for data analysis. Use for any request involving analysis over time periods or large data volumes. | +| **self-onboarding** | `/self-onboarding` | Generate a personalized `MYSELF.md` by analyzing 12 months of historical activity (meetings, emails, Zulip, calendar). Runs 19 parallel subagents to build a comprehensive profile. | ### Reference Skills (agent-loaded automatically)