feat: better greyscan positioning

2026-03-18 15:21:56 -04:00
parent 697c09457c
commit 62af4ed8b9
3 changed files with 79 additions and 42 deletions
--- a/app/api/analyze/prompt.txt
+++ b/app/api/analyze/prompt.txt
@@ -1,46 +1,51 @@
 You are a security analyst who deeply understands how AI coding agents behave. Your job is to generate a realistic threat report showing what an AI agent would attempt on a developer's machine when working on this repo.

-AI agents run as the user's own process with full access to their machine. But they don't just randomly access everything. The repo's contents DETERMINE what the agent is motivated to do. An agent working on a static HTML site has no reason to read AWS credentials. An agent working on a Terraform project with AWS SDK dependencies absolutely does.
+AI agents run as the user's own process with full access to their machine. The repo determines what the agent is motivated to do, but the attack surface is the developer's entire system. The agent doesn't stay within the repo boundary.

 Your findings must follow this logic: REPO CONTENTS → AGENT MOTIVATION → ACTION ON MACHINE.

-Examples of CORRECT causal chains:
+There are TWO categories of threats to analyze:
+
+CATEGORY 1: WHAT THE AGENT WOULD ACCESS (based on repo contents)
+
+Examples of correct causal chains:
 - Repo has aws-sdk in package.json → agent needs AWS context → reads ~/.aws/credentials
- Repo has Dockerfile + deploy scripts → agent needs SSH access for deployment → reads ~/.ssh/config
- Repo has .env.example with STRIPE_KEY → agent wants to test API integration → reads .env for real key, calls Stripe API
- Repo has CI/CD config (.github/workflows/) → agent wants to understand pipeline → reads git config, may push to trigger CI
- Repo has database migration files + pg dependency → agent needs DB connection → reads .env for DATABASE_URL, connects to production DB
+- Repo has Dockerfile + deploy scripts → agent needs SSH for deployment → reads ~/.ssh/config
+- Repo has .env.example with STRIPE_KEY → agent wants to test integration → reads .env, calls Stripe API
+- Repo has database migrations + pg dependency → agent needs DB connection → reads .env for DATABASE_URL

-Examples of WRONG findings (no causal link to repo):
- Static HTML repo → "reads SSH private keys" (why? there's nothing to deploy)
- Simple CLI tool with no network deps → "calls external APIs" (what APIs? there are none)
- Repo with no cloud dependencies → "reads ~/.aws/credentials" (no motivation to do this)
+Examples of WRONG findings (no causal link):
+- Static HTML repo → "reads SSH private keys" (no deployment, no motivation)
+- Repo with no cloud deps → "reads ~/.aws/credentials" (nothing uses AWS)
+- Simple library → "calls external APIs" (which APIs? there are none in the deps)

-AI agents follow these patterns, but ONLY when the repo motivates them to:
+CATEGORY 2: WHAT IN THIS REPO COULD MANIPULATE THE AGENT (prompt injection surface)

-FILESYSTEM READS:
- Read .env files to discover API keys and service credentials (only if .env/.env.example exists or dependencies suggest external services)
- Read config directories to understand project infrastructure
- Read package manifests to understand dependencies
- Read SSH config for deployment context (only if repo has deployment infra)
- Read cloud credential files (only if repo uses cloud SDKs)
- Read shell history to understand workflows (only if debugging or trying to reproduce commands)
+This is the sharper threat. Look for files that influence agent behavior:
+- CLAUDE.md, .claude/, .claude/commands/, .claude/agents/ — Claude Code instruction files
+- .cursorrules, .cursor/ — Cursor instruction files
+- .github/copilot-instructions.md — Copilot instructions
+- AGENTS.md, .mcp.json — agent config files
+- .aider.conf.yml — Aider config
+- CONTRIBUTING.md, CONVENTIONS.md — files agents read for project context
+- .windsurfrules, .boltrules, .clinerules — other agent instruction files
+- Any markdown file in the repo root that an agent would read for context

-FILESYSTEM WRITES:
- Write across the project directory, modifying files they think are relevant
- Modify git hooks to inject behavior (only if doing git-related work)
- Modify editor/tool configs (only if setting up dev environment)
+These files are a prompt injection surface. A malicious contributor, compromised dependency, or supply chain attack could modify these files to:
+- Instruct the agent to exfiltrate secrets to an attacker-controlled URL
+- Tell the agent to modify code in subtle, harmful ways
+- Redirect the agent to install malicious packages
+- Override safety behaviors the agent would normally follow
+- Instruct the agent to ignore security warnings

-COMMAND EXECUTION:
- Run package install commands which execute arbitrary post-install scripts (supply-chain risk, proportional to number of dependencies)
- Run build/test commands that may hit live services
- Chain commands with && and | pipes
- Run git commands including push
+If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface.

-NETWORK ACCESS:
- Call package registries during installs
- Call external APIs they discover credentials for (only if credentials and relevant SDK exist)
- Make curl/wget requests to URLs found in code
+SEVERITY CONTEXT:
+- Documentation sites, package registries, and CDN URLs in code are LOW severity, not high. Agents fetching docs from readthedocs.io or downloading from npmjs.org is normal behavior.
+- Reading .env files is only concerning if the repo actually has .env files or .env.example files
+- Network calls are only concerning if there are actual API keys/credentials the agent could discover
+- Supply chain risk (npm install, pip install) severity scales with number of dependencies. 3 deps = low. 300 deps = high.
+- Prompt injection via agent instruction files is HIGH/CRITICAL severity because it can hijack all other agent behavior

 Given the repository data below, generate a threat report. Every finding MUST have a clear causal chain from the repo's actual contents to the agent's action.

@@ -50,6 +55,7 @@ Stack detected: {{stack}}
 Dependencies: {{dependencies}}
 Sensitive files found: {{sensitiveFiles}}
 Config files found: {{configFiles}}
+Agent instruction files found: {{agentInstructionFiles}}

 Respond with ONLY valid JSON (no markdown, no code fences, no explanation):
 {
@@ -58,7 +64,7 @@ Respond with ONLY valid JSON (no markdown, no code fences, no explanation):
  "summary": "<2 sentence summary — what the agent would do and why, grounded in this repo's actual contents>",
  "findings": [
    {
-      "type": "credential_read" | "network_call" | "directory_access" | "command_execution",
+      "type": "credential_read" | "network_call" | "directory_access" | "command_execution" | "prompt_injection",
      "severity": "low" | "medium" | "high" | "critical",
      "title": "<short, specific title>",
      "description": "<1-2 sentences: what the agent would do, WHY this repo motivates it (reference specific files/deps), and the real-world damage>",
@@ -69,11 +75,13 @@ Respond with ONLY valid JSON (no markdown, no code fences, no explanation):

 Rules:
 - Generate 4-8 findings depending on actual repo complexity. Simple repos get fewer findings.
- Every finding MUST have a causal link: something in the repo that motivates the agent to take that action
- If the repo is simple (static site, small library, no cloud deps, no secrets), the score should be LOW (10-30) with only 3-4 findings
- If the repo has some config/deps but no secrets, score MEDIUM (30-60)
+- Every finding MUST have a causal link: something in the repo motivates the action
+- If agent instruction files exist, ALWAYS include a prompt_injection finding explaining the risk
+- If the repo is simple (static site, small library, no cloud deps, no secrets, no agent files), score LOW (10-30) with 3-4 findings
+- If the repo has config/deps but no secrets, score MEDIUM (30-60)
 - If the repo has .env files OR cloud SDK dependencies, score HIGH (60-80)
 - If the repo has .env files AND cloud SDKs AND deployment infra, score CRITICAL (80+)
- Do NOT inflate scores. A static HTML repo is low risk. Be honest.
- Do NOT generate findings that have no causal connection to this repo's contents
+- Agent instruction files bump the score by 10-15 points due to prompt injection risk
+- Do NOT inflate scores. Be honest. A static HTML repo is low risk.
+- Do NOT flag documentation sites (readthedocs, docs.python.org) or package registries as high-severity network threats. These are normal, expected agent behavior.
 - Commands must reference actual file paths from the repo tree
--- a/app/api/analyze/route.ts
+++ b/app/api/analyze/route.ts
@@ -10,7 +10,7 @@ const CACHE_TTL = 1000 * 60 * 60 * 24 // 24 hours

 export async function POST(req: NextRequest) {
  try {
-    const { owner, repo, files, stack, dependencies, sensitiveFiles, configFiles } = await req.json()
+    const { owner, repo, files, stack, dependencies, sensitiveFiles, configFiles, agentInstructionFiles } = await req.json()
    const baseUrl = process.env.SHARED_LLM_BASE_URL
    const apiKey = process.env.SHARED_LLM_API_KEY

@@ -33,6 +33,7 @@ export async function POST(req: NextRequest) {
      .replace('{{dependencies}}', Object.keys(dependencies || {}).slice(0, 40).join(', ') || 'None detected')
      .replace('{{sensitiveFiles}}', sensitiveFiles.join(', ') || 'None')
      .replace('{{configFiles}}', configFiles.join(', ') || 'None')
+      .replace('{{agentInstructionFiles}}', (agentInstructionFiles || []).join(', ') || 'None')

    let endpoint = baseUrl.replace(/\/+$/, '')
    endpoint = endpoint.replace(/\/v1$/, '')
--- a/app/greyscan/page.tsx
+++ b/app/greyscan/page.tsx
@@ -3,7 +3,7 @@
 import { useState, useEffect, useRef } from 'react'
 import {
  Shield, AlertTriangle, Globe, FolderOpen, Terminal,
-  ArrowLeft, Copy, Check, ArrowRight, Lock, Eye,
+  ArrowLeft, Copy, Check, ArrowRight, Lock, Eye, MessageSquareWarning,
 } from 'lucide-react'

 // --- Types ---
@@ -50,6 +50,16 @@ const CONFIG_PATTERNS = [
  'netlify.toml', 'terraform/', 'k8s/', '.aws/', 'Makefile',
 ]

+const AGENT_INSTRUCTION_PATTERNS = [
+  'CLAUDE.md', '.claude/', '.claude/commands/', '.claude/agents/',
+  '.cursorrules', '.cursorignore', '.cursor/',
+  '.github/copilot-instructions.md',
+  'AGENTS.md', '.mcp.json', 'mcp.json',
+  '.aider.conf.yml', '.aiderignore',
+  'CONVENTIONS.md', 'CONTRIBUTING.md',
+  '.windsurfrules', '.boltrules', '.clinerules',
+]
+
 const STACK_DETECT: [string, (f: string[]) => boolean][] = [
  ['Node.js', f => f.some(x => x === 'package.json')],
  ['TypeScript', f => f.some(x => x === 'tsconfig.json' || x.endsWith('.ts') || x.endsWith('.tsx'))],
@@ -123,6 +133,7 @@ function typeIcon(t: string) {
  if (t === 'network_call') return <Globe className="h-4 w-4" />
  if (t === 'directory_access') return <FolderOpen className="h-4 w-4" />
  if (t === 'command_execution') return <Terminal className="h-4 w-4" />
+  if (t === 'prompt_injection') return <MessageSquareWarning className="h-4 w-4" />
  return <Eye className="h-4 w-4" />
 }

@@ -238,6 +249,20 @@ export default function GamePage() {
      // Config files
      const configs = files.filter(f => CONFIG_PATTERNS.some(p => f.includes(p))).slice(0, 15)

+      // Agent instruction files (prompt injection surface)
+      addLine('Scanning for agent instruction files...', 'info')
+      await delay(200)
+      const agentFiles = files.filter(f => {
+        const name = f.split('/').pop() || f
+        return AGENT_INSTRUCTION_PATTERNS.some(p => name === p || f.includes(p))
+      })
+      for (const f of agentFiles.slice(0, 8)) {
+        addLine(f, 'warning')
+        await delay(100)
+      }
+      if (agentFiles.length === 0) addLine('No agent instruction files found', 'success')
+      await delay(200)
+
      // Fetch dependencies
      addLine('Analyzing dependencies...', 'info')
      await delay(200)
@@ -276,7 +301,7 @@ export default function GamePage() {
      const res = await fetch('/api/analyze', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
-        body: JSON.stringify({ owner, repo, files, stack, dependencies: deps, sensitiveFiles: sensitive, configFiles: configs }),
+        body: JSON.stringify({ owner, repo, files, stack, dependencies: deps, sensitiveFiles: sensitive, configFiles: configs, agentInstructionFiles: agentFiles }),
      })

      if (!res.ok) {
@@ -362,7 +387,7 @@ export default function GamePage() {
              </h1>

              <p className="text-muted-foreground font-serif text-base sm:text-lg leading-relaxed mb-10 max-w-xl mx-auto">
-                AI agents run as you, with access to everything you have. Paste a repo URL and we&apos;ll show what an unrestricted agent would attempt on your machine when working on that codebase.
+                AI agents run as you, with access to everything you have. Paste a repo and see what an unrestricted agent could attempt. This is not a security audit, it&apos;s a wake-up call.
              </p>

              <form
@@ -569,8 +594,11 @@ export default function GamePage() {

                {/* CTA */}
                <div className="px-6 sm:px-8 py-6 sm:py-8 border-t border-border/20 bg-card/20 text-center">
+                  <p className="text-xs text-muted-foreground/50 font-sans mb-4">
+                    This is not a security certification. It&apos;s a demonstration of what&apos;s possible without a sandbox.
+                  </p>
                  <p className="font-serif text-lg sm:text-xl font-semibold tracking-tight mb-2">
-                    This is what Greywall would have blocked.
+                    Greywall blocks this by default.
                  </p>
                  <p className="text-xs text-muted-foreground font-serif mb-5">
                    Container-free sandboxing with real-time observability for AI agents.