diff --git a/app/api/analyze/prompt.txt b/app/api/analyze/prompt.txt index cfabfaf..5d999f7 100644 --- a/app/api/analyze/prompt.txt +++ b/app/api/analyze/prompt.txt @@ -38,6 +38,8 @@ These files are a prompt injection surface. A malicious contributor, compromised - Override safety behaviors the agent would normally follow - Instruct the agent to ignore security warnings +IMPORTANT: A sandbox like Greywall does NOT prevent prompt injection. The agent will still read these files and follow the instructions. What a sandbox does is contain the blast radius: even if the agent is hijacked, it can't exfiltrate data (network blocked), can't read secrets (filesystem denied), can't run destructive commands (command blocked). Prompt injection findings should reflect this nuance: the risk is that the agent's behavior is manipulated, and without a sandbox the manipulated agent has unrestricted access. + If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface. SEVERITY CONTEXT: diff --git a/app/greyscan/page.tsx b/app/greyscan/page.tsx index 2d4c492..e2c34d7 100644 --- a/app/greyscan/page.tsx +++ b/app/greyscan/page.tsx @@ -584,6 +584,11 @@ export default function GamePage() { {finding.severity} + {finding.type === 'prompt_injection' && ( + + · sandbox limits damage + + )}
- Greywall blocks this by default. + {report.findings.some(f => f.type === 'prompt_injection') + ? 'Greywall limits what a hijacked agent can actually do.' + : 'Greywall blocks this by default.'}
- Container-free sandboxing with real-time observability for AI agents. + {report.findings.some(f => f.type === 'prompt_injection') + ? 'A sandbox can\'t prevent prompt injection, but it ensures a hijacked agent can\'t read secrets, call APIs, or exfiltrate data.' + : 'Container-free sandboxing with real-time observability for AI agents.'}