feat: better framing tied to prompt injection

2026-03-18 15:35:34 -04:00
parent 616b3139e0
commit bb0ea229e4
2 changed files with 13 additions and 2 deletions
--- a/app/api/analyze/prompt.txt
+++ b/app/api/analyze/prompt.txt
@@ -38,6 +38,8 @@ These files are a prompt injection surface. A malicious contributor, compromised
 - Override safety behaviors the agent would normally follow
 - Instruct the agent to ignore security warnings

+IMPORTANT: A sandbox like Greywall does NOT prevent prompt injection. The agent will still read these files and follow the instructions. What a sandbox does is contain the blast radius: even if the agent is hijacked, it can't exfiltrate data (network blocked), can't read secrets (filesystem denied), can't run destructive commands (command blocked). Prompt injection findings should reflect this nuance: the risk is that the agent's behavior is manipulated, and without a sandbox the manipulated agent has unrestricted access.
+
 If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface.

 SEVERITY CONTEXT: