feat: better framing tied to prompt injection

2026-03-18 15:35:34 -04:00
parent 616b3139e0
commit bb0ea229e4
2 changed files with 13 additions and 2 deletions
--- a/app/api/analyze/prompt.txt
+++ b/app/api/analyze/prompt.txt
@@ -38,6 +38,8 @@ These files are a prompt injection surface. A malicious contributor, compromised
 - Override safety behaviors the agent would normally follow
 - Instruct the agent to ignore security warnings

+IMPORTANT: A sandbox like Greywall does NOT prevent prompt injection. The agent will still read these files and follow the instructions. What a sandbox does is contain the blast radius: even if the agent is hijacked, it can't exfiltrate data (network blocked), can't read secrets (filesystem denied), can't run destructive commands (command blocked). Prompt injection findings should reflect this nuance: the risk is that the agent's behavior is manipulated, and without a sandbox the manipulated agent has unrestricted access.
+
 If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface.

 SEVERITY CONTEXT:
--- a/app/greyscan/page.tsx
+++ b/app/greyscan/page.tsx
@@ -584,6 +584,11 @@ export default function GamePage() {
                            <span className={`text-[10px] font-sans font-medium uppercase tracking-wider ${severityColor(finding.severity)}`}>
                              {finding.severity}
                            </span>
+                            {finding.type === 'prompt_injection' && (
+                              <span className="text-[10px] font-sans text-muted-foreground/50 uppercase tracking-wider">
+                                &middot; sandbox limits damage
+                              </span>
+                            )}
                          </div>
                          <h3 className="text-sm font-sans font-medium text-foreground mb-1">
                            {finding.title}
@@ -606,10 +611,14 @@ export default function GamePage() {
                    This is not a security certification. It&apos;s a demonstration of what&apos;s possible without a sandbox.
                  </p>
                  <p className="font-serif text-lg sm:text-xl font-semibold tracking-tight mb-2">
-                    Greywall blocks this by default.
+                    {report.findings.some(f => f.type === 'prompt_injection')
+                      ? 'Greywall limits what a hijacked agent can actually do.'
+                      : 'Greywall blocks this by default.'}
                  </p>
                  <p className="text-xs text-muted-foreground font-serif mb-5">
-                    Container-free sandboxing with real-time observability for AI agents.
+                    {report.findings.some(f => f.type === 'prompt_injection')
+                      ? 'A sandbox can\'t prevent prompt injection, but it ensures a hijacked agent can\'t read secrets, call APIs, or exfiltrate data.'
+                      : 'Container-free sandboxing with real-time observability for AI agents.'}
                  </p>
                  <a
                    href="https://github.com/GreyhavenHQ/greywall"