feat: better framing tied to prompt injection
This commit is contained in:
@@ -38,6 +38,8 @@ These files are a prompt injection surface. A malicious contributor, compromised
|
|||||||
- Override safety behaviors the agent would normally follow
|
- Override safety behaviors the agent would normally follow
|
||||||
- Instruct the agent to ignore security warnings
|
- Instruct the agent to ignore security warnings
|
||||||
|
|
||||||
|
IMPORTANT: A sandbox like Greywall does NOT prevent prompt injection. The agent will still read these files and follow the instructions. What a sandbox does is contain the blast radius: even if the agent is hijacked, it can't exfiltrate data (network blocked), can't read secrets (filesystem denied), can't run destructive commands (command blocked). Prompt injection findings should reflect this nuance: the risk is that the agent's behavior is manipulated, and without a sandbox the manipulated agent has unrestricted access.
|
||||||
|
|
||||||
If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface.
|
If agent instruction files exist in this repo, this is a SIGNIFICANT finding. The more instruction files present, the larger the attack surface.
|
||||||
|
|
||||||
SEVERITY CONTEXT:
|
SEVERITY CONTEXT:
|
||||||
|
|||||||
@@ -584,6 +584,11 @@ export default function GamePage() {
|
|||||||
<span className={`text-[10px] font-sans font-medium uppercase tracking-wider ${severityColor(finding.severity)}`}>
|
<span className={`text-[10px] font-sans font-medium uppercase tracking-wider ${severityColor(finding.severity)}`}>
|
||||||
{finding.severity}
|
{finding.severity}
|
||||||
</span>
|
</span>
|
||||||
|
{finding.type === 'prompt_injection' && (
|
||||||
|
<span className="text-[10px] font-sans text-muted-foreground/50 uppercase tracking-wider">
|
||||||
|
· sandbox limits damage
|
||||||
|
</span>
|
||||||
|
)}
|
||||||
</div>
|
</div>
|
||||||
<h3 className="text-sm font-sans font-medium text-foreground mb-1">
|
<h3 className="text-sm font-sans font-medium text-foreground mb-1">
|
||||||
{finding.title}
|
{finding.title}
|
||||||
@@ -606,10 +611,14 @@ export default function GamePage() {
|
|||||||
This is not a security certification. It's a demonstration of what's possible without a sandbox.
|
This is not a security certification. It's a demonstration of what's possible without a sandbox.
|
||||||
</p>
|
</p>
|
||||||
<p className="font-serif text-lg sm:text-xl font-semibold tracking-tight mb-2">
|
<p className="font-serif text-lg sm:text-xl font-semibold tracking-tight mb-2">
|
||||||
Greywall blocks this by default.
|
{report.findings.some(f => f.type === 'prompt_injection')
|
||||||
|
? 'Greywall limits what a hijacked agent can actually do.'
|
||||||
|
: 'Greywall blocks this by default.'}
|
||||||
</p>
|
</p>
|
||||||
<p className="text-xs text-muted-foreground font-serif mb-5">
|
<p className="text-xs text-muted-foreground font-serif mb-5">
|
||||||
Container-free sandboxing with real-time observability for AI agents.
|
{report.findings.some(f => f.type === 'prompt_injection')
|
||||||
|
? 'A sandbox can\'t prevent prompt injection, but it ensures a hijacked agent can\'t read secrets, call APIs, or exfiltrate data.'
|
||||||
|
: 'Container-free sandboxing with real-time observability for AI agents.'}
|
||||||
</p>
|
</p>
|
||||||
<a
|
<a
|
||||||
href="https://github.com/GreyhavenHQ/greywall"
|
href="https://github.com/GreyhavenHQ/greywall"
|
||||||
|
|||||||
Reference in New Issue
Block a user