Archived

This repository has been archived on 2026-03-13. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Mathieu Virbel 4ea4592d75 docs: add macOS learning mode analysis with fs_usage approach

Document fs_usage as a viable alternative to strace for macOS
--learning mode. SIP blocks all dtrace-based tools (dtrace, dtruss,
opensnoop) even with sudo, but fs_usage uses the kdebug kernel
facility which is unaffected. Requires admin access only for the
passive monitor process — the sandboxed command stays unprivileged.

2026-02-22 19:07:30 -06:00

75 KiB

Raw Blame History

Greywall Sandboxing Architecture — Deep Analysis

Overview

Greywall is a sandboxing layer that wraps commands in restrictive environments. It blocks network access by default (allowlist-based), restricts filesystem operations, and controls command execution. It supports Linux (bubblewrap + seccomp + Landlock + eBPF) and macOS (sandbox-exec / Seatbelt SBPL profiles).

Part 1: How Components Work Together (Linux)

The Problem Each Component Solves

There are five distinct security concerns. No single Linux technology can address all of them, which is why Greywall composes multiple layers:

Security Concern	Technology	Why This One?
Process/network isolation	Bubblewrap (namespaces)	Only namespaces can create a truly isolated network stack and PID space
Filesystem visibility	Bubblewrap (mount namespace)	Only mount namespaces can make files literally invisible (not mounted = doesn't exist)
Filesystem access rights	Landlock (LSM)	Only a kernel LSM can enforce access rights that survive mount misconfiguration
Dangerous syscall blocking	Seccomp BPF	Only seccomp can block specific system calls (ptrace, mount, reboot)
Violation visibility	eBPF (bpftrace)	Only kernel tracing can observe denied operations across all layers

Why No Single Layer Is Sufficient

Why can't Bubblewrap do everything? Bubblewrap controls what's visible in the filesystem (mount-time). But once a file IS mounted, bwrap has no say in what operations are performed on it. A read-only bind mount prevents writes, but bwrap cannot block ptrace, mount, or reboot syscalls — those aren't filesystem operations. And if a mount is misconfigured (edge case with symlinks, race conditions), bwrap alone provides no fallback.

Why can't Seccomp do everything? Seccomp filters syscalls by number and (optionally) argument values. It can block ptrace(101) or mount(165), but it cannot make path-based decisions. Seccomp sees openat(fd, "path", flags) but cannot evaluate whether "path" resolves to /home/user/.ssh/id_rsa or /tmp/safe.txt — that would require following the kernel's path resolution logic inside a BPF program, which is impossible. Seccomp is blind to filesystem semantics.

Why can't Landlock do everything? Landlock is a filesystem-only LSM. It controls READ_FILE, WRITE_FILE, EXECUTE, MAKE_DIR, etc. But it has zero knowledge of network operations, process tracing (ptrace), kernel module loading (init_module), or system control (reboot). Landlock also cannot isolate namespaces — it restricts access within the current namespace, it doesn't create new ones.

Why can't eBPF do everything? eBPF (as used here via bpftrace) is observation-only. It attaches to tracepoints at syscall exit and reads return values. It cannot block, modify, or deny any operation. Its purpose is to report violations after they've been caught by other layers. (eBPF can enforce policy via LSM hooks in newer kernels, but Greywall uses bpftrace for monitoring, not enforcement.)

Component Interaction Diagram

                    ┌─────────────────────────────────────────────────────────┐
                    │                    THREAT                               │
                    │  Sandboxed process attempts forbidden action            │
                    └──────────────┬──────────────────────────────────────────┘
                                   │
        ┌──────────────────────────┼──────────────────────────────┐
        │                          │                              │
        ▼                          ▼                              ▼
  ┌───────────┐          ┌──────────────┐              ┌──────────────────┐
  │ Filesystem │          │   Syscall    │              │    Network       │
  │  access    │          │   (ptrace,   │              │    connection    │
  │            │          │   mount...)  │              │                  │
  └─────┬─────┘          └──────┬───────┘              └────────┬─────────┘
        │                       │                               │
        ▼                       ▼                               ▼
  ┌───────────┐          ┌──────────────┐              ┌──────────────────┐
  │ Bubblewrap │          │   Seccomp    │              │   Bubblewrap     │
  │ Mount NS   │          │   BPF        │              │   Network NS     │
  │            │          │              │              │   (--unshare-net)│
  │ File not   │          │ Syscall #    │              │                  │
  │ mounted?   │          │ in blocklist?│              │ Isolated stack,  │
  │ → ENOENT   │          │ → EPERM      │              │ no host network  │
  └─────┬─────┘          └──────┬───────┘              └────────┬─────────┘
        │ (file IS mounted)     │                               │
        ▼                       │                               ▼
  ┌───────────┐                 │                      ┌──────────────────┐
  │ Landlock   │                 │                      │ tun2socks        │
  │ (kernel)   │                 │                      │ TUN device       │
  │            │                 │                      │                  │
  │ Has access │                 │                      │ All traffic →    │
  │ right?     │                 │                      │ SOCKS5 proxy     │
  │ → EACCES   │                 │                      │ (allowlist)      │
  └─────┬─────┘                 │                      └────────┬─────────┘
        │                       │                               │
        └───────────┬───────────┘───────────────────────────────┘
                    │
                    ▼
            ┌──────────────┐
            │   eBPF       │
            │   (bpftrace) │
            │              │
            │  Observes    │
            │  EACCES,     │
            │  EPERM,      │
            │  EROFS       │
            │  returns     │
            │              │
            │  → Logs to   │
            │    stderr    │
            └──────────────┘

Execution Timeline

This is the exact order of operations when greywall -- <command> runs:

PHASE 1: PRE-SANDBOX (on host)
─────────────────────────────────────────────────────────────────
1. Command blocking engine checks command against deny/allow lists
2. Environment sanitization strips LD_PRELOAD, LD_LIBRARY_PATH, etc.
3. ProxyBridge starts: socat creates Unix socket → external SOCKS5
4. DnsBridge starts: socat creates Unix socket → host DNS server
5. ReverseBridge starts: socat listens on exposed ports → Unix sockets
6. Seccomp BPF filter generated and written to temp file
7. Bubblewrap arguments assembled (mounts, namespaces, capabilities)
8. Inner bash script generated (network setup + command execution)

PHASE 2: SANDBOX CREATION (bwrap)
─────────────────────────────────────────────────────────────────
9.  bwrap creates new PID namespace (--unshare-pid)
10. bwrap creates new network namespace (--unshare-net)
11. bwrap sets up filesystem:
    - --tmpfs / (empty root) OR --ro-bind / / (read-only root)
    - System paths mounted read-only (/usr, /bin, /lib, /etc...)
    - CWD mounted read-write
    - /dev mounted with --dev-bind
    - /proc mounted fresh
    - /tmp as tmpfs
    - .env files masked with /dev/null bind mount
    - Protected files forced read-only
    - Unix sockets from bridges bind-mounted in
12. bwrap loads seccomp BPF filter (via fd 3 → --seccomp 3)
13. bwrap drops capabilities (except CAP_NET_ADMIN if proxy enabled)
14. bwrap executes inner bash script

PHASE 3: INNER SCRIPT (inside sandbox)
─────────────────────────────────────────────────────────────────
15. Script brings up loopback: ip link set lo up
16. Script creates TUN device: ip tuntap add dev tun0 mode tun
17. Script configures routing: ip route add default via 198.18.0.1
18. Script starts socat (localhost:18321 → proxy Unix socket)
19. Script starts tun2socks (TUN → SOCKS5 on localhost:18321)
20. Script configures DNS (socat relay or resolv.conf)
21. Script starts reverse bridge listeners (socat for each port)
22. Script waits 0.3s for services to initialize

PHASE 4: LANDLOCK APPLICATION (inside sandbox)
─────────────────────────────────────────────────────────────────
23. greywall re-executes: greywall --landlock-apply -- bash -c "<cmd>"
24. Reads config from GREYWALL_CONFIG_JSON env var
25. Sets PR_SET_NO_NEW_PRIVS (required for Landlock)
26. Creates Landlock ruleset (filesystem access rights bitmask)
27. Adds PATH_BENEATH rules for each allowed path
28. Applies LANDLOCK_RESTRICT_SELF (irrevocable)
29. syscall.Exec() replaces process with user command

PHASE 5: COMMAND EXECUTION (fully sandboxed)
─────────────────────────────────────────────────────────────────
30. User command runs with ALL layers active simultaneously:
    - Network: isolated namespace, traffic through TUN → proxy
    - Filesystem: bwrap mounts (visibility) + Landlock (access rights)
    - Syscalls: seccomp BPF blocking dangerous calls
    - Environment: sanitized (no LD_PRELOAD etc.)

PHASE 6: MONITORING (parallel, on host)
─────────────────────────────────────────────────────────────────
31. eBPF monitor started AFTER sandbox process begins
32. bpftrace attaches to syscall exit tracepoints
33. Filters for EACCES/EPERM/EROFS returns from sandbox PIDs
34. Logs violations to stderr in real-time

PHASE 7: CLEANUP (on host, after command exits)
─────────────────────────────────────────────────────────────────
35. eBPF monitor stopped (bpftrace killed)
36. ReverseBridge stopped (socat killed, sockets removed)
37. DnsBridge stopped (socat killed, socket removed)
38. ProxyBridge stopped (socat killed, socket removed)
39. tun2socks binary removed from /tmp
40. Seccomp filter file removed

What Catches What — Attack Scenarios

Attack	1st defense	2nd defense	3rd defense	Reported by
Read `~/.ssh/id_rsa`	bwrap: file not mounted → `ENOENT`	Landlock: no `READ_FILE` right → `EACCES`	—	eBPF: logs `EACCES`
Write to `.env`	bwrap: masked with `/dev/null` bind → writes go nowhere	Landlock: no `WRITE_FILE` right → `EACCES`	—	eBPF: logs `EACCES`
`curl evil.com`	bwrap: `--unshare-net` → no host network	tun2socks: routes through proxy → proxy denies	—	eBPF: logs `ECONNREFUSED`
`ptrace(pid)`	seccomp: syscall 101 blocked → `EPERM`	—	—	eBPF: logs `EPERM`
`mount /dev/sda /mnt`	seccomp: syscall 165 blocked → `EPERM`	—	—	eBPF: logs `EPERM`
`reboot` (inside sandbox)	seccomp: syscall 169 blocked → `EPERM`	—	—	eBPF: logs `EPERM`
`LD_PRELOAD=evil.so cmd`	Env sanitization: `LD_PRELOAD` stripped before sandbox starts	—	—	—
`greywall -- "git push"`	Command blocker: denied before sandbox created (caller-side only)	—	—	—
`git push` (inside sandbox)	Network namespace: no outbound connectivity	tun2socks: proxy denies	—	eBPF: logs `ECONNREFUSED`
Move `.bashrc` elsewhere	bwrap: `.bashrc` mounted read-only → `EROFS`	Landlock: no `REMOVE_FILE` right	—	eBPF: logs `EROFS`
Create file in `/etc`	bwrap: `/etc` mounted `--ro-bind` → `EROFS`	Landlock: no `MAKE_REG` right → `EACCES`	—	eBPF: logs `EROFS`

The Layering Principle

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  LAYER 0: CALLER-SIDE PRE-FILTER (before sandbox)            │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ Command blocking: rejects the TOP-LEVEL command        │  │
│  │   string passed to greywall by the caller (e.g., an   │  │
│  │   AI agent framework). Does NOT intercept commands     │  │
│  │   executed inside the sandbox by child processes.      │  │
│  │ Env sanitization: strip LD_PRELOAD, DYLD_* etc.       │  │
│  │   from the environment before launching the sandbox.   │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  LAYER 1: ISOLATION (bwrap namespaces)                       │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ What it does: Creates a separate world                 │  │
│  │ Filesystem: Only mounted files are visible             │  │
│  │ Network: Separate network stack, no host access        │  │
│  │ PID: Can't see/signal host processes                   │  │
│  │                                                        │  │
│  │ Analogy: Putting process in a room with only           │  │
│  │ selected items. Items not in the room don't exist.     │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  LAYER 2: ENFORCEMENT (seccomp + Landlock)                   │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ Seccomp: "You cannot USE these capabilities"           │  │
│  │  → Blocks ptrace, mount, reboot, kernel module load    │  │
│  │  → Operates on syscall numbers (capability-based)      │  │
│  │  → Cannot make path-based decisions                    │  │
│  │                                                        │  │
│  │ Landlock: "You cannot ACCESS these paths/operations"   │  │
│  │  → Controls read/write/execute/create/delete per path  │  │
│  │  → Operates on filesystem paths (resource-based)       │  │
│  │  → Cannot block non-filesystem syscalls                │  │
│  │                                                        │  │
│  │ Together: Seccomp blocks dangerous capabilities,       │  │
│  │ Landlock restricts resource access. Orthogonal.        │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  LAYER 3: NETWORK CONTROL (tun2socks + bridges)              │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ What it does: Routes ALL traffic through SOCKS5 proxy  │  │
│  │ TUN device captures every packet (transparent)         │  │
│  │ Proxy applies allowlist (domain/IP filtering)          │  │
│  │ DNS either bridged to host or forced through proxy     │  │
│  │                                                        │  │
│  │ Why not just bwrap --unshare-net?                      │  │
│  │ → That blocks ALL network. We need selective access.   │  │
│  │ → tun2socks re-enables controlled network via proxy.   │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
│  LAYER 4: OBSERVATION (eBPF)                                 │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ What it does: Watches for denied operations            │  │
│  │ Doesn't block anything — reports what was blocked      │  │
│  │ Catches EACCES/EPERM/EROFS from ANY layer              │  │
│  │                                                        │  │
│  │ Why needed: Without monitoring, violations are silent. │  │
│  │ eBPF tells you WHAT was blocked and WHY.               │  │
│  │ Essential for debugging sandbox configs.               │  │
│  └────────────────────────────────────────────────────────┘  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Seccomp vs Landlock — Why Both?

These two enforcement layers are orthogonal — they protect against completely different threat classes:

                    Seccomp BPF                 Landlock
                    ───────────                 ────────
Domain:             Syscall numbers             Filesystem paths
Question asked:     "Is this syscall allowed?"  "Can this path be accessed?"
Granularity:        Per-syscall                 Per-path + per-operation
Can block ptrace:   YES                         NO (not a filesystem op)
Can block mount:    YES                         NO (not a filesystem op)
Can block reboot:   YES                         NO
Can block read():   NO (too common)             YES (per-path)
Can block write():  NO (too common)             YES (per-path)
Can block mkdir():  NO (too common)             YES (per-path)
Can block rename:   NO                          YES (REFER right, ABI v2+)
Runs at:            Syscall entry               VFS operation
Fallback:           Returns EPERM               Returns EACCES

Seccomp answers: "Should this process be able to call ptrace() at all?" Landlock answers: "Should this process be able to read /home/user/.ssh/id_rsa?"

They have zero overlap in what they protect. Using only one leaves an entire class of attacks unaddressed.

Bubblewrap vs Landlock — Why Both?

These two have overlapping filesystem protection, deliberately:

                    Bubblewrap mounts            Landlock
                    ─────────────────            ────────
Mechanism:          Mount-time visibility        Runtime access control
When applied:       Before process starts        After process starts
Failure mode:       File doesn't exist           File exists but EACCES
Granularity:        Whole directories             Individual files + operations
Symlink handling:   Resolves before mounting     Kernel handles
Race conditions:    Possible (mount vs access)   None (kernel-enforced)
Edge cases:         /etc/resolv.conf symlinks,   None known
                    cross-mount boundaries

Why Landlock exists as a backup to bwrap:

Mount misconfiguration: If bwrap accidentally makes a file visible (symlink edge case, mount order issue), Landlock still denies access
Operation granularity: bwrap can make a file visible but read-only. Landlock can additionally block EXECUTE, REMOVE, TRUNCATE separately
Defense in depth: Two independent mechanisms with different failure modes. Both must fail for access to be granted

Real example from the code: .env files are masked with --ro-bind /dev/null .env in bwrap. But the Landlock layer ALSO denies READ_FILE on those paths. If the bwrap mask somehow fails (race condition, mount order), Landlock catches it.

Part 2: macOS Architecture

What macOS Has (Single-Layer Model)

macOS uses Apple's Seatbelt (sandbox-exec) — a single, comprehensive policy engine:

┌──────────────────────────────────────────────────────────────┐
│              User Command (sandboxed)                         │
├──────────────────────────────────────────────────────────────┤
│  Seatbelt SBPL profile (sandbox-exec)                        │
│                                                              │
│   ┌─ Filesystem rules ──────────────────────────────────┐    │
│   │  file-read-data, file-write-data, file-write-unlink │    │
│   │  (deny default) + explicit allows per path          │    │
│   └─────────────────────────────────────────────────────┘    │
│   ┌─ Network rules ─────────────────────────────────────┐    │
│   │  network-outbound, network-inbound, network-bind    │    │
│   │  IP:port based filtering                            │    │
│   └─────────────────────────────────────────────────────┘    │
│   ┌─ IPC rules ─────────────────────────────────────────┐    │
│   │  mach-lookup (XPC service allowlist)                │    │
│   │  ~20 essential macOS services whitelisted           │    │
│   └─────────────────────────────────────────────────────┘    │
│   ┌─ Hardware rules ────────────────────────────────────┐    │
│   │  iokit-open, iokit-get-properties                   │    │
│   │  sysctl-read, sysctl-write                          │    │
│   └─────────────────────────────────────────────────────┘    │
│   ┌─ Log tagging ───────────────────────────────────────┐    │
│   │  CMD64_<base64>_END_<session> in deny messages      │    │
│   │  Violations go to system.log                        │    │
│   └─────────────────────────────────────────────────────┘    │
│                                                              │
├──────────────────────────────────────────────────────────────┤
│  Environment sanitization (DYLD_* stripped)                   │
├──────────────────────────────────────────────────────────────┤
│  Command blocking engine (shared with Linux)                 │
├──────────────────────────────────────────────────────────────┤
│  Host macOS kernel (MACF hooks enforce Seatbelt)             │
└──────────────────────────────────────────────────────────────┘

Seatbelt is enforced by MACF (Mandatory Access Control Framework) kernel hooks — the same infrastructure that enforces SIP. In some ways, it's architecturally similar to Linux's Landlock (kernel-level, path-based), but it covers more domains (network, IPC, hardware) in a single profile rather than requiring separate technologies.

What Seatbelt Already Covers (vs Linux Equivalent)

Linux component	macOS Seatbelt equivalent	Coverage
Landlock filesystem control	`file-read-data`, `file-write-data`, `file-write-unlink`	Equivalent
Seccomp (partially)	`process-exec`, `process-fork`, `signal`, `mach-lookup`	Partial — blocks at operation level, not raw syscall level
Network namespace rules	`network-outbound`, `network-inbound`, `network-bind`	Similar — IP/port filtering, but no true isolation
No equivalent on Linux	`mach-lookup` (XPC service control)	macOS-only — controls which system services are accessible
No equivalent on Linux	`iokit-open`, `sysctl-read/write`	macOS-only — hardware and kernel parameter access

Key insight: Seatbelt is not "weaker" than any single Linux component — it's a different architecture. It's one comprehensive policy engine instead of multiple composable ones. The trade-off is: less defense-in-depth (single point of failure) but broader coverage per layer (filesystem + network + IPC + hardware in one profile).

Part 3: Security Level Comparison

Per-Threat Security Rating

Threat	Linux	macOS	Gap
Read unauthorized files	●●●●●	●●●○○	Linux: 2 layers (bwrap+Landlock). macOS: 1 layer (Seatbelt). No backup if Seatbelt misconfigured
Write to protected files	●●●●●	●●●●○	Linux: 2 layers + .env masking. macOS: Seatbelt + unlink blocking. Close, but single layer
Exfiltrate data via network	●●●●●	●●○○○	Linux: namespace isolation + transparent proxy (catches ALL traffic). macOS: env var proxy (apps can ignore it)
DNS exfiltration	●●●●○	●○○○○	Linux: DNS bridge with filtering. macOS: no DNS control at all
Dangerous syscalls (ptrace, mount)	●●●●●	●●●○○	Linux: seccomp blocks 24 syscalls. macOS: SIP blocks some, Seatbelt blocks some process ops, but no explicit syscall filtering
Library injection (LD_PRELOAD)	●●●●●	●●●●●	Both strip dangerous env vars. macOS additionally has SIP preventing DYLD injection on system binaries
Process visibility	●●●●●	●○○○○	Linux: PID namespace. macOS: no PID isolation (can see all processes)
Command execution control	●●●●●	●●●●●	Same engine on both platforms
SSH command control	●●●●●	●●●●●	Same engine on both platforms
Protected file awareness (.env, .gitconfig)	●●●●●	●●●●●	Same lists, same protection on both
Violation detection/reporting	●●●●○	●●○○○	Linux: eBPF real-time monitoring. macOS: violations in system.log, not captured programmatically
Config auto-generation (learning)	●●●●●	○○○○○	Linux: strace-based learning. macOS: not implemented. Two viable approaches: (1) Seatbelt `(allow (with report))` + `log stream` (no root), (2) `sudo fs_usage` (requires admin, more reliable — not dtrace-based, unaffected by SIP)
IPC control	●●○○○	●●●●●	macOS: explicit Mach IPC allowlist. Linux: no equivalent (D-Bus not filtered)
Sandbox escape via child process	●●●●●	●●●○○	Linux: namespace inherits to all children. macOS: sandbox-exec profile inherits, but no namespace boundary

Overall Security Posture

Linux:   ████████████████████░░  ~90% (multi-layer, defense in depth)
macOS:   ████████████░░░░░░░░░░  ~60% (single comprehensive layer, gaps in network/monitoring/learning)

The 30% gap breaks down as:

~10% Network control (no transparent proxy, no DNS filtering)
~8% No learning mode (can't auto-generate configs)
~5% No violation monitoring (can't see what's being blocked)
~4% No PID isolation (can see host processes)
~3% No defense-in-depth for filesystem (single Seatbelt layer)

Part 4: Closing the macOS Gap (Constraint: No Root, No Containers)

Hard Constraints

These are non-negotiable requirements for any macOS solution:

No root/sudo/admin access — must run as a regular user
No containerization — process runs on the host (macOS 26 Linux containers are out of scope)
SIP enabled — default macOS, cannot ask users to disable it
All traffic must be captured — same fail-closed behavior as Linux

Tool-by-Tool Privilege Audit

Every macOS tracing/filtering tool was evaluated against the "no root" constraint:

Tool	Requires root?	Requires admin?	Works as user?	Verdict
`sandbox-exec` (Seatbelt)	No	No	Yes	Current approach, works
`log stream` (violation monitoring)	No	No	Yes	Viable for monitoring
`fs_usage` (filesystem tracing)	Yes	—	No	Blocked without sudo. Viable for learning mode with admin access — not dtrace-based, unaffected by SIP. See "macOS Learning Mode via fs_usage" section.
`eslogger` (Endpoint Security)	Yes + Full Disk Access	—	No	Blocked
`dtrace` / `dtruss` / `opensnoop`	Yes + SIP blocks entirely	—	No	Blocked even with sudo — SIP disables the `syscall` dtrace provider system-wide, not just for SIP-protected binaries. All three tools are dtrace-based and fail identically.
`pfctl` / pf rules	Yes	—	No	Blocked
Network Extensions	No	Yes (system dialog)	No	Blocked
Endpoint Security framework	Yes + Apple-restricted entitlement	—	No	Blocked
TUN/TAP devices	Yes	—	No	Blocked
`DYLD_INSERT_LIBRARIES`	No	No	Partial	Works for non-hardened binaries only

Only three mechanisms work as a regular user: sandbox-exec, log stream, and DYLD_INSERT_LIBRARIES (with caveats).

1. Learning Mode

Why traditional tracing tools are all blocked

Tool	Why it fails without root
`fs_usage`	Requires root. Uses kernel tracing facility (`kdebug`). "Permission denied" without sudo.
`eslogger`	Requires root + Full Disk Access. Endpoint Security API enforces `ES_NEW_CLIENT_RESULT_ERR_NOT_PRIVILEGED`.
`dtrace` / `dtruss`	Requires root. With SIP enabled, even root cannot trace SIP-protected system binaries (`/bin/sh`, `/usr/bin/env`).
`opensnoop`	Requires root. Built on DTrace — same limitation.
Endpoint Security framework	Requires `com.apple.developer.endpoint-security.client` restricted entitlement (Apple approval, months-long process) AND root.
FileMonitor (Objective-See)	Built on Endpoint Security — same entitlement + root requirement.
FSEvents API	Works as user BUT has no per-process attribution — it reports filesystem changes without recording which process caused them. Useless for learning.
kqueue `EVFILT_VNODE`	Works as user for monitoring writes/deletes/renames, BUT cannot detect reads/opens (no `NOTE_OPEN` or `NOTE_ACCESS` flag). Also no per-process attribution.

There is no standalone filesystem tracing tool on macOS that works without root. Apple considers process tracing a privileged operation.

Viable approach: Seatbelt `(allow ... (with report))` + `log stream`

This is positive learning — like strace — not iterative reverse learning. The process runs once, to completion, and greywall captures what it accessed.

How it works:

The SBPL language supports a (with report) modifier on allow rules. By default, only denied operations are logged. Adding (with report) causes the sandbox kernel to also log permitted operations to the macOS unified log. These entries are readable via log stream without root.

; Learning profile — allow everything, report everything
(version 1)
(allow default)  ; permissive baseline

; Report all file reads
(allow file-read-data (with report) (subpath "/"))

; Report all file writes
(allow file-write-data (with report) (subpath "/"))
(allow file-write-create (with report) (subpath "/"))
(allow file-write-unlink (with report) (subpath "/"))

; Report all network
(allow network-outbound (with report))

On the host side, greywall captures the log stream:

log stream --style compact --predicate 'sender=="Sandbox"'

Single-pass learning flow:

greywall --learning -- <command>
    ↓
1. Generate a PERMISSIVE Seatbelt profile with (allow ... (with report)) on
   every operation category (file-read, file-write, network, etc.)
    ↓
2. Start `log stream` in background goroutine, filtering for Sandbox events
   with the session suffix tag
    ↓
3. Run: sandbox-exec -p '<learning-profile>' /bin/bash -c '<command>'
   → Process runs to completion normally (all operations allowed)
   → Every file read/write/network operation logged by the sandbox kernel
    ↓
4. After command exits, stop `log stream`, parse captured events
   → Extract file paths and operation types (read vs write)
   → Filter out system paths, temp paths, shared libraries
    ↓
5. Feed into existing CollapsePaths() and buildTemplate()
   (same platform-independent code as Linux learning.go)
    ↓
6. Save template to ~/.config/greywall/learned/<cmdname>.json
   → Auto-loaded on next run

Comparison with Linux strace approach:

Aspect	Linux (strace)	macOS (Seatbelt report)
Mechanism	ptrace syscall tracing	MACF kernel hooks + unified log
Runs as user?	Yes (ptrace same-UID)	Yes (sandbox-exec + log stream)
Single-pass?	Yes	Yes
Traces child processes?	Yes (`-f` flag)	Yes (sandbox profile inherits to children)
Read vs write distinction?	Yes (O_RDONLY vs O_WRONLY flags)	Yes (file-read-data vs file-write-data operations)
Path detail?	Full path in syscall args	Full path in log message
Interactive/TUI support?	Yes (foreground strace)	Yes (sandbox-exec preserves terminal)
Security during learning?	Relaxed (no seccomp/Landlock)	Relaxed (permissive profile)
Output format	Structured syscall log	Unified log text (needs parsing)

What this does NOT capture (limitations vs strace):

Exact open flags (O_CREAT, O_TRUNC, O_APPEND) — Seatbelt logs the operation type but not the libc-level flags
File descriptor numbers — not relevant for template generation
Syscall return values — Seatbelt doesn't log success/failure details for allowed operations

These limitations don't matter for template generation — the goal is "which paths were read, which were written," and Seatbelt (with report) provides exactly that.

Implementation requirements:

New file: learning_darwin.go (parallel to learning_linux.go)
CheckLearningAvailable() → verify sandbox-exec exists (always true on macOS)
GenerateLearningProfile() → SBPL with (allow ... (with report)) rules
ParseSandboxLog() → parse log stream output, extract paths + operation types, map to StraceResult{WritePaths, ReadPaths}
Rest of pipeline: reuse GenerateLearnedTemplate(), CollapsePaths(), buildTemplate() from learning.go

Alternative: Iterative reverse learning (fallback)

If the (with report) log entries prove too noisy or unreliable, a fallback approach exists:

Run with a restrictive sandbox profile
Capture denial messages via log stream --predicate 'sender=="Sandbox"'
Parse denials → discover what paths the process needs
Relax the config, run again
Repeat until no more denials

This is slower (multi-pass, process may fail early on first run) but doesn't depend on (with report) behavior.

2. Violation Monitoring — WORKS Without Root

log stream is the one viable tool. It can filter sandbox violations without root:

log stream --style compact --predicate 'sender=="Sandbox"'

Greywall already generates a unique sessionSuffix per sandbox session (in macos.go). All Seatbelt deny rules include this suffix in the (with message ...) tag, so violations can be filtered to the current session.

Implementation plan:

Start log stream as a subprocess when --monitor flag is set
Filter predicate: sender=="Sandbox" AND eventMessage CONTAINS "<sessionSuffix>"
Parse output in a goroutine, extract violation type and path
Format and print to stderr (matching Linux eBPF output format)

This is directly achievable and proven by Anthropic's sandbox-runtime.

3. DNS Filtering — IMPOSSIBLE as Transparent, PARTIAL via Seatbelt

The problem: On macOS, DNS resolution goes through mDNSResponder via a Unix domain socket (/private/var/run/mDNSResponder). There is no way to intercept or redirect DNS queries for a specific process without root.

Why a local DNS proxy doesn't work without root:

A user-level process CAN bind to a high port (e.g., 5353) and run a DNS proxy
But the sandboxed process uses the system resolver, which queries mDNSResponder via Unix socket, NOT via UDP to a configurable nameserver
You cannot override per-process DNS on macOS without root (scutil --dns requires root, networksetup requires admin)
Setting NAMESERVER env var does nothing — macOS programs use getaddrinfo() which goes through mDNSResponder

What Seatbelt CAN do:

Block the mDNSResponder Unix socket connection: (deny network-outbound (remote unix-socket (path-literal "/private/var/run/mDNSResponder"))) — this blocks ALL DNS, including resolution of allowed hosts
There is no Seatbelt rule to selectively allow DNS for specific domains

Realistic options:

Block all DNS via Seatbelt (deny the mDNSResponder socket) — nuclear option, breaks most programs
Allow all DNS (current behavior) — no filtering
DYLD_INSERT_LIBRARIES to intercept getaddrinfo() and filter at the libc level — works for non-hardened binaries only (see section 4)

Verdict: Per-domain DNS filtering is not possible on macOS as a regular user without DYLD_INSERT_LIBRARIES, and even then only for non-hardened binaries.

4. Transparent Network Proxy — PARTIAL via DYLD_INSERT_LIBRARIES

This is the most nuanced gap. There is exactly one mechanism that works without root: library interposition.

How it works:

┌─────────────────────────────────────────────────────┐
│  greywall sets up:                                   │
│  1. Local SOCKS5 proxy on localhost:PORT             │
│  2. redirect.dylib (intercepts connect/sendto/etc.)  │
│  3. Seatbelt profile: deny all network EXCEPT        │
│     localhost:PORT                                    │
│  4. DYLD_INSERT_LIBRARIES=redirect.dylib             │
├─────────────────────────────────────────────────────┤
│  sandbox-exec runs the target with:                  │
│  - Seatbelt profile active (kernel-enforced)         │
│  - redirect.dylib loaded (user-space interposition)  │
├─────────────────────────────────────────────────────┤
│  When target calls connect():                        │
│  - redirect.dylib intercepts → rewrites to proxy     │
│  - Seatbelt allows because dest is localhost:PORT    │
│                                                      │
│  If interposition fails (hardened binary):            │
│  - connect() goes to original destination             │
│  - Seatbelt BLOCKS because dest is not localhost:PORT │
│  - Result: EPERM → no network access (FAIL-CLOSED)  │
└─────────────────────────────────────────────────────┘

The key property is fail-closed: Even when DYLD interposition doesn't work, the Seatbelt network deny is kernel-enforced and cannot be bypassed. The process simply has no network access rather than unfiltered access.

When DYLD_INSERT_LIBRARIES works:

Homebrew-installed binaries (/opt/homebrew/bin/*)
User-compiled binaries
Binaries without hardened runtime or __RESTRICT segment
Most CLI tools installed via package managers (npm global, pip, cargo, etc.)

When it does NOT work (dyld strips the variable):

System binaries (/usr/bin/*, /bin/*, /sbin/*) — SIP-protected
Mac App Store apps — hardened runtime
Binaries with CS_RESTRICT, CS_REQUIRE_LV, or CS_RUNTIME code signing flags
Binaries with __RESTRICT,__restrict segment

Execution order matters: DYLD_INSERT_LIBRARIES injection happens BEFORE Seatbelt profile enforcement. The dylib loads during dyld initialization, then sandbox-exec applies the profile via __mac_syscall. This is documented behavior and the basis of multiple CVEs.

What the redirect.dylib would need to intercept:

connect() — TCP connection establishment
sendto() — UDP (including DNS if not going through mDNSResponder)
getaddrinfo() — DNS resolution (for DNS filtering)
bind() — Prevent binding to non-localhost addresses

Implementation approach:

Write a small C dylib using __attribute__((used)) __DATA,__interpose section
Dylib redirects all connect() calls to a local SOCKS5 proxy
Dylib intercepts getaddrinfo() for domain-based filtering
Embed the compiled dylib in the greywall binary (extract to temp at runtime, like tun2socks on Linux)
Seatbelt profile denies all network except localhost:PROXY_PORT

Comparison with Linux approach:

Aspect	Linux (tun2socks)	macOS (DYLD + Seatbelt)
Capture rate	100% (kernel TUN device)	Variable (depends on binary hardening)
Bypass possible?	No (namespace isolation)	Yes (hardened/system binaries)
Fail mode when bypass	N/A	Fail-closed (Seatbelt blocks all network)
Works for Go binaries?	Yes (kernel-level)	No (Go uses raw syscalls, not libc)
Works for static binaries?	Yes (kernel-level)	No (no dynamic linker)
Root required?	No (bwrap is unprivileged)	No

5. Summary: What's Achievable

Linux feature	macOS equivalent	Works as user?	Coverage
Filesystem sandbox	Seatbelt SBPL	Yes	Full — kernel-enforced, same as Linux
Network deny	Seatbelt `(deny network*)`	Yes	Full — kernel-enforced
Network redirect	DYLD interposition + Seatbelt	Yes	Partial — non-hardened binaries only. Fail-closed for hardened.
DNS filtering	DYLD `getaddrinfo()` intercept	Yes	Partial — same binary restrictions as network redirect
Violation monitoring	`log stream` parsing	Yes	Full — captures all Seatbelt denials
Learning mode	Seatbelt `(allow (with report))` + `log stream`	Yes	Full for sandbox-exec scope — single-pass, positive learning
Env sanitization	Strip DYLD_*	Yes	Full — already implemented
Command blocking	Shared engine	Yes	Full — already implemented
Seccomp equivalent	Seatbelt operation-level rules	Yes	Partial — covers process/network/IPC but not raw syscall numbers
PID isolation	—	—	Impossible
Landlock equivalent	—	—	Impossible

6. Honest Assessment: The Fundamental macOS Limitation

The core problem is Apple's security architecture:

On Linux, sandboxing tools run as an unprivileged user because the kernel provides user namespaces — a mechanism explicitly designed for unprivileged process isolation. bwrap uses CLONE_NEWUSER to create sandboxes without root. strace uses ptrace which is allowed between processes with the same UID.

On macOS, Apple's security model is: the OS protects the user FROM processes, not processes FROM each other. All the powerful mechanisms (Endpoint Security, Network Extensions, DTrace, pf) are designed for system administrators and enterprise MDM, not for user-level process isolation. Apple's answer to "how do I sandbox a process?" is "use the App Sandbox entitlement" — which requires being an app developer with code signing, not a CLI tool.

The result:

Filesystem control: Fully solved. sandbox-exec works as a user and is kernel-enforced.
Network control: Partially solved. Seatbelt deny is kernel-enforced (fail-closed), but transparent redirection only works for non-hardened binaries via DYLD interposition.
Observability: Mostly solved. log stream works for both violation monitoring and learning mode via Seatbelt's (with report) mechanism.
Process isolation: Not possible. No user namespaces on macOS.

There is no path to full parity with Linux without root privileges for network redirection and process isolation. But filesystem sandboxing, learning mode, and violation monitoring are all achievable as a regular user — the macOS gap is narrower than it first appears.

What Would Change With Root

For documentation completeness — if the "no root" constraint were relaxed:

Feature	Tool	Privilege needed	Improvement over user-level
Learning mode (structured)	`fs_usage`	`sudo`	More structured output, exact flags
Learning mode (JSON)	`eslogger`	`sudo` + Full Disk Access	Process tree tracking, JSON format
DNS filtering (system-level)	Local DNS proxy + `scutil`	`sudo`	System-level DNS redirect, works for all programs
Transparent proxy (all traffic)	`pf` rules	`sudo`	Catches 100% of traffic including hardened binaries
Transparent proxy (Apple-approved)	Network Extension	Admin (system dialog)	Officially supported, future-proof

macOS Learning Mode via `fs_usage` (with admin access)

Why dtrace-based tools are ALL blocked by SIP

A common misconception is that SIP only blocks tracing of SIP-protected binaries (those in /usr/bin/, /System/, etc.). In reality, SIP disables the syscall dtrace provider entirely, regardless of what process you're targeting. Even sudo dtrace fails:

$ sudo dtrace -n 'syscall::open*:entry /execname == "myapp"/ { printf("%s", copyinstr(arg0)); }'
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to initialize dtrace: DTrace requires additional privileges

This eliminates all three commonly suggested tools:

Tool	Based on	Status with SIP
`dtrace` (custom scripts)	dtrace	Blocked — syscall provider disabled
`dtruss` (strace equivalent)	dtrace	Blocked — same reason
`opensnoop` (file open tracer)	dtrace	Blocked — same reason

Disabling SIP (csrutil enable --without dtrace) requires rebooting into recovery mode and is not a viable requirement for end users.

`fs_usage` — the only viable macOS tracing tool

fs_usage is not dtrace-based. It reads directly from the kernel's kdebug tracing facility, which SIP does not restrict. It works with just sudo:

sudo fs_usage -w -f filesys opencode

This was confirmed working on macOS with SIP enabled (tested February 2026).

Key properties:

Name-based filtering: fs_usage <processname> filters by comm name, not PID. Start it before the process exists — it catches events from the moment any process with that name spawns. No PID race condition.
Full path capture: Output includes resolved absolute paths for every filesystem operation.
Operation types: Distinguishes reads (open, read) from writes (open with write flags, write, mkdir, unlink, rename).
No SIP issues: Works for ALL binaries — system, Homebrew, user-compiled, hardened or not.
Pre-installed: Ships with every macOS installation.

Architecture: privilege separation

The critical design point: sudo is used only for the monitoring process, completely isolated from the sandboxed command. The sandboxed command never receives elevated privileges.

greywall (orchestrator, unprivileged)
├── sudo fs_usage -w -f filesys <cmdname>     ← privileged monitor (separate process)
│   └── reads kernel kdebug tracebuffer        ← passive observer, no interaction with sandbox
│   └── output piped to temp file
│
└── sandbox-exec -p '<permissive>' -- <cmd>    ← sandboxed command (unprivileged)
    └── runs as current user, no sudo
    └── no access to the monitor process

The monitor and the sandboxed command share no file descriptors, no IPC, no environment. The sudo elevation cannot leak to the sandbox.

Learning flow on macOS

greywall --learning -- <command>
    ↓
1. Prompt user for admin password (sudo)
    ↓
2. Start `sudo fs_usage -w -f filesys <cmdname>` in background
   → Output redirected to temp file
    ↓
3. Start `sandbox-exec -p '<permissive-profile>' -- <command>`
   → Permissive Seatbelt profile (allow default)
   → Runs as current user, no elevation
    ↓
4. Wait for sandboxed command to exit
    ↓
5. Kill fs_usage monitor, parse temp file
   → Extract file paths and operation types
   → Distinguish reads vs writes from fs_usage output format
   → Filter out system paths, temp paths, shared libraries
    ↓
6. Feed into existing CollapsePaths() and buildTemplate()
   → Same platform-independent pipeline as Linux
    ↓
7. Save template to ~/.config/greywall/learned/<cmdname>.json
   → Auto-loaded on next run

Comparison: Linux strace vs macOS fs_usage for learning

Aspect	Linux (strace)	macOS (fs_usage)
Underlying mechanism	ptrace syscall tracing	kdebug kernel tracing facility
Privilege required	None (ptrace own child)	`sudo` (for kdebug access)
SIP/security restriction	N/A	No restriction (not dtrace-based)
Process filtering	Automatic (traces child)	Name-based (`fs_usage <name>`)
Catches from process start?	Yes (strace launches the command)	Yes (name filter, start monitor first)
Full paths?	Yes (in syscall arguments)	Yes (resolved in output)
Read vs write distinction?	Yes (O_RDONLY vs O_WRONLY flags)	Yes (operation type in output)
Traces child processes?	Yes (`-f` flag)	Yes (all processes matching name)
Works for all binaries?	Yes	Yes (no SIP restriction)
Interactive/TUI support?	Yes (foreground strace)	Yes (sandbox-exec preserves terminal)
Output format	Structured syscall log	Text with operation, path, timing
Post-processing	`ParseStraceLog()`	New `ParseFsUsageLog()` needed

Implementation requirements

New file: learning_darwin.go (replace current stub that returns "learning mode is only available on Linux")
CheckFsUsageAvailable() → verify fs_usage exists (always true on macOS) and sudo access
StartFsUsageMonitor(cmdName string) → spawn sudo fs_usage -w -f filesys <cmdname>, redirect output to temp file
StopFsUsageMonitor() → kill the fs_usage process
ParseFsUsageLog(logPath string) → parse fs_usage output, extract paths + operation types, return *StraceResult{WritePaths, ReadPaths}
Rest of pipeline: reuse GenerateLearnedTemplate(), CollapsePaths(), buildTemplate() from learning.go
Manager changes: wrapCommandLearning() needs a macOS path that starts the monitor, runs sandbox-exec with permissive profile, then stops monitor

Open questions

Name collision: fs_usage <name> matches ALL processes with that name system-wide. If another process named opencode is running, its events would be captured too. Mitigation: post-filter by PID range and timing (events between sandbox start and stop), or warn the user.
sudo UX: How to handle the sudo prompt? Options: (a) prompt inline before learning starts, (b) use osascript for a macOS password dialog, (c) require pre-authentication (sudo -v) before running greywall.
fs_usage output format: The exact format varies slightly across macOS versions. Needs testing on macOS 13, 14, and 15 to ensure the parser is robust.

Part 4b: Complete Functionality Table — Linux vs macOS

Every sandboxing capability, how it's implemented on each platform, and what to do about gaps.

Filesystem Control

Functionality	Linux	macOS (current)	macOS (proposed, no root)
Deny-by-default filesystem	bwrap `--tmpfs /` + selective `--ro-bind` mounts	Seatbelt `(deny default)` + explicit `(allow file-read-data (subpath ...))`	Already implemented. Equivalent.
Read control (system paths)	bwrap mounts `/usr`, `/bin`, `/lib`, `/etc` read-only	Seatbelt `(allow file-read-data (subpath "/usr"))` etc.	Already implemented. Equivalent.
Read control (user paths)	bwrap mounts specific home dirs read-only	Seatbelt `(allow file-read-data (subpath "~/.nvm"))` etc.	Already implemented. Equivalent.
Write control (CWD)	bwrap `--bind cwd cwd`	Seatbelt `(allow file-write* (subpath cwd))`	Already implemented. Equivalent.
Sensitive file masking (.env)	bwrap `--ro-bind /dev/null .env` (file is replaced with empty)	Seatbelt `(deny file-read-data (literal ".env"))` (file exists but is unreadable)	Already implemented. Slightly different: Linux hides the file, macOS blocks access. Both prevent data leakage.
Protected file read-only (.bashrc, .gitconfig)	bwrap `--ro-bind`	Seatbelt `(deny file-write* (literal ...))`	Already implemented. Equivalent.
Glob pattern matching	Landlock: expand globs, add `PATH_BENEATH` rules	Seatbelt: convert globs to regex via `GlobToRegex()`	Already implemented. Equivalent.
File movement blocking	Landlock `REFER` right (ABI v2+) blocks cross-dir renames	Seatbelt `(deny file-write-unlink)` with ancestor path blocking	Already implemented. macOS is actually more explicit here.
Symlink escape prevention	bwrap resolves symlinks before mounting	Seatbelt resolves at kernel level (MACF hooks)	Already implemented. Both handle this.
Kernel-level fs enforcement (defense-in-depth)	Landlock LSM — second layer behind bwrap mounts	Not possible — no user-space LSM on macOS	Gap: impossible. Seatbelt is the single enforcement layer. No kernel LSM available to regular users. If Seatbelt has a bug, there's no backup.
Filesystem visibility (mount namespace)	bwrap `--tmpfs /` — files literally don't exist in sandbox	Not possible — no mount namespace on macOS	Gap: impossible. Seatbelt denies access but files remain visible to `stat()` (metadata). `(allow file-read-metadata)` is required globally for path resolution.

Network Control

Functionality	Linux	macOS (current)	macOS (proposed, no root)
Network isolation (namespace)	bwrap `--unshare-net` — separate network stack	Not possible — no network namespace on macOS	Gap: impossible. Seatbelt denies network operations but the process shares the host network stack.
Block all outbound	bwrap network namespace (no interfaces)	Seatbelt `(deny network*)` — kernel-enforced via MACF	Already implemented. Equivalent effect.
Transparent proxy (all traffic)	tun2socks: TUN device captures all packets, routes through SOCKS5	Not implemented	Proposed: DYLD_INSERT_LIBRARIES with `connect()` interposition dylib + Seatbelt `(deny network-outbound)` except proxy port. Works for non-hardened binaries. Fail-closed: hardened binaries get zero network (Seatbelt blocks).
Proxy env var fallback	`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY` set when TUN unavailable	`HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY` set when proxy configured	Already implemented. Equivalent. Apps that respect env vars are proxied on both platforms.
DNS filtering	DnsBridge: socat relays DNS over Unix socket to host DNS server with filtering	Not implemented	Proposed: DYLD_INSERT_LIBRARIES intercepting `getaddrinfo()` for domain filtering. Same binary restrictions as network redirect. Alternative: block all DNS via Seatbelt (nuclear, breaks most programs).
Exposed ports (inbound)	ReverseBridge: socat on host forwards to Unix socket inside sandbox	Seatbelt `(allow network-bind (local ip "localhost:*"))` + `(allow network-inbound)`	Already implemented. Different mechanism, same effect.
Unix socket access	bwrap `--bind` mounts sockets into sandbox	Seatbelt `(allow network* (remote unix-socket (path-literal ...)))`	Already implemented. Equivalent.

Syscall/Operation Control

Functionality	Linux	macOS (current)	macOS (proposed, no root)
Block ptrace	seccomp BPF: `ptrace(101)` → EPERM	Seatbelt: no `process-info*` for other processes; SIP also blocks ptrace on system binaries	Partially covered. Seatbelt + SIP provide equivalent protection for most scenarios.
Block mount/umount	seccomp BPF: `mount(165)`, `umount2(166)` → EPERM	Not relevant — regular users cannot mount on macOS (no `mount` privilege)	No action needed. macOS doesn't allow user-level mounting.
Block reboot/shutdown	seccomp BPF: `reboot(169)` → EPERM + command blocker denies `reboot`	Command blocker denies `reboot`, `shutdown` etc.	Already implemented. Command blocker is shared. macOS also requires admin for reboot.
Block kernel module ops	seccomp BPF: `init_module`, `delete_module` → EPERM	Not relevant — kext loading requires root + SIP exemption on macOS	No action needed.
Block privilege escalation	seccomp BPF: `personality`, `userfaultfd`, `perf_event_open`, `bpf`, `keyctl`	Not relevant — these syscalls either don't exist or are already restricted by SIP on macOS	No action needed. macOS kernel restricts these at the platform level.
Mach IPC control	Not applicable (Linux has no Mach IPC)	Seatbelt `(allow mach-lookup ...)` — allowlist of ~20 essential XPC services	Already implemented. macOS-specific.
IOKit control	Not applicable	Seatbelt `(allow iokit-open ...)` — GPU, power management	Already implemented. macOS-specific.
sysctl control	Not implemented (could add via seccomp argument filtering)	Seatbelt `(allow sysctl-read ...)` — 50+ sysctls allowlisted	Already implemented. macOS is ahead here.

Process Isolation

Functionality	Linux	macOS (current)	macOS (proposed, no root)
PID namespace	bwrap `--unshare-pid` — can't see host processes	Not possible	Gap: impossible. No user namespace equivalent on macOS. Sandboxed process can see all host PIDs.
Session isolation	bwrap `--new-session` — detach from controlling terminal	Not possible	Gap: impossible. `sandbox-exec` does not detach sessions.
Capability dropping	bwrap drops caps; `--cap-add` for specific ones	Not applicable — macOS doesn't use Linux capabilities model	Different model. macOS uses entitlements, not capabilities.

Observability

Functionality	Linux	macOS (current)	macOS (proposed, no root)
Violation monitoring	eBPF (bpftrace): attaches to syscall exit tracepoints, reports EACCES/EPERM/EROFS	Not implemented	Proposed: `log stream` filtering for `sender=="Sandbox"` with session suffix. Works as user. Proven by Anthropic's sandbox-runtime. Infrastructure (session suffix, log tagging) already exists in greywall codebase.
Learning mode (positive)	strace: traces file-access syscalls, parses log, generates config template	Not implemented	Proposed: Seatbelt `(allow ... (with report))` + `log stream`. Permissive sandbox profile logs every permitted operation. Parse log → extract paths → generate template. Single-pass, positive learning, works as user.
Log tagging	Not implemented (eBPF shows PID but no command tag)	Seatbelt `(with message "CMD64_<base64>_END_<suffix>")` — per-session violation tags	Already implemented. macOS is ahead here.
Template auto-loading	Saved to `~/.config/greywall/learned/<cmd>.json`, loaded on next run	Not implemented	Reuse same mechanism — template format and auto-loading are platform-independent.

Caller-Side Pre-Filter

These operate BEFORE the sandbox is created. They filter the top-level command string submitted to greywall -- <command> by the caller (e.g., an AI agent framework). They do NOT intercept commands executed by child processes inside the sandbox — that's the job of the runtime layers above (seccomp blocks the reboot syscall, Seatbelt blocks filesystem operations, network namespace blocks connections, etc.).

Functionality	Linux	macOS (current)	macOS (proposed, no root)
Command blocking	Shared engine: deny/allow lists, shell parsing, nested expansion. Rejects `greywall -- "git push"` before sandbox starts.	Same	Already implemented. Shared code.
SSH policy	Shared engine: host patterns, remote command filtering	Same	Already implemented. Shared code.
Env sanitization (Linux)	Strip `LD_PRELOAD`, `LD_LIBRARY_PATH`, `LD_AUDIT`, all `LD_*`	N/A on macOS	N/A.
Env sanitization (macOS)	N/A on Linux	Strip `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, all `DYLD_*`	Already implemented. Note: greywall must selectively NOT strip `DYLD_INSERT_LIBRARIES` when it's setting it for the redirect.dylib. The sanitization should strip user-provided DYLD vars but preserve greywall's own.
Dangerous file lists	Shared: `.gitconfig`, `.bashrc`, `.zshrc`, `.env*`, `.git/hooks`, `.vscode`, `.idea`	Same	Already implemented. Shared lists.

Summary: Gap Status

Gap	Status	Reason
Learning mode	Solvable	Two approaches: (1) Seatbelt `(allow (with report))` + `log stream` — no root needed, (2) `sudo fs_usage` — requires admin, more reliable, not dtrace-based, unaffected by SIP.
Violation monitoring	Solvable	`log stream` with session suffix filtering. No root needed.
Transparent network proxy	Partially solvable	DYLD interposition for non-hardened binaries. Fail-closed via Seatbelt for the rest.
DNS filtering	Partially solvable	DYLD `getaddrinfo()` intercept for non-hardened binaries.
Kernel-level fs backup (Landlock)	Impossible	No user-space LSM on macOS.
PID namespace isolation	Impossible	No user namespaces on macOS.
Network namespace isolation	Impossible	No user namespaces on macOS.
Mount namespace (file visibility)	Impossible	No mount namespace on macOS.

Part 5: Detailed Component Reference

Linux Components

Bubblewrap (bwrap) — Namespace Isolation

Primary sandboxing primitive. Creates isolated namespaces for the sandboxed process.

Namespaces used:

Namespace	Flag	Purpose
Network	`--unshare-net`	Isolates network stack (no host network access)
PID	`--unshare-pid`	Process ID isolation
Session	`--new-session`	Detach from controlling terminal (disabled in learning mode)

Filesystem mounting — three modes:

Mode	Trigger	Root mount	Description
Deny-by-default	`defaultDenyRead: true` (default)	`--tmpfs /`	Empty root; system paths selectively mounted read-only. CWD mounted read-write.
Legacy	`defaultDenyRead: false`	`--ro-bind / /`	Entire root filesystem mounted read-only; specific paths overridden.
Learning	`--learning` flag	`--ro-bind / /`	Root read-only, home + CWD writable. Relaxed for strace tracing.

Special filesystem handling:

Path	Mount type	Reason
`/dev`	`--dev-bind`	Preserve host device permissions
`/proc`	`--proc`	Fresh procfs
`/tmp`	`--tmpfs`	Always writable, isolated from host
`/etc/resolv.conf`	Special cross-mount handling	May be a symlink crossing mount boundaries
`.env*` files	Empty file bind mount	Mask sensitive project files

Seccomp BPF — Syscall Filtering

BPF program generated and loaded at sandbox startup to block dangerous syscalls.

BPF program generated as raw bytecode (8 bytes per instruction)
Program loads the syscall number, compares against a blocklist
Blocked syscalls return SECCOMP_RET_ERRNO | EPERM (silent denial)
Unblocked syscalls return SECCOMP_RET_ALLOW
Filter passed to bwrap via file descriptor: exec 3<filter; bwrap --seccomp 3

Blocked syscalls (24 total):

Category	Syscalls
Process debugging/injection	`ptrace`, `process_vm_readv`, `process_vm_writev`
Kernel/privilege escalation	`personality`, `userfaultfd`, `perf_event_open`, `bpf`, `keyctl`, `add_key`, `request_key`
System control	`mount`, `umount2`, `pivot_root`, `swapon`, `swapoff`, `sethostname`, `setdomainname`
Kernel manipulation	`kexec_load`, `kexec_file_load`, `reboot`, `init_module`, `finit_module`, `delete_module`
System operations	`syslog`, `acct`, `ioperm` (x86_64), `iopl` (x86_64)

Landlock — Kernel Filesystem Access Control

Linux Security Module (LSM) providing fine-grained filesystem access control, available since Linux 5.13.

Greywall re-executes itself with --landlock-apply flag inside the sandbox
Config passed via GREYWALL_CONFIG_JSON environment variable
Ruleset created with SYS_LANDLOCK_CREATE_RULESET
Rules added for each allowed path with SYS_LANDLOCK_ADD_RULE (type PATH_BENEATH)
Ruleset applied with SYS_LANDLOCK_RESTRICT_SELF (irrevocable)
PR_SET_NO_NEW_PRIVS required via prctl()

Access rights controlled (ABI v1-v5):

Right	ABI	Description
`EXECUTE`	v1	Execute files
`READ_FILE`, `READ_DIR`	v1	Read files and list directories
`WRITE_FILE`	v1	Write/truncate files
`MAKE_REG`, `MAKE_DIR`, `MAKE_SOCK`, `MAKE_FIFO`, `MAKE_SYM`	v1	Create filesystem objects
`REMOVE_FILE`, `REMOVE_DIR`	v1	Delete filesystem objects
`REFER`	v2	Cross-directory renames
`TRUNCATE`	v3	Truncate files
`IOCTL_DEV`	v5	Device ioctl operations

eBPF Monitoring — Violation Detection

Real-time monitoring of sandbox violations via bpftrace. Observation only — does not enforce.

bpftrace script generated with the sandbox PID
Tracepoints attached to syscall exit points: openat, unlinkat, mkdirat, connect
Filters for error codes: EACCES (-13), EPERM (-1), EROFS (-30), ECONNREFUSED (-111)
PID filtering: pid >= SANDBOX_PID to exclude system daemons
Violations formatted and printed to stderr

Requirements: CAP_BPF or root, plus bpftrace installed. Graceful fallback if unavailable.

Transparent Network Proxy (tun2socks + bridges)

Network traffic routed through SOCKS5 proxy via TUN device for allowlist-based filtering.

Sandboxed process → TUN device (198.18.0.0/15) → tun2socks → socat → Unix socket → host socat → external SOCKS5 proxy

Bridges (all use socat + Unix sockets to cross namespace boundary):

Bridge	Direction	Purpose
ProxyBridge	Sandbox → Host	SOCKS5 proxy access
DnsBridge	Sandbox → Host	DNS resolution
ReverseBridge	Host → Sandbox	Inbound connections to sandbox services

Environment Sanitization

Platform	Stripped variables	Risk
Linux	`LD_PRELOAD`, `LD_LIBRARY_PATH`, `LD_AUDIT`, `LD_DEBUG`, all `LD_*`	Shared library injection
macOS	`DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, `DYLD_FRAMEWORK_PATH`, all `DYLD_*`	Dylib injection

macOS Components

Seatbelt / sandbox-exec — Profile-Based Sandbox

Uses Apple's built-in sandbox-exec command with SBPL (Sandbox Profile Language) profiles. Enforced by MACF kernel hooks.

Profile structure:

(deny default (with message "logTag")) — block everything by default
Essential process permissions (process-exec, process-fork, signal)
Mach IPC allowlist (~20 essential system services)
IOKit access (GPU memory, power management)
sysctl reads (50+ hardware/kernel parameters)
Filesystem read rules (system paths, CWD, user tooling)
Filesystem write rules (CWD, tmpdir, default write paths)
Mandatory deny rules (.env, .gitconfig, .bashrc, .git/hooks)
Network rules (proxy host:port or localhost binding)
PTY support (optional)

Network control modes:

Mode	Rules	Use case
Unrestricted	`(allow network*)`	Explicitly allowed
Full block	No network rules	Default (no proxy)
Local binding	`(allow network-bind (local ip "localhost:*"))`	Exposed ports
Proxy-based	`(allow network-outbound (remote ip "host:port"))`	External proxy access

Shared Components (Both Platforms)

Command Blocking Engine

Category	Commands
System control	`shutdown`, `reboot`, `halt`, `poweroff`, `init 0/6`, `systemctl poweroff/reboot/halt`
Kernel manipulation	`insmod`, `rmmod`, `modprobe`, `kexec`
Disk manipulation	`mkfs.*`, `fdisk`, `parted`, `dd if=`
Container escape	`docker run -v /:/`, `docker run --privileged`
Namespace escape	`chroot`, `unshare`, `nsenter`

Shell parsing splits on |, ||, &&, ;. Nested invocations (bash -c 'git push') are expanded.

SSH Policy

Dedicated rules: allowed hosts (wildcards), denied hosts, allowed/denied remote commands, optional inheritance of global deny rules.

Dangerous File/Directory Protection

Category	Items
Dangerous files	`.gitconfig`, `.gitmodules`, `.bashrc`, `.bash_profile`, `.zshrc`, `.zprofile`, `.profile`, `.ripgreprc`, `.mcp.json`
Dangerous directories	`.vscode`, `.idea`, `.claude/commands`, `.claude/agents`
Sensitive project files	`.env`, `.env.local`, `.env.development`, `.env.production`, `.env.staging`, `.env.test`

Learning Mode (Linux-only, macOS planned)

Traces filesystem access patterns and generates configuration templates.

greywall --learning -- <command>
    → Relaxed sandbox (bwrap, no seccomp/Landlock)
    → strace traces file-access syscalls
    → Log parsed → paths extracted → collapsed → filtered
    → JSON template generated → saved to ~/.config/greywall/learned/
    → Auto-loaded on next run of same command

Why seccomp and Landlock are disabled in learning mode: strace uses ptrace(2) to trace syscalls. Seccomp blocks ptrace → strace can't attach. Since the goal is observability (not security), all enforcement layers except basic bwrap are disabled.

Part 6: Configuration Reference

{
  "extends": "base-config.json",
  "network": {
    "proxyUrl": "socks5://host:1080",
    "dnsAddr": "localhost:3153",
    "allowUnixSockets": ["/path/to.sock"],
    "allowAllUnixSockets": false,
    "allowLocalBinding": false,
    "allowLocalOutbound": null
  },
  "filesystem": {
    "defaultDenyRead": true,
    "allowRead": ["~/extra-data"],
    "denyRead": ["~/.ssh/id_*"],
    "allowWrite": ["."],
    "denyWrite": [],
    "allowGitConfig": false
  },
  "command": {
    "deny": ["git push", "npm publish"],
    "allow": ["git status"],
    "useDefaults": true
  },
  "ssh": {
    "allowedHosts": ["github.com"],
    "deniedHosts": [],
    "allowedCommands": ["git-upload-pack"],
    "deniedCommands": [],
    "allowAllCommands": false,
    "inheritDeny": false
  },
  "allowPty": false
}

Runtime Dependencies

Dependency	Platform	Required	Purpose
bubblewrap (`bwrap`)	Linux	Yes	Namespace isolation
socat	Linux	Yes (if proxy/DNS)	Unix socket bridging
tun2socks	Linux	Embedded	Transparent network proxy
`ip` (iproute2)	Linux	Yes (if TUN)	TUN device setup
strace	Linux	Only for `--learning`	Filesystem access tracing
bpftrace	Linux	Optional	Violation monitoring
sandbox-exec	macOS	Yes (built-in)	Seatbelt sandbox
fs_usage	macOS	Only for `--learning` (requires `sudo`)	Filesystem access tracing (kdebug-based, not affected by SIP)

Go dependencies (4): doublestar (glob), cobra (CLI), jsonc (config), golang.org/x/sys (syscalls).

75 KiB Raw Blame History