diff --git a/analysis.md b/analysis.md new file mode 100644 index 0000000..c37d2e5 --- /dev/null +++ b/analysis.md @@ -0,0 +1,1151 @@ +# Greywall Sandboxing Architecture — Deep Analysis + +## Overview + +Greywall is a sandboxing layer that wraps commands in restrictive environments. It blocks network access by default (allowlist-based), restricts filesystem operations, and controls command execution. It supports **Linux** (bubblewrap + seccomp + Landlock + eBPF) and **macOS** (sandbox-exec / Seatbelt SBPL profiles). + +--- + +## Part 1: How Components Work Together (Linux) + +### The Problem Each Component Solves + +There are five distinct security concerns. No single Linux technology can address all of them, which is why Greywall composes multiple layers: + +| Security Concern | Technology | Why This One? | +|-----------------|------------|---------------| +| **Process/network isolation** | Bubblewrap (namespaces) | Only namespaces can create a truly isolated network stack and PID space | +| **Filesystem visibility** | Bubblewrap (mount namespace) | Only mount namespaces can make files literally invisible (not mounted = doesn't exist) | +| **Filesystem access rights** | Landlock (LSM) | Only a kernel LSM can enforce access rights that survive mount misconfiguration | +| **Dangerous syscall blocking** | Seccomp BPF | Only seccomp can block specific system calls (ptrace, mount, reboot) | +| **Violation visibility** | eBPF (bpftrace) | Only kernel tracing can observe denied operations across all layers | + +### Why No Single Layer Is Sufficient + +**Why can't Bubblewrap do everything?** +Bubblewrap controls *what's visible* in the filesystem (mount-time). But once a file IS mounted, bwrap has no say in what operations are performed on it. A read-only bind mount prevents writes, but bwrap cannot block `ptrace`, `mount`, or `reboot` syscalls — those aren't filesystem operations. And if a mount is misconfigured (edge case with symlinks, race conditions), bwrap alone provides no fallback. + +**Why can't Seccomp do everything?** +Seccomp filters syscalls by number and (optionally) argument values. It can block `ptrace(101)` or `mount(165)`, but it **cannot** make path-based decisions. Seccomp sees `openat(fd, "path", flags)` but cannot evaluate whether `"path"` resolves to `/home/user/.ssh/id_rsa` or `/tmp/safe.txt` — that would require following the kernel's path resolution logic inside a BPF program, which is impossible. Seccomp is blind to filesystem semantics. + +**Why can't Landlock do everything?** +Landlock is a filesystem-only LSM. It controls `READ_FILE`, `WRITE_FILE`, `EXECUTE`, `MAKE_DIR`, etc. But it has zero knowledge of network operations, process tracing (`ptrace`), kernel module loading (`init_module`), or system control (`reboot`). Landlock also cannot isolate namespaces — it restricts access within the current namespace, it doesn't create new ones. + +**Why can't eBPF do everything?** +eBPF (as used here via bpftrace) is **observation-only**. It attaches to tracepoints at syscall exit and reads return values. It cannot block, modify, or deny any operation. Its purpose is to report violations after they've been caught by other layers. (eBPF *can* enforce policy via LSM hooks in newer kernels, but Greywall uses bpftrace for monitoring, not enforcement.) + +### Component Interaction Diagram + +``` + ┌─────────────────────────────────────────────────────────┐ + │ THREAT │ + │ Sandboxed process attempts forbidden action │ + └──────────────┬──────────────────────────────────────────┘ + │ + ┌──────────────────────────┼──────────────────────────────┐ + │ │ │ + ▼ ▼ ▼ + ┌───────────┐ ┌──────────────┐ ┌──────────────────┐ + │ Filesystem │ │ Syscall │ │ Network │ + │ access │ │ (ptrace, │ │ connection │ + │ │ │ mount...) │ │ │ + └─────┬─────┘ └──────┬───────┘ └────────┬─────────┘ + │ │ │ + ▼ ▼ ▼ + ┌───────────┐ ┌──────────────┐ ┌──────────────────┐ + │ Bubblewrap │ │ Seccomp │ │ Bubblewrap │ + │ Mount NS │ │ BPF │ │ Network NS │ + │ │ │ │ │ (--unshare-net)│ + │ File not │ │ Syscall # │ │ │ + │ mounted? │ │ in blocklist?│ │ Isolated stack, │ + │ → ENOENT │ │ → EPERM │ │ no host network │ + └─────┬─────┘ └──────┬───────┘ └────────┬─────────┘ + │ (file IS mounted) │ │ + ▼ │ ▼ + ┌───────────┐ │ ┌──────────────────┐ + │ Landlock │ │ │ tun2socks │ + │ (kernel) │ │ │ TUN device │ + │ │ │ │ │ + │ Has access │ │ │ All traffic → │ + │ right? │ │ │ SOCKS5 proxy │ + │ → EACCES │ │ │ (allowlist) │ + └─────┬─────┘ │ └────────┬─────────┘ + │ │ │ + └───────────┬───────────┘───────────────────────────────┘ + │ + ▼ + ┌──────────────┐ + │ eBPF │ + │ (bpftrace) │ + │ │ + │ Observes │ + │ EACCES, │ + │ EPERM, │ + │ EROFS │ + │ returns │ + │ │ + │ → Logs to │ + │ stderr │ + └──────────────┘ +``` + +### Execution Timeline + +This is the exact order of operations when `greywall -- ` runs: + +``` +PHASE 1: PRE-SANDBOX (on host) +───────────────────────────────────────────────────────────────── +1. Command blocking engine checks command against deny/allow lists +2. Environment sanitization strips LD_PRELOAD, LD_LIBRARY_PATH, etc. +3. ProxyBridge starts: socat creates Unix socket → external SOCKS5 +4. DnsBridge starts: socat creates Unix socket → host DNS server +5. ReverseBridge starts: socat listens on exposed ports → Unix sockets +6. Seccomp BPF filter generated and written to temp file +7. Bubblewrap arguments assembled (mounts, namespaces, capabilities) +8. Inner bash script generated (network setup + command execution) + +PHASE 2: SANDBOX CREATION (bwrap) +───────────────────────────────────────────────────────────────── +9. bwrap creates new PID namespace (--unshare-pid) +10. bwrap creates new network namespace (--unshare-net) +11. bwrap sets up filesystem: + - --tmpfs / (empty root) OR --ro-bind / / (read-only root) + - System paths mounted read-only (/usr, /bin, /lib, /etc...) + - CWD mounted read-write + - /dev mounted with --dev-bind + - /proc mounted fresh + - /tmp as tmpfs + - .env files masked with /dev/null bind mount + - Protected files forced read-only + - Unix sockets from bridges bind-mounted in +12. bwrap loads seccomp BPF filter (via fd 3 → --seccomp 3) +13. bwrap drops capabilities (except CAP_NET_ADMIN if proxy enabled) +14. bwrap executes inner bash script + +PHASE 3: INNER SCRIPT (inside sandbox) +───────────────────────────────────────────────────────────────── +15. Script brings up loopback: ip link set lo up +16. Script creates TUN device: ip tuntap add dev tun0 mode tun +17. Script configures routing: ip route add default via 198.18.0.1 +18. Script starts socat (localhost:18321 → proxy Unix socket) +19. Script starts tun2socks (TUN → SOCKS5 on localhost:18321) +20. Script configures DNS (socat relay or resolv.conf) +21. Script starts reverse bridge listeners (socat for each port) +22. Script waits 0.3s for services to initialize + +PHASE 4: LANDLOCK APPLICATION (inside sandbox) +───────────────────────────────────────────────────────────────── +23. greywall re-executes: greywall --landlock-apply -- bash -c "" +24. Reads config from GREYWALL_CONFIG_JSON env var +25. Sets PR_SET_NO_NEW_PRIVS (required for Landlock) +26. Creates Landlock ruleset (filesystem access rights bitmask) +27. Adds PATH_BENEATH rules for each allowed path +28. Applies LANDLOCK_RESTRICT_SELF (irrevocable) +29. syscall.Exec() replaces process with user command + +PHASE 5: COMMAND EXECUTION (fully sandboxed) +───────────────────────────────────────────────────────────────── +30. User command runs with ALL layers active simultaneously: + - Network: isolated namespace, traffic through TUN → proxy + - Filesystem: bwrap mounts (visibility) + Landlock (access rights) + - Syscalls: seccomp BPF blocking dangerous calls + - Environment: sanitized (no LD_PRELOAD etc.) + +PHASE 6: MONITORING (parallel, on host) +───────────────────────────────────────────────────────────────── +31. eBPF monitor started AFTER sandbox process begins +32. bpftrace attaches to syscall exit tracepoints +33. Filters for EACCES/EPERM/EROFS returns from sandbox PIDs +34. Logs violations to stderr in real-time + +PHASE 7: CLEANUP (on host, after command exits) +───────────────────────────────────────────────────────────────── +35. eBPF monitor stopped (bpftrace killed) +36. ReverseBridge stopped (socat killed, sockets removed) +37. DnsBridge stopped (socat killed, socket removed) +38. ProxyBridge stopped (socat killed, socket removed) +39. tun2socks binary removed from /tmp +40. Seccomp filter file removed +``` + +### What Catches What — Attack Scenarios + +| Attack | 1st defense | 2nd defense | 3rd defense | Reported by | +|--------|-------------|-------------|-------------|-------------| +| Read `~/.ssh/id_rsa` | bwrap: file not mounted → `ENOENT` | Landlock: no `READ_FILE` right → `EACCES` | — | eBPF: logs `EACCES` | +| Write to `.env` | bwrap: masked with `/dev/null` bind → writes go nowhere | Landlock: no `WRITE_FILE` right → `EACCES` | — | eBPF: logs `EACCES` | +| `curl evil.com` | bwrap: `--unshare-net` → no host network | tun2socks: routes through proxy → proxy denies | — | eBPF: logs `ECONNREFUSED` | +| `ptrace(pid)` | seccomp: syscall 101 blocked → `EPERM` | — | — | eBPF: logs `EPERM` | +| `mount /dev/sda /mnt` | seccomp: syscall 165 blocked → `EPERM` | — | — | eBPF: logs `EPERM` | +| `reboot` (inside sandbox) | seccomp: syscall 169 blocked → `EPERM` | — | — | eBPF: logs `EPERM` | +| `LD_PRELOAD=evil.so cmd` | Env sanitization: `LD_PRELOAD` stripped before sandbox starts | — | — | — | +| `greywall -- "git push"` | Command blocker: denied before sandbox created (caller-side only) | — | — | — | +| `git push` (inside sandbox) | Network namespace: no outbound connectivity | tun2socks: proxy denies | — | eBPF: logs `ECONNREFUSED` | +| Move `.bashrc` elsewhere | bwrap: `.bashrc` mounted read-only → `EROFS` | Landlock: no `REMOVE_FILE` right | — | eBPF: logs `EROFS` | +| Create file in `/etc` | bwrap: `/etc` mounted `--ro-bind` → `EROFS` | Landlock: no `MAKE_REG` right → `EACCES` | — | eBPF: logs `EROFS` | + +### The Layering Principle + +``` +┌──────────────────────────────────────────────────────────────┐ +│ │ +│ LAYER 0: CALLER-SIDE PRE-FILTER (before sandbox) │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ Command blocking: rejects the TOP-LEVEL command │ │ +│ │ string passed to greywall by the caller (e.g., an │ │ +│ │ AI agent framework). Does NOT intercept commands │ │ +│ │ executed inside the sandbox by child processes. │ │ +│ │ Env sanitization: strip LD_PRELOAD, DYLD_* etc. │ │ +│ │ from the environment before launching the sandbox. │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +│ LAYER 1: ISOLATION (bwrap namespaces) │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ What it does: Creates a separate world │ │ +│ │ Filesystem: Only mounted files are visible │ │ +│ │ Network: Separate network stack, no host access │ │ +│ │ PID: Can't see/signal host processes │ │ +│ │ │ │ +│ │ Analogy: Putting process in a room with only │ │ +│ │ selected items. Items not in the room don't exist. │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +│ LAYER 2: ENFORCEMENT (seccomp + Landlock) │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ Seccomp: "You cannot USE these capabilities" │ │ +│ │ → Blocks ptrace, mount, reboot, kernel module load │ │ +│ │ → Operates on syscall numbers (capability-based) │ │ +│ │ → Cannot make path-based decisions │ │ +│ │ │ │ +│ │ Landlock: "You cannot ACCESS these paths/operations" │ │ +│ │ → Controls read/write/execute/create/delete per path │ │ +│ │ → Operates on filesystem paths (resource-based) │ │ +│ │ → Cannot block non-filesystem syscalls │ │ +│ │ │ │ +│ │ Together: Seccomp blocks dangerous capabilities, │ │ +│ │ Landlock restricts resource access. Orthogonal. │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +│ LAYER 3: NETWORK CONTROL (tun2socks + bridges) │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ What it does: Routes ALL traffic through SOCKS5 proxy │ │ +│ │ TUN device captures every packet (transparent) │ │ +│ │ Proxy applies allowlist (domain/IP filtering) │ │ +│ │ DNS either bridged to host or forced through proxy │ │ +│ │ │ │ +│ │ Why not just bwrap --unshare-net? │ │ +│ │ → That blocks ALL network. We need selective access. │ │ +│ │ → tun2socks re-enables controlled network via proxy. │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +│ LAYER 4: OBSERVATION (eBPF) │ +│ ┌────────────────────────────────────────────────────────┐ │ +│ │ What it does: Watches for denied operations │ │ +│ │ Doesn't block anything — reports what was blocked │ │ +│ │ Catches EACCES/EPERM/EROFS from ANY layer │ │ +│ │ │ │ +│ │ Why needed: Without monitoring, violations are silent. │ │ +│ │ eBPF tells you WHAT was blocked and WHY. │ │ +│ │ Essential for debugging sandbox configs. │ │ +│ └────────────────────────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Seccomp vs Landlock — Why Both? + +These two enforcement layers are **orthogonal** — they protect against completely different threat classes: + +``` + Seccomp BPF Landlock + ─────────── ──────── +Domain: Syscall numbers Filesystem paths +Question asked: "Is this syscall allowed?" "Can this path be accessed?" +Granularity: Per-syscall Per-path + per-operation +Can block ptrace: YES NO (not a filesystem op) +Can block mount: YES NO (not a filesystem op) +Can block reboot: YES NO +Can block read(): NO (too common) YES (per-path) +Can block write(): NO (too common) YES (per-path) +Can block mkdir(): NO (too common) YES (per-path) +Can block rename: NO YES (REFER right, ABI v2+) +Runs at: Syscall entry VFS operation +Fallback: Returns EPERM Returns EACCES +``` + +Seccomp answers: "Should this process be able to call ptrace() at all?" +Landlock answers: "Should this process be able to read /home/user/.ssh/id_rsa?" + +They have zero overlap in what they protect. Using only one leaves an entire class of attacks unaddressed. + +### Bubblewrap vs Landlock — Why Both? + +These two have **overlapping** filesystem protection, deliberately: + +``` + Bubblewrap mounts Landlock + ───────────────── ──────── +Mechanism: Mount-time visibility Runtime access control +When applied: Before process starts After process starts +Failure mode: File doesn't exist File exists but EACCES +Granularity: Whole directories Individual files + operations +Symlink handling: Resolves before mounting Kernel handles +Race conditions: Possible (mount vs access) None (kernel-enforced) +Edge cases: /etc/resolv.conf symlinks, None known + cross-mount boundaries +``` + +**Why Landlock exists as a backup to bwrap:** +1. **Mount misconfiguration**: If bwrap accidentally makes a file visible (symlink edge case, mount order issue), Landlock still denies access +2. **Operation granularity**: bwrap can make a file visible but read-only. Landlock can additionally block `EXECUTE`, `REMOVE`, `TRUNCATE` separately +3. **Defense in depth**: Two independent mechanisms with different failure modes. Both must fail for access to be granted + +**Real example from the code**: `.env` files are masked with `--ro-bind /dev/null .env` in bwrap. But the Landlock layer ALSO denies `READ_FILE` on those paths. If the bwrap mask somehow fails (race condition, mount order), Landlock catches it. + +--- + +## Part 2: macOS Architecture + +### What macOS Has (Single-Layer Model) + +macOS uses Apple's Seatbelt (sandbox-exec) — a **single, comprehensive policy engine**: + +``` +┌──────────────────────────────────────────────────────────────┐ +│ User Command (sandboxed) │ +├──────────────────────────────────────────────────────────────┤ +│ Seatbelt SBPL profile (sandbox-exec) │ +│ │ +│ ┌─ Filesystem rules ──────────────────────────────────┐ │ +│ │ file-read-data, file-write-data, file-write-unlink │ │ +│ │ (deny default) + explicit allows per path │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ ┌─ Network rules ─────────────────────────────────────┐ │ +│ │ network-outbound, network-inbound, network-bind │ │ +│ │ IP:port based filtering │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ ┌─ IPC rules ─────────────────────────────────────────┐ │ +│ │ mach-lookup (XPC service allowlist) │ │ +│ │ ~20 essential macOS services whitelisted │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ ┌─ Hardware rules ────────────────────────────────────┐ │ +│ │ iokit-open, iokit-get-properties │ │ +│ │ sysctl-read, sysctl-write │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ ┌─ Log tagging ───────────────────────────────────────┐ │ +│ │ CMD64__END_ in deny messages │ │ +│ │ Violations go to system.log │ │ +│ └─────────────────────────────────────────────────────┘ │ +│ │ +├──────────────────────────────────────────────────────────────┤ +│ Environment sanitization (DYLD_* stripped) │ +├──────────────────────────────────────────────────────────────┤ +│ Command blocking engine (shared with Linux) │ +├──────────────────────────────────────────────────────────────┤ +│ Host macOS kernel (MACF hooks enforce Seatbelt) │ +└──────────────────────────────────────────────────────────────┘ +``` + +Seatbelt is enforced by MACF (Mandatory Access Control Framework) kernel hooks — the same infrastructure that enforces SIP. In some ways, it's architecturally similar to Linux's Landlock (kernel-level, path-based), but it covers more domains (network, IPC, hardware) in a single profile rather than requiring separate technologies. + +### What Seatbelt Already Covers (vs Linux Equivalent) + +| Linux component | macOS Seatbelt equivalent | Coverage | +|----------------|--------------------------|----------| +| Landlock filesystem control | `file-read-data`, `file-write-data`, `file-write-unlink` | Equivalent | +| Seccomp (partially) | `process-exec`, `process-fork`, `signal`, `mach-lookup` | Partial — blocks at operation level, not raw syscall level | +| Network namespace rules | `network-outbound`, `network-inbound`, `network-bind` | Similar — IP/port filtering, but no true isolation | +| No equivalent on Linux | `mach-lookup` (XPC service control) | macOS-only — controls which system services are accessible | +| No equivalent on Linux | `iokit-open`, `sysctl-read/write` | macOS-only — hardware and kernel parameter access | + +**Key insight**: Seatbelt is not "weaker" than any single Linux component — it's a **different architecture**. It's one comprehensive policy engine instead of multiple composable ones. The trade-off is: less defense-in-depth (single point of failure) but broader coverage per layer (filesystem + network + IPC + hardware in one profile). + +--- + +## Part 3: Security Level Comparison + +### Per-Threat Security Rating + +| Threat | Linux | macOS | Gap | +|--------|:-----:|:-----:|-----| +| **Read unauthorized files** | ●●●●● | ●●●○○ | Linux: 2 layers (bwrap+Landlock). macOS: 1 layer (Seatbelt). No backup if Seatbelt misconfigured | +| **Write to protected files** | ●●●●● | ●●●●○ | Linux: 2 layers + .env masking. macOS: Seatbelt + unlink blocking. Close, but single layer | +| **Exfiltrate data via network** | ●●●●● | ●●○○○ | Linux: namespace isolation + transparent proxy (catches ALL traffic). macOS: env var proxy (apps can ignore it) | +| **DNS exfiltration** | ●●●●○ | ●○○○○ | Linux: DNS bridge with filtering. macOS: no DNS control at all | +| **Dangerous syscalls (ptrace, mount)** | ●●●●● | ●●●○○ | Linux: seccomp blocks 24 syscalls. macOS: SIP blocks some, Seatbelt blocks some process ops, but no explicit syscall filtering | +| **Library injection (LD_PRELOAD)** | ●●●●● | ●●●●● | Both strip dangerous env vars. macOS additionally has SIP preventing DYLD injection on system binaries | +| **Process visibility** | ●●●●● | ●○○○○ | Linux: PID namespace. macOS: no PID isolation (can see all processes) | +| **Command execution control** | ●●●●● | ●●●●● | Same engine on both platforms | +| **SSH command control** | ●●●●● | ●●●●● | Same engine on both platforms | +| **Protected file awareness (.env, .gitconfig)** | ●●●●● | ●●●●● | Same lists, same protection on both | +| **Violation detection/reporting** | ●●●●○ | ●●○○○ | Linux: eBPF real-time monitoring. macOS: violations in system.log, not captured programmatically | +| **Config auto-generation (learning)** | ●●●●● | ○○○○○ | Linux: strace-based learning. macOS: not implemented. Two viable approaches: (1) Seatbelt `(allow (with report))` + `log stream` (no root), (2) `sudo fs_usage` (requires admin, more reliable — not dtrace-based, unaffected by SIP) | +| **IPC control** | ●●○○○ | ●●●●● | macOS: explicit Mach IPC allowlist. Linux: no equivalent (D-Bus not filtered) | +| **Sandbox escape via child process** | ●●●●● | ●●●○○ | Linux: namespace inherits to all children. macOS: sandbox-exec profile inherits, but no namespace boundary | + +### Overall Security Posture + +``` +Linux: ████████████████████░░ ~90% (multi-layer, defense in depth) +macOS: ████████████░░░░░░░░░░ ~60% (single comprehensive layer, gaps in network/monitoring/learning) +``` + +The 30% gap breaks down as: +- ~10% Network control (no transparent proxy, no DNS filtering) +- ~8% No learning mode (can't auto-generate configs) +- ~5% No violation monitoring (can't see what's being blocked) +- ~4% No PID isolation (can see host processes) +- ~3% No defense-in-depth for filesystem (single Seatbelt layer) + +--- + +## Part 4: Closing the macOS Gap (Constraint: No Root, No Containers) + +### Hard Constraints + +These are non-negotiable requirements for any macOS solution: + +1. **No root/sudo/admin access** — must run as a regular user +2. **No containerization** — process runs on the host (macOS 26 Linux containers are out of scope) +3. **SIP enabled** — default macOS, cannot ask users to disable it +4. **All traffic must be captured** — same fail-closed behavior as Linux + +### Tool-by-Tool Privilege Audit + +Every macOS tracing/filtering tool was evaluated against the "no root" constraint: + +| Tool | Requires root? | Requires admin? | Works as user? | Verdict | +|------|:--------------:|:---------------:|:--------------:|---------| +| `sandbox-exec` (Seatbelt) | No | No | **Yes** | Current approach, works | +| `log stream` (violation monitoring) | No | No | **Yes** | Viable for monitoring | +| `fs_usage` (filesystem tracing) | **Yes** | — | No | Blocked without sudo. **Viable for learning mode with admin access** — not dtrace-based, unaffected by SIP. See "macOS Learning Mode via fs_usage" section. | +| `eslogger` (Endpoint Security) | **Yes** + Full Disk Access | — | No | Blocked | +| `dtrace` / `dtruss` / `opensnoop` | **Yes** + SIP blocks entirely | — | No | **Blocked even with sudo** — SIP disables the `syscall` dtrace provider system-wide, not just for SIP-protected binaries. All three tools are dtrace-based and fail identically. | +| `pfctl` / pf rules | **Yes** | — | No | Blocked | +| Network Extensions | No | **Yes** (system dialog) | No | Blocked | +| Endpoint Security framework | **Yes** + Apple-restricted entitlement | — | No | Blocked | +| TUN/TAP devices | **Yes** | — | No | Blocked | +| `DYLD_INSERT_LIBRARIES` | No | No | **Partial** | Works for non-hardened binaries only | + +**Only three mechanisms work as a regular user:** `sandbox-exec`, `log stream`, and `DYLD_INSERT_LIBRARIES` (with caveats). + +--- + +### 1. Learning Mode + +#### Why traditional tracing tools are all blocked + +| Tool | Why it fails without root | +|------|---------------------------| +| `fs_usage` | Requires root. Uses kernel tracing facility (`kdebug`). "Permission denied" without sudo. | +| `eslogger` | Requires root + Full Disk Access. Endpoint Security API enforces `ES_NEW_CLIENT_RESULT_ERR_NOT_PRIVILEGED`. | +| `dtrace` / `dtruss` | Requires root. With SIP enabled, even root cannot trace SIP-protected system binaries (`/bin/sh`, `/usr/bin/env`). | +| `opensnoop` | Requires root. Built on DTrace — same limitation. | +| Endpoint Security framework | Requires `com.apple.developer.endpoint-security.client` restricted entitlement (Apple approval, months-long process) AND root. | +| FileMonitor (Objective-See) | Built on Endpoint Security — same entitlement + root requirement. | +| FSEvents API | Works as user BUT has **no per-process attribution** — it reports filesystem changes without recording which process caused them. Useless for learning. | +| kqueue `EVFILT_VNODE` | Works as user for monitoring writes/deletes/renames, BUT **cannot detect reads/opens** (no `NOTE_OPEN` or `NOTE_ACCESS` flag). Also no per-process attribution. | + +**There is no standalone filesystem tracing tool on macOS that works without root.** Apple considers process tracing a privileged operation. + +#### Viable approach: Seatbelt `(allow ... (with report))` + `log stream` + +**This is positive learning — like strace — not iterative reverse learning.** The process runs once, to completion, and greywall captures what it accessed. + +**How it works:** + +The SBPL language supports a `(with report)` modifier on `allow` rules. By default, only denied operations are logged. Adding `(with report)` causes the sandbox kernel to also log *permitted* operations to the macOS unified log. These entries are readable via `log stream` without root. + +```sbpl +; Learning profile — allow everything, report everything +(version 1) +(allow default) ; permissive baseline + +; Report all file reads +(allow file-read-data (with report) (subpath "/")) + +; Report all file writes +(allow file-write-data (with report) (subpath "/")) +(allow file-write-create (with report) (subpath "/")) +(allow file-write-unlink (with report) (subpath "/")) + +; Report all network +(allow network-outbound (with report)) +``` + +On the host side, greywall captures the log stream: +```bash +log stream --style compact --predicate 'sender=="Sandbox"' +``` + +**Single-pass learning flow:** + +``` +greywall --learning -- + ↓ +1. Generate a PERMISSIVE Seatbelt profile with (allow ... (with report)) on + every operation category (file-read, file-write, network, etc.) + ↓ +2. Start `log stream` in background goroutine, filtering for Sandbox events + with the session suffix tag + ↓ +3. Run: sandbox-exec -p '' /bin/bash -c '' + → Process runs to completion normally (all operations allowed) + → Every file read/write/network operation logged by the sandbox kernel + ↓ +4. After command exits, stop `log stream`, parse captured events + → Extract file paths and operation types (read vs write) + → Filter out system paths, temp paths, shared libraries + ↓ +5. Feed into existing CollapsePaths() and buildTemplate() + (same platform-independent code as Linux learning.go) + ↓ +6. Save template to ~/.config/greywall/learned/.json + → Auto-loaded on next run +``` + +**Comparison with Linux strace approach:** + +| Aspect | Linux (strace) | macOS (Seatbelt report) | +|--------|----------------|------------------------| +| Mechanism | ptrace syscall tracing | MACF kernel hooks + unified log | +| Runs as user? | Yes (ptrace same-UID) | Yes (sandbox-exec + log stream) | +| Single-pass? | Yes | Yes | +| Traces child processes? | Yes (`-f` flag) | Yes (sandbox profile inherits to children) | +| Read vs write distinction? | Yes (O_RDONLY vs O_WRONLY flags) | Yes (file-read-data vs file-write-data operations) | +| Path detail? | Full path in syscall args | Full path in log message | +| Interactive/TUI support? | Yes (foreground strace) | Yes (sandbox-exec preserves terminal) | +| Security during learning? | Relaxed (no seccomp/Landlock) | Relaxed (permissive profile) | +| Output format | Structured syscall log | Unified log text (needs parsing) | + +**What this does NOT capture** (limitations vs strace): +- Exact open flags (O_CREAT, O_TRUNC, O_APPEND) — Seatbelt logs the operation type but not the libc-level flags +- File descriptor numbers — not relevant for template generation +- Syscall return values — Seatbelt doesn't log success/failure details for allowed operations + +These limitations don't matter for template generation — the goal is "which paths were read, which were written," and Seatbelt `(with report)` provides exactly that. + +**Implementation requirements:** +- New file: `learning_darwin.go` (parallel to `learning_linux.go`) +- `CheckLearningAvailable()` → verify `sandbox-exec` exists (always true on macOS) +- `GenerateLearningProfile()` → SBPL with `(allow ... (with report))` rules +- `ParseSandboxLog()` → parse `log stream` output, extract paths + operation types, map to `StraceResult{WritePaths, ReadPaths}` +- Rest of pipeline: reuse `GenerateLearnedTemplate()`, `CollapsePaths()`, `buildTemplate()` from `learning.go` + +#### Alternative: Iterative reverse learning (fallback) + +If the `(with report)` log entries prove too noisy or unreliable, a fallback approach exists: + +1. Run with a restrictive sandbox profile +2. Capture denial messages via `log stream --predicate 'sender=="Sandbox"'` +3. Parse denials → discover what paths the process needs +4. Relax the config, run again +5. Repeat until no more denials + +This is slower (multi-pass, process may fail early on first run) but doesn't depend on `(with report)` behavior. + +--- + +### 2. Violation Monitoring — WORKS Without Root + +**`log stream` is the one viable tool.** It can filter sandbox violations without root: + +```bash +log stream --style compact --predicate 'sender=="Sandbox"' +``` + +Greywall already generates a unique `sessionSuffix` per sandbox session (in `macos.go`). All Seatbelt deny rules include this suffix in the `(with message ...)` tag, so violations can be filtered to the current session. + +**Implementation plan:** +- Start `log stream` as a subprocess when `--monitor` flag is set +- Filter predicate: `sender=="Sandbox" AND eventMessage CONTAINS ""` +- Parse output in a goroutine, extract violation type and path +- Format and print to stderr (matching Linux eBPF output format) + +This is directly achievable and proven by Anthropic's sandbox-runtime. + +--- + +### 3. DNS Filtering — IMPOSSIBLE as Transparent, PARTIAL via Seatbelt + +**The problem:** On macOS, DNS resolution goes through `mDNSResponder` via a Unix domain socket (`/private/var/run/mDNSResponder`). There is no way to intercept or redirect DNS queries for a specific process without root. + +**Why a local DNS proxy doesn't work without root:** +- A user-level process CAN bind to a high port (e.g., 5353) and run a DNS proxy +- But the sandboxed process uses the system resolver, which queries `mDNSResponder` via Unix socket, NOT via UDP to a configurable nameserver +- You cannot override per-process DNS on macOS without root (`scutil --dns` requires root, `networksetup` requires admin) +- Setting `NAMESERVER` env var does nothing — macOS programs use `getaddrinfo()` which goes through `mDNSResponder` + +**What Seatbelt CAN do:** +- Block the `mDNSResponder` Unix socket connection: `(deny network-outbound (remote unix-socket (path-literal "/private/var/run/mDNSResponder")))` — this blocks ALL DNS, including resolution of allowed hosts +- There is no Seatbelt rule to selectively allow DNS for specific domains + +**Realistic options:** +1. **Block all DNS** via Seatbelt (deny the mDNSResponder socket) — nuclear option, breaks most programs +2. **Allow all DNS** (current behavior) — no filtering +3. **DYLD_INSERT_LIBRARIES** to intercept `getaddrinfo()` and filter at the libc level — works for non-hardened binaries only (see section 4) + +**Verdict:** Per-domain DNS filtering is not possible on macOS as a regular user without `DYLD_INSERT_LIBRARIES`, and even then only for non-hardened binaries. + +--- + +### 4. Transparent Network Proxy — PARTIAL via DYLD_INSERT_LIBRARIES + +This is the most nuanced gap. There is exactly one mechanism that works without root: library interposition. + +**How it works:** + +``` +┌─────────────────────────────────────────────────────┐ +│ greywall sets up: │ +│ 1. Local SOCKS5 proxy on localhost:PORT │ +│ 2. redirect.dylib (intercepts connect/sendto/etc.) │ +│ 3. Seatbelt profile: deny all network EXCEPT │ +│ localhost:PORT │ +│ 4. DYLD_INSERT_LIBRARIES=redirect.dylib │ +├─────────────────────────────────────────────────────┤ +│ sandbox-exec runs the target with: │ +│ - Seatbelt profile active (kernel-enforced) │ +│ - redirect.dylib loaded (user-space interposition) │ +├─────────────────────────────────────────────────────┤ +│ When target calls connect(): │ +│ - redirect.dylib intercepts → rewrites to proxy │ +│ - Seatbelt allows because dest is localhost:PORT │ +│ │ +│ If interposition fails (hardened binary): │ +│ - connect() goes to original destination │ +│ - Seatbelt BLOCKS because dest is not localhost:PORT │ +│ - Result: EPERM → no network access (FAIL-CLOSED) │ +└─────────────────────────────────────────────────────┘ +``` + +**The key property is fail-closed:** Even when DYLD interposition doesn't work, the Seatbelt network deny is kernel-enforced and cannot be bypassed. The process simply has no network access rather than unfiltered access. + +**When DYLD_INSERT_LIBRARIES works:** +- Homebrew-installed binaries (`/opt/homebrew/bin/*`) +- User-compiled binaries +- Binaries without hardened runtime or `__RESTRICT` segment +- Most CLI tools installed via package managers (npm global, pip, cargo, etc.) + +**When it does NOT work (dyld strips the variable):** +- System binaries (`/usr/bin/*`, `/bin/*`, `/sbin/*`) — SIP-protected +- Mac App Store apps — hardened runtime +- Binaries with `CS_RESTRICT`, `CS_REQUIRE_LV`, or `CS_RUNTIME` code signing flags +- Binaries with `__RESTRICT,__restrict` segment + +**Execution order matters:** `DYLD_INSERT_LIBRARIES` injection happens BEFORE Seatbelt profile enforcement. The dylib loads during `dyld` initialization, then `sandbox-exec` applies the profile via `__mac_syscall`. This is documented behavior and the basis of multiple CVEs. + +**What the redirect.dylib would need to intercept:** +- `connect()` — TCP connection establishment +- `sendto()` — UDP (including DNS if not going through mDNSResponder) +- `getaddrinfo()` — DNS resolution (for DNS filtering) +- `bind()` — Prevent binding to non-localhost addresses + +**Implementation approach:** +- Write a small C dylib using `__attribute__((used)) __DATA,__interpose` section +- Dylib redirects all `connect()` calls to a local SOCKS5 proxy +- Dylib intercepts `getaddrinfo()` for domain-based filtering +- Embed the compiled dylib in the greywall binary (extract to temp at runtime, like tun2socks on Linux) +- Seatbelt profile denies all network except `localhost:PROXY_PORT` + +**Comparison with Linux approach:** + +| Aspect | Linux (tun2socks) | macOS (DYLD + Seatbelt) | +|--------|-------------------|-------------------------| +| Capture rate | 100% (kernel TUN device) | Variable (depends on binary hardening) | +| Bypass possible? | No (namespace isolation) | Yes (hardened/system binaries) | +| Fail mode when bypass | N/A | Fail-closed (Seatbelt blocks all network) | +| Works for Go binaries? | Yes (kernel-level) | No (Go uses raw syscalls, not libc) | +| Works for static binaries? | Yes (kernel-level) | No (no dynamic linker) | +| Root required? | No (bwrap is unprivileged) | No | + +--- + +### 5. Summary: What's Achievable + +| Linux feature | macOS equivalent | Works as user? | Coverage | +|---------------|-----------------|:--------------:|----------| +| **Filesystem sandbox** | Seatbelt SBPL | Yes | Full — kernel-enforced, same as Linux | +| **Network deny** | Seatbelt `(deny network*)` | Yes | Full — kernel-enforced | +| **Network redirect** | DYLD interposition + Seatbelt | Yes | Partial — non-hardened binaries only. Fail-closed for hardened. | +| **DNS filtering** | DYLD `getaddrinfo()` intercept | Yes | Partial — same binary restrictions as network redirect | +| **Violation monitoring** | `log stream` parsing | Yes | Full — captures all Seatbelt denials | +| **Learning mode** | Seatbelt `(allow (with report))` + `log stream` | Yes | Full for sandbox-exec scope — single-pass, positive learning | +| **Env sanitization** | Strip DYLD_* | Yes | Full — already implemented | +| **Command blocking** | Shared engine | Yes | Full — already implemented | +| **Seccomp equivalent** | Seatbelt operation-level rules | Yes | Partial — covers process/network/IPC but not raw syscall numbers | +| **PID isolation** | — | — | Impossible | +| **Landlock equivalent** | — | — | Impossible | + +### 6. Honest Assessment: The Fundamental macOS Limitation + +**The core problem is Apple's security architecture:** + +On Linux, sandboxing tools run as an unprivileged user because the kernel provides user namespaces — a mechanism explicitly designed for unprivileged process isolation. `bwrap` uses `CLONE_NEWUSER` to create sandboxes without root. `strace` uses `ptrace` which is allowed between processes with the same UID. + +On macOS, Apple's security model is: **the OS protects the user FROM processes, not processes FROM each other.** All the powerful mechanisms (Endpoint Security, Network Extensions, DTrace, pf) are designed for system administrators and enterprise MDM, not for user-level process isolation. Apple's answer to "how do I sandbox a process?" is "use the App Sandbox entitlement" — which requires being an app developer with code signing, not a CLI tool. + +The result: +- **Filesystem control**: Fully solved. `sandbox-exec` works as a user and is kernel-enforced. +- **Network control**: Partially solved. Seatbelt deny is kernel-enforced (fail-closed), but transparent redirection only works for non-hardened binaries via DYLD interposition. +- **Observability**: Mostly solved. `log stream` works for both violation monitoring and learning mode via Seatbelt's `(with report)` mechanism. +- **Process isolation**: Not possible. No user namespaces on macOS. + +**There is no path to full parity with Linux without root privileges** for network redirection and process isolation. But filesystem sandboxing, learning mode, and violation monitoring are all achievable as a regular user — the macOS gap is narrower than it first appears. + +### What Would Change With Root + +For documentation completeness — if the "no root" constraint were relaxed: + +| Feature | Tool | Privilege needed | Improvement over user-level | +|---------|------|-----------------|----------------------------| +| Learning mode (structured) | `fs_usage` | `sudo` | More structured output, exact flags | +| Learning mode (JSON) | `eslogger` | `sudo` + Full Disk Access | Process tree tracking, JSON format | +| DNS filtering (system-level) | Local DNS proxy + `scutil` | `sudo` | System-level DNS redirect, works for all programs | +| Transparent proxy (all traffic) | `pf` rules | `sudo` | Catches 100% of traffic including hardened binaries | +| Transparent proxy (Apple-approved) | Network Extension | Admin (system dialog) | Officially supported, future-proof | + +### macOS Learning Mode via `fs_usage` (with admin access) + +#### Why dtrace-based tools are ALL blocked by SIP + +A common misconception is that SIP only blocks tracing of SIP-protected binaries (those in `/usr/bin/`, `/System/`, etc.). In reality, **SIP disables the `syscall` dtrace provider entirely**, regardless of what process you're targeting. Even `sudo dtrace` fails: + +``` +$ sudo dtrace -n 'syscall::open*:entry /execname == "myapp"/ { printf("%s", copyinstr(arg0)); }' +dtrace: system integrity protection is on, some features will not be available +dtrace: failed to initialize dtrace: DTrace requires additional privileges +``` + +This eliminates **all three** commonly suggested tools: + +| Tool | Based on | Status with SIP | +|------|----------|----------------| +| `dtrace` (custom scripts) | dtrace | **Blocked** — syscall provider disabled | +| `dtruss` (strace equivalent) | dtrace | **Blocked** — same reason | +| `opensnoop` (file open tracer) | dtrace | **Blocked** — same reason | + +Disabling SIP (`csrutil enable --without dtrace`) requires rebooting into recovery mode and is not a viable requirement for end users. + +#### `fs_usage` — the only viable macOS tracing tool + +`fs_usage` is **not dtrace-based**. It reads directly from the kernel's `kdebug` tracing facility, which SIP does not restrict. It works with just `sudo`: + +```bash +sudo fs_usage -w -f filesys opencode +``` + +This was confirmed working on macOS with SIP enabled (tested February 2026). + +**Key properties:** +- **Name-based filtering**: `fs_usage ` filters by comm name, not PID. Start it before the process exists — it catches events from the moment any process with that name spawns. No PID race condition. +- **Full path capture**: Output includes resolved absolute paths for every filesystem operation. +- **Operation types**: Distinguishes reads (`open`, `read`) from writes (`open` with write flags, `write`, `mkdir`, `unlink`, `rename`). +- **No SIP issues**: Works for ALL binaries — system, Homebrew, user-compiled, hardened or not. +- **Pre-installed**: Ships with every macOS installation. + +#### Architecture: privilege separation + +The critical design point: `sudo` is used **only** for the monitoring process, completely isolated from the sandboxed command. The sandboxed command never receives elevated privileges. + +``` +greywall (orchestrator, unprivileged) +├── sudo fs_usage -w -f filesys ← privileged monitor (separate process) +│ └── reads kernel kdebug tracebuffer ← passive observer, no interaction with sandbox +│ └── output piped to temp file +│ +└── sandbox-exec -p '' -- ← sandboxed command (unprivileged) + └── runs as current user, no sudo + └── no access to the monitor process +``` + +The monitor and the sandboxed command share no file descriptors, no IPC, no environment. The sudo elevation cannot leak to the sandbox. + +#### Learning flow on macOS + +``` +greywall --learning -- + ↓ +1. Prompt user for admin password (sudo) + ↓ +2. Start `sudo fs_usage -w -f filesys ` in background + → Output redirected to temp file + ↓ +3. Start `sandbox-exec -p '' -- ` + → Permissive Seatbelt profile (allow default) + → Runs as current user, no elevation + ↓ +4. Wait for sandboxed command to exit + ↓ +5. Kill fs_usage monitor, parse temp file + → Extract file paths and operation types + → Distinguish reads vs writes from fs_usage output format + → Filter out system paths, temp paths, shared libraries + ↓ +6. Feed into existing CollapsePaths() and buildTemplate() + → Same platform-independent pipeline as Linux + ↓ +7. Save template to ~/.config/greywall/learned/.json + → Auto-loaded on next run +``` + +#### Comparison: Linux strace vs macOS fs_usage for learning + +| Aspect | Linux (strace) | macOS (fs_usage) | +|--------|----------------|------------------| +| Underlying mechanism | ptrace syscall tracing | kdebug kernel tracing facility | +| Privilege required | None (ptrace own child) | `sudo` (for kdebug access) | +| SIP/security restriction | N/A | No restriction (not dtrace-based) | +| Process filtering | Automatic (traces child) | Name-based (`fs_usage `) | +| Catches from process start? | Yes (strace launches the command) | Yes (name filter, start monitor first) | +| Full paths? | Yes (in syscall arguments) | Yes (resolved in output) | +| Read vs write distinction? | Yes (O_RDONLY vs O_WRONLY flags) | Yes (operation type in output) | +| Traces child processes? | Yes (`-f` flag) | Yes (all processes matching name) | +| Works for all binaries? | Yes | Yes (no SIP restriction) | +| Interactive/TUI support? | Yes (foreground strace) | Yes (sandbox-exec preserves terminal) | +| Output format | Structured syscall log | Text with operation, path, timing | +| Post-processing | `ParseStraceLog()` | New `ParseFsUsageLog()` needed | + +#### Implementation requirements + +- New file: `learning_darwin.go` (replace current stub that returns "learning mode is only available on Linux") +- `CheckFsUsageAvailable()` → verify `fs_usage` exists (always true on macOS) and `sudo` access +- `StartFsUsageMonitor(cmdName string)` → spawn `sudo fs_usage -w -f filesys `, redirect output to temp file +- `StopFsUsageMonitor()` → kill the fs_usage process +- `ParseFsUsageLog(logPath string)` → parse fs_usage output, extract paths + operation types, return `*StraceResult{WritePaths, ReadPaths}` +- Rest of pipeline: reuse `GenerateLearnedTemplate()`, `CollapsePaths()`, `buildTemplate()` from `learning.go` +- Manager changes: `wrapCommandLearning()` needs a macOS path that starts the monitor, runs sandbox-exec with permissive profile, then stops monitor + +#### Open questions + +- **Name collision**: `fs_usage ` matches ALL processes with that name system-wide. If another process named `opencode` is running, its events would be captured too. Mitigation: post-filter by PID range and timing (events between sandbox start and stop), or warn the user. +- **sudo UX**: How to handle the sudo prompt? Options: (a) prompt inline before learning starts, (b) use `osascript` for a macOS password dialog, (c) require pre-authentication (`sudo -v`) before running greywall. +- **fs_usage output format**: The exact format varies slightly across macOS versions. Needs testing on macOS 13, 14, and 15 to ensure the parser is robust. + +--- + +## Part 4b: Complete Functionality Table — Linux vs macOS + +Every sandboxing capability, how it's implemented on each platform, and what to do about gaps. + +### Filesystem Control + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **Deny-by-default filesystem** | bwrap `--tmpfs /` + selective `--ro-bind` mounts | Seatbelt `(deny default)` + explicit `(allow file-read-data (subpath ...))` | Already implemented. Equivalent. | +| **Read control (system paths)** | bwrap mounts `/usr`, `/bin`, `/lib`, `/etc` read-only | Seatbelt `(allow file-read-data (subpath "/usr"))` etc. | Already implemented. Equivalent. | +| **Read control (user paths)** | bwrap mounts specific home dirs read-only | Seatbelt `(allow file-read-data (subpath "~/.nvm"))` etc. | Already implemented. Equivalent. | +| **Write control (CWD)** | bwrap `--bind cwd cwd` | Seatbelt `(allow file-write* (subpath cwd))` | Already implemented. Equivalent. | +| **Sensitive file masking (.env)** | bwrap `--ro-bind /dev/null .env` (file is replaced with empty) | Seatbelt `(deny file-read-data (literal ".env"))` (file exists but is unreadable) | Already implemented. Slightly different: Linux hides the file, macOS blocks access. Both prevent data leakage. | +| **Protected file read-only (.bashrc, .gitconfig)** | bwrap `--ro-bind` | Seatbelt `(deny file-write* (literal ...))` | Already implemented. Equivalent. | +| **Glob pattern matching** | Landlock: expand globs, add `PATH_BENEATH` rules | Seatbelt: convert globs to regex via `GlobToRegex()` | Already implemented. Equivalent. | +| **File movement blocking** | Landlock `REFER` right (ABI v2+) blocks cross-dir renames | Seatbelt `(deny file-write-unlink)` with ancestor path blocking | Already implemented. macOS is actually more explicit here. | +| **Symlink escape prevention** | bwrap resolves symlinks before mounting | Seatbelt resolves at kernel level (MACF hooks) | Already implemented. Both handle this. | +| **Kernel-level fs enforcement (defense-in-depth)** | Landlock LSM — second layer behind bwrap mounts | Not possible — no user-space LSM on macOS | **Gap: impossible.** Seatbelt is the single enforcement layer. No kernel LSM available to regular users. If Seatbelt has a bug, there's no backup. | +| **Filesystem visibility (mount namespace)** | bwrap `--tmpfs /` — files literally don't exist in sandbox | Not possible — no mount namespace on macOS | **Gap: impossible.** Seatbelt denies access but files remain visible to `stat()` (metadata). `(allow file-read-metadata)` is required globally for path resolution. | + +### Network Control + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **Network isolation (namespace)** | bwrap `--unshare-net` — separate network stack | Not possible — no network namespace on macOS | **Gap: impossible.** Seatbelt denies network operations but the process shares the host network stack. | +| **Block all outbound** | bwrap network namespace (no interfaces) | Seatbelt `(deny network*)` — kernel-enforced via MACF | Already implemented. Equivalent effect. | +| **Transparent proxy (all traffic)** | tun2socks: TUN device captures all packets, routes through SOCKS5 | Not implemented | **Proposed: DYLD_INSERT_LIBRARIES** with `connect()` interposition dylib + Seatbelt `(deny network-outbound)` except proxy port. Works for non-hardened binaries. **Fail-closed**: hardened binaries get zero network (Seatbelt blocks). | +| **Proxy env var fallback** | `HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY` set when TUN unavailable | `HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY` set when proxy configured | Already implemented. Equivalent. Apps that respect env vars are proxied on both platforms. | +| **DNS filtering** | DnsBridge: socat relays DNS over Unix socket to host DNS server with filtering | Not implemented | **Proposed: DYLD_INSERT_LIBRARIES** intercepting `getaddrinfo()` for domain filtering. Same binary restrictions as network redirect. Alternative: block all DNS via Seatbelt (nuclear, breaks most programs). | +| **Exposed ports (inbound)** | ReverseBridge: socat on host forwards to Unix socket inside sandbox | Seatbelt `(allow network-bind (local ip "localhost:*"))` + `(allow network-inbound)` | Already implemented. Different mechanism, same effect. | +| **Unix socket access** | bwrap `--bind` mounts sockets into sandbox | Seatbelt `(allow network* (remote unix-socket (path-literal ...)))` | Already implemented. Equivalent. | + +### Syscall/Operation Control + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **Block ptrace** | seccomp BPF: `ptrace(101)` → EPERM | Seatbelt: no `process-info*` for other processes; SIP also blocks ptrace on system binaries | Partially covered. Seatbelt + SIP provide equivalent protection for most scenarios. | +| **Block mount/umount** | seccomp BPF: `mount(165)`, `umount2(166)` → EPERM | Not relevant — regular users cannot mount on macOS (no `mount` privilege) | No action needed. macOS doesn't allow user-level mounting. | +| **Block reboot/shutdown** | seccomp BPF: `reboot(169)` → EPERM + command blocker denies `reboot` | Command blocker denies `reboot`, `shutdown` etc. | Already implemented. Command blocker is shared. macOS also requires admin for reboot. | +| **Block kernel module ops** | seccomp BPF: `init_module`, `delete_module` → EPERM | Not relevant — kext loading requires root + SIP exemption on macOS | No action needed. | +| **Block privilege escalation** | seccomp BPF: `personality`, `userfaultfd`, `perf_event_open`, `bpf`, `keyctl` | Not relevant — these syscalls either don't exist or are already restricted by SIP on macOS | No action needed. macOS kernel restricts these at the platform level. | +| **Mach IPC control** | Not applicable (Linux has no Mach IPC) | Seatbelt `(allow mach-lookup ...)` — allowlist of ~20 essential XPC services | Already implemented. macOS-specific. | +| **IOKit control** | Not applicable | Seatbelt `(allow iokit-open ...)` — GPU, power management | Already implemented. macOS-specific. | +| **sysctl control** | Not implemented (could add via seccomp argument filtering) | Seatbelt `(allow sysctl-read ...)` — 50+ sysctls allowlisted | Already implemented. macOS is ahead here. | + +### Process Isolation + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **PID namespace** | bwrap `--unshare-pid` — can't see host processes | Not possible | **Gap: impossible.** No user namespace equivalent on macOS. Sandboxed process can see all host PIDs. | +| **Session isolation** | bwrap `--new-session` — detach from controlling terminal | Not possible | **Gap: impossible.** `sandbox-exec` does not detach sessions. | +| **Capability dropping** | bwrap drops caps; `--cap-add` for specific ones | Not applicable — macOS doesn't use Linux capabilities model | Different model. macOS uses entitlements, not capabilities. | + +### Observability + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **Violation monitoring** | eBPF (bpftrace): attaches to syscall exit tracepoints, reports EACCES/EPERM/EROFS | Not implemented | **Proposed: `log stream`** filtering for `sender=="Sandbox"` with session suffix. Works as user. Proven by Anthropic's sandbox-runtime. Infrastructure (session suffix, log tagging) already exists in greywall codebase. | +| **Learning mode (positive)** | strace: traces file-access syscalls, parses log, generates config template | Not implemented | **Proposed: Seatbelt `(allow ... (with report))`** + `log stream`. Permissive sandbox profile logs every permitted operation. Parse log → extract paths → generate template. Single-pass, positive learning, works as user. | +| **Log tagging** | Not implemented (eBPF shows PID but no command tag) | Seatbelt `(with message "CMD64__END_")` — per-session violation tags | Already implemented. macOS is ahead here. | +| **Template auto-loading** | Saved to `~/.config/greywall/learned/.json`, loaded on next run | Not implemented | Reuse same mechanism — template format and auto-loading are platform-independent. | + +### Caller-Side Pre-Filter + +These operate BEFORE the sandbox is created. They filter the top-level command string submitted to `greywall -- ` by the caller (e.g., an AI agent framework). They do NOT intercept commands executed by child processes inside the sandbox — that's the job of the runtime layers above (seccomp blocks the `reboot` syscall, Seatbelt blocks filesystem operations, network namespace blocks connections, etc.). + +| Functionality | Linux | macOS (current) | macOS (proposed, no root) | +|--------------|-------|-----------------|---------------------------| +| **Command blocking** | Shared engine: deny/allow lists, shell parsing, nested expansion. Rejects `greywall -- "git push"` before sandbox starts. | Same | Already implemented. Shared code. | +| **SSH policy** | Shared engine: host patterns, remote command filtering | Same | Already implemented. Shared code. | +| **Env sanitization (Linux)** | Strip `LD_PRELOAD`, `LD_LIBRARY_PATH`, `LD_AUDIT`, all `LD_*` | N/A on macOS | N/A. | +| **Env sanitization (macOS)** | N/A on Linux | Strip `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, all `DYLD_*` | Already implemented. **Note:** greywall must selectively NOT strip `DYLD_INSERT_LIBRARIES` when it's setting it for the redirect.dylib. The sanitization should strip user-provided DYLD vars but preserve greywall's own. | +| **Dangerous file lists** | Shared: `.gitconfig`, `.bashrc`, `.zshrc`, `.env*`, `.git/hooks`, `.vscode`, `.idea` | Same | Already implemented. Shared lists. | + +### Summary: Gap Status + +| Gap | Status | Reason | +|-----|--------|--------| +| Learning mode | **Solvable** | Two approaches: (1) Seatbelt `(allow (with report))` + `log stream` — no root needed, (2) `sudo fs_usage` — requires admin, more reliable, not dtrace-based, unaffected by SIP. | +| Violation monitoring | **Solvable** | `log stream` with session suffix filtering. No root needed. | +| Transparent network proxy | **Partially solvable** | DYLD interposition for non-hardened binaries. Fail-closed via Seatbelt for the rest. | +| DNS filtering | **Partially solvable** | DYLD `getaddrinfo()` intercept for non-hardened binaries. | +| Kernel-level fs backup (Landlock) | **Impossible** | No user-space LSM on macOS. | +| PID namespace isolation | **Impossible** | No user namespaces on macOS. | +| Network namespace isolation | **Impossible** | No user namespaces on macOS. | +| Mount namespace (file visibility) | **Impossible** | No mount namespace on macOS. | + +--- + +## Part 5: Detailed Component Reference + +### Linux Components + +#### Bubblewrap (bwrap) — Namespace Isolation + +Primary sandboxing primitive. Creates isolated namespaces for the sandboxed process. + +**Namespaces used:** + +| Namespace | Flag | Purpose | +|-----------|------|---------| +| Network | `--unshare-net` | Isolates network stack (no host network access) | +| PID | `--unshare-pid` | Process ID isolation | +| Session | `--new-session` | Detach from controlling terminal (disabled in learning mode) | + +**Filesystem mounting — three modes:** + +| Mode | Trigger | Root mount | Description | +|------|---------|------------|-------------| +| **Deny-by-default** | `defaultDenyRead: true` (default) | `--tmpfs /` | Empty root; system paths selectively mounted read-only. CWD mounted read-write. | +| **Legacy** | `defaultDenyRead: false` | `--ro-bind / /` | Entire root filesystem mounted read-only; specific paths overridden. | +| **Learning** | `--learning` flag | `--ro-bind / /` | Root read-only, home + CWD writable. Relaxed for strace tracing. | + +**Special filesystem handling:** + +| Path | Mount type | Reason | +|------|-----------|--------| +| `/dev` | `--dev-bind` | Preserve host device permissions | +| `/proc` | `--proc` | Fresh procfs | +| `/tmp` | `--tmpfs` | Always writable, isolated from host | +| `/etc/resolv.conf` | Special cross-mount handling | May be a symlink crossing mount boundaries | +| `.env*` files | Empty file bind mount | Mask sensitive project files | + +#### Seccomp BPF — Syscall Filtering + +BPF program generated and loaded at sandbox startup to block dangerous syscalls. + +1. BPF program generated as raw bytecode (8 bytes per instruction) +2. Program loads the syscall number, compares against a blocklist +3. Blocked syscalls return `SECCOMP_RET_ERRNO | EPERM` (silent denial) +4. Unblocked syscalls return `SECCOMP_RET_ALLOW` +5. Filter passed to bwrap via file descriptor: `exec 3= SANDBOX_PID` to exclude system daemons +5. Violations formatted and printed to stderr + +Requirements: `CAP_BPF` or root, plus `bpftrace` installed. Graceful fallback if unavailable. + +#### Transparent Network Proxy (tun2socks + bridges) + +Network traffic routed through SOCKS5 proxy via TUN device for allowlist-based filtering. + +``` +Sandboxed process → TUN device (198.18.0.0/15) → tun2socks → socat → Unix socket → host socat → external SOCKS5 proxy +``` + +**Bridges (all use socat + Unix sockets to cross namespace boundary):** + +| Bridge | Direction | Purpose | +|--------|-----------|---------| +| ProxyBridge | Sandbox → Host | SOCKS5 proxy access | +| DnsBridge | Sandbox → Host | DNS resolution | +| ReverseBridge | Host → Sandbox | Inbound connections to sandbox services | + +#### Environment Sanitization + +| Platform | Stripped variables | Risk | +|----------|-------------------|------| +| Linux | `LD_PRELOAD`, `LD_LIBRARY_PATH`, `LD_AUDIT`, `LD_DEBUG`, all `LD_*` | Shared library injection | +| macOS | `DYLD_INSERT_LIBRARIES`, `DYLD_LIBRARY_PATH`, `DYLD_FRAMEWORK_PATH`, all `DYLD_*` | Dylib injection | + +### macOS Components + +#### Seatbelt / sandbox-exec — Profile-Based Sandbox + +Uses Apple's built-in `sandbox-exec` command with SBPL (Sandbox Profile Language) profiles. Enforced by MACF kernel hooks. + +**Profile structure:** +1. `(deny default (with message "logTag"))` — block everything by default +2. Essential process permissions (`process-exec`, `process-fork`, `signal`) +3. Mach IPC allowlist (~20 essential system services) +4. IOKit access (GPU memory, power management) +5. sysctl reads (50+ hardware/kernel parameters) +6. Filesystem read rules (system paths, CWD, user tooling) +7. Filesystem write rules (CWD, tmpdir, default write paths) +8. Mandatory deny rules (.env, .gitconfig, .bashrc, .git/hooks) +9. Network rules (proxy host:port or localhost binding) +10. PTY support (optional) + +**Network control modes:** + +| Mode | Rules | Use case | +|------|-------|----------| +| Unrestricted | `(allow network*)` | Explicitly allowed | +| Full block | No network rules | Default (no proxy) | +| Local binding | `(allow network-bind (local ip "localhost:*"))` | Exposed ports | +| Proxy-based | `(allow network-outbound (remote ip "host:port"))` | External proxy access | + +### Shared Components (Both Platforms) + +#### Command Blocking Engine + +| Category | Commands | +|----------|----------| +| System control | `shutdown`, `reboot`, `halt`, `poweroff`, `init 0/6`, `systemctl poweroff/reboot/halt` | +| Kernel manipulation | `insmod`, `rmmod`, `modprobe`, `kexec` | +| Disk manipulation | `mkfs.*`, `fdisk`, `parted`, `dd if=` | +| Container escape | `docker run -v /:/`, `docker run --privileged` | +| Namespace escape | `chroot`, `unshare`, `nsenter` | + +Shell parsing splits on `|`, `||`, `&&`, `;`. Nested invocations (`bash -c 'git push'`) are expanded. + +#### SSH Policy + +Dedicated rules: allowed hosts (wildcards), denied hosts, allowed/denied remote commands, optional inheritance of global deny rules. + +#### Dangerous File/Directory Protection + +| Category | Items | +|----------|-------| +| Dangerous files | `.gitconfig`, `.gitmodules`, `.bashrc`, `.bash_profile`, `.zshrc`, `.zprofile`, `.profile`, `.ripgreprc`, `.mcp.json` | +| Dangerous directories | `.vscode`, `.idea`, `.claude/commands`, `.claude/agents` | +| Sensitive project files | `.env`, `.env.local`, `.env.development`, `.env.production`, `.env.staging`, `.env.test` | + +### Learning Mode (Linux-only, macOS planned) + +Traces filesystem access patterns and generates configuration templates. + +``` +greywall --learning -- + → Relaxed sandbox (bwrap, no seccomp/Landlock) + → strace traces file-access syscalls + → Log parsed → paths extracted → collapsed → filtered + → JSON template generated → saved to ~/.config/greywall/learned/ + → Auto-loaded on next run of same command +``` + +**Why seccomp and Landlock are disabled in learning mode:** strace uses `ptrace(2)` to trace syscalls. Seccomp blocks `ptrace` → strace can't attach. Since the goal is observability (not security), all enforcement layers except basic bwrap are disabled. + +--- + +## Part 6: Configuration Reference + +```jsonc +{ + "extends": "base-config.json", + "network": { + "proxyUrl": "socks5://host:1080", + "dnsAddr": "localhost:3153", + "allowUnixSockets": ["/path/to.sock"], + "allowAllUnixSockets": false, + "allowLocalBinding": false, + "allowLocalOutbound": null + }, + "filesystem": { + "defaultDenyRead": true, + "allowRead": ["~/extra-data"], + "denyRead": ["~/.ssh/id_*"], + "allowWrite": ["."], + "denyWrite": [], + "allowGitConfig": false + }, + "command": { + "deny": ["git push", "npm publish"], + "allow": ["git status"], + "useDefaults": true + }, + "ssh": { + "allowedHosts": ["github.com"], + "deniedHosts": [], + "allowedCommands": ["git-upload-pack"], + "deniedCommands": [], + "allowAllCommands": false, + "inheritDeny": false + }, + "allowPty": false +} +``` + +## Runtime Dependencies + +| Dependency | Platform | Required | Purpose | +|------------|----------|----------|---------| +| bubblewrap (`bwrap`) | Linux | Yes | Namespace isolation | +| socat | Linux | Yes (if proxy/DNS) | Unix socket bridging | +| tun2socks | Linux | Embedded | Transparent network proxy | +| `ip` (iproute2) | Linux | Yes (if TUN) | TUN device setup | +| strace | Linux | Only for `--learning` | Filesystem access tracing | +| bpftrace | Linux | Optional | Violation monitoring | +| sandbox-exec | macOS | Yes (built-in) | Seatbelt sandbox | +| fs_usage | macOS | Only for `--learning` (requires `sudo`) | Filesystem access tracing (kdebug-based, not affected by SIP) | + +**Go dependencies (4):** `doublestar` (glob), `cobra` (CLI), `jsonc` (config), `golang.org/x/sys` (syscalls).