diff --git a/REPORT.md b/REPORT.md deleted file mode 100644 index eca5092..0000000 --- a/REPORT.md +++ /dev/null @@ -1,476 +0,0 @@ -# Linux Security Enhancement Report - -This document summarizes the implementation of enhanced Linux sandboxing with seccomp, Landlock, and eBPF monitoring to achieve feature parity with macOS. - -## Executive Summary - -| Goal | Status | Notes | -|------|--------|-------| -| Seccomp syscall filtering | ✅ Complete | Blocks 27 dangerous syscalls (arch-aware) | -| Landlock filesystem control | ✅ Complete | Applied via embedded wrapper | -| Glob pattern expansion | ✅ Complete | Uses doublestar library | -| eBPF violation monitoring | ✅ Complete | PID-range filtered | -| `--linux-features` flag | ✅ Complete | Shows available kernel features | -| Graceful fallback | ✅ Complete | Auto-detects features | -| bwrap namespace isolation | ✅ Complete | Primary isolation mechanism | - -### Landlock Implementation - -Landlock is now **fully applied** to sandboxed processes via an embedded wrapper approach: - -1. **Config passing**: User config is serialized to JSON and passed via `FENCE_CONFIG_JSON` env var -2. **Command preservation**: User command is wrapped with `bash -c` to preserve shell semantics (e.g., `echo hi && ls`) -3. **Timing**: The wrapper applies Landlock restrictions, then `exec()`s the user command -4. **Defense in depth**: Both bwrap mounts AND Landlock kernel restrictions are enforced - -```text -bwrap runs → export FENCE_CONFIG_JSON=... → fence --landlock-apply -- bash -c "user command" - → applies Landlock (using config from env) → exec(bash -c "user command") -``` - -**Note**: Landlock network restrictions are disabled—network isolation is handled by bwrap's network namespace. - -## Implementation Details - -### New Files Created - -| File | Purpose | -|------|---------| -| `internal/sandbox/linux_features.go` | Feature detection (kernel version, Landlock ABI, capabilities) | -| `internal/sandbox/linux_seccomp.go` | Seccomp BPF filter generation and violation monitoring | -| `internal/sandbox/linux_landlock.go` | Landlock ruleset management and glob expansion | -| `internal/sandbox/linux_ebpf.go` | eBPF-based filesystem monitoring via bpftrace | -| `docs/linux-security-features.md` | User documentation for Linux features | - -### Stub Files (for non-Linux builds) - -- `internal/sandbox/linux_features_stub.go` -- `internal/sandbox/linux_seccomp_stub.go` -- `internal/sandbox/linux_landlock_stub.go` -- `internal/sandbox/linux_ebpf_stub.go` -- `internal/sandbox/linux_stub.go` - -### Modified Files - -| File | Changes | -|------|---------| -| `internal/sandbox/linux.go` | Integrated all security layers, seccomp via fd, Landlock wrapper | -| `internal/sandbox/linux_landlock.go` | Added `ApplyLandlockFromConfig()`, optimized glob expansion | -| `internal/sandbox/manager.go` | Cleanup handler | -| `cmd/fence/main.go` | Landlock wrapper mode (`--landlock-apply`), reads config from `FENCE_CONFIG_JSON` | -| `ARCHITECTURE.md` | Updated platform comparison and monitoring docs | -| `docs/README.md` | Added link to new Linux docs | -| `go.mod` | Added `golang.org/x/sys` dependency | - -## Feature Parity Analysis - -### ✅ Fully Implemented - -| Feature | macOS | Linux | Notes | -|---------|-------|-------|-------| -| Subtree patterns (`dir/**`) | Seatbelt regex | Landlock PATH_BENEATH | Full parity | -| Fine-grained file ops | 5 categories | 13+ Landlock ops | Linux has more granularity | -| Network isolation | Syscall filtering | Network namespace | Linux is more complete | -| Dangerous syscall blocking | Implicit | 27 syscalls via seccomp | Full parity | -| Proxy-based domain filtering | ✅ | ✅ | Identical | - -### 🟡 Partially Implemented - -| Feature | macOS | Linux | Gap | -|---------|-------|-------|-----| -| Glob patterns (`**/.git/hooks`) | Native regex | doublestar library | Only protects existing files | -| Unix socket control | Path-based | bwrap namespace | Landlock has no socket path control | -| Violation monitoring | Always works | Needs CAP_BPF for FS | Documented workaround | - -### 🔴 Kernel Version Dependent - -| Feature | Required Kernel | Fallback | -|---------|-----------------|----------| -| Landlock | 5.13+ | bwrap mount-only restrictions | -| Landlock TRUNCATE | 6.2+ | No truncate control | -| Landlock network | 6.2+ | Uses network namespace instead | -| seccomp LOG | 4.14+ | Silent blocking | -| eBPF LSM | 4.15+ | No filesystem violation visibility | - -## Blocked Syscalls - -The following syscalls are blocked by the seccomp filter: - -```text -ptrace - Process debugging/injection -process_vm_readv - Read another process's memory -process_vm_writev - Write another process's memory -keyctl - Kernel keyring operations -add_key - Add key to keyring -request_key - Request key from keyring -personality - Change execution domain (ASLR bypass) -userfaultfd - User-space page fault (sandbox escape vector) -perf_event_open - Performance monitoring (info leak) -bpf - eBPF without CAP_BPF -kexec_load - Load new kernel -kexec_file_load - Load new kernel from file -reboot - Reboot system -syslog - Kernel log access -acct - Process accounting -mount - Mount filesystems -umount2 - Unmount filesystems -pivot_root - Change root filesystem -swapon - Enable swap -swapoff - Disable swap -sethostname - Change hostname -setdomainname - Change domain name -init_module - Load kernel module -finit_module - Load kernel module from file -delete_module - Unload kernel module -ioperm - I/O port permissions -iopl - I/O privilege level -``` - -## Testing Instructions - -### Prerequisites - -You need a Linux environment. Options: - -1. **Colima** (macOS): `colima ssh` - Uses Lima VM with default Ubuntu -2. **Docker**: `docker run -it --privileged ubuntu:24.04 bash` -3. **Native Linux**: Any distro with kernel 5.13+ recommended - -**Colima Note**: Running fence in Colima requires `sudo` because bwrap's network namespace setup (`--unshare-net`) needs `CAP_NET_ADMIN` which isn't available to unprivileged users in the VM. - -### Installing Dependencies - -```bash -# Ubuntu/Debian -sudo apt update && sudo apt install -y bubblewrap socat bpftrace - -# Fedora/RHEL -sudo dnf install -y bubblewrap socat bpftrace - -# Check kernel version -uname -r -``` - -### Building Fence for Linux - -```bash -# On macOS, cross-compile for Linux -# IMPORTANT: Match the target architecture! - -# Check your Colima architecture first: -colima ssh -- uname -m -# aarch64 = ARM64, x86_64 = amd64 - -# For Colima on Apple Silicon (M1/M2/M3) - uses ARM64: -cd /Users/jy/tusk/fence -GOOS=linux GOARCH=arm64 go build -o fence-linux ./cmd/fence - -# For Colima on Intel Macs or x86_64 VMs/containers: -GOOS=linux GOARCH=amd64 go build -o fence-linux ./cmd/fence -``` - -The binary is accessible via Colima's mount at `/Users/jy/tusk/fence/fence-linux`. - -**Note**: Using the wrong architecture will cause syscalls to fail with ENOSYS (function not implemented) due to Rosetta emulation limitations. - -Or build natively on Linux: - -```bash -cd fence -go build -o fence ./cmd/fence -sudo cp fence /usr/local/bin/ -``` - -### Test 1: Feature Detection - -```bash -# Check kernel version (5.13+ for Landlock, 6.2+ for Landlock network) -uname -r -# Expected: 5.13+ (e.g., "6.8.0-39-generic") - -# Check seccomp availability -# Note: "Seccomp: 0" means no filter is active on THIS process (normal) -# The value will be 2 when a filter is applied -grep Seccomp /proc/self/status -# Expected: Seccomp: 0 (or 2 if already filtered) - -# Check Landlock is enabled in LSM chain -cat /sys/kernel/security/lsm -# Expected: should contain "landlock" (e.g., "lockdown,capability,landlock,yama,apparmor") -``` - -### Test 2: Basic Sandboxing (bwrap) - -```bash -# Note: Use sudo for all fence commands in Colima - -# Test basic sandboxing -sudo ./fence-linux echo "Hello from sandbox" -# Expected: Hello from sandbox - -# Test network isolation (network blocked by default) -sudo ./fence-linux -- curl -I https://example.com --fail 2>&1 | head -5 -# Expected: curl error (connection failed - network is blocked) - -# Test with allowed domain -echo '{"network":{"allowedDomains":["example.com"]}}' > /tmp/fence.json -sudo ./fence-linux --settings /tmp/fence.json -- curl -I https://example.com 2>&1 | head -5 -# Expected: HTTP/2 200 -``` - -### Test 3: Filesystem Restrictions - -```bash -# Note: Use sudo for all fence commands in Colima - -# Test 1: Write to read-only filesystem (should fail) -sudo ./fence-linux touch /etc/test.txt -# Expected: touch: cannot touch '/etc/test.txt': Read-only file system - -# Test 2: /tmp is an isolated writable tmpfs (succeeds but doesn't persist) -sudo ./fence-linux bash -c 'touch /tmp/sandbox-file && echo "File created:" && ls /tmp/sandbox-file' -# Expected: /tmp/sandbox-file (file exists inside sandbox) - -# Verify isolation: file doesn't exist on host after sandbox exits -ls /tmp/sandbox-file 2>&1 -# Expected: No such file or directory - -# Test 3: allowWrite to persist changes to host filesystem -echo '{"filesystem":{"allowWrite":["."]}}' > /tmp/fence.json -sudo ./fence-linux --settings /tmp/fence.json touch ./test-write.txt -ls ./test-write.txt -# Expected: ./test-write.txt exists (persisted to host) -rm ./test-write.txt # cleanup -``` - -**Note**: `/tmp` inside the sandbox is an isolated tmpfs. Apps can write to it normally, but changes don't persist after the sandbox exits and don't affect the host's `/tmp`. This is intentional for security. - -### Test 4: Glob Pattern Expansion - -```bash -# Create test structure in current directory (host-mounted) -mkdir -p ./test-project/.git/hooks -echo "dangerous" > ./test-project/.bashrc -echo "hook" > ./test-project/.git/hooks/pre-commit - -# Test that dangerous files are protected even with allowWrite -echo '{"filesystem":{"allowWrite":["./test-project"]}}' > /tmp/fence.json -sudo ./fence-linux --settings /tmp/fence.json bash -c 'echo "modified" > ./test-project/.bashrc' 2>&1 -# Expected: Permission denied or Read-only file system (mandatory protection) - -# Normal files should be writable -sudo ./fence-linux --settings /tmp/fence.json bash -c 'echo "safe content" > ./test-project/safe.txt' -cat ./test-project/safe.txt -# Expected: safe content - -# Cleanup -rm -rf ./test-project -``` - -### Test 5: Seccomp Syscall Blocking ✅ - -The seccomp filter blocks dangerous syscalls like `ptrace`, preventing process debugging/injection attacks. - -```bash -# Test ptrace blocking via strace -sudo ./fence-linux strace ls 2>&1 -# Expected output: -# strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted -# strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted -# strace: PTRACE_SETOPTIONS: Operation not permitted - -# Verify normal commands still work -sudo ./fence-linux ls /tmp -# Expected: Success (lists /tmp contents) -``` - -**Note**: The seccomp filter blocks 27 dangerous syscalls including: - -- `ptrace` - process debugging/injection -- `mount`/`umount2` - filesystem manipulation -- `bpf` - eBPF operations -- `kexec_load` - kernel replacement -- `init_module`/`delete_module` - kernel module loading -- And more (see `DangerousSyscalls` in source) - -### Test 6: Network Violation Monitoring - -```bash -# The -m flag shows NETWORK violations via the HTTP/SOCKS proxy -# Note: Seccomp syscall violations are blocked silently (see Known Limitations) - -echo '{"network":{"allowedDomains":[]}}' > /tmp/fence.json -sudo ./fence-linux -m --settings /tmp/fence.json bash -c 'curl -s https://example.com; echo done' 2>&1 -# Expected output includes network violation log: -# [fence:http] HH:MM:SS ✗ CONNECT 403 example.com https://example.com:443 - -# Filesystem violations appear in the command's own error output: -sudo ./fence-linux touch /etc/test-file 2>&1 -# Expected: touch: cannot touch '/etc/test-file': Read-only file system -``` - -### Test 7: Landlock Enforcement (kernel 5.13+) - -```bash -# Run fence with debug to see Landlock being applied via embedded wrapper -sudo ./fence-linux -d echo "test" 2>&1 | grep -i landlock -# Expected output (v4 on kernel 6.2+): -# [fence:linux] Available features: kernel X.Y, bwrap, seccomp+usernotif, landlock-v4, ... -# [fence:linux] Sandbox: bwrap(network,pid,fs), seccomp, landlock-v4(wrapper) -# [fence:landlock-wrapper] Applying Landlock restrictions -# [fence:landlock] Created ruleset (ABI v4, fd=N) -# [fence:landlock] Added rule: /usr (access=0xd) -# ... more rules ... -# [fence:landlock] Ruleset applied to process -# [fence:landlock] Applied restrictions (ABI v4) -# [fence:landlock-wrapper] Landlock restrictions applied -# [fence:landlock-wrapper] Exec: /usr/bin/echo [test] - -# Verify Landlock enforcement (path not in allowed list should fail) -sudo ./fence-linux touch /opt/testfile 2>&1 -# Expected: touch: cannot touch '/opt/testfile': Read-only file system -# (blocked by bwrap + Landlock defense in depth) -``` - -## Known Limitations - -### 1. Glob Patterns Only Protect Existing Files - -**Impact**: If a file matching `**/.bashrc` is created AFTER the sandbox starts, it won't be protected. - -**Implementation**: Optimized for Landlock's PATH_BENEATH semantics: - -- `dir/**` → returns just `dir` (Landlock covers descendants automatically, no walking) -- `**/pattern` → scoped to cwd only, **skips directories already covered by `dir/**` patterns** -- `**/dir/**` → finds dirs in cwd, returns them (PATH_BENEATH covers contents) - -**Performance optimization**: When processing `**/.bashrc` alongside `./node_modules/**`, the walker automatically skips `node_modules/` since it's already covered. This prevents O(100k files) walks in large directories. - -**Workaround**: This is consistent with macOS behavior (Seatbelt patterns also evaluated at sandbox creation). - -### 2. Landlock Audit Support Not Yet Upstream - -**Impact**: Landlock denials are invisible without eBPF tracing. - -**Future**: Kernel developers are working on `AUDIT_LANDLOCK` support. Once merged, violations will be visible via the audit subsystem. - -### 3. Seccomp Violations Are Silent - -**Impact**: Blocked syscalls (like `ptrace`) return EPERM but are not logged by fence's `-m` flag. - -**Reason**: Linux's `SECCOMP_RET_ERRNO` action silently returns an error. Logging would require the audit framework or `SECCOMP_RET_USER_NOTIF` (adds complexity). - -**Workaround**: Blocked syscalls still show errors in the program's output (e.g., strace shows "Operation not permitted"). - -### 4. Old Kernel Fallback Reduces Protection - -**Impact**: On kernels < 5.13, filesystem protection relies solely on bwrap mount restrictions. - -**Recommendation**: Use Ubuntu 22.04+, Debian 12+, or Fedora 38+ for full protection. - -## Fixed Implementation Gaps ✅ - -The following issues were identified and **fixed**: - -### 1. `StartLinuxMonitor()` Now Wired Up ✅ - -**Fix**: `main.go` now calls `StartLinuxMonitor()` after starting the sandboxed command. -When `-m` flag is set, the eBPF monitor is started for the sandbox PID. - -### 2. `--linux-features` Flag Implemented ✅ - -**Fix**: Added `--linux-features` flag to CLI that calls `PrintLinuxFeatures()`. - -```bash -fence --linux-features -# Shows: Kernel version, bwrap, socat, seccomp, Landlock, eBPF status -``` - -### 3. eBPF Monitor Now Working ✅ - -**Fix**: The bpftrace script now correctly: - -- Monitors filesystem syscalls (openat, unlinkat, mkdirat) -- Monitors network syscalls (connect) -- Shows human-readable error messages (e.g., "Read-only file system") -- Example output: `[fence:ebpf] 16:35:27 ✗ open: Read-only file system (touch, pid=84398)` - -**Note**: Due to timing constraints, the monitor cannot filter by PID (bpftrace attaches after forks complete). Some noise from other processes may appear during monitoring. - -### SeccompMonitor: Removed (Not Feasible) - -**What we tried**: A `SeccompMonitor` that parsed dmesg/audit logs for seccomp violation events. - -**Why it doesn't work**: Our seccomp filter uses `SECCOMP_RET_ERRNO` to block syscalls with EPERM. This action is completely silent—it doesn't log to dmesg, audit, or anywhere else. - -**Alternatives considered**: - -| Approach | Why it doesn't work | -|----------|---------------------| -| `SECCOMP_RET_LOG` | Logs but **allows** the syscall (defeats the purpose) | -| `SECCOMP_RET_KILL` | Logs but **kills** the process (too harsh) | -| `SECCOMP_RET_USER_NOTIF` | Complex supervisor architecture, adds latency to every blocked call | -| auditd integration | Requires audit daemon setup and root access | - -**Solution**: The eBPF monitor now handles syscall failure detection instead, which catches EPERM/EACCES errors regardless of their source. - -### Summary Table - -| Component | Status | Notes | -|-----------|--------|-------| -| Seccomp filter | ✅ Active | Blocks 27 dangerous syscalls | -| bwrap namespaces | ✅ Active | Primary fs/network isolation | -| Landlock rules | ✅ Active | Via embedded wrapper | -| eBPF Monitor | ✅ Active | PID-range filtered | -| `--linux-features` | ✅ Active | Shows kernel features | -| SeccompMonitor | ❌ Removed | Not feasible (ERRNO is silent) | - -## Performance Comparison - -| Metric | macOS | Linux | Notes | -|--------|-------|-------|-------| -| Startup latency | ~10ms | ~25-35ms | Extra time for seccomp/Landlock setup | -| Syscall overhead | ~1-3% | ~1-2% | seccomp is very efficient | -| Filesystem check | ~1-2% | ~1-3% | Landlock + bwrap mounts | -| Monitoring overhead | ~0% | ~1-2% | eBPF tracing when enabled | -| **Total runtime** | ~2-5% | ~3-7% | Comparable | - -## Recommendations - -1. **For CI/CD**: Use Ubuntu 22.04+ or Debian 12+ for kernel 5.15+ with Landlock v1 -2. **For Development**: Any recent distro works; Landlock recommended -3. **For Production**: Test on target kernel version; fallback is safe but less restrictive - -## Conclusion - -### What Works ✅ - -- **bwrap namespace isolation**: Primary mechanism for network, PID, and filesystem isolation -- **Landlock kernel restrictions**: Applied via embedded wrapper for defense-in-depth -- **Seccomp syscall filtering**: 27 dangerous syscalls blocked (architecture-aware for ARM64/x86_64) -- **Network violation monitoring**: `-m` flag shows blocked HTTP/SOCKS requests via proxy -- **eBPF filesystem monitoring**: `-m` flag with root shows filesystem access errors (PID-range filtered to reduce noise)g -- **`--linux-features` flag**: Query available kernel features -- **Graceful fallback**: Auto-detects features, degrades safely on older kernels - -### Remaining Limitations - -1. **eBPF PID-range filtered**: The monitor filters events to `pid >= SANDBOX_PID`, which excludes pre-existing system processes. This isn't perfect (other new processes might still appear) but significantly reduces noise. - -2. **Seccomp violations are silent**: The filter uses `SECCOMP_RET_ERRNO` which blocks syscalls with EPERM but doesn't log anywhere. Programs will show their own error messages (e.g., "Operation not permitted"). - -### Gap vs macOS - -On macOS, `-m` shows all violations via `log stream`. On Linux: - -- Network violations: ✅ Visible via proxy -- Filesystem violations: ✅ Visible via eBPF (PID-range filtered) -- Seccomp violations: ⚠️ Blocked but not logged (programs show errors) - -### What's Enforced - -**Three-layer enforcement is now active**: - -- **Network**: Completely isolated via bwrap network namespace + proxy filtering -- **Filesystem**: Defense-in-depth with bwrap read-only mounts + Landlock kernel restrictions -- **Dangerous syscalls**: Blocked via seccomp (returns EPERM)