mirror of
https://github.com/Monadical-SAS/reflector.git
synced 2026-03-21 22:56:47 +00:00
feat: add custom S3 endpoint support + Garage standalone storage
Add TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL setting to enable S3-compatible backends (Garage, MinIO). When set, uses path-style addressing and routes all requests to the custom endpoint. When unset, AWS behavior is unchanged. - AwsStorage: accept aws_endpoint_url, pass to all 6 session.client() calls, configure path-style addressing and base_url - Fix 4 direct AwsStorage constructions in Hatchet workflows to pass endpoint_url (would have silently targeted wrong endpoint) - Standalone: add Garage service to docker-compose.standalone.yml, setup script initializes layout/bucket/key and writes credentials - Fix compose_cmd() bug: Mac path was missing standalone yml - garage.toml template with runtime secret generation via openssl
This commit is contained in:
@@ -6,6 +6,23 @@
|
|||||||
# On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
|
# On Mac: Ollama runs natively (Metal GPU) — no profile needed, services here unused.
|
||||||
|
|
||||||
services:
|
services:
|
||||||
|
garage:
|
||||||
|
image: dxflrs/garage:v1.1.0
|
||||||
|
ports:
|
||||||
|
- "3900:3900" # S3 API
|
||||||
|
- "3903:3903" # Admin API
|
||||||
|
volumes:
|
||||||
|
- garage_data:/var/lib/garage/data
|
||||||
|
- garage_meta:/var/lib/garage/meta
|
||||||
|
- ./data/garage.toml:/etc/garage.toml:ro
|
||||||
|
restart: unless-stopped
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "curl", "-sf", "http://localhost:3903/health"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 5
|
||||||
|
start_period: 5s
|
||||||
|
|
||||||
ollama:
|
ollama:
|
||||||
image: ollama/ollama:latest
|
image: ollama/ollama:latest
|
||||||
profiles: ["ollama-gpu"]
|
profiles: ["ollama-gpu"]
|
||||||
@@ -42,4 +59,6 @@ services:
|
|||||||
retries: 5
|
retries: 5
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
|
garage_data:
|
||||||
|
garage_meta:
|
||||||
ollama_data:
|
ollama_data:
|
||||||
|
|||||||
@@ -59,20 +59,28 @@ Generates `server/.env` and `www/.env.local` with standalone defaults:
|
|||||||
|
|
||||||
If env files already exist, the script only updates LLM vars — it won't overwrite your customizations.
|
If env files already exist, the script only updates LLM vars — it won't overwrite your customizations.
|
||||||
|
|
||||||
### 3. Transcript storage (skip for standalone)
|
### 3. Object storage (Garage)
|
||||||
|
|
||||||
Production uses AWS S3 to persist processed audio. **Not needed for standalone live/WebRTC mode.**
|
Standalone uses [Garage](https://garagehq.deuxfleurs.fr/) — a lightweight S3-compatible object store running in Docker. The setup script starts Garage, initializes the layout, creates a bucket and access key, and writes the credentials to `server/.env`.
|
||||||
|
|
||||||
When `TRANSCRIPT_STORAGE_BACKEND` is unset (the default):
|
**`server/.env`** — storage settings added by the script:
|
||||||
- Audio stays on local disk at `DATA_DIR/{transcript_id}/audio.mp3`
|
|
||||||
- The live pipeline skips the S3 upload step gracefully
|
|
||||||
- Audio playback endpoint serves directly from disk
|
|
||||||
- Post-processing (LLM summary, topics, title) works entirely from DB text
|
|
||||||
- Diarization (speaker ID) is skipped — already disabled in standalone config (`DIARIZATION_ENABLED=false`)
|
|
||||||
|
|
||||||
> **Future**: if file upload or audio persistence across restarts is needed, implement a filesystem storage backend (`storage_local.py`) using the existing `Storage` plugin architecture in `reflector/storage/base.py`. No MinIO required.
|
| Variable | Value | Why |
|
||||||
|
|----------|-------|-----|
|
||||||
|
| `TRANSCRIPT_STORAGE_BACKEND` | `aws` | Uses the S3-compatible storage driver |
|
||||||
|
| `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL` | `http://garage:3900` | Docker-internal Garage S3 API |
|
||||||
|
| `TRANSCRIPT_STORAGE_AWS_BUCKET_NAME` | `reflector-media` | Created by the script |
|
||||||
|
| `TRANSCRIPT_STORAGE_AWS_REGION` | `garage` | Must match Garage config |
|
||||||
|
| `TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID` | *(auto-generated)* | Created by `garage key create` |
|
||||||
|
| `TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY` | *(auto-generated)* | Created by `garage key create` |
|
||||||
|
|
||||||
### 4. Transcription and diarization
|
The `TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL` setting enables S3-compatible backends. When set, the storage driver uses path-style addressing and routes all requests to the custom endpoint. When unset (production AWS), behavior is unchanged.
|
||||||
|
|
||||||
|
Garage config lives at `scripts/garage.toml` (mounted read-only into the container). Single-node, `replication_factor=1`.
|
||||||
|
|
||||||
|
> **Note**: Presigned URLs embed the Garage Docker hostname (`http://garage:3900`). This is fine — the server proxies S3 responses to the browser. Modal GPU workers cannot reach internal Garage, but standalone doesn't use Modal.
|
||||||
|
|
||||||
|
### 4. Transcription and diarization (NOT YET IMPLEMENTED)
|
||||||
|
|
||||||
Standalone uses `TRANSCRIPT_BACKEND=whisper` for local CPU-based transcription. Diarization is disabled.
|
Standalone uses `TRANSCRIPT_BACKEND=whisper` for local CPU-based transcription. Diarization is disabled.
|
||||||
|
|
||||||
@@ -81,10 +89,10 @@ Standalone uses `TRANSCRIPT_BACKEND=whisper` for local CPU-based transcription.
|
|||||||
### 5. Docker services
|
### 5. Docker services
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker compose up -d postgres redis server worker beat web
|
docker compose up -d postgres redis garage server worker beat web
|
||||||
```
|
```
|
||||||
|
|
||||||
All services start in a single command. No Hatchet in standalone mode — LLM processing (summaries, topics, titles) runs via Celery tasks.
|
All services start in a single command. Garage is already started by step 3 but is included for idempotency. No Hatchet in standalone mode — LLM processing (summaries, topics, titles) runs via Celery tasks.
|
||||||
|
|
||||||
### 6. Database migrations
|
### 6. Database migrations
|
||||||
|
|
||||||
@@ -105,6 +113,7 @@ Verifies:
|
|||||||
| `web` | 3000 | Next.js frontend |
|
| `web` | 3000 | Next.js frontend |
|
||||||
| `postgres` | 5432 | PostgreSQL database |
|
| `postgres` | 5432 | PostgreSQL database |
|
||||||
| `redis` | 6379 | Cache + Celery broker |
|
| `redis` | 6379 | Cache + Celery broker |
|
||||||
|
| `garage` | 3900, 3903 | S3-compatible object storage (S3 API + admin API) |
|
||||||
| `worker` | — | Celery worker (live pipeline post-processing) |
|
| `worker` | — | Celery worker (live pipeline post-processing) |
|
||||||
| `beat` | — | Celery beat (scheduled tasks) |
|
| `beat` | — | Celery beat (scheduled tasks) |
|
||||||
|
|
||||||
@@ -121,7 +130,7 @@ These require external accounts and infrastructure that can't be scripted:
|
|||||||
|
|
||||||
- Step 1 (Ollama/LLM) — implemented
|
- Step 1 (Ollama/LLM) — implemented
|
||||||
- Step 2 (environment files) — implemented
|
- Step 2 (environment files) — implemented
|
||||||
- Step 3 (transcript storage) — resolved: skip for live-only mode
|
- Step 3 (object storage / Garage) — implemented (`docker-compose.standalone.yml` + `setup-standalone.sh`)
|
||||||
- Step 4 (transcription/diarization) — in progress by another developer
|
- Step 4 (transcription/diarization) — in progress by another developer
|
||||||
- Steps 5-7 (Docker, migrations, health) — implemented
|
- Steps 5-7 (Docker, migrations, health) — implemented
|
||||||
- **Unified script**: `scripts/setup-standalone.sh`
|
- **Unified script**: `scripts/setup-standalone.sh`
|
||||||
|
|||||||
18
scripts/garage.toml
Normal file
18
scripts/garage.toml
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
metadata_dir = "/var/lib/garage/meta"
|
||||||
|
data_dir = "/var/lib/garage/data"
|
||||||
|
replication_factor = 1
|
||||||
|
|
||||||
|
[s3_api]
|
||||||
|
api_bind_addr = "[::]:3900"
|
||||||
|
s3_region = "garage"
|
||||||
|
root_domain = ".s3.garage.localhost"
|
||||||
|
|
||||||
|
[s3_web]
|
||||||
|
bind_addr = "[::]:3902"
|
||||||
|
|
||||||
|
[admin]
|
||||||
|
api_bind_addr = "[::]:3903"
|
||||||
|
|
||||||
|
[rpc]
|
||||||
|
bind_addr = "[::]:3901"
|
||||||
|
secret_transmitter = "__GARAGE_RPC_SECRET__"
|
||||||
@@ -69,13 +69,11 @@ env_set() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
compose_cmd() {
|
compose_cmd() {
|
||||||
|
local compose_files="-f $ROOT_DIR/docker-compose.yml -f $ROOT_DIR/docker-compose.standalone.yml"
|
||||||
if [[ "$OS" == "Linux" ]] && [[ -n "${OLLAMA_PROFILE:-}" ]]; then
|
if [[ "$OS" == "Linux" ]] && [[ -n "${OLLAMA_PROFILE:-}" ]]; then
|
||||||
docker compose -f "$ROOT_DIR/docker-compose.yml" \
|
docker compose $compose_files --profile "$OLLAMA_PROFILE" "$@"
|
||||||
-f "$ROOT_DIR/docker-compose.standalone.yml" \
|
|
||||||
--profile "$OLLAMA_PROFILE" \
|
|
||||||
"$@"
|
|
||||||
else
|
else
|
||||||
docker compose -f "$ROOT_DIR/docker-compose.yml" "$@"
|
docker compose $compose_files "$@"
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -177,8 +175,7 @@ AUTH_BACKEND=none
|
|||||||
# --- Transcription (local whisper) ---
|
# --- Transcription (local whisper) ---
|
||||||
TRANSCRIPT_BACKEND=whisper
|
TRANSCRIPT_BACKEND=whisper
|
||||||
|
|
||||||
# --- Storage (local disk, no S3) ---
|
# --- Storage (set by step_storage, Garage S3-compatible) ---
|
||||||
# TRANSCRIPT_STORAGE_BACKEND is intentionally unset — audio stays on local disk
|
|
||||||
|
|
||||||
# --- Diarization (disabled, no backend available) ---
|
# --- Diarization (disabled, no backend available) ---
|
||||||
DIARIZATION_ENABLED=false
|
DIARIZATION_ENABLED=false
|
||||||
@@ -201,10 +198,67 @@ ENVEOF
|
|||||||
}
|
}
|
||||||
|
|
||||||
# =========================================================
|
# =========================================================
|
||||||
# Step 3: Generate www/.env.local
|
# Step 3: Object storage (Garage)
|
||||||
|
# =========================================================
|
||||||
|
step_storage() {
|
||||||
|
info "Step 3: Object storage (Garage)"
|
||||||
|
|
||||||
|
# Generate garage.toml from template (fill in RPC secret)
|
||||||
|
GARAGE_TOML="$ROOT_DIR/scripts/garage.toml"
|
||||||
|
GARAGE_TOML_RUNTIME="$ROOT_DIR/data/garage.toml"
|
||||||
|
if [[ ! -f "$GARAGE_TOML_RUNTIME" ]]; then
|
||||||
|
mkdir -p "$ROOT_DIR/data"
|
||||||
|
RPC_SECRET=$(openssl rand -hex 32)
|
||||||
|
sed "s|__GARAGE_RPC_SECRET__|${RPC_SECRET}|" "$GARAGE_TOML" > "$GARAGE_TOML_RUNTIME"
|
||||||
|
fi
|
||||||
|
|
||||||
|
compose_cmd up -d garage
|
||||||
|
|
||||||
|
wait_for_url "http://localhost:3903/health" "Garage admin API"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Layout: get node ID, assign, apply (skip if already applied)
|
||||||
|
NODE_ID=$(compose_cmd exec -T garage /garage node id -q 2>/dev/null | tr -d '[:space:]')
|
||||||
|
LAYOUT_STATUS=$(compose_cmd exec -T garage /garage layout show 2>&1 || true)
|
||||||
|
if echo "$LAYOUT_STATUS" | grep -q "No nodes"; then
|
||||||
|
compose_cmd exec -T garage /garage layout assign "$NODE_ID" -c 1G -z dc1
|
||||||
|
compose_cmd exec -T garage /garage layout apply --version 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create bucket (idempotent — skip if exists)
|
||||||
|
if ! compose_cmd exec -T garage /garage bucket info reflector-media &>/dev/null; then
|
||||||
|
compose_cmd exec -T garage /garage bucket create reflector-media
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create key (idempotent — skip if exists)
|
||||||
|
KEY_OUTPUT=$(compose_cmd exec -T garage /garage key info --name reflector 2>&1 || true)
|
||||||
|
if echo "$KEY_OUTPUT" | grep -q "not found"; then
|
||||||
|
KEY_OUTPUT=$(compose_cmd exec -T garage /garage key create --name reflector)
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Parse key ID and secret from output
|
||||||
|
KEY_ID=$(echo "$KEY_OUTPUT" | grep -i "key id" | awk '{print $NF}')
|
||||||
|
KEY_SECRET=$(echo "$KEY_OUTPUT" | grep -i "secret" | awk '{print $NF}')
|
||||||
|
|
||||||
|
# Grant bucket permissions (idempotent)
|
||||||
|
compose_cmd exec -T garage /garage bucket allow reflector-media --read --write --key reflector
|
||||||
|
|
||||||
|
# Set env vars
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_BACKEND" "aws"
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL" "http://garage:3900"
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_AWS_BUCKET_NAME" "reflector-media"
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_AWS_REGION" "garage"
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID" "$KEY_ID"
|
||||||
|
env_set "$SERVER_ENV" "TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY" "$KEY_SECRET"
|
||||||
|
|
||||||
|
ok "Object storage ready (Garage)"
|
||||||
|
}
|
||||||
|
|
||||||
|
# =========================================================
|
||||||
|
# Step 4: Generate www/.env.local
|
||||||
# =========================================================
|
# =========================================================
|
||||||
step_www_env() {
|
step_www_env() {
|
||||||
info "Step 3: Generating www/.env.local"
|
info "Step 4: Generating www/.env.local"
|
||||||
|
|
||||||
if [[ -f "$WWW_ENV" ]]; then
|
if [[ -f "$WWW_ENV" ]]; then
|
||||||
ok "www/.env.local already exists — skipping"
|
ok "www/.env.local already exists — skipping"
|
||||||
@@ -232,22 +286,22 @@ ENVEOF
|
|||||||
}
|
}
|
||||||
|
|
||||||
# =========================================================
|
# =========================================================
|
||||||
# Step 4: Start all services
|
# Step 5: Start all services
|
||||||
# =========================================================
|
# =========================================================
|
||||||
step_services() {
|
step_services() {
|
||||||
info "Step 4: Starting Docker services"
|
info "Step 5: Starting Docker services"
|
||||||
|
|
||||||
# server runs alembic migrations on startup automatically (see runserver.sh)
|
# server runs alembic migrations on startup automatically (see runserver.sh)
|
||||||
compose_cmd up -d postgres redis server worker beat web
|
compose_cmd up -d postgres redis garage server worker beat web
|
||||||
ok "Containers started"
|
ok "Containers started"
|
||||||
info "Server is running migrations (alembic upgrade head)..."
|
info "Server is running migrations (alembic upgrade head)..."
|
||||||
}
|
}
|
||||||
|
|
||||||
# =========================================================
|
# =========================================================
|
||||||
# Step 5: Health checks
|
# Step 6: Health checks
|
||||||
# =========================================================
|
# =========================================================
|
||||||
step_health() {
|
step_health() {
|
||||||
info "Step 5: Health checks"
|
info "Step 6: Health checks"
|
||||||
|
|
||||||
wait_for_url "http://localhost:1250/health" "Server API" 60 3
|
wait_for_url "http://localhost:1250/health" "Server API" 60 3
|
||||||
echo ""
|
echo ""
|
||||||
@@ -292,6 +346,8 @@ main() {
|
|||||||
echo ""
|
echo ""
|
||||||
step_server_env
|
step_server_env
|
||||||
echo ""
|
echo ""
|
||||||
|
step_storage
|
||||||
|
echo ""
|
||||||
step_www_env
|
step_www_env
|
||||||
echo ""
|
echo ""
|
||||||
step_services
|
step_services
|
||||||
|
|||||||
@@ -171,11 +171,13 @@ async def set_workflow_error_status(transcript_id: NonEmptyString) -> bool:
|
|||||||
|
|
||||||
def _spawn_storage():
|
def _spawn_storage():
|
||||||
"""Create fresh storage instance."""
|
"""Create fresh storage instance."""
|
||||||
|
# TODO: replace direct AwsStorage construction with get_transcripts_storage() factory
|
||||||
return AwsStorage(
|
return AwsStorage(
|
||||||
aws_bucket_name=settings.TRANSCRIPT_STORAGE_AWS_BUCKET_NAME,
|
aws_bucket_name=settings.TRANSCRIPT_STORAGE_AWS_BUCKET_NAME,
|
||||||
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
||||||
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
||||||
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
||||||
|
aws_endpoint_url=settings.TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -49,11 +49,13 @@ async def pad_track(input: PaddingInput, ctx: Context) -> PadTrackResult:
|
|||||||
from reflector.settings import settings # noqa: PLC0415
|
from reflector.settings import settings # noqa: PLC0415
|
||||||
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
||||||
|
|
||||||
|
# TODO: replace direct AwsStorage construction with get_transcripts_storage() factory
|
||||||
storage = AwsStorage(
|
storage = AwsStorage(
|
||||||
aws_bucket_name=settings.TRANSCRIPT_STORAGE_AWS_BUCKET_NAME,
|
aws_bucket_name=settings.TRANSCRIPT_STORAGE_AWS_BUCKET_NAME,
|
||||||
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
||||||
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
||||||
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
||||||
|
aws_endpoint_url=settings.TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL,
|
||||||
)
|
)
|
||||||
|
|
||||||
source_url = await storage.get_file_url(
|
source_url = await storage.get_file_url(
|
||||||
|
|||||||
@@ -60,6 +60,7 @@ async def pad_track(input: TrackInput, ctx: Context) -> PadTrackResult:
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
# Create fresh storage instance to avoid aioboto3 fork issues
|
# Create fresh storage instance to avoid aioboto3 fork issues
|
||||||
|
# TODO: replace direct AwsStorage construction with get_transcripts_storage() factory
|
||||||
from reflector.settings import settings # noqa: PLC0415
|
from reflector.settings import settings # noqa: PLC0415
|
||||||
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
||||||
|
|
||||||
@@ -68,6 +69,7 @@ async def pad_track(input: TrackInput, ctx: Context) -> PadTrackResult:
|
|||||||
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
||||||
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
||||||
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
||||||
|
aws_endpoint_url=settings.TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL,
|
||||||
)
|
)
|
||||||
|
|
||||||
source_url = await storage.get_file_url(
|
source_url = await storage.get_file_url(
|
||||||
@@ -159,6 +161,7 @@ async def transcribe_track(input: TrackInput, ctx: Context) -> TranscribeTrackRe
|
|||||||
raise ValueError("Missing padded_key from pad_track")
|
raise ValueError("Missing padded_key from pad_track")
|
||||||
|
|
||||||
# Presign URL on demand (avoids stale URLs on workflow replay)
|
# Presign URL on demand (avoids stale URLs on workflow replay)
|
||||||
|
# TODO: replace direct AwsStorage construction with get_transcripts_storage() factory
|
||||||
from reflector.settings import settings # noqa: PLC0415
|
from reflector.settings import settings # noqa: PLC0415
|
||||||
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
from reflector.storage.storage_aws import AwsStorage # noqa: PLC0415
|
||||||
|
|
||||||
@@ -167,6 +170,7 @@ async def transcribe_track(input: TrackInput, ctx: Context) -> TranscribeTrackRe
|
|||||||
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
aws_region=settings.TRANSCRIPT_STORAGE_AWS_REGION,
|
||||||
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
aws_access_key_id=settings.TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID,
|
||||||
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
aws_secret_access_key=settings.TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY,
|
||||||
|
aws_endpoint_url=settings.TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL,
|
||||||
)
|
)
|
||||||
|
|
||||||
audio_url = await storage.get_file_url(
|
audio_url = await storage.get_file_url(
|
||||||
|
|||||||
@@ -49,6 +49,7 @@ class Settings(BaseSettings):
|
|||||||
TRANSCRIPT_STORAGE_AWS_REGION: str = "us-east-1"
|
TRANSCRIPT_STORAGE_AWS_REGION: str = "us-east-1"
|
||||||
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
|
TRANSCRIPT_STORAGE_AWS_ACCESS_KEY_ID: str | None = None
|
||||||
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
|
TRANSCRIPT_STORAGE_AWS_SECRET_ACCESS_KEY: str | None = None
|
||||||
|
TRANSCRIPT_STORAGE_AWS_ENDPOINT_URL: str | None = None
|
||||||
|
|
||||||
# Platform-specific recording storage (follows {PREFIX}_STORAGE_AWS_{CREDENTIAL} pattern)
|
# Platform-specific recording storage (follows {PREFIX}_STORAGE_AWS_{CREDENTIAL} pattern)
|
||||||
# Whereby storage configuration
|
# Whereby storage configuration
|
||||||
|
|||||||
@@ -53,6 +53,7 @@ class AwsStorage(Storage):
|
|||||||
aws_access_key_id: str | None = None,
|
aws_access_key_id: str | None = None,
|
||||||
aws_secret_access_key: str | None = None,
|
aws_secret_access_key: str | None = None,
|
||||||
aws_role_arn: str | None = None,
|
aws_role_arn: str | None = None,
|
||||||
|
aws_endpoint_url: str | None = None,
|
||||||
):
|
):
|
||||||
if not aws_bucket_name:
|
if not aws_bucket_name:
|
||||||
raise ValueError("Storage `aws_storage` require `aws_bucket_name`")
|
raise ValueError("Storage `aws_storage` require `aws_bucket_name`")
|
||||||
@@ -73,17 +74,26 @@ class AwsStorage(Storage):
|
|||||||
self._access_key_id = aws_access_key_id
|
self._access_key_id = aws_access_key_id
|
||||||
self._secret_access_key = aws_secret_access_key
|
self._secret_access_key = aws_secret_access_key
|
||||||
self._role_arn = aws_role_arn
|
self._role_arn = aws_role_arn
|
||||||
|
self._endpoint_url = aws_endpoint_url
|
||||||
|
|
||||||
self.aws_folder = ""
|
self.aws_folder = ""
|
||||||
if "/" in aws_bucket_name:
|
if "/" in aws_bucket_name:
|
||||||
self._bucket_name, self.aws_folder = aws_bucket_name.split("/", 1)
|
self._bucket_name, self.aws_folder = aws_bucket_name.split("/", 1)
|
||||||
self.boto_config = Config(retries={"max_attempts": 3, "mode": "adaptive"})
|
|
||||||
|
config_kwargs: dict = {"retries": {"max_attempts": 3, "mode": "adaptive"}}
|
||||||
|
if aws_endpoint_url:
|
||||||
|
config_kwargs["s3"] = {"addressing_style": "path"}
|
||||||
|
self.boto_config = Config(**config_kwargs)
|
||||||
|
|
||||||
self.session = aioboto3.Session(
|
self.session = aioboto3.Session(
|
||||||
aws_access_key_id=aws_access_key_id,
|
aws_access_key_id=aws_access_key_id,
|
||||||
aws_secret_access_key=aws_secret_access_key,
|
aws_secret_access_key=aws_secret_access_key,
|
||||||
region_name=aws_region,
|
region_name=aws_region,
|
||||||
)
|
)
|
||||||
self.base_url = f"https://{self._bucket_name}.s3.amazonaws.com/"
|
if aws_endpoint_url:
|
||||||
|
self.base_url = f"{aws_endpoint_url}/{self._bucket_name}/"
|
||||||
|
else:
|
||||||
|
self.base_url = f"https://{self._bucket_name}.s3.amazonaws.com/"
|
||||||
|
|
||||||
# Implement credential properties
|
# Implement credential properties
|
||||||
@property
|
@property
|
||||||
@@ -139,7 +149,9 @@ class AwsStorage(Storage):
|
|||||||
s3filename = f"{folder}/{filename}" if folder else filename
|
s3filename = f"{folder}/{filename}" if folder else filename
|
||||||
logger.info(f"Uploading {filename} to S3 {actual_bucket}/{folder}")
|
logger.info(f"Uploading {filename} to S3 {actual_bucket}/{folder}")
|
||||||
|
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
if isinstance(data, bytes):
|
if isinstance(data, bytes):
|
||||||
await client.put_object(Bucket=actual_bucket, Key=s3filename, Body=data)
|
await client.put_object(Bucket=actual_bucket, Key=s3filename, Body=data)
|
||||||
else:
|
else:
|
||||||
@@ -162,7 +174,9 @@ class AwsStorage(Storage):
|
|||||||
actual_bucket = bucket or self._bucket_name
|
actual_bucket = bucket or self._bucket_name
|
||||||
folder = self.aws_folder
|
folder = self.aws_folder
|
||||||
s3filename = f"{folder}/{filename}" if folder else filename
|
s3filename = f"{folder}/{filename}" if folder else filename
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
presigned_url = await client.generate_presigned_url(
|
presigned_url = await client.generate_presigned_url(
|
||||||
operation,
|
operation,
|
||||||
Params={"Bucket": actual_bucket, "Key": s3filename},
|
Params={"Bucket": actual_bucket, "Key": s3filename},
|
||||||
@@ -177,7 +191,9 @@ class AwsStorage(Storage):
|
|||||||
folder = self.aws_folder
|
folder = self.aws_folder
|
||||||
logger.info(f"Deleting {filename} from S3 {actual_bucket}/{folder}")
|
logger.info(f"Deleting {filename} from S3 {actual_bucket}/{folder}")
|
||||||
s3filename = f"{folder}/{filename}" if folder else filename
|
s3filename = f"{folder}/{filename}" if folder else filename
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
await client.delete_object(Bucket=actual_bucket, Key=s3filename)
|
await client.delete_object(Bucket=actual_bucket, Key=s3filename)
|
||||||
|
|
||||||
@handle_s3_client_errors("download")
|
@handle_s3_client_errors("download")
|
||||||
@@ -186,7 +202,9 @@ class AwsStorage(Storage):
|
|||||||
folder = self.aws_folder
|
folder = self.aws_folder
|
||||||
logger.info(f"Downloading {filename} from S3 {actual_bucket}/{folder}")
|
logger.info(f"Downloading {filename} from S3 {actual_bucket}/{folder}")
|
||||||
s3filename = f"{folder}/{filename}" if folder else filename
|
s3filename = f"{folder}/{filename}" if folder else filename
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
response = await client.get_object(Bucket=actual_bucket, Key=s3filename)
|
response = await client.get_object(Bucket=actual_bucket, Key=s3filename)
|
||||||
return await response["Body"].read()
|
return await response["Body"].read()
|
||||||
|
|
||||||
@@ -201,7 +219,9 @@ class AwsStorage(Storage):
|
|||||||
logger.info(f"Listing objects from S3 {actual_bucket} with prefix '{s3prefix}'")
|
logger.info(f"Listing objects from S3 {actual_bucket} with prefix '{s3prefix}'")
|
||||||
|
|
||||||
keys = []
|
keys = []
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
paginator = client.get_paginator("list_objects_v2")
|
paginator = client.get_paginator("list_objects_v2")
|
||||||
async for page in paginator.paginate(Bucket=actual_bucket, Prefix=s3prefix):
|
async for page in paginator.paginate(Bucket=actual_bucket, Prefix=s3prefix):
|
||||||
if "Contents" in page:
|
if "Contents" in page:
|
||||||
@@ -227,7 +247,9 @@ class AwsStorage(Storage):
|
|||||||
folder = self.aws_folder
|
folder = self.aws_folder
|
||||||
logger.info(f"Streaming {filename} from S3 {actual_bucket}/{folder}")
|
logger.info(f"Streaming {filename} from S3 {actual_bucket}/{folder}")
|
||||||
s3filename = f"{folder}/{filename}" if folder else filename
|
s3filename = f"{folder}/{filename}" if folder else filename
|
||||||
async with self.session.client("s3", config=self.boto_config) as client:
|
async with self.session.client(
|
||||||
|
"s3", config=self.boto_config, endpoint_url=self._endpoint_url
|
||||||
|
) as client:
|
||||||
await client.download_fileobj(
|
await client.download_fileobj(
|
||||||
Bucket=actual_bucket, Key=s3filename, Fileobj=fileobj
|
Bucket=actual_bucket, Key=s3filename, Fileobj=fileobj
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -319,3 +319,51 @@ def test_aws_storage_constructor_rejects_mixed_auth():
|
|||||||
aws_secret_access_key="test-secret",
|
aws_secret_access_key="test-secret",
|
||||||
aws_role_arn="arn:aws:iam::123456789012:role/test-role",
|
aws_role_arn="arn:aws:iam::123456789012:role/test-role",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_aws_storage_custom_endpoint_url():
|
||||||
|
"""Test that custom endpoint_url configures path-style addressing and passes endpoint to client."""
|
||||||
|
storage = AwsStorage(
|
||||||
|
aws_bucket_name="reflector-media",
|
||||||
|
aws_region="garage",
|
||||||
|
aws_access_key_id="GKtest",
|
||||||
|
aws_secret_access_key="secret",
|
||||||
|
aws_endpoint_url="http://garage:3900",
|
||||||
|
)
|
||||||
|
assert storage._endpoint_url == "http://garage:3900"
|
||||||
|
assert storage.boto_config.s3["addressing_style"] == "path"
|
||||||
|
assert storage.base_url == "http://garage:3900/reflector-media/"
|
||||||
|
# retries config preserved (merge, not replace)
|
||||||
|
assert storage.boto_config.retries["max_attempts"] == 3
|
||||||
|
|
||||||
|
mock_client = AsyncMock()
|
||||||
|
mock_client.put_object = AsyncMock()
|
||||||
|
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
|
||||||
|
mock_client.__aexit__ = AsyncMock(return_value=None)
|
||||||
|
mock_client.generate_presigned_url = AsyncMock(
|
||||||
|
return_value="http://garage:3900/reflector-media/test.txt"
|
||||||
|
)
|
||||||
|
|
||||||
|
with patch.object(
|
||||||
|
storage.session, "client", return_value=mock_client
|
||||||
|
) as mock_session_client:
|
||||||
|
await storage.put_file("test.txt", b"data")
|
||||||
|
mock_session_client.assert_called_with(
|
||||||
|
"s3", config=storage.boto_config, endpoint_url="http://garage:3900"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_aws_storage_none_endpoint_url():
|
||||||
|
"""Test that None endpoint preserves current AWS behavior."""
|
||||||
|
storage = AwsStorage(
|
||||||
|
aws_bucket_name="reflector-bucket",
|
||||||
|
aws_region="us-east-1",
|
||||||
|
aws_access_key_id="AKIAtest",
|
||||||
|
aws_secret_access_key="secret",
|
||||||
|
)
|
||||||
|
assert storage._endpoint_url is None
|
||||||
|
assert storage.base_url == "https://reflector-bucket.s3.amazonaws.com/"
|
||||||
|
# No s3 addressing_style override — boto_config should only have retries
|
||||||
|
assert not hasattr(storage.boto_config, "s3") or storage.boto_config.s3 is None
|
||||||
|
|||||||
Reference in New Issue
Block a user