build: move to uv (#488)

* build: move to uv

* build: add packages declaration

* build: move to python 3.12, as sentencespiece does not work on 3.13

* ci: remove pre-commit check, will be done in another branch.

* ci: fix name checkout

* ci: update lock and dockerfile

* test: remove event_loop, not needed in python 3.12

* test: updated test due to av returning AudioFrame with 4096 samples instead of 1024

* build: prevent using fastapi cli, because there is no way to set default port

I don't want to pass --port 1250 every time, so back on previous
approach. I deactivated auto-reload for production.

* ci: remove main.py

* test: fix quirck with httpx
This commit is contained in:
2025-07-16 18:10:11 -06:00
committed by GitHub
parent 4895160181
commit 86ce68651f
16 changed files with 3316 additions and 4797 deletions

View File

@@ -21,35 +21,26 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Install poetry
run: pipx install poetry
- name: Set up Python 3.x
uses: actions/setup-python@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: "3.11"
cache: "poetry"
cache-dependency-path: "server/poetry.lock"
- name: Install requirements
working-directory: ./server
run: |
poetry install --no-root
enable-cache: true
working-directory: server
- name: Test migrations from scratch
working-directory: ./server
working-directory: server
run: |
echo "Testing migrations from clean database..."
poetry run alembic upgrade head
uv run alembic upgrade head
echo "✅ Fresh migration successful"
- name: Test migration rollback and re-apply
working-directory: ./server
working-directory: server
run: |
echo "Testing rollback to base..."
poetry run alembic downgrade base
uv run alembic downgrade base
echo "✅ Rollback successful"
echo "Testing re-apply of all migrations..."
poetry run alembic upgrade head
uv run alembic upgrade head
echo "✅ Re-apply successful"

View File

@@ -17,56 +17,22 @@ jobs:
ports:
- 6379:6379
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry
- name: Set up Python 3.x
uses: actions/setup-python@v4
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
python-version: "3.11"
cache: "poetry"
cache-dependency-path: "server/poetry.lock"
- name: Install requirements
run: |
cd server
poetry install --no-root
enable-cache: true
working-directory: server
- name: Tests
run: |
cd server
poetry run python -m pytest -v tests
formatting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.x
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Validate formatting
run: |
pip install black
cd server
black --check reflector tests
linting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.x
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Validate formatting
run: |
pip install ruff
cd server
ruff check reflector tests
uv run -m pytest -v tests
docker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx

View File

@@ -1 +0,0 @@
3.11.6

174
CLAUDE.md Normal file
View File

@@ -0,0 +1,174 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Reflector is an AI-powered audio transcription and meeting analysis platform with real-time processing capabilities. The system consists of:
- **Frontend**: Next.js 14 React application (`www/`) with Chakra UI, real-time WebSocket integration
- **Backend**: Python FastAPI server (`server/`) with async database operations and background processing
- **Processing**: GPU-accelerated ML pipeline for transcription, diarization, summarization via Modal.com
- **Infrastructure**: Redis, PostgreSQL/SQLite, Celery workers, WebRTC streaming
## Development Commands
### Backend (Python) - `cd server/`
**Setup and Dependencies:**
```bash
# Install dependencies
uv sync
# Database migrations (first run or schema changes)
uv run alembic upgrade head
# Start services
docker compose up -d redis
```
**Development:**
```bash
# Start FastAPI server
uv run -m reflector.app --reload
# Start Celery worker for background tasks
uv run celery -A reflector.worker.app worker --loglevel=info
# Start Celery beat scheduler (optional, for cron jobs)
uv run celery -A reflector.worker.app beat
```
**Testing:**
```bash
# Run all tests with coverage
uv run pytest
# Run specific test file
uv run pytest tests/test_transcripts.py
# Run tests with verbose output
uv run pytest -v
```
**Process Audio Files:**
```bash
# Process local audio file manually
uv run python -m reflector.tools.process path/to/audio.wav
```
### Frontend (Next.js) - `cd www/`
**Setup:**
```bash
# Install dependencies
yarn install
# Copy configuration templates
cp .env_template .env
cp config-template.ts config.ts
```
**Development:**
```bash
# Start development server
yarn dev
# Generate TypeScript API client from OpenAPI spec
yarn openapi
# Lint code
yarn lint
# Format code
yarn format
# Build for production
yarn build
```
### Docker Compose (Full Stack)
```bash
# Start all services
docker compose up -d
# Start specific services
docker compose up -d redis server worker
```
## Architecture Overview
### Backend Processing Pipeline
The audio processing follows a modular pipeline architecture:
1. **Audio Input**: WebRTC streaming, file upload, or cloud recording ingestion
2. **Chunking**: Audio split into processable segments (`AudioChunkerProcessor`)
3. **Transcription**: Whisper or Modal.com GPU processing (`AudioTranscriptAutoProcessor`)
4. **Diarization**: Speaker identification (`AudioDiarizationAutoProcessor`)
5. **Text Processing**: Formatting, translation, topic detection
6. **Summarization**: AI-powered summaries and title generation
7. **Storage**: Database persistence with optional S3 backend
### Database Models
Core entities:
- `transcript`: Main table with processing results, summaries, topics, participants
- `meeting`: Live meeting sessions with consent management
- `room`: Virtual meeting spaces with configuration
- `recording`: Audio/video file metadata and processing status
### API Structure
All endpoints prefixed `/v1/`:
- `transcripts/` - CRUD operations for transcripts
- `transcripts_audio/` - Audio streaming and download
- `transcripts_webrtc/` - Real-time WebRTC endpoints
- `transcripts_websocket/` - WebSocket for live updates
- `meetings/` - Meeting lifecycle management
- `rooms/` - Virtual room management
### Frontend Architecture
- **App Router**: Next.js 14 with route groups for organization
- **State**: React Context pattern, no Redux
- **Real-time**: WebSocket integration for live transcription updates
- **Auth**: NextAuth.js with Authentik OAuth/OIDC provider
- **UI**: Chakra UI components with Tailwind CSS utilities
## Key Configuration
### Environment Variables
**Backend** (`server/.env`):
- `DATABASE_URL` - Database connection string
- `REDIS_URL` - Redis broker for Celery
- `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` - Modal.com GPU processing
- `WHEREBY_API_KEY` - Video platform integration
- `REFLECTOR_AUTH_BACKEND` - Authentication method (none, fief, jwt)
**Frontend** (`www/.env`):
- `NEXTAUTH_URL`, `NEXTAUTH_SECRET` - Authentication configuration
- `NEXT_PUBLIC_REFLECTOR_API_URL` - Backend API endpoint
- `REFLECTOR_DOMAIN_CONFIG` - Feature flags and domain settings
## Testing Strategy
- **Backend**: pytest with async support, HTTP client mocking, audio processing tests
- **Frontend**: No current test suite - opportunities for Jest/React Testing Library
- **Coverage**: Backend maintains test coverage reports in `htmlcov/`
## GPU Processing
Modal.com integration for scalable ML processing:
- Deploy changes: `modal run server/gpu/path/to/model.py`
- Requires Modal account with `REFLECTOR_GPU_APIKEY` secret
- Fallback to local processing when Modal unavailable
## Common Issues
- **Permissions**: Browser microphone access required in System Preferences
- **Audio Routing**: Use BlackHole (Mac) for merging multiple audio sources
- **WebRTC**: Ensure proper CORS configuration for cross-origin streaming
- **Database**: Run `uv run alembic upgrade head` after pulling schema changes

View File

@@ -106,25 +106,19 @@ yarn openapi
Start with `cd server`.
**Installation**
```bash
poetry install
```
**Run in development mode**
```bash
docker compose up -d redis
# on the first run, or if the schemas changed
poetry run alembic upgrade head
uv run alembic upgrade head
# start the worker
poetry run celery -A reflector.worker.app worker --loglevel=info
uv run celery -A reflector.worker.app worker --loglevel=info
# start the app
poetry run python -m reflector.app
uv run -m reflector.app --reload
```
Then fill `.env` with the omitted values (ask in Zulip).
@@ -134,7 +128,7 @@ Then fill `.env` with the omitted values (ask in Zulip).
For crontab (only healthcheck for now), start the celery beat (you don't need it on your local dev environment):
```bash
poetry run celery -A reflector.worker.app beat
uv run celery -A reflector.worker.app beat
```
### GPU models
@@ -153,5 +147,5 @@ To deploy llm changes to modal, you need:
You can manually process an audio file by calling the process tool:
```bash
poetry run python -m reflector.tools.process path/to/audio.wav
uv run python -m reflector.tools.process path/to/audio.wav
```

View File

@@ -1 +1 @@
3.11.6
3.12

View File

@@ -1,30 +1,25 @@
FROM python:3.11-slim as base
FROM python:3.12-slim
ENV PIP_DEFAULT_TIMEOUT=100 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
POETRY_VERSION=1.3.1
ENV PYTHONUNBUFFERED=1 \
UV_LINK_MODE=copy
# builder install base dependencies
FROM base AS builder
WORKDIR /tmp
RUN pip install "poetry==$POETRY_VERSION"
RUN python -m venv /venv
RUN apt-get update && apt-get install -y curl && apt-get clean
ADD https://astral.sh/uv/install.sh /uv-installer.sh
RUN sh /uv-installer.sh && rm /uv-installer.sh
ENV PATH="/root/.local/bin/:$PATH"
# install application dependencies
COPY pyproject.toml poetry.lock /tmp
RUN . /venv/bin/activate && poetry config virtualenvs.create false
RUN . /venv/bin/activate && poetry install --only main,aws --no-root --no-interaction --no-ansi
RUN mkdir -p /app
WORKDIR /app
COPY pyproject.toml uv.lock /app/
RUN touch README.md && env uv sync --compile-bytecode --locked
# bootstrap
FROM base AS final
COPY --from=builder /venv /venv
RUN mkdir -p /app
COPY reflector /app/reflector
COPY migrations /app/migrations
COPY images /app/images
COPY alembic.ini runserver.sh /app/
COPY images /app/images
COPY migrations /app/migrations
COPY reflector /app/reflector
WORKDIR /app
CMD ["./runserver.sh"]

4607
server/poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,76 +1,82 @@
[tool.poetry]
name = "reflector-server"
[project]
name = "reflector"
version = "0.1.0"
description = ""
authors = ["Monadical team <ops@monadical.com>"]
authors = [{ name = "Monadical team", email = "ops@monadical.com" }]
requires-python = ">=3.11, <3.13"
readme = "README.md"
packages = []
dependencies = [
"aiohttp>=3.9.0",
"aiohttp-cors>=0.7.0",
"av>=10.0.0",
"requests>=2.31.0",
"aiortc>=1.5.0",
"sortedcontainers>=2.4.0",
"loguru>=0.7.0",
"pydantic-settings>=2.0.2",
"structlog>=23.1.0",
"uvicorn[standard]>=0.23.1",
"fastapi[standard]>=0.100.1",
"sentry-sdk[fastapi]>=1.29.2",
"httpx>=0.24.1",
"fastapi-pagination>=0.12.6",
"databases[aiosqlite, asyncpg]>=0.7.0",
"sqlalchemy<1.5",
"fief-client[fastapi]>=0.17.0",
"alembic>=1.11.3",
"nltk>=3.8.1",
"prometheus-fastapi-instrumentator>=6.1.0",
"sentencepiece>=0.1.99",
"protobuf>=4.24.3",
"profanityfilter>=2.0.6",
"celery>=5.3.4",
"redis>=5.0.1",
"python-jose[cryptography]>=3.3.0",
"python-multipart>=0.0.6",
"faster-whisper>=0.10.0",
"transformers>=4.36.2",
"black==24.1.1",
"jsonschema>=4.23.0",
"openai>=1.59.7",
]
[tool.poetry.dependencies]
python = "^3.11"
aiohttp = "^3.9.0"
aiohttp-cors = "^0.7.0"
av = "^10.0.0"
requests = "^2.31.0"
aiortc = "^1.5.0"
sortedcontainers = "^2.4.0"
loguru = "^0.7.0"
pydantic-settings = "^2.0.2"
structlog = "^23.1.0"
uvicorn = {extras = ["standard"], version = "^0.23.1"}
fastapi = "^0.100.1"
sentry-sdk = {extras = ["fastapi"], version = "^1.29.2"}
httpx = "^0.24.1"
fastapi-pagination = "^0.12.6"
databases = {extras = ["aiosqlite", "asyncpg"], version = "^0.7.0"}
sqlalchemy = "<1.5"
fief-client = {extras = ["fastapi"], version = "^0.17.0"}
alembic = "^1.11.3"
nltk = "^3.8.1"
prometheus-fastapi-instrumentator = "^6.1.0"
sentencepiece = "^0.1.99"
protobuf = "^4.24.3"
profanityfilter = "^2.0.6"
celery = "^5.3.4"
redis = "^5.0.1"
python-jose = {extras = ["cryptography"], version = "^3.3.0"}
python-multipart = "^0.0.6"
faster-whisper = "^0.10.0"
transformers = "^4.36.2"
black = "24.1.1"
jsonschema = "^4.23.0"
openai = "^1.59.7"
[dependency-groups]
dev = [
"black>=24.1.1",
"stamina>=23.1.0",
"pyinstrument>=4.6.1",
]
tests = [
"pytest-cov>=4.1.0",
"pytest-aiohttp>=1.0.4",
"pytest-asyncio>=0.21.1",
"pytest>=7.4.0",
"httpx-ws>=0.4.1",
"pytest-httpx>=0.23.1",
"pytest-celery>=0.0.0",
]
aws = ["aioboto3>=11.2.0"]
evaluation = [
"jiwer>=3.0.2",
"levenshtein>=0.21.1",
"tqdm>=4.66.0",
"pydantic>=2.1.1",
]
[tool.poetry.group.dev.dependencies]
black = "^24.1.1"
stamina = "^23.1.0"
pyinstrument = "^4.6.1"
[tool.poetry.group.tests.dependencies]
pytest-cov = "^4.1.0"
pytest-aiohttp = "^1.0.4"
pytest-asyncio = "^0.21.1"
pytest = "^7.4.0"
httpx-ws = "^0.4.1"
pytest-httpx = "^0.23.1"
pytest-celery = "^0.0.0"
[tool.poetry.group.aws.dependencies]
aioboto3 = "^11.2.0"
[tool.poetry.group.evaluation.dependencies]
jiwer = "^3.0.2"
levenshtein = "^0.21.1"
tqdm = "^4.66.0"
pydantic = "^2.1.1"
[tool.uv]
default-groups = [
"dev",
"tests",
"aws",
"evaluation",
]
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["reflector"]
[tool.coverage.run]
source = ["reflector"]

View File

@@ -147,6 +147,10 @@ if settings.PROFILING:
if __name__ == "__main__":
import sys
import uvicorn
uvicorn.run("reflector.app:app", host="0.0.0.0", port=1250, reload=True)
should_reload = "--reload" in sys.argv
uvicorn.run("reflector.app:app", host="0.0.0.0", port=1250, reload=should_reload)

View File

@@ -1,2 +0,0 @@
poetry run python3 -m reflector.app

View File

@@ -1,16 +1,12 @@
#!/bin/bash
if [ -f "/venv/bin/activate" ]; then
source /venv/bin/activate
fi
if [ "${ENTRYPOINT}" = "server" ]; then
alembic upgrade head
python -m reflector.app
uv run alembic upgrade head
uv run -m reflector.app
elif [ "${ENTRYPOINT}" = "worker" ]; then
celery -A reflector.worker.app worker --loglevel=info
uv run celery -A reflector.worker.app worker --loglevel=info
elif [ "${ENTRYPOINT}" = "beat" ]; then
celery -A reflector.worker.app beat --loglevel=info
uv run celery -A reflector.worker.app beat --loglevel=info
else
echo "Unknown command"
fi

View File

@@ -84,7 +84,7 @@ from unittest import mock
],
)
@pytest.mark.asyncio
async def test_processors_audio_diarization(event_loop, name, diarization, expected):
async def test_processors_audio_diarization(name, diarization, expected):
from reflector.processors.audio_diarization import AudioDiarizationProcessor
from reflector.processors.types import (
TitleSummaryWithId,

View File

@@ -3,7 +3,6 @@ import pytest
@pytest.mark.asyncio
async def test_basic_process(
event_loop,
nltk,
dummy_transcript,
dummy_llm,
@@ -34,8 +33,8 @@ async def test_basic_process(
print(marks)
# validate the events
assert marks["TranscriptLinerProcessor"] == 4
assert marks["TranscriptTranslatorProcessor"] == 4
assert marks["TranscriptLinerProcessor"] == 1
assert marks["TranscriptTranslatorProcessor"] == 1
assert marks["TranscriptTopicDetectorProcessor"] == 1
assert marks["TranscriptFinalSummaryProcessor"] == 1
assert marks["TranscriptFinalTitleProcessor"] == 1

View File

@@ -1,28 +1,27 @@
import asyncio
import pytest
import httpx
import pytest
from reflector.utils.retry import (
retry,
RetryTimeoutException,
RetryHTTPException,
RetryException,
RetryHTTPException,
RetryTimeoutException,
retry,
)
@pytest.mark.asyncio
async def test_retry_redirect(httpx_mock):
async def custom_response(request: httpx.Request):
if request.url.path == "/hello":
await asyncio.sleep(1)
return httpx.Response(
status_code=303, headers={"location": "https://test_url/redirected"}
httpx_mock.add_response(
url="https://test_url/hello",
status_code=303,
headers={"location": "https://test_url/redirected"},
)
httpx_mock.add_response(
url="https://test_url/redirected",
status_code=200,
json={"hello": "world"},
)
elif request.url.path == "/redirected":
return httpx.Response(status_code=200, json={"hello": "world"})
else:
raise Exception("Unexpected path")
httpx_mock.add_callback(custom_response)
async with httpx.AsyncClient() as client:
# timeout should not triggered, as it will end up ok
# even though the first request is a 303 and took more that 0.5
@@ -37,7 +36,7 @@ async def test_retry_redirect(httpx_mock):
@pytest.mark.asyncio
async def test_retry_httpx(httpx_mock):
# this code should be force a retry
httpx_mock.add_response(status_code=500)
httpx_mock.add_response(status_code=500, is_reusable=True)
async with httpx.AsyncClient() as client:
with pytest.raises(RetryTimeoutException):
await retry(client.get)("https://test_url", retry_timeout=0.1)

3005
server/uv.lock generated Normal file

File diff suppressed because it is too large Load Diff