Add LLM filtering pattern, .env.example, and workflows/lib
- Add .env.example with LLM_API_URL, LLM_MODEL, LLM_API_KEY - Add .gitignore to exclude .env - Add Pattern 5 (LLM filtering) to notebook-patterns.md - Track workflows/lib with llm_call helper using mirascope - Update README with LLM setup step and updated project structure
This commit is contained in:
@@ -425,6 +425,89 @@ def display_timeline(timeline_df):
|
||||
timeline_df
|
||||
```
|
||||
|
||||
## Pattern 5: LLM Filtering with `lib.llm`
|
||||
|
||||
When you need to classify, score, or extract structured information from each entity (e.g. "is this meeting about project X?", "rate the relevance of this email"), use the `llm_call` helper from `workflows/lib`. It sends each item to an LLM and parses the response into a typed Pydantic model.
|
||||
|
||||
**Prerequisites:** Copy `.env.example` to `.env` and fill in your `LLM_API_KEY`. Add `mirascope` and `pydantic` to the notebook's PEP 723 dependencies.
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.12"
|
||||
# dependencies = [
|
||||
# "marimo",
|
||||
# "httpx",
|
||||
# "polars",
|
||||
# "mirascope",
|
||||
# "pydantic",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
### Setup cell — import `llm_call`
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def setup():
|
||||
import httpx
|
||||
import marimo as mo
|
||||
import polars as pl
|
||||
from lib.llm import llm_call
|
||||
client = httpx.Client(timeout=30)
|
||||
return (client, llm_call, mo, pl,)
|
||||
```
|
||||
|
||||
### Define a response model
|
||||
|
||||
Create a Pydantic model that describes the structured output you want from the LLM:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
def models():
|
||||
from pydantic import BaseModel
|
||||
|
||||
class RelevanceScore(BaseModel):
|
||||
relevant: bool
|
||||
reason: str
|
||||
score: int # 0-10
|
||||
|
||||
return (RelevanceScore,)
|
||||
```
|
||||
|
||||
### Filter entities through the LLM
|
||||
|
||||
Iterate over fetched entities and call `llm_call` for each one. Since `llm_call` is async, use `asyncio.gather` to process items concurrently:
|
||||
|
||||
```python
|
||||
@app.cell
|
||||
async def llm_filter(meetings, llm_call, RelevanceScore, pl, mo):
|
||||
import asyncio
|
||||
|
||||
_topic = "Greyhaven"
|
||||
|
||||
async def _score(meeting):
|
||||
_text = meeting.get("summary") or meeting.get("title") or ""
|
||||
_result = await llm_call(
|
||||
prompt=f"Is this meeting about '{_topic}'?\n\nMeeting: {_text}",
|
||||
response_model=RelevanceScore,
|
||||
system_prompt="Score the relevance of this meeting to the given topic. Set relevant=true if score >= 5.",
|
||||
)
|
||||
return {**meeting, "llm_relevant": _result.relevant, "llm_reason": _result.reason, "llm_score": _result.score}
|
||||
|
||||
scored_meetings = await asyncio.gather(*[_score(_m) for _m in meetings])
|
||||
relevant_meetings = [_m for _m in scored_meetings if _m["llm_relevant"]]
|
||||
|
||||
mo.md(f"**LLM filter:** {len(relevant_meetings)}/{len(meetings)} meetings relevant to '{_topic}'")
|
||||
return (relevant_meetings,)
|
||||
```
|
||||
|
||||
### Tips for LLM filtering
|
||||
|
||||
- **Keep prompts short** — only include the fields the LLM needs (title, summary, snippet), not the entire raw entity.
|
||||
- **Use structured output** — always pass a `response_model` so you get typed fields back, not free-text.
|
||||
- **Batch wisely** — `asyncio.gather` sends all requests concurrently. For large datasets (100+ items), process in chunks to avoid rate limits.
|
||||
- **Cache results** — LLM calls are slow and cost money. If iterating on a notebook, consider storing scored results in a cell variable so you don't re-score on every edit.
|
||||
|
||||
## Do / Don't — Quick Reference for LLM Agents
|
||||
|
||||
When generating marimo notebooks, follow these rules strictly. Violations cause `MultipleDefinitionError` at runtime.
|
||||
|
||||
Reference in New Issue
Block a user