How to Filter a DataFrame with an LLM
Here we show how to filter a pandas dataframe by qualitative criteria, when normal filtering like df[df['column'] == value] won't work.
LLMs, and LLM-web-agents, can evaluate qualitative criteria at high accuracy. But they can be very expensive and difficult to orchestrate at scale. We provide a low cost solution by handling the orchestration, batching, and consistency checking.
This guide shows how to filter 3,616 job postings for "remote-friendly, senior-level roles with disclosed salary" in 10 minutes for $4.24.
| Metric | Value |
|---|---|
| Rows processed | 3,616 |
| Rows passing filter | 216 (6.0%) |
| Total cost | $4.24 |
| Time | 9.9 minutes |
| Cost per row | $0.001 |
In this example, we want to check job postings for three criteria:
- Remote-friendly
- Senior level
- Salary is disclosed
None of these can be done without intelligence, by, e.g.
# This matches "No remote work available"
df[df['posting'].str.contains('remote', case=False)]
What you need is a filter that understands: this posting explicitly allows remote work, requires senior experience, and states a specific salary number.
We use a dataset of 3,616 job postings from Hacker News "Who's Hiring" threads, 10% of all posts every month since March 2020 through January 2026. Download hn_jobs.csv to follow along.
pip install everyrow
export EVERYROW_API_KEY=your_key_here # Get one at everyrow.io/api-key
import asyncio
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import screen
jobs = pd.read_csv("hn_jobs.csv") # 3,616 job postings
class JobScreenResult(BaseModel):
qualifies: bool = Field(description="True if meets ALL criteria")
async def main():
result = await screen(
task="""
A job posting qualifies if it meets ALL THREE criteria:
1. Remote-friendly: Explicitly allows remote work, hybrid, WFH,
distributed teams, or "work from anywhere".
2. Senior-level: Title contains Senior/Staff/Lead/Principal/Architect,
OR requires 5+ years experience, OR mentions "founding engineer".
3. Salary disclosed: Specific compensation numbers are mentioned.
"$150K-200K" qualifies. "Competitive" or "DOE" does not.
""",
input=jobs,
response_model=JobScreenResult,
)
qualified = result.data
print(f"Qualified: {len(qualified)} of {len(jobs)}")
return qualified
qualified_jobs = asyncio.run(main())
The screen operation evaluates each row against the natural language criteria and returns only the rows that pass. Out of 3,616 postings, 216 qualified (6.0%). View the session.
Interestingly, the data reveals a clear trend in tech hiring practices over the pandemic years:
| Year | Qualified | Total | Pass Rate |
|---|---|---|---|
| 2020 | 10 | 594 | 1.7% |
| 2021 | 27 | 1,033 | 2.6% |
| 2022 | 36 | 758 | 4.7% |
| 2023 | 39 | 412 | 9.5% |
| 2024 | 39 | 387 | 10.1% |
| 2025 | 59 | 406 | 14.5% |
| 2026 | 6 | 26 | 23.1% |
In early 2020, only 1.7% of job postings met all three criteria. By 2025, that number reached 14.5%. More companies now offer remote work, disclose salaries upfront, and hire senior engineers.
Some examples:
Bloomberg | Senior Software Engineer | Hybrid (NYC) | $160k - $240k USD + bonus
KoBold Metals | Senior Infrastructure Engineer | Remote (USA) | $170k - $230k
EnergyHub | Director of Engineering | Remote (US) | Salary $225k
Gladly | Staff Software Engineer | Remote (US, Colombia) | $60k–$215k + Equity
Built with everyrow. See the screen documentation for more options including batch size tuning and async execution.