How to filter job postings with LLM Agents¶
This notebook demonstrates using everyrow's screen() utility to filter job postings by semantic criteria that traditional regex/keyword matching struggles with.
Use Case: Filter job postings from a "Who's Hiring" thread to find only those that meet ALL of:
- Remote-friendly (explicitly allows remote/hybrid/distributed work)
- Senior-level (title or requirements indicate 5+ years experience)
- Salary disclosed (specific compensation figures, not "competitive" or "DOE")
Why everyrow? Traditional keyword matching achieves ~68% precision on this task. Semantic screening with everyrow achieves >90% precision by understanding context and intent.
In [1]:
import asyncio
from dotenv import load_dotenv
load_dotenv()
import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import screen
Load Job Posting Data¶
In [2]:
job_postings = pd.read_csv("../data/job_postings.csv")
print(f"Loaded {len(job_postings)} job postings")
job_postings.head()
Out[2]:
Define Screening Schema¶
We use a Pydantic model to structure the screening output.
In [3]:
class JobScreeningResult(BaseModel):
"""Schema for job posting screening results."""
passes: bool = Field(
description="Whether the job posting meets ALL three criteria"
)
is_remote_friendly: bool = Field(
description="Whether the posting explicitly allows remote/hybrid/distributed work"
)
is_senior_level: bool = Field(
description="Whether the role is senior-level (5+ years or Senior/Staff/Lead/Principal title)"
)
has_salary_disclosed: bool = Field(
description="Whether specific salary figures are provided (not 'competitive' or 'DOE')"
)
reasoning: str = Field(
description="Brief explanation of the screening decision"
)
Define Screening Task¶
In [4]:
SCREENING_TASK = """
Screen job postings to find roles that meet ALL THREE of the following criteria:
1. **Remote-friendly**: The posting explicitly allows remote, hybrid, distributed, or
work-from-anywhere arrangements. "On-site only" or no mention of remote = fail.
2. **Senior-level**: The role is for experienced professionals. This means EITHER:
- Title includes Senior, Staff, Lead, Principal, Director, or Architect
- Requirements explicitly state 5+ years of experience
Junior roles or roles requiring <5 years = fail.
3. **Salary disclosed**: The posting includes specific compensation figures (dollar amounts,
salary ranges, or equivalent). Vague terms like "competitive", "DOE", "top of market",
"TBD", or "equity only" = fail.
A posting only PASSES if it meets ALL THREE criteria.
"""
Run the Screening¶
In [5]:
async def run_screening():
async with create_session(name="Job Posting Screening") as session:
print(f"Session URL: {session.get_url()}")
result = await screen(
session=session,
task=SCREENING_TASK,
input=job_postings,
response_model=JobScreeningResult,
)
return result.data
results_df = await run_screening()
Analyze Results¶
In [6]:
# Filter to passing jobs
passing_jobs = results_df[results_df["passes"] == True]
print(f"\n{'='*60}")
print(f"RESULTS: {len(passing_jobs)} of {len(results_df)} jobs passed all criteria")
print(f"{'='*60}\n")
print("QUALIFIED POSTINGS:")
print("-" * 40)
for _, row in passing_jobs.iterrows():
print(f" {row['company']:20} | {row['title']}")
print(f" {row['location']}")
print()
In [7]:
# Show breakdown
print("\nSCREENING SUMMARY:")
print(f" Total postings: {len(results_df)}")
print(f" Passed: {results_df['passes'].sum()}")
print(f" Failed: {(~results_df['passes']).sum()}")
In [8]:
# Show full results
results_df[["company", "title", "location", "passes"]]
Out[8]: