Everyrow
Getting Started
  • Installation
  • Skills vs MCP
Guides
  • How to Add A Column to a DataFrame with Web Research
  • How to Classify and Label Data with an LLM in Python
  • Remove Duplicates from ML Training Data in Python
  • Filter a Pandas DataFrame with LLMs
  • How to Fuzzy Join DataFrames in Python
  • How to sort a dataset using web data in Python
  • How to resolve duplicate rows in Python with LLMs
API Reference
  • dedupe
  • merge
  • rank
  • agent_map
  • screen
Case Studies
  • Build an AI lead qualification pipeline in Python
  • Fuzzy join two Pandas DataFrames using LLMs
  • Fuzzy match and merge contact lists in Python
  • How to filter job postings with LLM Agents
  • How to merge datasets without common ID in Python
  • How to score and prioritize leads with AI in Python
  • How to Screen Stocks in Python with AI Agents
  • How to use LLMs to deduplicate CRM Data
  • LLM-powered Merging at Scale
  • LLM-powered Screening at Scale
  • Python Notebook to screen stocks using AI Agents
  • Running LLM Web Research Agents at Scale
  • Score and rank leads without a CRM in Python
  • Use LLM Agents to research government data at scale

How to filter job postings with LLM Agents¶

This notebook demonstrates using everyrow's screen() utility to filter job postings by semantic criteria that traditional regex/keyword matching struggles with.

Use Case: Filter job postings from a "Who's Hiring" thread to find only those that meet ALL of:

  1. Remote-friendly (explicitly allows remote/hybrid/distributed work)
  2. Senior-level (title or requirements indicate 5+ years experience)
  3. Salary disclosed (specific compensation figures, not "competitive" or "DOE")

Why everyrow? Traditional keyword matching achieves ~68% precision on this task. Semantic screening with everyrow achieves >90% precision by understanding context and intent.

In [1]:
import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import screen

Load Job Posting Data¶

In [2]:
job_postings = pd.read_csv("../data/job_postings.csv")

print(f"Loaded {len(job_postings)} job postings")
job_postings.head()
Loaded 15 job postings
Out[2]:
company title location description
0 TechCorp Senior Backend Engineer Remote (US) We're looking for a senior backend engineer wi...
1 StartupXYZ Full Stack Developer San Francisco, CA Join our fast-growing team! 2+ years experienc...
2 DataDriven Inc Staff Data Scientist Hybrid (NYC) Staff-level data scientist needed. 8+ years ML...
3 CloudFirst Junior DevOps Engineer Remote Entry level DevOps role. 0-2 years experience....
4 Enterprise Solutions Principal Architect On-site Boston Principal architect for our platform team. 15+...

Define Screening Schema¶

We use a Pydantic model to structure the screening output.

In [3]:
class JobScreeningResult(BaseModel):
    """Schema for job posting screening results."""
    passes: bool = Field(
        description="Whether the job posting meets ALL three criteria"
    )
    is_remote_friendly: bool = Field(
        description="Whether the posting explicitly allows remote/hybrid/distributed work"
    )
    is_senior_level: bool = Field(
        description="Whether the role is senior-level (5+ years or Senior/Staff/Lead/Principal title)"
    )
    has_salary_disclosed: bool = Field(
        description="Whether specific salary figures are provided (not 'competitive' or 'DOE')"
    )
    reasoning: str = Field(
        description="Brief explanation of the screening decision"
    )

Define Screening Task¶

In [4]:
SCREENING_TASK = """
Screen job postings to find roles that meet ALL THREE of the following criteria:

1. **Remote-friendly**: The posting explicitly allows remote, hybrid, distributed, or 
   work-from-anywhere arrangements. "On-site only" or no mention of remote = fail.

2. **Senior-level**: The role is for experienced professionals. This means EITHER:
   - Title includes Senior, Staff, Lead, Principal, Director, or Architect
   - Requirements explicitly state 5+ years of experience
   Junior roles or roles requiring <5 years = fail.

3. **Salary disclosed**: The posting includes specific compensation figures (dollar amounts,
   salary ranges, or equivalent). Vague terms like "competitive", "DOE", "top of market",
   "TBD", or "equity only" = fail.

A posting only PASSES if it meets ALL THREE criteria.
"""

Run the Screening¶

In [5]:
async def run_screening():
    async with create_session(name="Job Posting Screening") as session:
        print(f"Session URL: {session.get_url()}")
        
        result = await screen(
            session=session,
            task=SCREENING_TASK,
            input=job_postings,
            response_model=JobScreeningResult,
        )
        
        return result.data

results_df = await run_screening()
Session URL: https://everyrow.io/sessions/3ec69130-f011-49b8-abb8-3779dcfaa204

Analyze Results¶

In [6]:
# Filter to passing jobs
passing_jobs = results_df[results_df["passes"] == True]

print(f"\n{'='*60}")
print(f"RESULTS: {len(passing_jobs)} of {len(results_df)} jobs passed all criteria")
print(f"{'='*60}\n")

print("QUALIFIED POSTINGS:")
print("-" * 40)
for _, row in passing_jobs.iterrows():
    print(f"  {row['company']:20} | {row['title']}")
    print(f"  {row['location']}")
    print()
============================================================
RESULTS: 7 of 7 jobs passed all criteria
============================================================

QUALIFIED POSTINGS:
----------------------------------------
  TechCorp             | Senior Backend Engineer
  Remote (US)

  DataDriven Inc       | Staff Data Scientist
  Hybrid (NYC)

  RemoteFirst Co       | Lead Frontend Engineer
  100% Remote, Anywhere

  FinTech Pro          | Senior Security Engineer
  Remote (EU timezone)

  HealthTech           | Senior Product Manager
  Distributed team

  MegaCorp             | Staff SRE
  Hybrid (Seattle)

  EdTech Plus          | Senior iOS Developer
  Remote first

In [7]:
# Show breakdown
print("\nSCREENING SUMMARY:")
print(f"  Total postings:  {len(results_df)}")
print(f"  Passed:          {results_df['passes'].sum()}")
print(f"  Failed:          {(~results_df['passes']).sum()}")
SCREENING SUMMARY:
  Total postings:  7
  Passed:          7
  Failed:          0
In [8]:
# Show full results
results_df[["company", "title", "location", "passes"]]
Out[8]:
company title location passes
0 TechCorp Senior Backend Engineer Remote (US) True
1 DataDriven Inc Staff Data Scientist Hybrid (NYC) True
2 RemoteFirst Co Lead Frontend Engineer 100% Remote, Anywhere True
3 FinTech Pro Senior Security Engineer Remote (EU timezone) True
4 HealthTech Senior Product Manager Distributed team True
5 MegaCorp Staff SRE Hybrid (Seattle) True
6 EdTech Plus Senior iOS Developer Remote first True