How to Classify DataFrame Rows with an LLM
Labeling data with an LLM at scale requires orchestration and can get very expensive. EveryRow can classify each row of a dataframe using LLMs or LLM web agents at low cost, by handling the batching, parallelism, task queues, error handling, and consistency, in a single function call.
We run evals to find the pareto frontier for classification tasks, getting you the most accuracy for your dollar.
Here, we classify 200 job postings into 9 categories in 2 minutes for $1.74.
| Metric | Value |
|---|---|
| Rows | 200 |
| Time | 2.1 minutes |
| Cost | $1.74 |
| Cost per row | $0.009 |
If you're categorizing support tickets, labeling training data, or tagging content by topic, string heuristic or embedding techniques are low accuracy, but training a model is a very high lift. LLMs make it possible to solve this efficiently.
Walkthrough
The agent_map function processes each row in parallel with structured output via Pydantic models. You define the schema, describe the task, and get back a DataFrame with your new columns. Download hn_jobs.csv to follow along.
pip install everyrow
export EVERYROW_API_KEY=your_key_here # Get one at everyrow.io
import asyncio
from typing import Literal
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import agent_map
class JobClassification(BaseModel):
category: Literal[
"backend", "frontend", "fullstack", "data",
"ml_ai", "devops_sre", "mobile", "security", "other"
] = Field(description="Primary role category")
reasoning: str = Field(description="Why this category was chosen")
async def main():
jobs = pd.read_csv("hn_jobs.csv")
result = await agent_map(
task="""Classify this job posting by primary role:
- backend: Server-side, API development
- frontend: UI, web development
- fullstack: Both frontend and backend
- data: Data engineering, pipelines, analytics
- ml_ai: Machine learning, AI, deep learning
- devops_sre: Infrastructure, platform engineering
- mobile: iOS, Android development
- security: Security engineering
- other: Product, design, management, etc.
""",
input=jobs,
response_model=JobClassification,
)
print(result.data[["id", "category", "reasoning"]])
asyncio.run(main())
The output DataFrame includes your original columns plus category and reasoning:
id category reasoning
0 46469380 fullstack Role spans React frontend and Django backend...
1 46134153 fullstack Title is "Fullstack Engineer (with DevOps focus)"...
2 46113062 backend Company builds API platform tooling...
3 46467458 ml_ai First role listed is ML Engineer...
4 46466466 other Primary role is Founding Product Manager...
Constraining output values
Use Python's Literal type to restrict classifications to specific values:
category: Literal["positive", "negative", "neutral"]
The LLM is constrained to only return values from this set. No post-processing or validation needed.
Adding confidence scores
Extend the response model to capture uncertainty:
class Classification(BaseModel):
category: Literal["spam", "ham"] = Field(description="Email classification")
confidence: Literal["high", "medium", "low"] = Field(
description="How confident the classification is"
)
signals: str = Field(description="Key signals that drove the decision")
Multi-label classification
For cases where multiple labels can apply, use a list:
class MultiLabel(BaseModel):
tags: list[str] = Field(description="All applicable tags for this item")
primary_tag: str = Field(description="The most relevant tag")
Adding in web research agents
Choosing the right LLM, and handling the batching, parallelism, and retries is not that hard when there is no web search. But when you want to use the web as part of your classification, e.g. looking at the wikipedia page for entities, cost and complexity can spiral.
EveryRow supports this natively. And we tune our web research to be as efficient as possible, classifying rows for as little as $0.05/row, though it can cost more if the research is more involved.
And without web research agents, as in the example at the top, we can classify data for ~$0.009 per row, or 10,000 rows for ~$90. The exact cost depends on input length and the complexity of your response model. Short inputs with simple schemas cost less; long documents with detailed reasoning cost more.
| Rows | Estimated Cost | Estimated Time |
|---|---|---|
| 100 | ~$1 | ~1 min |
| 1,000 | ~$9 | ~5 min |
| 10,000 | ~$90 | ~30 min |
You can visualize the results at the output URL and see latency and cost numbers. The first $20 of processing is free with no credit card required.