Running LLM Web Research Agents at Scale¶
The everyrow agent_map() function runs an LLM web research agent on every row of a dataframe. This notebook demonstrates how this scales to running 10,000 agents, each of which consists of many LLM calls.
The total cost was ~$0.11/row, reflecting that there can be more than a few LLM calls involved in each row's agent.
Example: Researching 10,000 Drug Products¶
This example takes a dataset of 10,000 drug product entries (trade name, ingredient, applicant, strength, dosage form) and determines each product's current FDA regulatory status. Determining regulatory status requires researching each product individually against FDA databases, Orange Book listings, Federal Register notices, and other sources. Some products have straightforward histories while others have complex timelines involving tentative approvals, voluntary withdrawals, or transitions between marketed and not marketed status.
This run achieved a 99.97% success rate (9,997 of 10,000 rows returned results). For evals on agent accuracy, see evals.futuresearch.ai or our papers.
Load Data¶
from dotenv import load_dotenv
import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import agent_map
pd.set_option("display.max_colwidth", None)
load_dotenv()
input_df = pd.read_csv("regulatory_status_results.csv", usecols=["row_id", "trade_name", "ingredient", "applicant", "strength", "dosage_form"])
print(f"{len(input_df):,} drug products")
print(f"Columns: {list(input_df.columns)}")
input_df.head(5)
Define Response Model and Task¶
Each agent researches a drug product's regulatory status using its trade name, ingredient, and dosage form. The regulatory_status field is constrained to a fixed set of allowed values.
from enum import Enum
class RegulatoryStatus(str, Enum):
NDA = "FDA approved (NDA)"
ANDA = "FDA approved (ANDA \u2013 generic)"
TENTATIVE = "Tentative approval"
DISCONTINUED = "Discontinued (not withdrawn for safety)"
WITHDRAWN = "Withdrawn for safety reasons"
NOT_MARKETED = "Approved but currently not marketed"
UNDER_REVIEW = "Under FDA review"
EUA = "Emergency Use Authorization (EUA)"
NOT_APPROVED = "Not FDA approved (compounded / ex-US only)"
class DrugRegulatoryResult(BaseModel):
regulatory_status: RegulatoryStatus = Field(
description="The current FDA regulatory status of this drug product."
)
AGENT_TASK = """Research the current regulatory status based on its trade name, ingredient, and dosage form.
Allowed Values for regulatory_status:
FDA approved (NDA)
FDA approved (ANDA \u2013 generic)
Tentative approval
Discontinued (not withdrawn for safety)
Withdrawn for safety reasons
Approved but currently not marketed
Under FDA review
Emergency Use Authorization (EUA)
Not FDA approved (compounded / ex-US only)
NOTHING ELSE."""
Run Agent Map¶
Send all 10k rows to agent_map(). Each row gets its own agent that researches the product's regulatory history, checking FDA Orange Book listings, approval databases, Federal Register notices, and other sources.
async with create_session(name="Drug Regulatory Status Research") as session:
print(f"Session URL: {session.get_url()}\n")
result = await agent_map(
task=AGENT_TASK,
input=input_df,
response_model=DrugRegulatoryResult,
session=session,
)
Cost¶
Running 10,000 agents cost $1.17k, averaging around $0.11 per row.
Inspecting Results¶
Load the results CSV and analyze the regulatory status classifications.
results_df = pd.read_csv("regulatory_status_results.csv")
print(f"Total rows: {len(results_df):,}")
print(f"Rows with results: {results_df['regulatory_status'].notna().sum():,}")
print(f"Failed rows: {results_df['regulatory_status'].isna().sum()}")
results_df.head(2)
status_counts = results_df["regulatory_status"].value_counts()
status_pct = (results_df["regulatory_status"].value_counts(normalize=True) * 100).round(1)
summary = pd.DataFrame({"count": status_counts, "percent": status_pct})
print("Regulatory Status Breakdown")
print("=" * 50)
summary
import matplotlib.pyplot as plt
# Normalize near-duplicate labels before plotting
normalize_map = {
"FDA approve (NDA)": "FDA approved (NDA)",
"FDA approved (ANDA - generic)": "FDA approved (ANDA \u2013 generic)",
}
plot_df = results_df.copy()
plot_df["regulatory_status"] = plot_df["regulatory_status"].replace(normalize_map)
counts = plot_df["regulatory_status"].value_counts()
fig, ax = plt.subplots(figsize=(10, 5))
counts.plot.barh(ax=ax)
ax.set_xlabel("Number of products")
ax.set_title("FDA Regulatory Status Distribution (10k drug products)")
ax.invert_yaxis()
for i, v in enumerate(counts.values):
ax.text(v + 30, i, f"{v:,}", va="center", fontsize=9)
plt.tight_layout()
plt.show()
Exploring by Status Category¶
# Products withdrawn for safety reasons
withdrawn = results_df[results_df["regulatory_status"] == "Withdrawn for safety reasons"]
print(f"Withdrawn for safety: {len(withdrawn)} products")
withdrawn[["trade_name", "ingredient", "dosage_form", "research"]].sample(min(2, len(withdrawn)), random_state=42)
# Discontinued products (not for safety)
discontinued = results_df[results_df["regulatory_status"] == "Discontinued (not withdrawn for safety)"]
print(f"Discontinued (not for safety): {len(discontinued):,} products")
discontinued[["trade_name", "ingredient", "applicant", "research"]].sample(2, random_state=42)
# Currently approved products (NDA + ANDA)
approved = results_df[results_df["regulatory_status"].isin(["FDA approved (NDA)", "FDA approved (ANDA \u2013 generic)"])]
print(f"Currently FDA approved: {len(approved):,} products ({len(approved)/len(results_df)*100:.1f}%)")
print(f" NDA (brand): {(results_df['regulatory_status'] == 'FDA approved (NDA)').sum():,}")
print(f" ANDA (generic): {(results_df['regulatory_status'] == 'FDA approved (ANDA \u2013 generic)').sum():,}")
approved[["trade_name", "ingredient", "applicant", "dosage_form", "research"]].sample(2, random_state=42)