Run 10,000 LLM Web Research Agents¶
The everyrow agent_map() function runs an LLM web research agent on every row of a dataframe. In this notebook, I demonstrate scaling this to running 10,000 web agents.
First, some numbers. The total cost was ~$0.11/row, using 120k LLM calls, 1.56B input tokens, 20.1M output tokens, executing 338k web searches, and reading 11,726 pages. The whole run took only 3 hours 27 minutes.
| Model | Calls | Input Tokens | Output Tokens | Cost |
|---|---|---|---|---|
| gemini-3-flash-preview | 98,190 | 847,115,551 | 17,237,847 | $913.85 |
| gemini-2.5-flash | 11,574 | 700,327,085 | 2,715,535 | $222.01 |
| claude-sonnet-4-20250514 | 10,015 | 10,912,199 | 193,567 | $35.64 |
You'll see that this is reasonably affordable only because the vast majority of the work is done by Gemini-3-Flash (running the agents) and Gemini-2.5-Flash (reading webpages). The SDK supports using higher powered LLMs when it's really worth it.
Also, you'll see that to process 10,000 rows, each agent executed 34 web searches, but only fully read ~1.2 pages. The rest of its information it got by reading search result snippets, which can be surprisingly informative to an agent trying to answer simple questions, often allowing it save a lot of tokens by not fetching and read any pages at all, and still answer correctly. Gemini-3-Flash is quite good at this, in general, doing nearly the best on Deep Research Bench, and by far the most cost-efficient model. (Though Opus 4.6, released in Feb 2026, also shows great token efficiency in doing web research, and can be cost competitive even though it's ~9x the price per token!)
A large cost comes from writing output, as this agent produced a few paragraphs of unstructured research output in addition to specifically requested fields (see the dataframe below). Costs could be reduced by minimizing the outputs, but generally we find that output to be very useful in processing the outputs further, and reduces the chance that the agent is unable to report important information given a restrictive schema.
Researching 10,000 Drugs¶
In many cases, running a web research agent for every row in a dataset is not efficient. We recommend, for example, first using an intelligent filter, or deduping your data, to cut down on the amount of web research needed.
Sometimes, though, you do want good research on every entity in a large list, and there's no structured data source to pull from. In that case, you want a system to run as cheaply and consistently as possible to research all of them, using multiple sources and given agents freedom to search around and read what seems relevant for each one.
This example takes a dataset of 10,000 drug product entries (trade name, ingredient, applicant, strength, dosage form) and determines each product's current FDA regulatory status. Determining regulatory status requires researching each product individually against FDA databases, Orange Book listings, Federal Register notices, and other sources. Some products have straightforward histories while others have complex timelines involving tentative approvals, voluntary withdrawals, or transitions between marketed and not marketed status.
This run achieved a 99.97% success rate (9,997 of 10,000 rows returned results).
Below is how you can reproduce this.
Load Data¶
from dotenv import load_dotenv
import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import agent_map
pd.set_option("display.max_colwidth", None)
load_dotenv()
input_df = pd.read_csv("regulatory_status_results.csv", usecols=["row_id", "trade_name", "ingredient", "applicant", "strength", "dosage_form"])
print(f"{len(input_df):,} drug products")
print(f"Columns: {list(input_df.columns)}")
input_df.head(5)
Define Response Model and Task¶
Each agent researches a drug product's regulatory status using its trade name, ingredient, and dosage form. The regulatory_status field is constrained to a fixed set of allowed values.
from enum import Enum
class RegulatoryStatus(str, Enum):
NDA = "FDA approved (NDA)"
ANDA = "FDA approved (ANDA \u2013 generic)"
TENTATIVE = "Tentative approval"
DISCONTINUED = "Discontinued (not withdrawn for safety)"
WITHDRAWN = "Withdrawn for safety reasons"
NOT_MARKETED = "Approved but currently not marketed"
UNDER_REVIEW = "Under FDA review"
EUA = "Emergency Use Authorization (EUA)"
NOT_APPROVED = "Not FDA approved (compounded / ex-US only)"
class DrugRegulatoryResult(BaseModel):
regulatory_status: RegulatoryStatus = Field(
description="The current FDA regulatory status of this drug product."
)
AGENT_TASK = """Research the current regulatory status based on its trade name, ingredient, and dosage form.
Allowed Values for regulatory_status:
FDA approved (NDA)
FDA approved (ANDA \u2013 generic)
Tentative approval
Discontinued (not withdrawn for safety)
Withdrawn for safety reasons
Approved but currently not marketed
Under FDA review
Emergency Use Authorization (EUA)
Not FDA approved (compounded / ex-US only)
NOTHING ELSE."""
Run Agent Map¶
Send all 10k rows to agent_map(). Each row gets its own agent that researches the product's regulatory history, checking FDA Orange Book listings, approval databases, Federal Register notices, and other sources.
async with create_session(name="Drug Regulatory Status Research") as session:
print(f"Session URL: {session.get_url()}\n")
result = await agent_map(
task=AGENT_TASK,
input=input_df,
response_model=DrugRegulatoryResult,
session=session,
)
Inspecting Results¶
Load the results CSV and analyze the regulatory status classifications.
results_df = pd.read_csv("regulatory_status_results.csv")
print(f"Total rows: {len(results_df):,}")
print(f"Rows with results: {results_df['regulatory_status'].notna().sum():,}")
print(f"Failed rows: {results_df['regulatory_status'].isna().sum()}")
results_df.head(2)
status_counts = results_df["regulatory_status"].value_counts()
status_pct = (results_df["regulatory_status"].value_counts(normalize=True) * 100).round(1)
summary = pd.DataFrame({"count": status_counts, "percent": status_pct})
print("Regulatory Status Breakdown")
print("=" * 50)
summary
import matplotlib.pyplot as plt
# Normalize near-duplicate labels before plotting
normalize_map = {
"FDA approve (NDA)": "FDA approved (NDA)",
"FDA approved (ANDA - generic)": "FDA approved (ANDA \u2013 generic)",
}
plot_df = results_df.copy()
plot_df["regulatory_status"] = plot_df["regulatory_status"].replace(normalize_map)
counts = plot_df["regulatory_status"].value_counts()
fig, ax = plt.subplots(figsize=(10, 5))
counts.plot.barh(ax=ax)
ax.set_xlabel("Number of products")
ax.set_title("FDA Regulatory Status Distribution (10k drug products)")
ax.invert_yaxis()
for i, v in enumerate(counts.values):
ax.text(v + 30, i, f"{v:,}", va="center", fontsize=9)
plt.tight_layout()
plt.show()
Exploring by Status Category¶
# Products withdrawn for safety reasons
withdrawn = results_df[results_df["regulatory_status"] == "Withdrawn for safety reasons"]
print(f"Withdrawn for safety: {len(withdrawn)} products")
withdrawn[["trade_name", "ingredient", "dosage_form", "research"]].sample(min(2, len(withdrawn)), random_state=42)
# Discontinued products (not for safety)
discontinued = results_df[results_df["regulatory_status"] == "Discontinued (not withdrawn for safety)"]
print(f"Discontinued (not for safety): {len(discontinued):,} products")
discontinued[["trade_name", "ingredient", "applicant", "research"]].sample(2, random_state=42)
# Currently approved products (NDA + ANDA)
approved = results_df[results_df["regulatory_status"].isin(["FDA approved (NDA)", "FDA approved (ANDA \u2013 generic)"])]
print(f"Currently FDA approved: {len(approved):,} products ({len(approved)/len(results_df)*100:.1f}%)")
print(f" NDA (brand): {(results_df['regulatory_status'] == 'FDA approved (NDA)').sum():,}")
print(f" ANDA (generic): {(results_df['regulatory_status'] == 'FDA approved (ANDA \u2013 generic)').sum():,}")
approved[["trade_name", "ingredient", "applicant", "dosage_form", "research"]].sample(2, random_state=42)