API Reference

Five operations for processing data with LLM-powered web research agents. Each takes a DataFrame and a natural-language instruction.

screen

result = await screen(task=..., input=df, response_model=Model)

screen takes a DataFrame and a natural-language filter predicate, evaluates each row using web research agents, and returns only the rows that pass. The filter condition does not need to be computable from existing columns. Agents can research external information to make the determination.

Full reference → Guides: Filter a DataFrame with LLMs Notebooks: LLM Screening at Scale, Screen Stocks by Investment Thesis

rank

result = await rank(task=..., input=df, field_name="score")

rank takes a DataFrame and a natural-language scoring criterion, dispatches web research agents to compute a score for each row, and returns the DataFrame sorted by that score. The sort key does not need to exist in your data. Agents derive it at runtime by searching the web, reading pages, and reasoning over what they find.

Full reference → Guides: Sort a Dataset Using Web Data Notebooks: Score Leads from Fragmented Data, Score Leads Without CRM History

dedupe

result = await dedupe(input=df, equivalence_relation="...")

dedupe groups duplicate rows in a DataFrame based on a natural-language equivalence relation, assigns cluster IDs, and selects a canonical row per cluster. The duplicate criterion is semantic and LLM-powered: agents reason over the data and, when needed, search the web for external information to establish equivalence. This handles abbreviations, name variations, job changes, and entity relationships that no string similarity threshold can capture.

Full reference → Guides: Remove Duplicates from ML Training Data, Resolve Duplicate Entities Notebooks: Dedupe CRM Company Records

merge

result = await merge(task=..., left_table=df1, right_table=df2)

merge left-joins two DataFrames using LLM-powered agents to resolve the key mapping instead of requiring exact or fuzzy key matches. Agents resolve semantic relationships by reasoning over the data and, when needed, searching the web for external information to establish matches: subsidiaries, regional names, abbreviations, and product-to-parent-company mappings.

Full reference → Guides: Fuzzy Join Without Matching Keys Notebooks: LLM Merging at Scale, Match Software Vendors to Requirements

agent_map / single_agent

result = await agent_map(task=..., input=df)

single_agent runs one web research agent on a single input (or no input). agent_map runs an agent on every row of a DataFrame in parallel. Both dispatch agents that search the web, read pages, and return structured results. The transform is live web research: agents fetch and synthesize external information to populate new columns.

Full reference → Guides: Add a Column with Web Lookup, Classify and Label Data with an LLM Notebooks: LLM Web Research Agents at Scale