Merge

merge left-joins two DataFrames using LLM-powered agents to resolve the key mapping instead of requiring exact or fuzzy key matches. Agents resolve semantic relationships by reasoning over the data and, when needed, searching the web for external information to establish matches: subsidiaries, regional names, abbreviations, and product-to-parent-company mappings.

Examples

from everyrow.ops import merge

result = await merge(
    task="Match each software product to its parent company",
    left_table=software_products,
    right_table=approved_vendors,
    merge_on_left="product_name",
    merge_on_right="company_name",
)
print(result.data.head())

For ambiguous cases, add context:

result = await merge(
    task="""
        Match clinical trial sponsors to parent pharma companies.

        Watch for:
        - Subsidiaries (Genentech → Roche, Janssen → J&J)
        - Regional names (MSD is Merck outside the US)
        - Abbreviations (BMS → Bristol-Myers Squibb)
    """,
    left_table=trials,
    right_table=pharma_companies,
    merge_on_left="sponsor",
    merge_on_right="company",
)
print(result.data.head())

What you get back

A DataFrame with all left table columns plus matched right table columns. Rows that don't match get nulls for the right columns (like a left join).

Parameters

Name	Type	Description
`task`	str	How to match the tables
`left_table`	DataFrame	Primary table (all rows kept)
`right_table`	DataFrame	Table to match from
`merge_on_left`	Optional[str]	Column in left table. Model will try to guess if not specified.
`merge_on_right`	Optional[str]	Column in right table. Model will try to guess if not specified.
`session`	Session	Optional, auto-created if omitted

Performance

Size	Time	Cost
100 × 50	~3 min	~$2
2,000 × 50	~8 min	~$9
1,000 × 1,000	~12 min	~$15

Case studies

Software Supplier Matching — 2,000 products to 50 vendors, 91% accuracy, zero false positives
HubSpot Contact Merge — 99.9% recall despite GitHub handles, typos, and partial emails
CRM Merge Workflow — joining fund-level and contact-level data