Everyrow
Getting Started
  • Installation
  • Skills vs MCP
Guides
  • How to Add A Column to a DataFrame with Web Research
  • How to Classify and Label Data with an LLM in Python
  • Remove Duplicates from ML Training Data in Python
  • Filter a Pandas DataFrame with LLMs
  • How to Fuzzy Join DataFrames in Python
  • How to sort a dataset using web data in Python
  • How to resolve duplicate rows in Python with LLMs
API Reference
  • dedupe
  • merge
  • rank
  • agent_map
  • screen
Case Studies
  • Build an AI lead qualification pipeline in Python
  • Fuzzy join two Pandas DataFrames using LLMs
  • Fuzzy match and merge contact lists in Python
  • How to filter job postings with LLM Agents
  • How to merge datasets without common ID in Python
  • How to score and prioritize leads with AI in Python
  • How to Screen Stocks in Python with AI Agents
  • How to use LLMs to deduplicate CRM Data
  • LLM-powered Merging at Scale
  • LLM-powered Screening at Scale
  • Python Notebook to screen stocks using AI Agents
  • Running LLM Web Research Agents at Scale
  • Score and rank leads without a CRM in Python
  • Use LLM Agents to research government data at scale

How to merge datasets without common ID in Python¶

This notebook demonstrates using everyrow's merge() utility to join contact-level data with organization-level data before CRM upload.

Use Case: Your data lives across multiple tables—contacts in one, company information in another. Before uploading to HubSpot/Salesforce, you need a flattened export where each contact row includes the associated company context.

Why everyrow? Company names may not match exactly between tables ("Acme Corp" vs "Acme Corporation" vs "ACME"). The merge() function handles these variations semantically.

In [1]:
import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from everyrow import create_session
from everyrow.ops import merge

Load Data¶

In [2]:
# Contacts table
contacts_df = pd.read_csv("../data/crm_contacts.csv")

print(f"Contacts: {len(contacts_df)}")
contacts_df
Contacts: 10
Out[2]:
contact_name title email company_name
0 John Smith Portfolio Manager jsmith@bridgewater.com Bridgewater
1 Sarah Johnson Research Analyst sjohnson@citadel.com Citadel LLC
2 Michael Chen CIO mchen@twosgima.com Two Sigma Investments
3 Emily Davis Partner edavis@pershingsq.com Pershing Square
4 David Lee Senior Analyst dlee@aqr.com AQR
5 Jessica Wang VP Research jwang@deshaw.com D.E. Shaw
6 Robert Brown Managing Director rbrown@point72.com Point72 Asset Mgmt
7 Amanda Wilson Quant Researcher awilson@rentech.com Renaissance Tech
8 Chris Martinez Portfolio Analyst cmartinez@elliott.com Elliott Mgmt
9 Lisa Thompson Research Director lthompson@baupost.com Baupost
In [3]:
# Funds/Companies table with enriched information
funds_df = pd.read_csv("../data/crm_funds.csv")

print(f"Funds: {len(funds_df)}")
funds_df
Funds: 10
Out[3]:
fund_name aum_billions strategy research_intensity outreach_hook
0 Bridgewater Associates 120 Global Macro High Known for systematic research - might value da...
1 Citadel 52 Multi-Strategy Very High Aggressive talent acquisition - research tools...
2 Two Sigma 60 Quantitative Medium Build vs buy - but may want specialized tools
3 Pershing Square Capital 12 Activist Value Very High Deep research for activist campaigns - ideal fit
4 AQR Capital Management 98 Factor Investing Medium Academic approach - may value research augment...
5 D. E. Shaw & Co. 55 Quantitative Low Fully systematic - limited research tool needs
6 Point72 Asset Management 24 Multi-Strategy High Cubist data science division - potential integ...
7 Renaissance Technologies 55 Quantitative Low Pure quant - unlikely to need external research
8 Elliott Management 56 Activist/Distressed Very High Legal and financial deep dives - strong fit
9 Baupost Group 27 Value/Distressed Very High Seth Klarman's research-heavy approach - ideal...

Define Merge Task¶

In [4]:
MERGE_TASK = """
Match contacts to their associated fund/company.

Company names may vary between tables:
- "Bridgewater" should match "Bridgewater Associates"
- "Citadel LLC" should match "Citadel"
- "Two Sigma Investments" should match "Two Sigma"
- "D.E. Shaw" should match "D. E. Shaw & Co."
- "Point72 Asset Mgmt" should match "Point72 Asset Management"
- "Renaissance Tech" should match "Renaissance Technologies"
- "Elliott Mgmt" should match "Elliott Management"

Match based on the core company name, ignoring variations in:
- Legal suffixes (LLC, Inc, LP, & Co.)
- Abbreviations (Mgmt = Management, Tech = Technologies)
- Additional descriptors (Associates, Capital, Asset Management)
"""

Run the Merge¶

In [5]:
async def run_merge():
    async with create_session(name="CRM Merge Workflow") as session:
        print(f"Session URL: {session.get_url()}")
        print("\nMerging contacts with fund data...\n")
        
        result = await merge(
            session=session,
            task=MERGE_TASK,
            left_table=contacts_df,
            right_table=funds_df,
            merge_on_left="company_name",
            merge_on_right="fund_name",
        )
        
        return result.data

results_df = await run_merge()
Session URL: https://everyrow.io/sessions/91871b91-e559-42cd-b4d4-e7fa3ffb41f3

Merging contacts with fund data...

Review Merged Data¶

In [6]:
# Count successful merges
merged = results_df[results_df["fund_name"].notna()]
unmerged = results_df[results_df["fund_name"].isna()]

print(f"\n{'='*60}")
print(f"MERGE RESULTS")
print(f"{'='*60}")
print(f"  Total contacts:        {len(results_df)}")
print(f"  Successfully merged:   {len(merged)}")
print(f"  Unmatched:             {len(unmerged)}")
============================================================
MERGE RESULTS
============================================================
  Total contacts:        10
  Successfully merged:   10
  Unmatched:             0
In [7]:
# Show CRM-ready output
print("\nCRM-READY CONTACT LIST:")
print("-" * 80)

for _, row in merged.iterrows():
    print(f"\n{row['contact_name']} - {row['title']}")
    print(f"  Email: {row['email']}")
    print(f"  Fund: {row['fund_name']} (${row['aum_billions']}B AUM)")
    print(f"  Strategy: {row['strategy']} | Research Intensity: {row['research_intensity']}")
    print(f"  Outreach Hook: {row['outreach_hook']}")
CRM-READY CONTACT LIST:
--------------------------------------------------------------------------------

John Smith - Portfolio Manager
  Email: jsmith@bridgewater.com
  Fund: Bridgewater Associates ($120B AUM)
  Strategy: Global Macro | Research Intensity: High
  Outreach Hook: Known for systematic research - might value data tools

Sarah Johnson - Research Analyst
  Email: sjohnson@citadel.com
  Fund: Citadel ($52B AUM)
  Strategy: Multi-Strategy | Research Intensity: Very High
  Outreach Hook: Aggressive talent acquisition - research tools as recruiting edge

Michael Chen - CIO
  Email: mchen@twosgima.com
  Fund: Two Sigma ($60B AUM)
  Strategy: Quantitative | Research Intensity: Medium
  Outreach Hook: Build vs buy - but may want specialized tools

Emily Davis - Partner
  Email: edavis@pershingsq.com
  Fund: Pershing Square Capital ($12B AUM)
  Strategy: Activist Value | Research Intensity: Very High
  Outreach Hook: Deep research for activist campaigns - ideal fit

David Lee - Senior Analyst
  Email: dlee@aqr.com
  Fund: AQR Capital Management ($98B AUM)
  Strategy: Factor Investing | Research Intensity: Medium
  Outreach Hook: Academic approach - may value research augmentation

Jessica Wang - VP Research
  Email: jwang@deshaw.com
  Fund: D. E. Shaw & Co. ($55B AUM)
  Strategy: Quantitative | Research Intensity: Low
  Outreach Hook: Fully systematic - limited research tool needs

Robert Brown - Managing Director
  Email: rbrown@point72.com
  Fund: Point72 Asset Management ($24B AUM)
  Strategy: Multi-Strategy | Research Intensity: High
  Outreach Hook: Cubist data science division - potential integration partner

Amanda Wilson - Quant Researcher
  Email: awilson@rentech.com
  Fund: Renaissance Technologies ($55B AUM)
  Strategy: Quantitative | Research Intensity: Low
  Outreach Hook: Pure quant - unlikely to need external research

Chris Martinez - Portfolio Analyst
  Email: cmartinez@elliott.com
  Fund: Elliott Management ($56B AUM)
  Strategy: Activist/Distressed | Research Intensity: Very High
  Outreach Hook: Legal and financial deep dives - strong fit

Lisa Thompson - Research Director
  Email: lthompson@baupost.com
  Fund: Baupost Group ($27B AUM)
  Strategy: Value/Distressed | Research Intensity: Very High
  Outreach Hook: Seth Klarman's research-heavy approach - ideal prospect
In [8]:
# Filter to high-priority contacts (high research intensity)
high_priority = merged[merged["research_intensity"].isin(["High", "Very High"])]

print(f"\n{'='*60}")
print(f"HIGH PRIORITY CONTACTS ({len(high_priority)} contacts)")
print(f"{'='*60}")

for _, row in high_priority.iterrows():
    print(f"  {row['contact_name']:20} | {row['fund_name']:25} | {row['research_intensity']}")
============================================================
HIGH PRIORITY CONTACTS (6 contacts)
============================================================
  John Smith           | Bridgewater Associates    | High
  Sarah Johnson        | Citadel                   | Very High
  Emily Davis          | Pershing Square Capital   | Very High
  Robert Brown         | Point72 Asset Management  | High
  Chris Martinez       | Elliott Management        | Very High
  Lisa Thompson        | Baupost Group             | Very High
In [9]:
# Export CRM-ready file
crm_export = merged[["contact_name", "title", "email", "fund_name", "aum_billions", "strategy", "research_intensity", "outreach_hook"]]
crm_export.to_csv("crm_ready_contacts.csv", index=False)
print(f"\nExported {len(crm_export)} contacts to crm_ready_contacts.csv")
Exported 10 contacts to crm_ready_contacts.csv
In [10]:
# Full results
results_df
Out[10]:
contact_name title email company_name fund_name aum_billions strategy research_intensity outreach_hook research
0 John Smith Portfolio Manager jsmith@bridgewater.com Bridgewater Bridgewater Associates 120 Global Macro High Known for systematic research - might value da... {'fund_name': 'This row was matched due to the...
1 Sarah Johnson Research Analyst sjohnson@citadel.com Citadel LLC Citadel 52 Multi-Strategy Very High Aggressive talent acquisition - research tools... {'fund_name': 'This row was matched due to the...
2 Michael Chen CIO mchen@twosgima.com Two Sigma Investments Two Sigma 60 Quantitative Medium Build vs buy - but may want specialized tools {'fund_name': 'This row was matched due to the...
3 Emily Davis Partner edavis@pershingsq.com Pershing Square Pershing Square Capital 12 Activist Value Very High Deep research for activist campaigns - ideal fit {'fund_name': 'This row was matched due to the...
4 David Lee Senior Analyst dlee@aqr.com AQR AQR Capital Management 98 Factor Investing Medium Academic approach - may value research augment... {'fund_name': 'This row was matched due to the...
5 Jessica Wang VP Research jwang@deshaw.com D.E. Shaw D. E. Shaw & Co. 55 Quantitative Low Fully systematic - limited research tool needs {'fund_name': 'This row was matched due to the...
6 Robert Brown Managing Director rbrown@point72.com Point72 Asset Mgmt Point72 Asset Management 24 Multi-Strategy High Cubist data science division - potential integ... {'fund_name': 'This row was matched due to the...
7 Amanda Wilson Quant Researcher awilson@rentech.com Renaissance Tech Renaissance Technologies 55 Quantitative Low Pure quant - unlikely to need external research {'fund_name': 'This row was matched due to the...
8 Chris Martinez Portfolio Analyst cmartinez@elliott.com Elliott Mgmt Elliott Management 56 Activist/Distressed Very High Legal and financial deep dives - strong fit {'fund_name': 'This row was matched due to the...
9 Lisa Thompson Research Director lthompson@baupost.com Baupost Baupost Group 27 Value/Distressed Very High Seth Klarman's research-heavy approach - ideal... {'fund_name': 'This row was matched due to the...