How to merge datasets without common ID in Python¶

This notebook demonstrates using everyrow's merge() utility to join contact-level data with organization-level data before CRM upload.

Use Case: Your data lives across multiple tables—contacts in one, company information in another. Before uploading to HubSpot/Salesforce, you need a flattened export where each contact row includes the associated company context.

Why everyrow? Company names may not match exactly between tables ("Acme Corp" vs "Acme Corporation" vs "ACME"). The merge() function handles these variations semantically.

In [1]:

import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from everyrow import create_session
from everyrow.ops import merge

Load Data¶

In [2]:

# Contacts table
contacts_df = pd.read_csv("../data/crm_contacts.csv")

print(f"Contacts: {len(contacts_df)}")
contacts_df

Contacts: 10

Out[2]:

	contact_name	title	email	company_name
0	John Smith	Portfolio Manager	jsmith@bridgewater.com	Bridgewater
1	Sarah Johnson	Research Analyst	sjohnson@citadel.com	Citadel LLC
2	Michael Chen	CIO	mchen@twosgima.com	Two Sigma Investments
3	Emily Davis	Partner	edavis@pershingsq.com	Pershing Square
4	David Lee	Senior Analyst	dlee@aqr.com	AQR
5	Jessica Wang	VP Research	jwang@deshaw.com	D.E. Shaw
6	Robert Brown	Managing Director	rbrown@point72.com	Point72 Asset Mgmt
7	Amanda Wilson	Quant Researcher	awilson@rentech.com	Renaissance Tech
8	Chris Martinez	Portfolio Analyst	cmartinez@elliott.com	Elliott Mgmt
9	Lisa Thompson	Research Director	lthompson@baupost.com	Baupost

In [3]:

# Funds/Companies table with enriched information
funds_df = pd.read_csv("../data/crm_funds.csv")

print(f"Funds: {len(funds_df)}")
funds_df

Funds: 10

Out[3]:

	fund_name	aum_billions	strategy	research_intensity	outreach_hook
0	Bridgewater Associates	120	Global Macro	High	Known for systematic research - might value da...
1	Citadel	52	Multi-Strategy	Very High	Aggressive talent acquisition - research tools...
2	Two Sigma	60	Quantitative	Medium	Build vs buy - but may want specialized tools
3	Pershing Square Capital	12	Activist Value	Very High	Deep research for activist campaigns - ideal fit
4	AQR Capital Management	98	Factor Investing	Medium	Academic approach - may value research augment...
5	D. E. Shaw & Co.	55	Quantitative	Low	Fully systematic - limited research tool needs
6	Point72 Asset Management	24	Multi-Strategy	High	Cubist data science division - potential integ...
7	Renaissance Technologies	55	Quantitative	Low	Pure quant - unlikely to need external research
8	Elliott Management	56	Activist/Distressed	Very High	Legal and financial deep dives - strong fit
9	Baupost Group	27	Value/Distressed	Very High	Seth Klarman's research-heavy approach - ideal...

Define Merge Task¶

In [4]:

MERGE_TASK = """
Match contacts to their associated fund/company.

Company names may vary between tables:
- "Bridgewater" should match "Bridgewater Associates"
- "Citadel LLC" should match "Citadel"
- "Two Sigma Investments" should match "Two Sigma"
- "D.E. Shaw" should match "D. E. Shaw & Co."
- "Point72 Asset Mgmt" should match "Point72 Asset Management"
- "Renaissance Tech" should match "Renaissance Technologies"
- "Elliott Mgmt" should match "Elliott Management"

Match based on the core company name, ignoring variations in:
- Legal suffixes (LLC, Inc, LP, & Co.)
- Abbreviations (Mgmt = Management, Tech = Technologies)
- Additional descriptors (Associates, Capital, Asset Management)
"""

Run the Merge¶

In [5]:

async def run_merge():
    async with create_session(name="CRM Merge Workflow") as session:
        print(f"Session URL: {session.get_url()}")
        print("\nMerging contacts with fund data...\n")
        
        result = await merge(
            session=session,
            task=MERGE_TASK,
            left_table=contacts_df,
            right_table=funds_df,
            merge_on_left="company_name",
            merge_on_right="fund_name",
        )
        
        return result.data

results_df = await run_merge()

Session URL: https://everyrow.io/sessions/91871b91-e559-42cd-b4d4-e7fa3ffb41f3

Merging contacts with fund data...

Review Merged Data¶

In [6]:

# Count successful merges
merged = results_df[results_df["fund_name"].notna()]
unmerged = results_df[results_df["fund_name"].isna()]

print(f"\n{'='*60}")
print(f"MERGE RESULTS")
print(f"{'='*60}")
print(f"  Total contacts:        {len(results_df)}")
print(f"  Successfully merged:   {len(merged)}")
print(f"  Unmatched:             {len(unmerged)}")

============================================================
MERGE RESULTS
============================================================
  Total contacts:        10
  Successfully merged:   10
  Unmatched:             0

In [7]:

# Show CRM-ready output
print("\nCRM-READY CONTACT LIST:")
print("-" * 80)

for _, row in merged.iterrows():
    print(f"\n{row['contact_name']} - {row['title']}")
    print(f"  Email: {row['email']}")
    print(f"  Fund: {row['fund_name']} (${row['aum_billions']}B AUM)")
    print(f"  Strategy: {row['strategy']} | Research Intensity: {row['research_intensity']}")
    print(f"  Outreach Hook: {row['outreach_hook']}")

CRM-READY CONTACT LIST:
--------------------------------------------------------------------------------

John Smith - Portfolio Manager
  Email: jsmith@bridgewater.com
  Fund: Bridgewater Associates ($120B AUM)
  Strategy: Global Macro | Research Intensity: High
  Outreach Hook: Known for systematic research - might value data tools

Sarah Johnson - Research Analyst
  Email: sjohnson@citadel.com
  Fund: Citadel ($52B AUM)
  Strategy: Multi-Strategy | Research Intensity: Very High
  Outreach Hook: Aggressive talent acquisition - research tools as recruiting edge

Michael Chen - CIO
  Email: mchen@twosgima.com
  Fund: Two Sigma ($60B AUM)
  Strategy: Quantitative | Research Intensity: Medium
  Outreach Hook: Build vs buy - but may want specialized tools

Emily Davis - Partner
  Email: edavis@pershingsq.com
  Fund: Pershing Square Capital ($12B AUM)
  Strategy: Activist Value | Research Intensity: Very High
  Outreach Hook: Deep research for activist campaigns - ideal fit

David Lee - Senior Analyst
  Email: dlee@aqr.com
  Fund: AQR Capital Management ($98B AUM)
  Strategy: Factor Investing | Research Intensity: Medium
  Outreach Hook: Academic approach - may value research augmentation

Jessica Wang - VP Research
  Email: jwang@deshaw.com
  Fund: D. E. Shaw & Co. ($55B AUM)
  Strategy: Quantitative | Research Intensity: Low
  Outreach Hook: Fully systematic - limited research tool needs

Robert Brown - Managing Director
  Email: rbrown@point72.com
  Fund: Point72 Asset Management ($24B AUM)
  Strategy: Multi-Strategy | Research Intensity: High
  Outreach Hook: Cubist data science division - potential integration partner

Amanda Wilson - Quant Researcher
  Email: awilson@rentech.com
  Fund: Renaissance Technologies ($55B AUM)
  Strategy: Quantitative | Research Intensity: Low
  Outreach Hook: Pure quant - unlikely to need external research

Chris Martinez - Portfolio Analyst
  Email: cmartinez@elliott.com
  Fund: Elliott Management ($56B AUM)
  Strategy: Activist/Distressed | Research Intensity: Very High
  Outreach Hook: Legal and financial deep dives - strong fit

Lisa Thompson - Research Director
  Email: lthompson@baupost.com
  Fund: Baupost Group ($27B AUM)
  Strategy: Value/Distressed | Research Intensity: Very High
  Outreach Hook: Seth Klarman's research-heavy approach - ideal prospect

In [8]:

# Filter to high-priority contacts (high research intensity)
high_priority = merged[merged["research_intensity"].isin(["High", "Very High"])]

print(f"\n{'='*60}")
print(f"HIGH PRIORITY CONTACTS ({len(high_priority)} contacts)")
print(f"{'='*60}")

for _, row in high_priority.iterrows():
    print(f"  {row['contact_name']:20} | {row['fund_name']:25} | {row['research_intensity']}")

============================================================
HIGH PRIORITY CONTACTS (6 contacts)
============================================================
  John Smith           | Bridgewater Associates    | High
  Sarah Johnson        | Citadel                   | Very High
  Emily Davis          | Pershing Square Capital   | Very High
  Robert Brown         | Point72 Asset Management  | High
  Chris Martinez       | Elliott Management        | Very High
  Lisa Thompson        | Baupost Group             | Very High

In [9]:

# Export CRM-ready file
crm_export = merged[["contact_name", "title", "email", "fund_name", "aum_billions", "strategy", "research_intensity", "outreach_hook"]]
crm_export.to_csv("crm_ready_contacts.csv", index=False)
print(f"\nExported {len(crm_export)} contacts to crm_ready_contacts.csv")

Exported 10 contacts to crm_ready_contacts.csv

In [10]:

# Full results
results_df

Out[10]:

	contact_name	title	email	company_name	fund_name	aum_billions	strategy	research_intensity	outreach_hook	research
0	John Smith	Portfolio Manager	jsmith@bridgewater.com	Bridgewater	Bridgewater Associates	120	Global Macro	High	Known for systematic research - might value da...	{'fund_name': 'This row was matched due to the...
1	Sarah Johnson	Research Analyst	sjohnson@citadel.com	Citadel LLC	Citadel	52	Multi-Strategy	Very High	Aggressive talent acquisition - research tools...	{'fund_name': 'This row was matched due to the...
2	Michael Chen	CIO	mchen@twosgima.com	Two Sigma Investments	Two Sigma	60	Quantitative	Medium	Build vs buy - but may want specialized tools	{'fund_name': 'This row was matched due to the...
3	Emily Davis	Partner	edavis@pershingsq.com	Pershing Square	Pershing Square Capital	12	Activist Value	Very High	Deep research for activist campaigns - ideal fit	{'fund_name': 'This row was matched due to the...
4	David Lee	Senior Analyst	dlee@aqr.com	AQR	AQR Capital Management	98	Factor Investing	Medium	Academic approach - may value research augment...	{'fund_name': 'This row was matched due to the...
5	Jessica Wang	VP Research	jwang@deshaw.com	D.E. Shaw	D. E. Shaw & Co.	55	Quantitative	Low	Fully systematic - limited research tool needs	{'fund_name': 'This row was matched due to the...
6	Robert Brown	Managing Director	rbrown@point72.com	Point72 Asset Mgmt	Point72 Asset Management	24	Multi-Strategy	High	Cubist data science division - potential integ...	{'fund_name': 'This row was matched due to the...
7	Amanda Wilson	Quant Researcher	awilson@rentech.com	Renaissance Tech	Renaissance Technologies	55	Quantitative	Low	Pure quant - unlikely to need external research	{'fund_name': 'This row was matched due to the...
8	Chris Martinez	Portfolio Analyst	cmartinez@elliott.com	Elliott Mgmt	Elliott Management	56	Activist/Distressed	Very High	Legal and financial deep dives - strong fit	{'fund_name': 'This row was matched due to the...
9	Lisa Thompson	Research Director	lthompson@baupost.com	Baupost	Baupost Group	27	Value/Distressed	Very High	Seth Klarman's research-heavy approach - ideal...	{'fund_name': 'This row was matched due to the...

How to merge datasets without common ID in Python¶

This notebook demonstrates using everyrow's merge() utility to join contact-level data with organization-level data before CRM upload.

Why everyrow? Company names may not match exactly between tables ("Acme Corp" vs "Acme Corporation" vs "ACME"). The merge() function handles these variations semantically.

contact_name

title

company_name

John Smith

Portfolio Manager

jsmith@bridgewater.com

Bridgewater

Sarah Johnson

Research Analyst

sjohnson@citadel.com

Citadel LLC

Michael Chen

CIO

mchen@twosgima.com

Two Sigma Investments

Emily Davis

Partner

edavis@pershingsq.com

Pershing Square

David Lee

Senior Analyst

dlee@aqr.com

AQR

Jessica Wang

VP Research

jwang@deshaw.com

D.E. Shaw

Robert Brown

Managing Director

rbrown@point72.com

Point72 Asset Mgmt

Amanda Wilson

Quant Researcher

awilson@rentech.com

Renaissance Tech

Chris Martinez

Portfolio Analyst

cmartinez@elliott.com

Elliott Mgmt

Lisa Thompson

Research Director

lthompson@baupost.com

Baupost

fund_name

aum_billions

strategy

research_intensity

outreach_hook

Bridgewater Associates

120

Global Macro

High

Known for systematic research - might value da...

Citadel

Multi-Strategy

Very High

Aggressive talent acquisition - research tools...

Two Sigma

Quantitative

Medium

Build vs buy - but may want specialized tools

Pershing Square Capital

Activist Value

Very High

Deep research for activist campaigns - ideal fit

AQR Capital Management

Factor Investing

Medium

Academic approach - may value research augment...

D. E. Shaw & Co.

Quantitative

Low

Fully systematic - limited research tool needs

Point72 Asset Management

Multi-Strategy

High

Cubist data science division - potential integ...

Renaissance Technologies

Quantitative

Low

Pure quant - unlikely to need external research

Elliott Management

Activist/Distressed

Very High

Legal and financial deep dives - strong fit

Baupost Group

Value/Distressed

Very High

Seth Klarman's research-heavy approach - ideal...

MERGE_TASK = """ Match contacts to their associated fund/company. Company names may vary between tables: - "Bridgewater" should match "Bridgewater Associates" - "Citadel LLC" should match "Citadel" - "Two Sigma Investments" should match "Two Sigma" - "D.E. Shaw" should match "D. E. Shaw & Co." - "Point72 Asset Mgmt" should match "Point72 Asset Management" - "Renaissance Tech" should match "Renaissance Technologies" - "Elliott Mgmt" should match "Elliott Management" Match based on the core company name, ignoring variations in: - Legal suffixes (LLC, Inc, LP, & Co.) - Abbreviations (Mgmt = Management, Tech = Technologies) - Additional descriptors (Associates, Capital, Asset Management) """

async def run_merge(): async with create_session(name="CRM Merge Workflow") as session: print(f"Session URL: {session.get_url()}") print("\nMerging contacts with fund data...\n") result = await merge( session=session, task=MERGE_TASK, left_table=contacts_df, right_table=funds_df, merge_on_left="company_name", merge_on_right="fund_name", ) return result.data results_df = await run_merge()

# Count successful merges merged = results_df[results_df["fund_name"].notna()] unmerged = results_df[results_df["fund_name"].isna()] print(f"\n{'='*60}") print(f"MERGE RESULTS") print(f"{'='*60}") print(f" Total contacts: {len(results_df)}") print(f" Successfully merged: {len(merged)}") print(f" Unmatched: {len(unmerged)}")

============================================================ MERGE RESULTS ============================================================ Total contacts: 10 Successfully merged: 10 Unmatched: 0

# Show CRM-ready output print("\nCRM-READY CONTACT LIST:") print("-" * 80) for _, row in merged.iterrows(): print(f"\n{row['contact_name']} - {row['title']}") print(f" Email: {row['email']}") print(f" Fund: {row['fund_name']} (${row['aum_billions']}B AUM)") print(f" Strategy: {row['strategy']} | Research Intensity: {row['research_intensity']}") print(f" Outreach Hook: {row['outreach_hook']}")

CRM-READY CONTACT LIST: -------------------------------------------------------------------------------- John Smith - Portfolio Manager Email: jsmith@bridgewater.com Fund: Bridgewater Associates ($120B AUM) Strategy: Global Macro | Research Intensity: High Outreach Hook: Known for systematic research - might value data tools Sarah Johnson - Research Analyst Email: sjohnson@citadel.com Fund: Citadel ($52B AUM) Strategy: Multi-Strategy | Research Intensity: Very High Outreach Hook: Aggressive talent acquisition - research tools as recruiting edge Michael Chen - CIO Email: mchen@twosgima.com Fund: Two Sigma ($60B AUM) Strategy: Quantitative | Research Intensity: Medium Outreach Hook: Build vs buy - but may want specialized tools Emily Davis - Partner Email: edavis@pershingsq.com Fund: Pershing Square Capital ($12B AUM) Strategy: Activist Value | Research Intensity: Very High Outreach Hook: Deep research for activist campaigns - ideal fit David Lee - Senior Analyst Email: dlee@aqr.com Fund: AQR Capital Management ($98B AUM) Strategy: Factor Investing | Research Intensity: Medium Outreach Hook: Academic approach - may value research augmentation Jessica Wang - VP Research Email: jwang@deshaw.com Fund: D. E. Shaw & Co. ($55B AUM) Strategy: Quantitative | Research Intensity: Low Outreach Hook: Fully systematic - limited research tool needs Robert Brown - Managing Director Email: rbrown@point72.com Fund: Point72 Asset Management ($24B AUM) Strategy: Multi-Strategy | Research Intensity: High Outreach Hook: Cubist data science division - potential integration partner Amanda Wilson - Quant Researcher Email: awilson@rentech.com Fund: Renaissance Technologies ($55B AUM) Strategy: Quantitative | Research Intensity: Low Outreach Hook: Pure quant - unlikely to need external research Chris Martinez - Portfolio Analyst Email: cmartinez@elliott.com Fund: Elliott Management ($56B AUM) Strategy: Activist/Distressed | Research Intensity: Very High Outreach Hook: Legal and financial deep dives - strong fit Lisa Thompson - Research Director Email: lthompson@baupost.com Fund: Baupost Group ($27B AUM) Strategy: Value/Distressed | Research Intensity: Very High Outreach Hook: Seth Klarman's research-heavy approach - ideal prospect

# Filter to high-priority contacts (high research intensity) high_priority = merged[merged["research_intensity"].isin(["High", "Very High"])] print(f"\n{'='*60}") print(f"HIGH PRIORITY CONTACTS ({len(high_priority)} contacts)") print(f"{'='*60}") for _, row in high_priority.iterrows(): print(f" {row['contact_name']:20} | {row['fund_name']:25} | {row['research_intensity']}")

# Export CRM-ready file crm_export = merged[["contact_name", "title", "email", "fund_name", "aum_billions", "strategy", "research_intensity", "outreach_hook"]] crm_export.to_csv("crm_ready_contacts.csv", index=False) print(f"\nExported {len(crm_export)} contacts to crm_ready_contacts.csv")

contact_name

title

company_name

fund_name

aum_billions

strategy

research_intensity

outreach_hook

research

John Smith

Portfolio Manager

jsmith@bridgewater.com

Bridgewater

Bridgewater Associates

120

Global Macro

High

Known for systematic research - might value da...

{'fund_name': 'This row was matched due to the...

Sarah Johnson

Research Analyst

sjohnson@citadel.com

Citadel LLC

Citadel

Multi-Strategy

Very High

Aggressive talent acquisition - research tools...

{'fund_name': 'This row was matched due to the...

Michael Chen

CIO

mchen@twosgima.com

Two Sigma Investments

Two Sigma

Quantitative

Medium

Build vs buy - but may want specialized tools

{'fund_name': 'This row was matched due to the...

Emily Davis

Partner

edavis@pershingsq.com

Pershing Square

Pershing Square Capital

Activist Value

Very High

Deep research for activist campaigns - ideal fit

{'fund_name': 'This row was matched due to the...

David Lee

Senior Analyst

dlee@aqr.com

AQR

AQR Capital Management

Factor Investing

Medium

Academic approach - may value research augment...

{'fund_name': 'This row was matched due to the...

Jessica Wang

VP Research

jwang@deshaw.com

D.E. Shaw

D. E. Shaw & Co.

Quantitative

Low

Fully systematic - limited research tool needs

{'fund_name': 'This row was matched due to the...

Robert Brown

Managing Director

rbrown@point72.com

Point72 Asset Mgmt

Point72 Asset Management

Multi-Strategy

High

Cubist data science division - potential integ...

{'fund_name': 'This row was matched due to the...

Amanda Wilson

Quant Researcher

awilson@rentech.com

Renaissance Tech

Renaissance Technologies

Quantitative

Low

Pure quant - unlikely to need external research

{'fund_name': 'This row was matched due to the...

Chris Martinez

Portfolio Analyst

cmartinez@elliott.com

Elliott Mgmt

Elliott Management

Activist/Distressed

Very High

Legal and financial deep dives - strong fit

{'fund_name': 'This row was matched due to the...

Lisa Thompson

Research Director

lthompson@baupost.com

Baupost

Baupost Group

Value/Distressed

Very High

Seth Klarman's research-heavy approach - ideal...

{'fund_name': 'This row was matched due to the...