Everyrow
Getting Started
  • Installation
  • Skills vs MCP
Guides
  • How to Add A Column to a DataFrame with Web Research
  • How to Classify and Label Data with an LLM in Python
  • Remove Duplicates from ML Training Data in Python
  • Filter a Pandas DataFrame with LLMs
  • How to Fuzzy Join DataFrames in Python
  • How to sort a dataset using web data in Python
  • How to resolve duplicate rows in Python with LLMs
API Reference
  • dedupe
  • merge
  • rank
  • agent_map
  • screen
Case Studies
  • Build an AI lead qualification pipeline in Python
  • Fuzzy join two Pandas DataFrames using LLMs
  • Fuzzy match and merge contact lists in Python
  • How to filter job postings with LLM Agents
  • How to merge datasets without common ID in Python
  • How to score and prioritize leads with AI in Python
  • How to Screen Stocks in Python with AI Agents
  • How to use LLMs to deduplicate CRM Data
  • LLM-powered Merging at Scale
  • LLM-powered Screening at Scale
  • Python Notebook to screen stocks using AI Agents
  • Running LLM Web Research Agents at Scale
  • Score and rank leads without a CRM in Python
  • Use LLM Agents to research government data at scale

Build an AI lead qualification pipeline in Python¶

This notebook demonstrates a complex, multi-stage screening workflow that combines multiple everyrow operations with pandas data transformations.

Use Case: Qualify investment fund leads for a B2B research tools company. The workflow:

  1. Score funds by "contrarian" research approach (likely to adopt new tools)
  2. Filter to high-scoring candidates using pandas
  3. Research team sizes for remaining candidates
  4. Apply nuanced inclusion logic: include funds with strong research signals OR very small teams

Why this approach? Traditional tools force binary choices. This workflow captures the nuanced mental model: "I want funds that show research-tool-adoption signals, but I'll also include tiny funds where even weak signals matter."

In [1]:
import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from pydantic import BaseModel, Field
from everyrow import create_session
from everyrow.ops import rank, screen

Load Investment Fund Data¶

In [2]:
funds_df = pd.read_csv("../data/investment_funds.csv")

print(f"Loaded {len(funds_df)} funds")
funds_df.head(10)
Loaded 20 funds
Out[2]:
fund_name aum_millions investment_focus website_summary
0 Quantum Value Partners 850 Deep value, contrarian bets on out-of-favor se... We use proprietary research frameworks to iden...
1 Momentum Capital 2400 Trend-following systematic strategies Algorithmic trading based on price momentum. F...
2 Greenfield Research Fund 320 Small-cap fundamental analysis Intensive primary research on overlooked small...
3 Index Plus Alpha 5600 Enhanced indexing with factor tilts Low-cost index exposure with systematic factor...
4 Catalyst Event Partners 780 Event-driven, merger arbitrage We analyze M&A deals and corporate actions. He...
5 Artemis Long/Short 1200 Fundamental long/short equity Bottom-up stock picking with extensive channel...
6 Passive Global ETF 12000 Passive global equity exposure Track the MSCI World Index at lowest cost.
7 Boutique Micro Fund 45 Micro-cap special situations Two-person team finding hidden gems in micro-c...
8 Quant Systematic Alpha 3200 Statistical arbitrage and market-making High-frequency strategies. Technology-first, n...
9 Deep Dive Capital 520 Concentrated positions, 18-month holding period We spend 6 months researching each position. O...

Stage 1: Rank Funds by Research Tool Adoption Likelihood¶

First, we score each fund on how likely they are to adopt new research tools. This is based on their investment approach and research intensity.

In [3]:
CONTRARIAN_SCORING_TASK = """
Score each investment fund from 0-100 on their likelihood to adopt new research tools.

HIGH scores (70-100) for funds that:
- Emphasize proprietary/primary research
- Mention reading documents, reports, filings manually
- Have research-intensive strategies (fundamental analysis, deep dives)
- Express need for research edge or differentiation
- Smaller teams that need to punch above their weight

MEDIUM scores (40-69) for funds that:
- Do some research but also rely on quantitative/systematic approaches
- Have mixed strategies

LOW scores (0-39) for funds that:
- Are fully systematic/algorithmic with no fundamental research
- Passive/index funds
- Explicitly mention no human research or automated-only approaches
"""
In [4]:
async def stage1_score_funds(session, df):
    """Score funds by research tool adoption likelihood."""
    print("Stage 1: Scoring funds by research tool adoption likelihood...")
    
    result = await rank(
        session=session,
        task=CONTRARIAN_SCORING_TASK,
        input=df,
        field_name="score",
    )
    
    return result.data

Stage 2: Filter Using Pandas¶

Apply a threshold to focus on high-potential leads. We keep funds scoring 50+ for further analysis.

In [5]:
def stage2_filter_by_score(df, threshold=50):
    """Filter to funds above the score threshold."""
    print(f"\nStage 2: Filtering to funds with score >= {threshold}...")
    
    filtered = df[df["score"] >= threshold].copy()
    
    print(f"  {len(df)} funds -> {len(filtered)} funds after filtering")
    print(f"  Removed: {len(df) - len(filtered)} low-score funds")
    
    return filtered

Stage 3: Research Team Sizes¶

For the remaining funds, we want to know their team size. Smaller teams are often more accessible and more likely to try new tools.

In [6]:
TEAM_SIZE_TASK = """
Estimate the investment team size for each fund based on the available information.

Look for clues like:
- Explicit mentions of team size ("two-person team", "solo GP")
- AUM relative to strategy complexity
- Website descriptions mentioning analysts, partners, etc.

Provide your best estimate as a number. If a range, use the midpoint.
For very small operations, 1-3 is typical.
For larger funds, 10-50+ is common.
"""
In [7]:
async def stage3_research_team_size(session, df):
    """Research and estimate team sizes."""
    print("\nStage 3: Researching team sizes...")
    
    result = await rank(
        session=session,
        task=TEAM_SIZE_TASK,
        input=df,
        field_name="team_size_estimate",
    )
    
    return result.data

Stage 4: Apply Nuanced Inclusion Logic¶

The final screening applies nuanced logic that captures our actual mental model:

  • Include if strong research signals (score >= 70)
  • Also include if very small team (<= 5 people), even with weaker signals

This captures the insight that tiny teams are often more accessible and more desperate for research tools, even if their website doesn't explicitly mention research needs.

In [8]:
class InclusionResult(BaseModel):
    """Schema for final inclusion decision."""
    include: bool = Field(
        description="Whether to include this fund in the final outreach list"
    )
    inclusion_reason: str = Field(
        description="Why this fund was included or excluded"
    )
    priority: str = Field(
        description="high, medium, or low priority for outreach"
    )

INCLUSION_TASK = """
Decide whether to include each fund in the final outreach list for a B2B research tools sale.

INCLUDE a fund if EITHER:
1. They have a high research tool adoption score (>= 70) - these are obvious fits
2. They have a very small team (<= 5 people) - small teams are accessible and need tools

PRIORITY levels:
- HIGH: Score >= 70 AND small team - best of both worlds
- MEDIUM: Score >= 70 OR small team (but not both)
- LOW: Included but borderline

EXCLUDE funds that don't meet either criterion.
"""
In [9]:
async def stage4_final_screening(session, df):
    """Apply final inclusion logic."""
    print("\nStage 4: Applying final inclusion logic...")
    
    result = await screen(
        session=session,
        task=INCLUSION_TASK,
        input=df,
        response_model=InclusionResult,
    )
    
    return result.data

Run the Complete Workflow¶

In [10]:
async def run_full_workflow():
    """Execute the complete multi-stage screening workflow."""
    
    async with create_session(name="Multi-Stage Lead Screening") as session:
        print(f"Session URL: {session.get_url()}")
        print("="*60)
        
        # Stage 1: Score by research tool adoption
        scored_df = await stage1_score_funds(session, funds_df)
        
        # Stage 2: Filter by score threshold (pandas)
        filtered_df = stage2_filter_by_score(scored_df, threshold=50)
        
        # Stage 3: Research team sizes
        with_team_size_df = await stage3_research_team_size(session, filtered_df)
        
        # Stage 4: Final screening with nuanced logic
        final_df = await stage4_final_screening(session, with_team_size_df)
        
        return final_df

final_results = await run_full_workflow()
Session URL: https://everyrow.io/sessions/9dd21918-d385-4fa4-932a-190c877f74d4
============================================================
Stage 1: Scoring funds by research tool adoption likelihood...
Stage 2: Filtering to funds with score >= 50...
  20 funds -> 15 funds after filtering
  Removed: 5 low-score funds

Stage 3: Researching team sizes...
Stage 4: Applying final inclusion logic...

Analyze Final Results¶

In [11]:
# Filter to included funds
included_funds = final_results[final_results["include"] == True].copy()

print(f"\n{'='*60}")
print(f"FINAL RESULTS: {len(included_funds)} funds qualified for outreach")
print(f"{'='*60}\n")
============================================================
FINAL RESULTS: 14 funds qualified for outreach
============================================================

In [12]:
# Show included funds
print("QUALIFIED FUNDS:")
print("-" * 50)
for _, row in included_funds.iterrows():
    print(f"  {row['fund_name']}")
    team_size = row.get('team_size_estimate', 'N/A')
    score = row.get('score', 'N/A')
    print(f"    AUM: ${row['aum_millions']}M | Team: ~{team_size} | Score: {score}")
    print()
QUALIFIED FUNDS:
--------------------------------------------------
  Tiny Ventures GP
    AUM: $28M | Team: ~1 | Score: 85

  Family Office Alpha
    AUM: $95M | Team: ~2 | Score: 85

  Boutique Micro Fund
    AUM: $45M | Team: ~2 | Score: 95

  Nano Cap Hunters
    AUM: $62M | Team: ~3 | Score: 95

  Greenfield Research Fund
    AUM: $320M | Team: ~4 | Score: 92

  Deep Dive Capital
    AUM: $520M | Team: ~5 | Score: 95

  ESG Impact Capital
    AUM: $680M | Team: ~6 | Score: 88

  Quantum Value Partners
    AUM: $850M | Team: ~8 | Score: 85

  Artemis Long/Short
    AUM: $1200M | Team: ~8 | Score: 85

  Catalyst Event Partners
    AUM: $780M | Team: ~8 | Score: 90

  Sector Specialist Partners
    AUM: $410M | Team: ~8 | Score: 94

  Activist Value Fund
    AUM: $1800M | Team: ~10 | Score: 85

  Global Macro Partners
    AUM: $2800M | Team: ~15 | Score: 72

  Credit Opportunities Fund
    AUM: $1500M | Team: ~15 | Score: 95

In [13]:
# Summary statistics
print("\nWORKFLOW SUMMARY:")
print(f"  Started with:          {len(funds_df)} funds")
print(f"  After score filter:    {len(final_results)} funds")
print(f"  Final qualified leads: {len(included_funds)} funds")
WORKFLOW SUMMARY:
  Started with:          20 funds
  After score filter:    14 funds
  Final qualified leads: 14 funds
In [14]:
# Export the final list
included_funds.to_csv("qualified_leads.csv", index=False)
print(f"\nExported {len(included_funds)} qualified leads to qualified_leads.csv")
Exported 14 qualified leads to qualified_leads.csv
In [15]:
# Display full results table
final_results[["fund_name", "aum_millions", "score", "team_size_estimate", "include"]]
Out[15]:
fund_name aum_millions score team_size_estimate include
0 Tiny Ventures GP 28 85 1 True
1 Family Office Alpha 95 85 2 True
2 Boutique Micro Fund 45 95 2 True
3 Nano Cap Hunters 62 95 3 True
4 Greenfield Research Fund 320 92 4 True
5 Deep Dive Capital 520 95 5 True
6 ESG Impact Capital 680 88 6 True
7 Quantum Value Partners 850 85 8 True
8 Artemis Long/Short 1200 85 8 True
9 Catalyst Event Partners 780 90 8 True
10 Sector Specialist Partners 410 94 8 True
11 Activist Value Fund 1800 85 10 True
12 Global Macro Partners 2800 72 15 True
13 Credit Opportunities Fund 1500 95 15 True