Everyrow
Getting Started
  • Installation
  • Skills vs MCP
Guides
  • How to Add A Column to a DataFrame with Web Research
  • How to Classify and Label Data with an LLM in Python
  • Remove Duplicates from ML Training Data in Python
  • Filter a Pandas DataFrame with LLMs
  • How to Fuzzy Join DataFrames in Python
  • How to sort a dataset using web data in Python
  • How to resolve duplicate rows in Python with LLMs
API Reference
  • dedupe
  • merge
  • rank
  • agent_map
  • screen
Case Studies
  • Build an AI lead qualification pipeline in Python
  • Fuzzy join two Pandas DataFrames using LLMs
  • Fuzzy match and merge contact lists in Python
  • How to filter job postings with LLM Agents
  • How to merge datasets without common ID in Python
  • How to score and prioritize leads with AI in Python
  • How to Screen Stocks in Python with AI Agents
  • How to use LLMs to deduplicate CRM Data
  • LLM-powered Merging at Scale
  • LLM-powered Screening at Scale
  • Python Notebook to screen stocks using AI Agents
  • Running LLM Web Research Agents at Scale
  • Score and rank leads without a CRM in Python
  • Use LLM Agents to research government data at scale

Use LLM Agents to research government data at scale¶

This notebook demonstrates using everyrow's rank() utility with web research capabilities to gather and rank real-world data that isn't available in a structured format.

Use Case: Real estate investors need permit processing timelines to evaluate markets—delays directly impact holding costs. But municipalities publish this data inconsistently: some on websites, some in PDFs, some not at all.

Why everyrow? The rank() function can perform web research to find permit processing times from official sources, contractor reports, and comparable city data—then rank cities by speed.

In [6]:
import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from everyrow import create_session
from everyrow.ops import rank

Load Texas Cities Data¶

In [7]:
texas_cities_df = pd.read_csv("../data/texas_cities.csv")

print(f"Analyzing {len(texas_cities_df)} Texas cities")
texas_cities_df.head(10)
Analyzing 30 Texas cities
Out[7]:
city population region
0 Houston 2300000 Gulf Coast
1 San Antonio 1500000 South Texas
2 Dallas 1340000 North Texas
3 Austin 980000 Central Texas
4 Fort Worth 920000 North Texas
5 El Paso 680000 West Texas
6 Arlington 395000 North Texas
7 Corpus Christi 320000 Gulf Coast
8 Plano 285000 North Texas
9 Laredo 260000 South Texas

Define Research & Ranking Task¶

The task instructs everyrow to research permit times from official sources.

In [8]:
RANKING_TASK = """
Research and score each Texas city by their RESIDENTIAL BUILDING PERMIT processing time.

The score should represent the NUMBER OF BUSINESS DAYS for typical residential permit approval.
Lower numbers = faster = better for real estate investors.

RESEARCH PRIORITIES (in order):
1. Official city development services performance metrics
2. City-stated standard processing times from permit office websites
3. Contractor reports and local builder forum discussions
4. Comparable city estimates if no direct data available

For cities without published data, estimate based on:
- City size (smaller cities often faster)
- Region patterns (some Texas regions known for faster permitting)
- Recent development activity levels

Output the score as estimated business days (e.g., 5 = 5 business days, 30 = 30 business days).
Include the source of information in your reasoning.
"""

Run the Research & Ranking¶

In [9]:
async def run_ranking():
    async with create_session(name="Texas Permit Times Research") as session:
        print(f"Session URL: {session.get_url()}")
        print("\nResearching permit processing times (this may take a few minutes)...\n")
        
        result = await rank(
            session=session,
            task=RANKING_TASK,
            input=texas_cities_df,
            field_name="score",
        )
        
        return result.data

results_df = await run_ranking()
Session URL: https://everyrow.io/sessions/b0ed2d81-3f0b-48fa-a5c7-41251cb826e9

Researching permit processing times (this may take a few minutes)...

Analyze Results¶

In [10]:
# Rename score to permit_days for clarity
results_df = results_df.rename(columns={"score": "permit_days"})

# Sort by permit time (ascending = fastest first)
results_df = results_df.sort_values("permit_days", ascending=True)

print(f"\n{'='*60}")
print("TEXAS CITIES BY PERMIT PROCESSING TIME")
print("(Fastest to Slowest)")
print(f"{'='*60}\n")
============================================================
TEXAS CITIES BY PERMIT PROCESSING TIME
(Fastest to Slowest)
============================================================

In [11]:
# Top 10 fastest
print("TOP 10 FASTEST (Best for Investors):")
print("-" * 50)
for i, (_, row) in enumerate(results_df.head(10).iterrows(), 1):
    print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}")
    if 'research' in row and pd.notna(row['research']):
        print(f"    Source: {str(row['research'])[:60]}...")
    print()
TOP 10 FASTEST (Best for Investors):
--------------------------------------------------
 1. Corpus Christi       |   2 days | Pop: 320,000
    Source: {'score': 'The score is based on official performance metric...

 2. San Antonio          |   3 days | Pop: 1,500,000
    Source: {'score': "The City of San Antonio's Development Services De...

 3. Irving               |   3 days | Pop: 240,000
    Source: {'score': "The City of Irving's official development service...

 4. McKinney             |   3 days | Pop: 200,000
    Source: {'score': "The score is based on the City of McKinney's 'Sin...

 5. McAllen              |   3 days | Pop: 145,000
    Source: {'score': "The score of 3 business days is directly sourced ...

 6. Plano                |   5 days | Pop: 285,000
    Source: {'score': "The City of Plano's Building Inspections departme...

 7. Amarillo             |   5 days | Pop: 200,000
    Source: {'score': "The City of Amarillo reports a typical turnaround...

 8. Brownsville          |   5 days | Pop: 185,000
    Source: {'score': "The City of Brownsville's official Building Permi...

 9. Waco                 |   6 days | Pop: 140,000
    Source: {'score': "The City of Waco's Development Services performan...

10. Garland              |   7 days | Pop: 240,000
    Source: {'score': "The City of Garland explicitly lists 'New Residen...

In [12]:
# Bottom 10 slowest
print("\nTOP 10 SLOWEST (Highest Holding Costs):")
print("-" * 50)
for i, (_, row) in enumerate(results_df.tail(10).iloc[::-1].iterrows(), 1):
    print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}")
TOP 10 SLOWEST (Highest Holding Costs):
--------------------------------------------------
 1. Round Rock           |  30 days | Pop: 130,000
 2. Houston              |  30 days | Pop: 2,300,000
 3. Laredo               |  21 days | Pop: 260,000
 4. Frisco               |  15 days | Pop: 210,000
 5. Austin               |  15 days | Pop: 980,000
 6. El Paso              |  14 days | Pop: 680,000
 7. Arlington            |  12 days | Pop: 395,000
 8. Mesquite             |  10 days | Pop: 150,000
 9. Grand Prairie        |  10 days | Pop: 195,000
10. Killeen              |  10 days | Pop: 155,000
In [13]:
# Average by region
print("\nAVERAGE PERMIT TIME BY REGION:")
print(results_df.groupby("region")["permit_days"].mean().sort_values().to_string())
AVERAGE PERMIT TIME BY REGION:
region
Rio Grande Valley     4.000000
Panhandle             5.000000
North Texas           8.416667
West Texas            9.500000
South Texas          12.000000
Gulf Coast           12.400000
Central Texas        15.250000
In [14]:
# Summary stats
print(f"\nSUMMARY STATISTICS:")
print(f"  Fastest city: {results_df.iloc[0]['city']} ({results_df.iloc[0]['permit_days']} days)")
print(f"  Slowest city: {results_df.iloc[-1]['city']} ({results_df.iloc[-1]['permit_days']} days)")
print(f"  Average: {results_df['permit_days'].mean():.1f} days")
print(f"  Median: {results_df['permit_days'].median():.1f} days")
SUMMARY STATISTICS:
  Fastest city: Corpus Christi (2 days)
  Slowest city: Round Rock (30 days)
  Average: 10.0 days
  Median: 10.0 days
In [15]:
# Full results
results_df[["city", "region", "population", "permit_days", "research"]]
Out[15]:
city region population permit_days research
0 Corpus Christi Gulf Coast 320000 2 {'score': 'The score is based on official perf...
1 San Antonio South Texas 1500000 3 {'score': 'The City of San Antonio's Developme...
2 Irving North Texas 240000 3 {'score': 'The City of Irving's official devel...
3 McKinney North Texas 200000 3 {'score': 'The score is based on the City of M...
4 McAllen Rio Grande Valley 145000 3 {'score': 'The score of 3 business days is dir...
5 Plano North Texas 285000 5 {'score': 'The City of Plano's Building Inspec...
6 Amarillo Panhandle 200000 5 {'score': 'The City of Amarillo reports a typi...
7 Brownsville Rio Grande Valley 185000 5 {'score': 'The City of Brownsville's official ...
8 Waco Central Texas 140000 6 {'score': 'The City of Waco's Development Serv...
9 Garland North Texas 240000 7 {'score': 'The City of Garland explicitly list...
10 Midland West Texas 135000 7 {'score': 'The City of Midland's official Buil...
11 Odessa West Texas 125000 7 {'score': 'The City of Odessa's official websi...
12 Dallas North Texas 1340000 8 {'score': 'The score of 8 business days is bas...
13 Fort Worth North Texas 920000 8 {'score': 'The score of 8 business days is bas...
22 Sugar Land Gulf Coast 110000 10 {'score': 'The City of Sugar Land's Developmen...
21 Pearland Gulf Coast 125000 10 {'score': 'The City of Pearland's Community De...
20 Carrollton North Texas 135000 10 {'score': 'The City of Carrollton's official w...
19 Denton North Texas 140000 10 {'score': 'The City of Denton and Denton Count...
14 Lubbock West Texas 260000 10 {'score': 'The score is derived directly from ...
17 Pasadena Gulf Coast 150000 10 {'score': 'The score for Pasadena, Texas is ba...
16 Killeen Central Texas 155000 10 {'score': 'Official city flowcharts for Killee...
15 Grand Prairie North Texas 195000 10 {'score': 'The score for Grand Prairie, Texas ...
18 Mesquite North Texas 150000 10 {'score': 'The City of Mesquite's official dev...
23 Arlington North Texas 395000 12 {'score': 'The City of Arlington's FY 2025 Ado...
24 El Paso West Texas 680000 14 {'score': 'The score of 14 business days is de...
25 Austin Central Texas 980000 15 {'score': 'The City of Austin's Development Se...
26 Frisco North Texas 210000 15 {'score': 'The estimate of 15 business days is...
27 Laredo South Texas 260000 21 {'score': 'The City of Laredo's official Build...
28 Houston Gulf Coast 2300000 30 {'score': 'The score is based on the City of H...
29 Round Rock Central Texas 130000 30 {'score': 'The City of Round Rock's official P...