Use LLM Agents to research government data at scale¶

This notebook demonstrates using everyrow's rank() utility with web research capabilities to gather and rank real-world data that isn't available in a structured format.

Use Case: Real estate investors need permit processing timelines to evaluate markets—delays directly impact holding costs. But municipalities publish this data inconsistently: some on websites, some in PDFs, some not at all.

Why everyrow? The rank() function can perform web research to find permit processing times from official sources, contractor reports, and comparable city data—then rank cities by speed.

In [6]:

import asyncio
from dotenv import load_dotenv
load_dotenv()

import pandas as pd
from everyrow import create_session
from everyrow.ops import rank

Load Texas Cities Data¶

In [7]:

texas_cities_df = pd.read_csv("../data/texas_cities.csv")

print(f"Analyzing {len(texas_cities_df)} Texas cities")
texas_cities_df.head(10)

Analyzing 30 Texas cities

Out[7]:

	city	population	region
0	Houston	2300000	Gulf Coast
1	San Antonio	1500000	South Texas
2	Dallas	1340000	North Texas
3	Austin	980000	Central Texas
4	Fort Worth	920000	North Texas
5	El Paso	680000	West Texas
6	Arlington	395000	North Texas
7	Corpus Christi	320000	Gulf Coast
8	Plano	285000	North Texas
9	Laredo	260000	South Texas

Define Research & Ranking Task¶

The task instructs everyrow to research permit times from official sources.

In [8]:

RANKING_TASK = """
Research and score each Texas city by their RESIDENTIAL BUILDING PERMIT processing time.

The score should represent the NUMBER OF BUSINESS DAYS for typical residential permit approval.
Lower numbers = faster = better for real estate investors.

RESEARCH PRIORITIES (in order):
1. Official city development services performance metrics
2. City-stated standard processing times from permit office websites
3. Contractor reports and local builder forum discussions
4. Comparable city estimates if no direct data available

For cities without published data, estimate based on:
- City size (smaller cities often faster)
- Region patterns (some Texas regions known for faster permitting)
- Recent development activity levels

Output the score as estimated business days (e.g., 5 = 5 business days, 30 = 30 business days).
Include the source of information in your reasoning.
"""

Run the Research & Ranking¶

In [9]:

async def run_ranking():
    async with create_session(name="Texas Permit Times Research") as session:
        print(f"Session URL: {session.get_url()}")
        print("\nResearching permit processing times (this may take a few minutes)...\n")
        
        result = await rank(
            session=session,
            task=RANKING_TASK,
            input=texas_cities_df,
            field_name="score",
        )
        
        return result.data

results_df = await run_ranking()

Session URL: https://everyrow.io/sessions/b0ed2d81-3f0b-48fa-a5c7-41251cb826e9

Researching permit processing times (this may take a few minutes)...

Analyze Results¶

In [10]:

# Rename score to permit_days for clarity
results_df = results_df.rename(columns={"score": "permit_days"})

# Sort by permit time (ascending = fastest first)
results_df = results_df.sort_values("permit_days", ascending=True)

print(f"\n{'='*60}")
print("TEXAS CITIES BY PERMIT PROCESSING TIME")
print("(Fastest to Slowest)")
print(f"{'='*60}\n")

============================================================
TEXAS CITIES BY PERMIT PROCESSING TIME
(Fastest to Slowest)
============================================================

In [11]:

# Top 10 fastest
print("TOP 10 FASTEST (Best for Investors):")
print("-" * 50)
for i, (_, row) in enumerate(results_df.head(10).iterrows(), 1):
    print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}")
    if 'research' in row and pd.notna(row['research']):
        print(f"    Source: {str(row['research'])[:60]}...")
    print()

TOP 10 FASTEST (Best for Investors):
--------------------------------------------------
 1. Corpus Christi       |   2 days | Pop: 320,000
    Source: {'score': 'The score is based on official performance metric...

 2. San Antonio          |   3 days | Pop: 1,500,000
    Source: {'score': "The City of San Antonio's Development Services De...

 3. Irving               |   3 days | Pop: 240,000
    Source: {'score': "The City of Irving's official development service...

 4. McKinney             |   3 days | Pop: 200,000
    Source: {'score': "The score is based on the City of McKinney's 'Sin...

 5. McAllen              |   3 days | Pop: 145,000
    Source: {'score': "The score of 3 business days is directly sourced ...

 6. Plano                |   5 days | Pop: 285,000
    Source: {'score': "The City of Plano's Building Inspections departme...

 7. Amarillo             |   5 days | Pop: 200,000
    Source: {'score': "The City of Amarillo reports a typical turnaround...

 8. Brownsville          |   5 days | Pop: 185,000
    Source: {'score': "The City of Brownsville's official Building Permi...

 9. Waco                 |   6 days | Pop: 140,000
    Source: {'score': "The City of Waco's Development Services performan...

10. Garland              |   7 days | Pop: 240,000
    Source: {'score': "The City of Garland explicitly lists 'New Residen...

In [12]:

# Bottom 10 slowest
print("\nTOP 10 SLOWEST (Highest Holding Costs):")
print("-" * 50)
for i, (_, row) in enumerate(results_df.tail(10).iloc[::-1].iterrows(), 1):
    print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}")

TOP 10 SLOWEST (Highest Holding Costs):
--------------------------------------------------
 1. Round Rock           |  30 days | Pop: 130,000
 2. Houston              |  30 days | Pop: 2,300,000
 3. Laredo               |  21 days | Pop: 260,000
 4. Frisco               |  15 days | Pop: 210,000
 5. Austin               |  15 days | Pop: 980,000
 6. El Paso              |  14 days | Pop: 680,000
 7. Arlington            |  12 days | Pop: 395,000
 8. Mesquite             |  10 days | Pop: 150,000
 9. Grand Prairie        |  10 days | Pop: 195,000
10. Killeen              |  10 days | Pop: 155,000

In [13]:

# Average by region
print("\nAVERAGE PERMIT TIME BY REGION:")
print(results_df.groupby("region")["permit_days"].mean().sort_values().to_string())

AVERAGE PERMIT TIME BY REGION:
region
Rio Grande Valley     4.000000
Panhandle             5.000000
North Texas           8.416667
West Texas            9.500000
South Texas          12.000000
Gulf Coast           12.400000
Central Texas        15.250000

In [14]:

# Summary stats
print(f"\nSUMMARY STATISTICS:")
print(f"  Fastest city: {results_df.iloc[0]['city']} ({results_df.iloc[0]['permit_days']} days)")
print(f"  Slowest city: {results_df.iloc[-1]['city']} ({results_df.iloc[-1]['permit_days']} days)")
print(f"  Average: {results_df['permit_days'].mean():.1f} days")
print(f"  Median: {results_df['permit_days'].median():.1f} days")

SUMMARY STATISTICS:
  Fastest city: Corpus Christi (2 days)
  Slowest city: Round Rock (30 days)
  Average: 10.0 days
  Median: 10.0 days

In [15]:

# Full results
results_df[["city", "region", "population", "permit_days", "research"]]

Out[15]:

	city	region	population	permit_days	research
0	Corpus Christi	Gulf Coast	320000	2	{'score': 'The score is based on official perf...
1	San Antonio	South Texas	1500000	3	{'score': 'The City of San Antonio's Developme...
2	Irving	North Texas	240000	3	{'score': 'The City of Irving's official devel...
3	McKinney	North Texas	200000	3	{'score': 'The score is based on the City of M...
4	McAllen	Rio Grande Valley	145000	3	{'score': 'The score of 3 business days is dir...
5	Plano	North Texas	285000	5	{'score': 'The City of Plano's Building Inspec...
6	Amarillo	Panhandle	200000	5	{'score': 'The City of Amarillo reports a typi...
7	Brownsville	Rio Grande Valley	185000	5	{'score': 'The City of Brownsville's official ...
8	Waco	Central Texas	140000	6	{'score': 'The City of Waco's Development Serv...
9	Garland	North Texas	240000	7	{'score': 'The City of Garland explicitly list...
10	Midland	West Texas	135000	7	{'score': 'The City of Midland's official Buil...
11	Odessa	West Texas	125000	7	{'score': 'The City of Odessa's official websi...
12	Dallas	North Texas	1340000	8	{'score': 'The score of 8 business days is bas...
13	Fort Worth	North Texas	920000	8	{'score': 'The score of 8 business days is bas...
22	Sugar Land	Gulf Coast	110000	10	{'score': 'The City of Sugar Land's Developmen...
21	Pearland	Gulf Coast	125000	10	{'score': 'The City of Pearland's Community De...
20	Carrollton	North Texas	135000	10	{'score': 'The City of Carrollton's official w...
19	Denton	North Texas	140000	10	{'score': 'The City of Denton and Denton Count...
14	Lubbock	West Texas	260000	10	{'score': 'The score is derived directly from ...
17	Pasadena	Gulf Coast	150000	10	{'score': 'The score for Pasadena, Texas is ba...
16	Killeen	Central Texas	155000	10	{'score': 'Official city flowcharts for Killee...
15	Grand Prairie	North Texas	195000	10	{'score': 'The score for Grand Prairie, Texas ...
18	Mesquite	North Texas	150000	10	{'score': 'The City of Mesquite's official dev...
23	Arlington	North Texas	395000	12	{'score': 'The City of Arlington's FY 2025 Ado...
24	El Paso	West Texas	680000	14	{'score': 'The score of 14 business days is de...
25	Austin	Central Texas	980000	15	{'score': 'The City of Austin's Development Se...
26	Frisco	North Texas	210000	15	{'score': 'The estimate of 15 business days is...
27	Laredo	South Texas	260000	21	{'score': 'The City of Laredo's official Build...
28	Houston	Gulf Coast	2300000	30	{'score': 'The score is based on the City of H...
29	Round Rock	Central Texas	130000	30	{'score': 'The City of Round Rock's official P...

Use LLM Agents to research government data at scale¶

This notebook demonstrates using everyrow's rank() utility with web research capabilities to gather and rank real-world data that isn't available in a structured format.

Why everyrow? The rank() function can perform web research to find permit processing times from official sources, contractor reports, and comparable city data—then rank cities by speed.

city

population

region

Houston

2300000

Gulf Coast

San Antonio

1500000

South Texas

Dallas

1340000

North Texas

Austin

980000

Central Texas

Fort Worth

920000

North Texas

El Paso

680000

West Texas

Arlington

395000

North Texas

Corpus Christi

320000

Gulf Coast

Plano

285000

North Texas

Laredo

260000

South Texas

RANKING_TASK = """ Research and score each Texas city by their RESIDENTIAL BUILDING PERMIT processing time. The score should represent the NUMBER OF BUSINESS DAYS for typical residential permit approval. Lower numbers = faster = better for real estate investors. RESEARCH PRIORITIES (in order): 1. Official city development services performance metrics 2. City-stated standard processing times from permit office websites 3. Contractor reports and local builder forum discussions 4. Comparable city estimates if no direct data available For cities without published data, estimate based on: - City size (smaller cities often faster) - Region patterns (some Texas regions known for faster permitting) - Recent development activity levels Output the score as estimated business days (e.g., 5 = 5 business days, 30 = 30 business days). Include the source of information in your reasoning. """

async def run_ranking(): async with create_session(name="Texas Permit Times Research") as session: print(f"Session URL: {session.get_url()}") print("\nResearching permit processing times (this may take a few minutes)...\n") result = await rank( session=session, task=RANKING_TASK, input=texas_cities_df, field_name="score", ) return result.data results_df = await run_ranking()

# Rename score to permit_days for clarity results_df = results_df.rename(columns={"score": "permit_days"}) # Sort by permit time (ascending = fastest first) results_df = results_df.sort_values("permit_days", ascending=True) print(f"\n{'='*60}") print("TEXAS CITIES BY PERMIT PROCESSING TIME") print("(Fastest to Slowest)") print(f"{'='*60}\n")

# Top 10 fastest print("TOP 10 FASTEST (Best for Investors):") print("-" * 50) for i, (_, row) in enumerate(results_df.head(10).iterrows(), 1): print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}") if 'research' in row and pd.notna(row['research']): print(f" Source: {str(row['research'])[:60]}...") print()

TOP 10 FASTEST (Best for Investors): -------------------------------------------------- 1. Corpus Christi | 2 days | Pop: 320,000 Source: {'score': 'The score is based on official performance metric... 2. San Antonio | 3 days | Pop: 1,500,000 Source: {'score': "The City of San Antonio's Development Services De... 3. Irving | 3 days | Pop: 240,000 Source: {'score': "The City of Irving's official development service... 4. McKinney | 3 days | Pop: 200,000 Source: {'score': "The score is based on the City of McKinney's 'Sin... 5. McAllen | 3 days | Pop: 145,000 Source: {'score': "The score of 3 business days is directly sourced ... 6. Plano | 5 days | Pop: 285,000 Source: {'score': "The City of Plano's Building Inspections departme... 7. Amarillo | 5 days | Pop: 200,000 Source: {'score': "The City of Amarillo reports a typical turnaround... 8. Brownsville | 5 days | Pop: 185,000 Source: {'score': "The City of Brownsville's official Building Permi... 9. Waco | 6 days | Pop: 140,000 Source: {'score': "The City of Waco's Development Services performan... 10. Garland | 7 days | Pop: 240,000 Source: {'score': "The City of Garland explicitly lists 'New Residen...

# Bottom 10 slowest print("\nTOP 10 SLOWEST (Highest Holding Costs):") print("-" * 50) for i, (_, row) in enumerate(results_df.tail(10).iloc[::-1].iterrows(), 1): print(f"{i:2}. {row['city']:20} | {row['permit_days']:3} days | Pop: {row['population']:,}")

AVERAGE PERMIT TIME BY REGION: region Rio Grande Valley 4.000000 Panhandle 5.000000 North Texas 8.416667 West Texas 9.500000 South Texas 12.000000 Gulf Coast 12.400000 Central Texas 15.250000

# Summary stats print(f"\nSUMMARY STATISTICS:") print(f" Fastest city: {results_df.iloc[0]['city']} ({results_df.iloc[0]['permit_days']} days)") print(f" Slowest city: {results_df.iloc[-1]['city']} ({results_df.iloc[-1]['permit_days']} days)") print(f" Average: {results_df['permit_days'].mean():.1f} days") print(f" Median: {results_df['permit_days'].median():.1f} days")

city

region

population

permit_days

research

Corpus Christi

Gulf Coast

320000

{'score': 'The score is based on official perf...

San Antonio

South Texas

1500000

{'score': 'The City of San Antonio's Developme...

Irving

North Texas

240000

{'score': 'The City of Irving's official devel...

McKinney

North Texas

200000

{'score': 'The score is based on the City of M...

McAllen

Rio Grande Valley

145000

{'score': 'The score of 3 business days is dir...

Plano

North Texas

285000

{'score': 'The City of Plano's Building Inspec...

Amarillo

Panhandle

200000

{'score': 'The City of Amarillo reports a typi...

Brownsville

Rio Grande Valley

185000

{'score': 'The City of Brownsville's official ...

Waco

Central Texas

140000

{'score': 'The City of Waco's Development Serv...

Garland

North Texas

240000

{'score': 'The City of Garland explicitly list...

Midland

West Texas

135000

{'score': 'The City of Midland's official Buil...

Odessa

West Texas

125000

{'score': 'The City of Odessa's official websi...

Dallas

North Texas

1340000

{'score': 'The score of 8 business days is bas...

Fort Worth

North Texas

920000

{'score': 'The score of 8 business days is bas...

Sugar Land

Gulf Coast

110000

{'score': 'The City of Sugar Land's Developmen...

Pearland

Gulf Coast

125000

{'score': 'The City of Pearland's Community De...

Carrollton

North Texas

135000

{'score': 'The City of Carrollton's official w...

Denton

North Texas

140000

{'score': 'The City of Denton and Denton Count...

Lubbock

West Texas

260000

{'score': 'The score is derived directly from ...

Pasadena

Gulf Coast

150000

{'score': 'The score for Pasadena, Texas is ba...

Killeen

Central Texas

155000

{'score': 'Official city flowcharts for Killee...

Grand Prairie

North Texas

195000

{'score': 'The score for Grand Prairie, Texas ...

Mesquite

North Texas

150000

{'score': 'The City of Mesquite's official dev...

Arlington

North Texas

395000

{'score': 'The City of Arlington's FY 2025 Ado...

El Paso

West Texas

680000

{'score': 'The score of 14 business days is de...

Austin

Central Texas

980000

{'score': 'The City of Austin's Development Se...

Frisco

North Texas

210000

{'score': 'The estimate of 15 business days is...

Laredo

South Texas

260000

{'score': 'The City of Laredo's official Build...

Houston

Gulf Coast

2300000

{'score': 'The score is based on the City of H...

Round Rock

Central Texas

130000

{'score': 'The City of Round Rock's official P...