Name: everyrow
Author: FutureSearch

The Opportunity

Prediction markets let you trade on the outcome of real-world events: elections, economic data, policy decisions. Prices reflect the crowd's probability estimate: a contract trading at 40 cents implies the market believes there's a 40% chance of YES.

The math is simple: if you can estimate probabilities more accurately than the market, you make money in expectation. Buy YES when you think the true probability is higher than the price. Buy NO when you think it's lower. Over enough trades, better accuracy wins.

The hard part isn't the math, it's the research and the judgment. Kalshi has thousands of open markets at any given time. Finding the ones where the market is wrong requires doing deep research on each question: What's the current state of the world? What are the base rates? What do experts think? What are the strongest arguments on both sides? And then you have to synthesize all of that into an accurate probability estimate, weighing conflicting evidence, calibrating your confidence, and resisting the pull of cognitive biases. No human can do this systematically across hundreds of markets. It's tedious, time-consuming, and the kind of work where human biases like confirmation bias, anchoring, and availability bias creep in and degrade both the research and the judgment.

This is a job ideally suited for AI.

We built a pipeline that scans the highest-volume Kalshi markets, screens out markets where we don't expect to add value and worry about being at an information disadvantage (insider information, sports, crypto), uses AI to do in-depth research and best-practice forecasting for each remaining market, and highlights the markets where the AI's probability estimate disagrees most with the market price. The full research, both sides of every argument, is included for every question.

We think this is useful for anyone trying to forecast underlying probabilities on prediction markets, whether you use the AI's estimates directly or just as a starting point for your own analysis. (This is less relevant for momentum traders, technical traders, or market makers, it's a fundamentals-first tool.)

Our broader aspiration is to build a forecaster that's useful beyond prediction markets, one that helps people reason about complex questions in politics, geopolitics, economics, and policy. Prediction markets give us a hard benchmark to measure against: if the AI can't outperform the crowd's probability estimates, we need to question whether it's adding accuracy at all. If it can, that's a strong signal the same research and reasoning methodology is worth applying to questions that don't have a market price attached.

Disclaimer: This is not investment advice. The forecasts and research presented here are for informational and educational purposes only.

A key advantage of an AI researcher is that the research is 100% transparent and totally neutral. Unlike human analysts who might omit information, intentionally or through bias, the AI researches both sides with equal rigor. You can read every piece of research and every rationale, and decide for yourself whether you agree.

Metric	Value
Markets scanned	~3,500 (all open Kalshi events)
Markets forecasted	Top events by volume, after screening
Screening steps	2 (insider info/moral hazard, then methodology fit)
Forecasting	everyrow `forecast()`
Cost per question	~$0.60

How It Works

The system runs in four stages:

1. Find markets

The pipeline fetches all open events from Kalshi's API (~3,500 events), ranks them by trading volume, and selects the top events. For events with multiple subquestions (e.g., "Who will be the next Fed Chair?" has one subquestion per candidate), it picks the top 2 by volume. High-volume markets are more likely to have meaningful price signals and enough liquidity to actually trade.

Markets are filtered by price range (3%–97%) to exclude near-certain outcomes, by minimum volume to focus on liquid markets, and by resolution date to exclude markets resolving within 10 days. Near-term markets tend to be more information-sensitive, where insiders or close observers have an edge over AI research.

2. Screen for adverse selection and methodology fit

Not all high-volume markets are good candidates for AI forecasting. Before spending money on research, two AI-powered screens filter the market list using everyrow's screen operation:

Insider information screen: Rejects markets where insiders likely know the outcome: pre-taped reality TV shows (Survivor, Bachelor), celebrity personal decisions ("Will X attend the Super Bowl?"), outcomes already determined but not yet public, and markets with moral hazard where a bettor could influence the outcome (stunts, self-fulfilling bets).

Methodology fit screen: Rejects markets where specialized traders have a structural edge over AI research: sports (dedicated bettors with statistical models and injury intel) and cryptocurrency (driven by technical analysis and on-chain data our research agents don't capture). We focus on elections, geopolitics, policy, economics, and similar domains where deep web research and reasoning provide a forecasting edge.

Both screens use everyrow's screen operation, which runs an LLM evaluation on each row and keeps only those that pass. This costs ~$0.01 per market, far cheaper than running full research on a market we shouldn't be trading.

3. Forecast each market

For each market that passes screening, we run everyrow's forecast() utility. Under the hood, forecast() dispatches research agents that perform live web searches, steelman both sides of the question, and synthesize their findings into a calibrated probability estimate with a detailed written rationale.

Every forecast is fully transparent: a multi-paragraph rationale explaining the reasoning, the key uncertainties, and what could change the probability. You can read the research and the rationale for every single position. Nothing is a black box.

4. Compare to market prices

For each market, the system compares the AI forecast to the current Kalshi price, computes the edge (forecast minus market), and sorts by the largest disagreement: the markets where the AI thinks the crowd is most wrong. A summary table shows all questions with the forecast and the edge at a glance.

Recent Run: February 26, 2026

Here are the 30 markets where the AI's median forecast disagrees most with the Kalshi market price, sorted by the size of the disagreement:

Question	Kalshi %	Forecast %	Edge
How many launches will SpaceX have in February 2026? [Above 12]	6	45	+39
Will the US take control of any part of Greenland? [Before January 2027]	41	10	-31
Which companies will have a top-ranked AI model this year? [xAI]	50	78	+28
Which companies will have a top-ranked AI model this year? [OpenAI]	59	85	+26
Will Trump buy at least part of Greenland? [Before January 20, 2029]	29	8	-21
Who will run for the Democratic presidential nomination in 2028? [Gretchen Whitmer]	60	78	+18
Texas Democratic Senate nominee? [Jasmine Crockett]	30	47	+17
2026 Texas Senate matchup? [Talarico vs. Paxton]	62	45	-17
Will Trump take back the Panama Canal?	31	14	-17
Will the U.S. confirm that aliens exist before 2027?	23	8	-15
World leaders out in 2026? [Ali Khamenei]	56	42	-14
Who will leave the Trump administration in 2026? [Kristi Noem]	50	62	+12
Will marijuana be rescheduled? [Before 2027]	56	45	-11
2028 Republican nominee for President? [J.D. Vance]	45	55	+10
Florida Republican Governor nominee? [James Fishback]	16	6	-10
CPI year-over-year in May 2026? [Exactly 2.8%]	26	16	-10
How much will the US acquire Greenland for? [$0 / No Acquisition]	78	87	+9
Will the US acquire any new territory? [Before Jan 2027]	23	14	-9
When will DHS receive full-year funding? [Before Mar 20, 2026]	32	23	-9
Will Trump buy at least part of Greenland? [Before 2027]	14	5	-9
Who will be the next Prime Minister of the UK? [Rupert Lowe]	13	4	-9
Will the US take control of any part of Greenland? [Before 2027]	15	7	-8
Will Americans receive tariff stimulus checks? [Before 2027]	21	13	-8
Will Trump invoke the Insurrection Act? [Before Jan 20, 2029]	56	48	-8
California Governor winner? (Person) [Eric Swalwell]	50	42	-8
Will a court order a tariff refund? [Before 2027]	78	85	+7
2026 Texas Senate matchup? [Crockett vs. Paxton]	25	32	+7
Will Trump be impeached? [Before Jan 1, 2027]	14	7	-7
Ali Khamenei out as Supreme Leader? [Before July 1, 2026]	39	32	-7
Will Reza Pahlavi lead Iran in 2026?	18	12	-6

We think the forecaster is pretty good. The research it produces is thorough, and the rationales are well-reasoned. But we're sure it makes mistakes. We don't expect anyone to take these forecasts at face value, and we don't either. Instead, when we see something unexpected, like a big disagreement with the market, we read the full research and rationales to understand why the AI thinks what it does. More often than not, we walk away having learned something interesting about the question, even when we ultimately disagree with the forecast. The value isn't just the number, it's the structured, neutral research behind it.

The full research, rationales, and forecasts for all 153 questions from this run are available in this spreadsheet.

Example: Which companies will have a top-ranked AI model this year? [xAI]

To give a sense of the depth of research, here's what the system produced for one question. The AI forecasts 78% versus the Kalshi market price of 50%.

Forecast rationale (78%):

This question asks whether xAI will have a #1 ranked AI model at any point before January 1, 2027. The resolution is cumulative: it only needs to happen once.

Most important factor: Likely already achieved. The research consistently indicates that xAI's Grok 4.1 (Thinking) held the sole #1 position on the LMSYS Chatbot Arena in January 2026. Multiple sources corroborate this. If confirmed under Kalshi's resolution criteria, this would already satisfy the condition for YES resolution regardless of future performance.

Even setting aside January 2026: xAI is currently at #2 with an Elo of 1502, just 3 points behind Anthropic's Claude Opus 4.6 at 1505. With 309 days remaining, xAI has multiple "at-bats": the full Grok 4.20 release (currently in beta at #4), the anticipated Grok 5 (68% probability of release by June 30 per Polymarket), and likely additional iterations. Historical base rates show the #1 spot changes 8-12 times per year, and xAI has reached #1 with ~50-66% probability per major version.

Why I'm above the market price of 50%: The market appears to be anchored on current rankings (recency bias) and merger disruption concerns. The NO thesis about "integration tax" from the SpaceX merger is plausible but likely overweighted given xAI released Grok 4.20 Beta in mid-February, demonstrating continued shipping velocity post-merger.

Key Uncertainties:

Did January 2026 count as #1 under Kalshi's rules? If yes (~60% likely given multiple sources say "sole #1"), the question is effectively already resolved YES.

Will Grok 5 deliver on the parameter scaling thesis? If the 6T parameter model launches and underperforms, xAI's main catalyst disappears. But historical patterns of reasoning models reaching #1 strongly favor YES.

Will LMSYS methodology change significantly? A shift to agentic or safety-weighted benchmarks could structurally disadvantage xAI's "uncensored" approach.

Competitive response timing: If OpenAI's GPT-6 or Anthropic's Claude 5 launches with a step-function improvement, the threshold for #1 could move beyond xAI's reach.

Weighting all factors: the combination of likely already having achieved #1, extreme proximity to #1 currently, massive compute infrastructure, aggressive release cadence, and 309 days of remaining opportunity makes YES substantially more likely than the 50% market price suggests.

Try It Yourself

Run forecasts at everyrow.io/app. Sign in with Google, upload or ask for forecasting questions, and we'll dispatch a team of researchers and forecasters to get you calibrated probabilities on all of them. Or add everyrow to Claude Code or Claude.ai and ask it to forecast.

What's Next

We plan to track whether these forecasts would lead to a profitable portfolio using play money, not to encourage trading, but as a hard benchmark for accuracy. If the AI's probability estimates are better than the market's, a simulated portfolio that buys when the forecast is above the market and sells when it's below should make money over time. If it doesn't, we need to improve the forecaster.

FutureSearch lets you run your own team of AI researchers and forecasters on any dataset. Try it for yourself.