HomeWorldAI Is Getting Scary Good at Making Predictions

AI Is Getting Scary Good at Making Predictions


To live in time is to wonder what will happen next. In every human society, there are people who obsess over the world’s patterns to predict the future. In antiquity, they told kings which stars would appear at nightfall. Today they build the quantitative models that nudge governments into opening spigots of capital. They pick winners on Wall Street. They estimate the likelihood of earthquakes for insurance companies. They tell commodities traders at hedge funds about the next month’s weather.

For years, some elite forecasters have been competing against one another in tournaments where they answer questions about events that will happen—or not—in the coming months or years. The questions span diverse subject matter because they’re meant to measure general forecasting ability, not narrow expertise. Players may be asked whether a coup will occur in an unstable country, or to project the future deforestation rate in some part of the Amazon. They may be asked how many songs from a forthcoming Taylor Swift album will top the streaming charts. The forecaster who makes the most accurate predictions, as early as possible, can earn a cash prize and, perhaps more important, the esteem of the world’s most talented seers.

These tournaments have become much more popular during the recent boom of prediction markets such as Polymarket and Kalshi, where hundreds of thousands of people around the world now trade billions of dollars a month on similar sorts of forecasting questions. And now AIs are playing in them, too. At first, the bots didn’t fare too well: At the end of 2024, no AI had even managed to place 100th in one of the major competitions. But they have since vaulted up the leaderboards. AIs have already proved that they can make superhuman predictions within the bounded context of a board game, but they may soon be better than us at divining the future of our entire messy, contingent world.


Three times a year, the forecasting platform Metaculus hosts a tournament that is known to have especially difficult questions. It generally attracts the more serious forecasters, Ben Shindel, a materials scientist who ranked third among participants in a recent competition, told me. Last year, at its Summer Cup, a London-based start-up called Mantic entered an AI prediction engine. Like other participants, the Mantic AI had to answer 60 questions by assigning probabilities to certain outcomes. The AI had to guess how the battle lines in Ukraine would shift. It had to pick the winner of the Tour de France and estimate Superman’s global box-office gross during its opening weekend. It had to say whether China would ban the export of a rare earth element, and predict whether a major hurricane would strike the Atlantic coast before September. It had to figure out whether Elon Musk and Donald Trump would disparage each other, in public, within a certain range of dates.

A few months later, the guesses from Mantic’s prediction engine and the other tournament participants were scored against the real-life outcomes and one another. The AI placed eighth out of more than 500 entrants, a new record for a bot. “It was an unexpected breakthrough” according to Toby Shevlane, Mantic’s CEO. Shevlane told me that he left a cushy gig as a research scientist at Google DeepMind to co-found the company. He wanted to celebrate the AI’s triumph, but he worried that it had been the product of some lucky guesses. He and his team entered a new version of it into the Metaculus Fall Cup. That bot did even better. Not only did it finish fourth, another record, it beat a weighted average of all human-forecaster predictions. It proved itself wiser than the wisdom of a pretty wise crowd.

Mantic’s AI engine is designed to make accurate forecasts in just about any domain. Shevlane wouldn’t show me the engine’s interface, and he was cagey about its precise construction. He described it only as a “scaffolding” that comprises several large language models with different inclinations. These individual LLMs are themselves getting much better at general forecasts, especially those made by OpenAI, Anthropic, and Google. That’s partly because good forecasting requires reading and processing enormous amounts of information. To guess the winner of the Tour de France, for example, a human forecaster might spend hours building a basic regression model based on previous years’ results, while also scouring injury and conditioning reports and reading commentary from fans and experts. AIs have a natural advantage here. They can read much faster than humans, and their cognitive skills don’t break down after a string of all-nighters.

Last year, a team advised by Haifeng Xu, a professor at the University of Chicago, built a benchmarking service that evaluates AI’s predictions on a continuing basis. Almost every day, it asks the major models new questions pulled from the betting markets on Kalshi. (It recently asked them who Apple’s next CEO would be and also who would star in the upcoming season of The White Lotus.) Their accuracy scores continually update as the questions resolve. “They all have different forecasting personalities,” Xu told me. The version of ChatGPT that the service evaluates is conservative, perhaps too conservative; on Xu’s leaderboard, it currently trails versions of Grok and Gemini.

Mantic’s prediction engine combines a bunch of LLMs and assigns each one different tasks. One might serve as an expert on a database of election results. Another might be asked to scan weather data, economic outcomes, or box-office receipts, depending on the question that it’s attacking. The models work together as a team to generate a final prediction. Shevlane told me that Mantic is using its computing resources to experiment with more complex scaffoldings, which make use of even more models. I asked him whether they have sought AI’s input on the general structure of these scaffoldings. Not yet, he said, but like almost everyone else, they are using it to help write the code for their prediction engines.

A company called Lightning Rod Labs has been experimenting with predictive models that are purpose-built for specific domains. They have even designed one to predict Trump’s erratic behavior. Ben Turtel, the company’s CEO, told me that his team presented to the model a set of more than 2,000 forecasting questions with known outcomes that were not included in its training data. Then the model checked its answers against the things that Trump had actually done, and learned from its mistakes. When the company had the small model forecast Trump’s behavior on a new set of questions—whether he would meet with Xi Jinping in person, for example, or attend the Army-Navy football game—it outperformed OpenAI’s most advanced models.

[Read: Do you feel the AGI yet?]

This year could be big for AI prediction. In January, Mantic entered its most recent souped-up engine into the Metaculus Spring Cup for 2026. It has already been asked how many Oscars Sinners will win and if the United States will soon attack Iran. By May, these questions will resolve, and we will see how the engine fared. If it moves up one spot from its most recent finish, it will become the first AI to medal in a major prediction tournament.

If the AI takes gold, that might signal a new era. Human beings—predictors of eclipses, theorists of cosmic heat death—may no longer be the best guides to the future. From this point on, for as long as we exist, we might be asking AIs what comes next. We won’t always understand how they arrived at their predictions. This crystal ball may be like a black hole with an event horizon, past which the light of its insight cannot escape. We may just have to take it at its word.

So far, elite human forecasters have been pretty good sports about this possibility. When I spoke with Shindel, the highly ranked forecaster, he had nothing but admiring things to say about the AIs. “Their reasoning capabilities are very good,” he told me. “They don’t have the same biases that people have. They can find out about news right as it happens, and they don’t become attached to their predictions.” On Metaculus, a group of forecasters has taken to estimating when AIs will have the chops to out-predict an elite team of humans. Last January, they said there was about a 75 percent chance this would happen by 2030. Now they think it’s more like 95 percent.

- Advertisment -

Most Popular

Recent Comments