Current methods for evaluating artificial intelligence models were not designed for underwriting and are not fit for that purpose, Gallagher Re warns.
The global reinsurance broker published a new report arguing that the gap between how models are assessed and what insurers need to know is holding back the AI risk market.
Most AI models are assessed using benchmarks, standardised tests that score capability against fixed tasks. Ed Pocock, global head of cyber security at Gallagher Re, said insurers need to understand failure, not just performance.
“They indicate what a model can do under controlled conditions, but insurers are concerned with how models fail, how often they fail, and whether those failures could be correlated across a portfolio,” he said.
Benchmark contamination compounds the problem. Models are increasingly shaped by the tests used to evaluate them. This can inflate published scores and reduce their usefulness as a guide to real-world reliability.
Where only a narrow set of behaviours is modelled, attempts to boost benchmark scores can increase model homogeneity. Pocock said this “risks erasing useful differentiation between systems and increasing concentration risk.”
This concentration risk is not theoretical. A November 2025 outage involving a major technology provider was estimated to have caused losses of between $5 billion and $15 billion, based on industry assessments.
Munich Re cited the event as evidence of the systemic exposure that builds when insurers and businesses rely on a limited number of shared technology providers. It is precisely the kind of correlated failure that current benchmark-based evaluation methods are not designed to capture.
Gallagher Re’s report also examines restricted-distribution AI models. It uses Anthropic’s Mythos model, released under its Project Glasswing programme to a vetted group of partners, as its primary example.
Gallagher Re argues that if the most capable models are kept from independent evaluators, insurers lose the ability to price risk accurately.
“If a model cannot be independently evaluated, it cannot be meaningfully priced,” Pocock said. He added that without independent access, insurers could end up loading for uncertainty rather than reflecting actual risk, raising costs for everyone.
The stakes are significant. According to Munich Re and ERGO’s Tech Trend Radar 2026, almost two-thirds of executives, 63%, said they want to buy insurance against AI-related risks.
AI adoption among those surveyed has jumped to 57%. Two in three executives expect AI to deliver gains, while 23% fear the opposite. This level of demand makes accurate pricing not just a technical challenge but a market-shaping one.
The report calls for evaluation that tests AI systems under real-world inputs and adversarial conditions. It also calls for testing over time as models are updated.
Gallagher Re argues the reinsurance industry can influence which models are deployed and how transparently they are evaluated.
This argument sits within a broader push from Gallagher Re itself. The broker’s Q1 2026 InsurTech report identified AI liability insurance as the next major growth frontier.
Global deputy head of InsurTech Freddie Scarratt said third-party AI liability cover was on track to mirror the rise of the cyber reinsurance market.
“The silent risk of AI is becoming audible, and ignoring it is no longer an option,” he said.
For that market to develop, Pocock argued, evaluation standards must improve first: “Better evaluation gives the market the tools to reward transparency and robustness. Without it, we risk defaulting to scale and brand as proxies for safety, which could amplify the concentration risks we’ll need to manage.”