Banks Deploy AI Blind. Here's the $3.2M Fix.

PREVIEWBanks Deploy AI Blind. Here's the $3.2M Fix. · MD

In February, a major Spanish bank ran a customer support AI agent through Galtea's evaluation platform. The platform found 2,164 failures across seven critical vulnerabilities — none of which the bank's own internal testing had caught.

That gap is the problem Galtea is built to solve. The company, which spun out of the Barcelona Supercomputing Center in October 2024, announced a $3.2 million seed round today, led by German fund 42CAP with participation from Mozilla Ventures, JME Ventures, Masia, and ABAC Nest Ventures. Total funding is $4.1 million.

The founding team comes from BSC's Language Technologies Unit, a fifty-person research group that has spent years training and evaluating language models at scale. Co-founder Jorge Palomar was an AI Data Engineer and then Data Strategy Lead at BSC. Co-founder Baybars Külebi, a physicist, ran engineering there. Their core asset: they built their evaluation technology on MareNostrum 5, one of Europe's most powerful supercomputers — a compute environment that hobby projects and SaaS benchmarks cannot replicate.

"We were running workloads on MareNostrum 5 that nobody else in Europe could run," Palomar said. "Synthetic scenario generation at the scale we were doing it required infrastructure most eval startups do not have access to. The spin-out was a way to productize what we'd already built."

Galtea's platform generates thousands of synthetic test scenarios and simulated user interactions from a description of how an AI agent is supposed to behave. It evaluates agents across hallucination rates, bias, security vulnerabilities, and toxicity, and outputs structured metrics that compliance teams can use in deployment decisions. The platform targets regulated industries — financial services, telco — where the cost of an AI failure is high and the compliance burden is about to get significantly heavier.

The regulatory forcing function

The EU AI Act takes effect August 2, 2026. Annex III high-risk classifications — covering credit scoring, fraud detection, algorithmic underwriting, and a range of financial AI applications — come into force on that date, triggering mandatory conformity assessments and documentation requirements. Fines reach 35 million euros for violations. Financial institutions across the EU have less than five months to demonstrate their high-risk AI systems meet the new standard.

That compliance deadline is the market event Galtea and every other AI evaluation vendor is selling against. ABANCA, one of Spain's larger banks with roughly 75 billion euros on its balance sheet, is already using the platform in production.

"With Galtea, we uncovered vulnerabilities we would likely have missed otherwise, saved significant engineering time, and improved the reliability of our AI systems," said Jorge Romaris, AI Lead at ABANCA. "It changed how we approach AI evaluation and governance."

Galtea's customers also include Telefonica. The company has doubled its workforce to twelve people over the past year.

The evaluation gap

The most commonly cited stat in AI evaluation is that 95 percent of enterprise AI projects fail to reach production — a figure tracing to MIT NANDA research published in 2025, reported by Fortune. The report, based on 150 interviews with leaders, a survey of 350 employees, and an analysis of 300 public AI deployments, found that only about 5 percent of AI pilot programs achieve rapid revenue acceleration, with the vast majority stalling before delivering measurable impact.

What Galtea can demonstrate with third-party evidence is limited. The ABANCA reference — named person, named company, specific operational claim — is the strongest data point in the story: a practitioner describing what evaluation tooling changed in their workflow. The T1 financial institution case study — 2,164 failures across seven critical vulnerabilities, twelve times what the client's own internal testing detected — describes Galtea's own work at the client, not an independent assessment. The company auto-generated over 6,000 test scenarios and estimates it saved roughly 600 hours of manual test authoring.

The open question is whether Galtea's approach — compute-intensive synthetic scenario generation built on supercomputing infrastructure — scales to the variety of real enterprise workflows, or whether it remains most useful in constrained, high-stakes domains like financial services and telco. The August 2026 deadline makes getting that wrong expensive enough that someone will pay to find out. Galtea has twelve employees, a named reference in ABANCA, and a compliance deadline working in their favor.

Banks Deploy AI Blind. Here's the $3.2M Fix. — type0 | type0

Banks Deploy AI Blind. Here's the $3.2M Fix.

Sources