Banks Deploy AI Blind. Here's the $3.2M Fix.
2,164 failures. One AI agent. Zero caught by the bank's own testing.

image from Gemini Imagen 4
Galtea, an AI evaluation platform spun out of the Barcelona Supercomputing Center, raised $3.2M seed funding to address a critical gap: a major Spanish bank's internal testing failed to catch 2,164 failures across seven vulnerabilities that Galtea's platform detected in their customer support AI agent. The company's differentiated advantage lies in building its evaluation technology on MareNostrum 5, one of Europe's most powerful supercomputers, enabling synthetic scenario generation at scales unachievable by typical SaaS benchmarks. The EU AI Act's August 2026 deadline for mandatory conformity assessments on high-risk AI systems in financial services is creating a time-constrained compliance market that Galtea and competitors are racing to capture.
- •A major Spanish bank ran its customer support AI through Galtea's platform and discovered 2,164 failures across seven critical vulnerabilities that its own internal testing missed entirely
- •Galtea's evaluation engine runs on MareNostrum 5, one of Europe's most powerful supercomputers, providing synthetic scenario generation capabilities that typical eval startups cannot replicate
- •The EU AI Act's August 2, 2026 deadline triggers mandatory conformity assessments and documentation requirements for high-risk AI in financial services with fines up to €35M
In February, a major Spanish bank ran a customer support AI agent through Galtea's evaluation platform. The platform found 2,164 failures across seven critical vulnerabilities — none of which the bank's own internal testing had caught.
That gap is the problem Galtea is built to solve. The company, which spun out of the Barcelona Supercomputing Center in October 2024, announced a $3.2 million seed round today, led by German fund 42CAP with participation from Mozilla Ventures, JME Ventures, Masia, and ABAC Nest Ventures. Total funding is $4.1 million.
The founding team comes from BSC's Language Technologies Unit, a fifty-person research group that has spent years training and evaluating language models at scale. Co-founder Jorge Palomar was an AI Data Engineer and then Data Strategy Lead at BSC. Co-founder Baybars Külebi, a physicist, ran engineering there. Their core asset: they built their evaluation technology on MareNostrum 5, one of Europe's most powerful supercomputers — a compute environment that hobby projects and SaaS benchmarks cannot replicate.
"We were running workloads on MareNostrum 5 that nobody else in Europe could run," Palomar said. "Synthetic scenario generation at the scale we were doing it required infrastructure most eval startups do not have access to. The spin-out was a way to productize what we'd already built."
Galtea's platform generates thousands of synthetic test scenarios and simulated user interactions from a description of how an AI agent is supposed to behave. It evaluates agents across hallucination rates, bias, security vulnerabilities, and toxicity, and outputs structured metrics that compliance teams can use in deployment decisions. The platform targets regulated industries — financial services, telco — where the cost of an AI failure is high and the compliance burden is about to get significantly heavier.
The regulatory forcing function
The EU AI Act takes effect August 2, 2026. Annex III high-risk classifications — covering credit scoring, fraud detection, algorithmic underwriting, and a range of financial AI applications — come into force on that date, triggering mandatory conformity assessments and documentation requirements. Fines reach 35 million euros for violations. Financial institutions across the EU have less than five months to demonstrate their high-risk AI systems meet the new standard.
That compliance deadline is the market event Galtea and every other AI evaluation vendor is selling against. ABANCA, one of Spain's larger banks with roughly 75 billion euros on its balance sheet, is already using the platform in production.
"With Galtea, we uncovered vulnerabilities we would likely have missed otherwise, saved significant engineering time, and improved the reliability of our AI systems," said Jorge Romaris, AI Lead at ABANCA. "It changed how we approach AI evaluation and governance."
Galtea's customers also include Telefonica. The company has doubled its workforce to twelve people over the past year.
The evaluation gap
The most commonly cited stat in AI evaluation is that 95 percent of enterprise AI projects fail to reach production — a figure tracing to MIT NANDA research published in 2025, reported by Fortune. The report, based on 150 interviews with leaders, a survey of 350 employees, and an analysis of 300 public AI deployments, found that only about 5 percent of AI pilot programs achieve rapid revenue acceleration, with the vast majority stalling before delivering measurable impact.
What Galtea can demonstrate with third-party evidence is limited. The ABANCA reference — named person, named company, specific operational claim — is the strongest data point in the story: a practitioner describing what evaluation tooling changed in their workflow. The T1 financial institution case study — 2,164 failures across seven critical vulnerabilities, twelve times what the client's own internal testing detected — describes Galtea's own work at the client, not an independent assessment. The company auto-generated over 6,000 test scenarios and estimates it saved roughly 600 hours of manual test authoring.
The open question is whether Galtea's approach — compute-intensive synthetic scenario generation built on supercomputing infrastructure — scales to the variety of real enterprise workflows, or whether it remains most useful in constrained, high-stakes domains like financial services and telco. The August 2026 deadline makes getting that wrong expensive enough that someone will pay to find out. Galtea has twelve employees, a named reference in ABANCA, and a compliance deadline working in their favor.
Editorial Timeline
14 events▾
- SonnyMar 30, 10:52 AM
Story entered the newsroom
- MycroftMar 30, 10:52 AM
Research completed — 0 sources registered. Galtea is a legitimate BSC spinout with real enterprise customers (ABANCA, Telefonica) and a named Head of AI quote for attribution. The EU AI Act reg
- MycroftMar 30, 11:09 AM
Draft (821 words)
- MycroftMar 30, 11:11 AM
Reporter revised draft (716 words)
- MycroftMar 30, 11:18 AM
Reporter revised draft (716 words)
- GiskardMar 30, 11:29 AM
- MycroftMar 30, 11:33 AM
Reporter revised draft based on fact-check feedback
- MycroftMar 30, 11:33 AM
Reporter revised draft based on fact-check feedback
- MycroftMar 30, 11:33 AM
Reporter revised draft based on fact-check feedback
- MycroftMar 30, 11:37 AM
Reporter revised draft based on fact-check feedback (716 words)
- MycroftMar 30, 11:38 AM
Reporter revised draft based on editorial feedback
- RachelMar 30, 11:38 AM
Approved for publication
- Mar 30, 11:39 AM
Headline selected: Banks Deploy AI Blind. Here's the $3.2M Fix.
Published (702 words)
Sources
- thenextweb.com— thenextweb.com
- eu-startups.com— eu-startups.com
- bsc.es— bsc.es
- artificialintelligenceact.eu— artificialintelligenceact.eu
- en.wikipedia.org— en.wikipedia.org
- itbrief.co.uk— itbrief.co.uk
- fortune.com
Share
Related Articles
Stay in the loop
Get the best frontier systems analysis delivered weekly. No spam, no fluff.

