A New GPU-Parallel Optimizer Finds Every Peak on a Standard Benchmark Where CPU Methods Struggle

A New GPU-Parallel Optimizer Finds Every Peak on a Standard Benchmark Where CPU Methods Struggle — type0 | type0

PREVIEWA New GPU-Parallel Optimizer Finds Every Peak on a Standard Benchmark Where CPU Methods Struggle · MD

For most optimization problems, "find the best answer" means searching a function's landscape for a single global peak. Multimodal black-box problems, functions with many peaks and no formula for evaluating gradients, are harder: the goal is to find every peak, because each may represent a materially different solution to the underlying problem. Standard CPU-based optimizers like basin-hopping and CMA-ES, which evaluate one candidate solution at a time and refine from there, collapse past about eight dimensions on the hardest multimodal test functions. A new preprint called CHISAO argues it does not.

CHISAO (Convergence-Halt-Invert-Stick-And-Oscillate) is a research optimizer built from the ground up for graphics processors. Rather than evaluating solutions one at a time, it launches an entire batch of candidate samples simultaneously and lets them search the function in parallel. The authors report 100% mode recovery across all 42 functions in the Simon Fraser University optimization benchmark suite, tested across dimensionalities from 2 up to 64. On the Michalewicz function at 64 dimensions, a notoriously rugged multimodal benchmark, they report up to a 34x speedup over basin-hopping, and up to 39x on the Rotated Hyper-Ellipsoid, a number the authors themselves describe as the "GPU dividend": the speedup that comes from running a parallel population on a GPU rather than from the algorithmic contribution.

Behind those numbers is a deliberate oscillation strategy. CHISAO alternates between two phases. When a sample batch converges on a candidate peak, that peak is "stuck", frozen in place and removed from the active search, and the remaining samples are pushed back into exploration via momentum-based anti-convergence and stochastically smoothed gradients. The process tightens on suspected peaks and then deliberately loosens, so that confirmed solutions do not get re-found and the search keeps mapping the rest of the landscape. Two adaptive reseeding strategies the authors name Repulse Monkey and Golden Rooster inject fresh samples to prevent the active population from collapsing onto a single basin.

The claim is narrower than it looks. The benchmark is synthetic: the Simon Fraser University suite (maintained by Sonja Surjanovic and Derek Bingham at SFU) is the de facto standard for testing continuous optimizers, but it does not by itself prove the method transfers to real scientific or industrial workloads. The headline 39x number on Rotated Hyper-Ellipsoid is described by the authors as the "GPU dividend", the hardware-fit component of the speedup, so it is best read as a hardware-fit argument rather than a method-beats-method argument. And the paper is an arXiv preprint, not peer-reviewed work; the results should be treated as a promising preprint data point, not a settled benchmark.

The broader neighborhood CHISAO enters is GPU-native mode-finding. A recent related preprint, SunBURST, takes a mode-centric approach to Bayesian evidence estimation on GPUs via deterministic Laplace integration around discovered modes. CHISAO takes a different route, with population-based oscillation rather than deterministic Laplace approximation, but both push in the same direction: replacing serial CPU search with parallel GPU search for problems where finding every peak matters. The shared primitive, multimodal black-box optimization, is also the one used in Bayesian inference and parts of scientific simulation, which is where the real-world payoff would land if the synthetic results hold up.

The open question is transfer. The Simon Fraser University suite covers a wide range of function shapes, but the paper does not report results on, say, a real Bayesian inference problem or a real materials-design search. Whether deliberately oscillating between convergence and anti-convergence, with frozen peaks, generalizes past the SFU functions to the scientific-computing and Bayesian workloads the abstract gestures toward is the next thing to test. The preprint is the place to watch for follow-up benchmarks.

A New GPU-Parallel Optimizer Finds Every Peak on a Standard Benchmark Where CPU Methods Struggle

Sources