For more than a decade, engineered immune cells called CAR-T therapies have rewritten treatment for some blood cancers. Patients with relapsed B-cell leukemia or lymphoma who had run out of options have watched their tumors disappear after a single infusion of their own reprogrammed T cells. That success, though, has stopped at the edge of the blood. Against solid tumors, which are the carcinomas and sarcomas that make up most cancer cases, CAR-T has not delivered.
The bottleneck, researchers agree, is the target. A useful CAR-T antigen has to be dense on the tumor and quiet on healthy tissue, and almost nothing in the solid-tumor surfaceome meets both bars at once. That is the gap a team led by the University of Pennsylvania's Perelman School of Medicine now says it has narrowed, using a workflow that folds large language models into the search (Baker et al., Cell, June 25, 202600651-3)).
The team's top candidate is a protein called GPNMB, short for glycoprotein non-metastatic melanoma protein B. GPNMB had appeared on cancer radars before in other drug-discovery contexts, but it has not been a serious candidate for engineered T-cell therapy. The Penn workflow, which its developers describe as one of the first applications of large language models to cell and gene therapy discovery, put GPNMB at the top of a much shorter list (GEN; News-Medical).
The pipeline starts with the surfaceome, the set of every protein sitting on the outside of a cell where an immune receptor can grab it, drawn from single-cell RNA sequencing of four publicly available skin cancer datasets and supplemented with public antigen databases. That produces more than 10,000 candidates. The team then layered three filters: tumor composition (is the protein present on cancer cells rather than surrounding stroma?), tissue specificity (does it stay quiet on healthy organs?), and clinical feasibility (can it be turned into a working CAR?). Each filter is run through a frontier large language model, with the authors running roughly 1,000 independent simulations to reduce the chance of the model inventing plausible-sounding but wrong answers (Cell00651-3)).
What comes out the other end is a shortlist a human expert team can read in an afternoon. The Penn group says the framework moved from raw surfaceome to ranked candidate in weeks, where conventional antigen discovery has typically taken months to years. Lead author Daniel Baker, who completed his PhD at Penn in December 2025, did the work under Carl June, the oncologist whose lab developed the first FDA-approved CAR-T therapy, and cardiologist Zoltan Arany. Sikander Hayat of RWTH Aachen University is a co-corresponding author, with collaborators at the Icahn School of Medicine at Mount Sinai (GEN).
The preclinical results are the part of the paper designed to be stress-tested. Engineered T cells bearing a GPNMB-targeting receptor shrank or cleared tumors in mouse models of melanoma, monoblastic leukemia, and colorectal adenocarcinoma, three tumors spanning skin, blood, and colon tissue. The team's framing is careful. GPNMB CAR-T is now being moved toward formal preclinical safety work, including the toxicology studies an IND filing requires, with the goal of clinical translation (Cell00651-3)).
Two limits sit on top of those results. First, no human has yet received a GPNMB CAR-T cell. On-target, off-tumor toxicity, which is the side effect that happens when an engineered T cell also hits a vital organ that quietly expresses the same antigen, has repeatedly stalled solid-tumor CAR-T trials, and the tissue-specificity filter in this workflow is the very screen meant to prevent that. Second, the authors' claim that this is among the first uses of large language models in CAR-T discovery is a positioning claim, not a measured benchmark. Whether the workflow is meaningfully better than classical antigen triage depends on which comparisons get published next. Funding for the work came from the NIH, the Centurion Foundation Innovation Fund, the Parker Institute for Cancer Immunotherapy, and the Norman and Selma Kron Endowed Fellowship (News-Medical).
The watch item is the timeline. Penn has not announced an IND filing, but the workflow is published in full and the team has said it plans to extend the same pipeline to other tumor types. The next independent checkpoint is whether another lab can reproduce the GPNMB nomination from the public surfaceome data. The one after that is whether the same antigen holds up against a human tumor in a first-in-human trial.