A small AI scheduler, trained on factory puzzles as small as four jobs and four machines, was applied unchanged to problems with up to a hundred machines, and stayed close to the best classical rule. The result, posted this spring to arXiv as preprint 2606.13682 by Faezeh Ardali, Mwembezi A. Nyelele, and Gerald M. Knapp, sidesteps one of the recurring headaches in applying machine learning to industrial scheduling: the model did not need to be retrained to handle bigger problems.
The "open shop" problem the paper targets is a long-standing puzzle in operations research. A set of jobs must each visit the same set of machines, but the order each job follows is up to the scheduler. It shows up wherever work has to be assigned flexibly across a fixed set of stations: a small machine shop, a hospital's surgical block, a parts-repair bay. The goal is usually to minimize the time the last job finishes, a quantity called the makespan.
The model itself is an encoder-decoder Transformer with multi-head attention, the same broad architecture family that powers most modern language models. The twist is the input: only the processing-time matrix, the table of how long each job will take on each machine, with no other features. The authors trained it on Taillard's standard open-shop benchmark instances, ranging from 4-by-4 to 10-by-10 jobs and machines, and then applied the resulting policy, without retraining or fine-tuning, to randomly generated 40-by-40 through 100-by-100 instances, a 100-fold jump in size.
On the small training distribution, the paper reports feasible schedules with makespans within 15 to 30% of the best-known values for the 4-by-4 through 10-by-10 Taillard problems. The more striking result is what happened at scale. On the larger 40-by-40 through 100-by-100 instances, the authors report average gaps of 12.89 to 15.12% relative to a "standard lower bound" (a theoretical floor below which no schedule can go) and describe the policy as "competitive with EST by a modest margin" while "substantially outperforming" two simpler dispatch rules, SPT (shortest processing time first) and LPT (longest processing time first). EST, the earliest start rule, is one of the strongest classical heuristics for open-shop problems.
Two things are worth flagging before treating this as a win for AI scheduling. First, the comparison is to a lower bound, not to optimal solutions, and to a small set of classical dispatch rules. State-of-the-art metaheuristics for open-shop scheduling, which the visible abstract does not discuss, are the more demanding benchmark. Second, the title bills the method as "Deep Reinforcement Learning," but the visible abstract describes a Transformer-based scheduling policy trained on a fixed processing-time matrix; a clear reward signal or environment loop is not characterized. The training signal in the abstract reads as supervised, not as a reinforcement-learning loop. Whether the full paper sets that record straight is a question for the body, not the title.
The preprint is not peer-reviewed, the benchmarks are author-selected, and the generalization result is the load-bearing claim. But the structural finding is the kind worth watching: a small model that learned on toy-scale factory puzzles, then ran at factory scale without retraining and stayed in the hunt against the best classical rule. If the DRL claim holds up, the next question is how it stacks up against the operations-research tooling that factories already trust.