Vision-language AI is wasting compute on the wrong pieces of an image — type0 | type0