Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding — type0 | type0