A new open benchmark tests whether AI can write real software, not ace training puzzles — type0 | type0