Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published 8 days ago • 109
view article Article Training Design for Text-to-Image Models: Lessons from Ablations 2 days ago • 43
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 3 days ago • 87
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published 2 days ago • 48
Balancing Understanding and Generation in Discrete Diffusion Models Paper • 2602.01362 • Published 4 days ago • 13
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization Paper • 2601.21358 • Published 7 days ago • 7
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 3 days ago • 122
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published 3 days ago • 29
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 6 days ago • 32