2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107 • 8
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 6 days ago • 47 • 4
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published 7 days ago • 13 • 5
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published 12 days ago • 25 • 4
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 6 days ago • 54 • 5
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 11 days ago • 106 • 9
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25 • 167 • 7
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 11 days ago • 81 • 4
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published Nov 20 • 23 • 4
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 21 days ago • 114 • 3
Generalist Foundation Models Are Not Clinical Enough for Hospital Operations Paper • 2511.13703 • Published Nov 17 • 21 • 3