Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
Paper
• 2509.23250 • Published
• 6
Natural Language Processing
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards