Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 21 days ago • 145
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 20 days ago • 85
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20, 2025 • 43
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks Paper • 2503.06885 • Published Mar 10, 2025 • 4