AMO-Bench: Large Language Models Still Struggle in High School Math Competitions Paper • 2510.26768 • Published Oct 30, 2025 • 33
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5, 2025 • 20
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5, 2025 • 20
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published Aug 5, 2025 • 20 • 3
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published Feb 27, 2025 • 8
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published Feb 27, 2025 • 8
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published Feb 27, 2025 • 8 • 2