ToolRM Collection One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning • 6 items • Updated Nov 19, 2025 • 2
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published Sep 29, 2025 • 37
🦫 PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! • 9 items • Updated Oct 1, 2025 • 3
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Paper • 2510.24645 • Published Oct 28, 2025 • 8
Spurious Rewards: Rethinking Training Signals in RLVR Paper • 2506.10947 • Published Jun 12, 2025 • 2