InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 35
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published about 1 month ago • 242
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published Nov 27, 2025 • 84
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones Paper • 2509.25123 • Published Sep 29, 2025 • 20
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 110
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 79
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25, 2025 • 9