Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
316
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
•
2507.15758
•
Published
•
35
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper
•
2507.15844
•
Published
•
16
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking
Reasoning
Paper
•
2507.16814
•
Published
•
21
RePO: Replay-Enhanced Policy Optimization
Paper
•
2506.09340
•
Published
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
•
2507.06448
•
Published
•
47
On-Policy RL with Optimal Reward Baseline
Paper
•
2505.23585
•
Published
•
14
EXPO: Stable Reinforcement Learning with Expressive Policies
Paper
•
2507.07986
•
Published
Geometric-Mean Policy Optimization
Paper
•
2507.20673
•
Published
•
31
Single-stream Policy Optimization
Paper
•
2509.13232
•
Published
•
34
MAPO: Mixed Advantage Policy Optimization
Paper
•
2509.18849
•
Published
•
26