papers
updated
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
•
2506.06395
•
Published
•
133
Qwen3 Embedding: Advancing Text Embedding and Reranking Through
Foundation Models
Paper
•
2506.05176
•
Published
•
77
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
277
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Paper
•
2506.03143
•
Published
•
53
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
•
2506.01939
•
Published
•
187
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware
Reinforcement Learning
Paper
•
2506.01713
•
Published
•
48
Large Language Models for Data Synthesis
Paper
•
2505.14752
•
Published
•
49
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
•
2505.23762
•
Published
•
45
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
131
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper
•
2505.22453
•
Published
•
46
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based
Mobile GUI Agents
Paper
•
2505.21496
•
Published
•
38
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
•
2505.17667
•
Published
•
88
Synthetic Data RL: Task Definition Is All You Need
Paper
•
2505.17063
•
Published
•
11
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
•
2505.16410
•
Published
•
58
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement
Learning
Paper
•
2505.16421
•
Published
•
19
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Paper
•
2505.15277
•
Published
•
104
Efficient Agent Training for Computer Use
Paper
•
2505.13909
•
Published
•
44
MMSearch-R1: Incentivizing LMMs to Search
Paper
•
2506.20670
•
Published
•
64
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
259
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
•
2507.15846
•
Published
•
133
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
•
2507.09477
•
Published
•
86
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
•
2508.04026
•
Published
•
161
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Paper
•
2507.23779
•
Published
•
44
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
Experience
Paper
•
2508.04700
•
Published
•
52
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
•
2507.21046
•
Published
•
82
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web
Agents
Paper
•
2508.01858
•
Published
•
20
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
Bridging Foundation Models and Lifelong Agentic Systems
Paper
•
2508.07407
•
Published
•
98
OpenCUA: Open Foundations for Computer-Use Agents
Paper
•
2508.09123
•
Published
•
31
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
•
2509.02544
•
Published
•
124
Why Language Models Hallucinate
Paper
•
2509.04664
•
Published
•
195
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
•
2509.08721
•
Published
•
661
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
501
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
228