papers - a liheeem Collection

liheeem 's Collections

papers

papers

updated Nov 12, 2025

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 133
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 77
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 277
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 53
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 187
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning

Paper • 2506.01713 • Published Jun 2, 2025 • 48
Large Language Models for Data Synthesis

Paper • 2505.14752 • Published May 20, 2025 • 49
ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29, 2025 • 45
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28, 2025 • 131
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Paper • 2505.21496 • Published May 27, 2025 • 38
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88
Synthetic Data RL: Task Definition Is All You Need

Paper • 2505.17063 • Published May 18, 2025 • 11
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22, 2025 • 58
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Paper • 2505.16421 • Published May 22, 2025 • 19
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Paper • 2505.15277 • Published May 21, 2025 • 104
Efficient Agent Training for Computer Use

Paper • 2505.13909 • Published May 20, 2025 • 44
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 64
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 259
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 133
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13, 2025 • 86
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6, 2025 • 161
Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Paper • 2507.23779 • Published Jul 31, 2025 • 44
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6, 2025 • 52
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 82
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Paper • 2508.01858 • Published Aug 3, 2025 • 20
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10, 2025 • 98
OpenCUA: Open Foundations for Computer-Use Agents

Paper • 2508.09123 • Published Aug 12, 2025 • 31
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2, 2025 • 124
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 195
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Paper • 2509.08721 • Published Sep 10, 2025 • 661
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 501
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 228