-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models
Paper • 2508.03363 • Published • 1 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 134
LLouice
llouice
AI & ML interests
None yet
Organizations
None yet
LLM-papers
-
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 238 -
Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models
Paper • 2508.03363 • Published • 1 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 134
models
0
None public yet
datasets
0
None public yet