Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better Paper • 2602.05393 • Published about 23 hours ago • 3
Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space Paper • 2310.13572 • Published Oct 20, 2023
Mano: Restriking Manifold Optimization for LLM Training Paper • 2601.23000 • Published 7 days ago • 2
Mano: Restriking Manifold Optimization for LLM Training Paper • 2601.23000 • Published 7 days ago • 2 • 3
Mano: Restriking Manifold Optimization for LLM Training Paper • 2601.23000 • Published 7 days ago • 2
PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers Paper • 2602.01077 • Published 5 days ago • 3