SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper โข 2602.13515 โข Published 16 days ago โข 43 โข 6
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper โข 2602.13515 โข Published 16 days ago โข 43 โข 6