LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Paper • 2411.15708 • Published Nov 24, 2024
Iterative Value Function Optimization for Guided Decoding Paper • 2503.02368 • Published Mar 4, 2025 • 15
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published Mar 7, 2025 • 8
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper • 2503.16779 • Published Mar 21, 2025 • 1
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Paper • 2406.11256 • Published Jun 17, 2024
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper • 2508.09834 • Published Aug 13, 2025 • 53
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 3 days ago • 14
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published 3 days ago • 14
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Paper • 2410.11805 • Published Oct 15, 2024 • 14
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22, 2024 • 12
Timo: Towards Better Temporal Reasoning for Language Models Paper • 2406.14192 • Published Jun 20, 2024 • 1
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark Paper • 2405.08355 • Published May 14, 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 21
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14, 2024 • 31
Mirror: A Universal Framework for Various Information Extraction Tasks Paper • 2311.05419 • Published Nov 9, 2023
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Paper • 2406.16554 • Published Jun 24, 2024 • 1