Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Paper
•
2406.05955
•
Published
•
27
The TurboSparse-Mixtral Large Language Model (LLM) is an sparsified version of the Mixtral.
The average performance is evaluated using benchmarks from the OpenLLM Leaderboard.
Our code for accelerating TurboSparse-Mixtral is currently being refined. Stay tuned! Now you can run this model like dense model.
During sparsification, we also utilize some SFT datasets. We take ChatML as our chat template:
<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n
As we merged the predictors for FFN neurons in models, you can finetune TurboSparse-Mixtral with any framework and algorithm.
The model is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage.