Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper
•
2407.09121
•
Published
•
6
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the Evol-Instruct and BeaverTails dataset.
Please refer to the paper Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training and GitHub DeRTa.
Input format:
[INST] Your Instruction [\INST]
The model is trained with DeRTa, showing a high safety performance.
More information needed
The following hyperparameters were used during training:
Base model
meta-llama/Meta-Llama-3-8B