Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment Paper • 2505.11821 • Published May 17, 2025 • 14
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 2
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 2
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 6
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 6
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 4
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 11
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 4
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 11