PessimisticDPO/Llama-3.1-Tulu-3-8B-SFT-a0.1-b0.1-g1.0-r10.0-noclip-chi2po-e0 Updated about 1 hour ago
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-135eb3c4 Viewer • Updated 10 days ago • 56.7k • 14
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-88098679 Viewer • Updated 10 days ago • 56.7k • 14
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-138efe3b Viewer • Updated 10 days ago • 56.7k • 6
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-67bce0ae Viewer • Updated 10 days ago • 52.3k • 7
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-f7d169f4 Viewer • Updated 10 days ago • 56.7k • 25
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-2f4a2b4e Viewer • Updated 10 days ago • 52.3k • 8
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-fb3c781b Viewer • Updated 11 days ago • 2.2k • 22 • 1
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-e20e895f Viewer • Updated 11 days ago • 2.2k • 15
PessimisticDPO/ultrafeedback_binarized-Logprob-Reward-bc6c0512 Viewer • Updated 11 days ago • 56.7k • 22