Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Thao Nguyen's picture
1 1

Thao Nguyen

thaottn
tcm03's profile picture vishaal27's profile picture Leyo's profile picture
·
https://thaonguyen19.github.io/
  • thao_nguyen26
  • thaonguyen19

AI & ML interests

None yet

Organizations

DataComp 's profile picture

authored a paper 7 months ago

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Paper • 2506.04689 • Published Jun 5, 2025
authored 4 papers over 1 year ago

DataComp: In search of the next generation of multimodal datasets

Paper • 2304.14108 • Published Apr 27, 2023 • 2

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

Paper • 2208.05516 • Published Aug 10, 2022

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 55

Better Alignment with Instruction Back-and-Forth Translation

Paper • 2408.04614 • Published Aug 8, 2024 • 15
authored 2 papers over 2 years ago

Guiding Image Captioning Models Toward More Specific Captions

Paper • 2307.16686 • Published Jul 31, 2023 • 16

Improving Multimodal Datasets with Image Captioning

Paper • 2307.10350 • Published Jul 19, 2023 • 11
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs