Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
JournalistsonHF
's Collections
Transcription
Image Tools
Test Chat Models
For Fun & Understanding AI Capabilities
Datasets
Text-Analysis Tools
LLMs Evaluation
Data Journalism
Text-to-Speech & Audio Tools
Datasets
updated
Oct 1, 2024
A curated list of datasets to train your models
Upvote
3
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
Jul 11, 2025
•
3.5B
•
327k
•
897
google/frames-benchmark
Viewer
•
Updated
Oct 15, 2024
•
824
•
7.5k
•
239
Running
on
CPU Upgrade
Featured
109
FineVideo Explorer
▶
109
Explore video metadata and scenes
HuggingFaceFV/finevideo
Viewer
•
Updated
Dec 16, 2024
•
39.5k
•
8.4k
•
341
CIVICS-dataset/CIVICS
Viewer
•
Updated
May 13, 2024
•
700
•
63
•
10
HuggingFaceFW/fineweb
Viewer
•
Updated
Jul 11, 2025
•
52.5B
•
188k
•
2.6k
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
44.8k
•
650
academic-datasets/AMMeBa
Preview
•
Updated
May 21, 2024
•
235
HuggingFaceM4/OBELICS
Viewer
•
Updated
Aug 22, 2023
•
276M
•
36.4k
•
163
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23, 2024
•
5.45B
•
7.79k
•
438
pixparse/pdfa-eng-wds
Viewer
•
Updated
Mar 29, 2024
•
7.1k
•
7.77k
•
155
pixparse/idl-wds
Viewer
•
Updated
Mar 29, 2024
•
3.41M
•
4.6k
•
189
argilla/OpenHermesPreferences
Viewer
•
Updated
Mar 1, 2024
•
989k
•
1.2k
•
211
argilla/Capybara-Preferences
Viewer
•
Updated
May 9, 2024
•
15.4k
•
137
•
45
PleIAs/YouTube-Commons
Updated
Jun 26, 2024
•
2.35k
•
371
PleIAs/French-PD-Newspapers
Viewer
•
Updated
Mar 19, 2024
•
2.25M
•
1.3k
•
69
satellogic/EarthView
Viewer
•
Updated
Jan 28, 2025
•
7.41M
•
2.46k
•
135
Upvote
3
Share collection
View history
Collection guide
Browse collections