Pretraining Data
updated
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer
•
Updated
•
958M
•
39.8k
•
63
Viewer
•
Updated
•
56.2M
•
43.5k
•
28
Viewer
•
Updated
•
3.8B
•
27.9k
•
103
allenai/dolma3_dolmino_pool
Updated
•
71.7k
•
7
allenai/dolma3_longmino_pool
Updated
•
27.5k
•
10
Viewer
•
Updated
•
476M
•
37.7k
•
815
Viewer
•
Updated
•
4.48B
•
108k
•
743
Viewer
•
Updated
•
61.6M
•
9.85k
•
279
Viewer
•
Updated
•
819M
•
8.25k
•
11
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
16.5k
•
28
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
•
49.2k
•
6.86k
•
57
Preview
•
Updated
•
6.74k
•
144
nvidia/Nemotron-Pretraining-Code-v2
Viewer
•
Updated
•
836M
•
5.84k
•
100
nvidia/Nemotron-Pretraining-Specialized-v1
Viewer
•
Updated
•
60.7M
•
6.84k
•
69
nvidia/Nemotron-CC-Math-v1
Viewer
•
Updated
•
190M
•
6.02k
•
63
nvidia/Nemotron-Pretraining-SFT-v1
Viewer
•
Updated
•
299M
•
4.66k
•
57
Viewer
•
Updated
•
1.86M
•
4.7k
•
225
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
26.8k
•
217
EssentialAI/eai-taxonomy-stem-w-dclm
Preview
•
Updated
•
1.42k
•
6
EssentialAI/eai-taxonomy-med-w-dclm
Viewer
•
Updated
•
81.2M
•
590
•
8
EssentialAI/eai-taxonomy-code-w-dclm
Viewer
•
Updated
•
274M
•
1.71k
•
8
EssentialAI/eai-taxonomy-math-w-fm
Viewer
•
Updated
•
21.6M
•
438
•
5
Viewer
•
Updated
•
27.9B
•
24
•
3
DataMuncher-Labs/UltiMath
Viewer
•
Updated
•
32.9B
•
16.1k
•
7
HuggingFaceFW/finetranslations
Viewer
•
Updated
•
3.33B
•
69k
•
262
Viewer
•
Updated
•
470M
•
36.2k
•
335