Dataset
updated
DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale
Difficulty-Graded Data Training
Paper
•
2504.17565
•
Published
•
2
Viewer
•
Updated
•
896k
•
1.7k
•
166
PrimeIntellect/synthetic-code-understanding
Viewer
•
Updated
•
60.6k
•
83
•
19
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Paper
•
2507.07095
•
Published
•
55
VeriGUI: Verifiable Long-Chain GUI Dataset
Paper
•
2508.04026
•
Published
•
161
Viewer
•
Updated
•
408k
•
2.01k
•
44
Viewer
•
Updated
•
116M
•
4.24k
•
165
jupyter-agent/jupyter-agent-dataset
Viewer
•
Updated
•
95.8k
•
1.66k
•
153
Viewer
•
Updated
•
24.2M
•
100k
•
463
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Paper
•
2509.11362
•
Published
•
4
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform
Data
Paper
•
2509.15221
•
Published
•
111
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods,
Results, Discussion, and Outlook
Paper
•
2509.14142
•
Published
•
10
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Paper
•
2507.21924
•
Published
•
1
Viewer
•
Updated
•
731
•
11.8k
•
43
Updated
•
5.33k
•
176
Hierarchical Dataset Selection for High-Quality Data Sharing
Paper
•
2512.10952
•
Published
•
1
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
Paper
•
2512.13884
•
Published
•
14