Dataset - a JuanRafap Collection

JuanRafap 's Collections

RAG

Memory

Bim

Dataset

Agent

Library

Models

Dataset

updated 9 days ago

DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training

Paper • 2504.17565 • Published Apr 24, 2025 • 2
AI-MO/NuminaMath-1.5

Viewer • Updated Feb 10, 2025 • 896k • 1.7k • 166
PrimeIntellect/synthetic-code-understanding

Viewer • Updated Feb 15, 2025 • 60.6k • 83 • 19
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Paper • 2507.07095 • Published Jul 9, 2025 • 55
VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6, 2025 • 161
allenai/CoSyn-400K

Viewer • Updated Feb 28, 2025 • 408k • 2.01k • 44
nvidia/Granary

Viewer • Updated Aug 14, 2025 • 116M • 4.24k • 165
jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10, 2025 • 95.8k • 1.66k • 153
HuggingFaceM4/FineVision

Viewer • Updated Oct 21, 2025 • 24.2M • 100k • 463
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

Paper • 2509.11362 • Published Sep 14, 2025 • 4
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

Paper • 2509.15221 • Published Sep 18, 2025 • 111
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

Paper • 2509.14142 • Published Sep 17, 2025 • 10
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning

Paper • 2507.21924 • Published Jul 29, 2025 • 1
ScaleAI/SWE-bench_Pro

Viewer • Updated Sep 25, 2025 • 731 • 11.8k • 43
nvidia/NitroGen

Updated 17 days ago • 5.33k • 176
Hierarchical Dataset Selection for High-Quality Data Sharing

Paper • 2512.10952 • Published 24 days ago • 1
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published 20 days ago • 14