Daniel van Strien's picture

Building on HF

Daniel van Strien PRO

davanstrien

huggingface

·

https://danielvanstrien.xyz/

AI & ML interests

Machine Learning Librarian

Recent Activity

updated a dataset about 2 hours ago

data-is-better-together/fineweb-c-progress

updated a dataset about 19 hours ago

librarian-bots/dataset-columns

updated a dataset about 20 hours ago

davanstrien/my-classified-papers

View all activity

Organizations

upvoted a collection 1 day ago

Qwen3-TTS

7 items • Updated 3 days ago • 209

upvoted an article 6 days ago

Article

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

6 days ago

•

60

upvoted a paper 6 days ago

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Paper • 2512.14698 • Published Dec 16, 2025 • 21

upvoted a collection 10 days ago

TranslateGemma

3 items • Updated 10 days ago • 185

upvoted a paper 16 days ago

Perceptual Taxonomy: Evaluating and Guiding Hierarchical Scene Reasoning in Vision-Language Models

Paper • 2511.19526 • Published Nov 24, 2025 • 2

upvoted a collection 17 days ago

Qwen3-VL-Embedding

2 items • Updated 17 days ago • 57

upvoted 2 articles 19 days ago

Article

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

+1

Mar 22, 2024

•

125

Article

Why We Built VIBE Bench: Rethinking Evaluation for Real Workloads

19 days ago

•

6

upvoted an article 20 days ago

Article

Diversity Vs Density: A data strategy comparison for fine-tuning VLMs

20 days ago

•

5

upvoted an article about 1 month ago

Article

Shadow AI - Where are the CIOs?

Dec 19, 2025

•

31

upvoted 2 collections about 1 month ago

SauerkrautLM-Vision-Document-Retrieval

7 items • Updated Dec 15, 2025 • 9

GLM-V

4 items • Updated Dec 17, 2025 • 12

upvoted 3 papers about 1 month ago

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Paper • 2509.19768 • Published Sep 24, 2025 • 6

Metadata Extraction Leveraging Large Language Models

Paper • 2510.19334 • Published Oct 22, 2025 • 1

FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition

Paper • 2512.13884 • Published Dec 15, 2025 • 15

upvoted 5 collections about 1 month ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 1

Molmo2 Data

Artifacts for the Molmo2 data release • 16 items • Updated Dec 23, 2025 • 35

Molmo2

Artifacts for the Molmo2 release • 6 items • Updated Dec 23, 2025 • 30

Datasets Wrapped 2025: Reasoning

The reasoning datasets that defined 2025. Part 1 of Datasets Wrapped 2025. #DatasetsWrapped2025 • 20 items • Updated Dec 16, 2025 • 1

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 13 items • Updated 5 days ago • 38