Solomatin Roman's picture

In a Training Loop 🔄

Solomatin Roman

Samoed

·

AI & ML interests

None yet

Recent Activity

updated a dataset 5 days ago

mteb/WikiClusteringP2P.v2

published a dataset 5 days ago

mteb/WikiClusteringP2P.v2

updated a dataset 5 days ago

mteb/WikiCitiesClustering

View all activity

Organizations

upvoted an article 11 days ago

Article

ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?

11 days ago

•

17

upvoted a paper 11 days ago

ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models

Paper • 2602.16609 • Published 12 days ago • 6

upvoted a paper 12 days ago

MAEB: Massive Audio Embedding Benchmark

Paper • 2602.16008 • Published 13 days ago • 21

upvoted a collection 17 days ago

LateOn-Code 💻

State-of-the-art late interaction code retrieval models • 6 items • Updated 11 days ago • 14

upvoted an article 18 days ago

Article

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

18 days ago

•

46

upvoted an article 25 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

27 days ago

•

83

upvoted an article 26 days ago

Article

Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model

26 days ago

•

28

upvoted an article about 2 months ago

Article

🥃 Distilling Tiny Embeddings

Jan 10

•

20

upvoted a collection about 2 months ago

Qwen3-VL-Embedding

2 items • Updated Jan 8 • 62

upvoted a collection 3 months ago

SauerkrautLM-Vision-Document-Retrieval

7 items • Updated Dec 15, 2025 • 9

upvoted a paper 3 months ago

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

Paper • 2512.10430 • Published Dec 11, 2025 • 116

upvoted a collection 3 months ago

NanoBEIR datasets

These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated about 12 hours ago • 14

upvoted an article 3 months ago

Article

Building and evaluating Multimodal Rerankers

Nov 30, 2025

•

8

upvoted 5 articles 4 months ago

Article

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

Nov 5, 2025

•

62

Article

Improving Parquet Dedupe on Hugging Face Hub

Oct 5, 2024

•

41

Article

LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR

Oct 23, 2025

•

73

Article

Sentence Transformers is joining Hugging Face!

Oct 22, 2025

•

87

Article

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

Oct 20, 2025

•

36

upvoted 2 papers 5 months ago

Scaling Language-Centric Omnimodal Representation Learning

Paper • 2510.11693 • Published Oct 13, 2025 • 104

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Paper • 2510.10062 • Published Oct 11, 2025 • 10