Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
1782
331
159
Stefan Schweter
PRO
stefan-it
Follow
VenomFx's profile picture
ApoorvKhannaNagarro's profile picture
malteos's profile picture
3,705 followers
·
391 following
https://schweter.bayern
stefan-it
stefan-it
AI & ML interests
Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨
Recent Activity
upvoted
a
collection
about 9 hours ago
🤏 Smol-Data
reacted
to
hannayukhymenko
's
post
with 🔥
about 17 hours ago
Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks
reacted
to
hannayukhymenko
's
post
with ❤️
about 17 hours ago
Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks
View all activity
Organizations
stefan-it
's datasets
22
Sort: Recently updated
stefan-it/xlstm-transformers-bug-data
Viewer
•
Updated
Nov 8, 2025
•
62.5k
•
17
stefan-it/grokipedia-urls
Viewer
•
Updated
Oct 28, 2025
•
885k
•
27
•
2
stefan-it/nanochat-german-city-populations
Viewer
•
Updated
Oct 26, 2025
•
706
•
12
stefan-it/nanochat-german-wordlist
Viewer
•
Updated
Oct 25, 2025
•
9.06M
•
36
stefan-it/nanochat-german-openhermes
Viewer
•
Updated
Oct 25, 2025
•
239k
•
24
stefan-it/nanochat-german-alpaca
Viewer
•
Updated
Oct 25, 2025
•
50.5k
•
17
stefan-it/nanochat-german-data
Viewer
•
Updated
Oct 23, 2025
•
51.2M
•
614
stefan-it/nanochat-german-eval-data
Viewer
•
Updated
Oct 21, 2025
•
7
•
22
stefan-it/awesome-tagesschau
Updated
Jun 26, 2025
•
413
•
2
stefan-it/turblimp-evaluations
Updated
Jun 23, 2025
•
131
stefan-it/senti-anno
Viewer
•
Updated
Nov 29, 2024
•
929
•
73
stefan-it/offenseval2020_tr
Viewer
•
Updated
Nov 22, 2024
•
35.3k
•
1.36k
stefan-it/dewiki-20230701-nltk-corpus
Viewer
•
Updated
Sep 6, 2024
•
39.4M
•
9
•
2
stefan-it/germeval14_no_wikipedia
Preview
•
Updated
May 29, 2024
•
5
stefan-it/histnero
Viewer
•
Updated
May 10, 2024
•
217k
•
68
stefan-it/HisGermaNER
Preview
•
Updated
Mar 28, 2024
•
49
•
2
stefan-it/co-funer
Preview
•
Updated
Mar 25, 2024
•
18
stefan-it/german-dbmdz-bert-corpus
Viewer
•
Updated
Dec 22, 2023
•
52.8M
•
19
•
3
stefan-it/span-marker-base-model-detection
Viewer
•
Updated
Sep 5, 2023
•
28
•
15
stefan-it/flair-base-model-detection
Viewer
•
Updated
Sep 5, 2023
•
52
•
17
•
1
stefan-it/autotrain-flair-hipe2022-fr-hmbert
Updated
Sep 4, 2023
•
28
stefan-it/autotrain-flair-hipe2022-de-hmbert
Updated
Sep 4, 2023
•
24