Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

upvoted a collection about 9 hours ago

reacted to hannayukhymenko's post with 🔥 about 17 hours ago

Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks

reacted to hannayukhymenko's post with ❤️ about 17 hours ago

Do you translate your benchmarks from English correctly? 🤔 Turns out, for many languages it is much harder than you can imagine! Introducing Recovered in Translation 🌍 together with @aalexandrov ritranslation.insait.ai Translating benchmarks is a painful process, requiring a lot of manual inspection and adjustments. You start from setting up the whole pipeline and adapting to every format type, including task specifics. There already exist some massive benchmarks, but they still have some simple (and sometimes silly) bugs, which can hurt the evaluations :( We present a novel automated translation framework to help with that! Eastern and Southern European languages introduce richer linguistic structures compared to English and for benchmarks which heavily rely on grammatical coherence machine translation presents a risk of harming evaluations. We discover potential answer leakage or misleading through grammatical structure of the questions. Some benchmarks are also just outdated and need to be retranslated with newer and better models. We present a framework with novel test-time scaling methods which allow to control time and cost investments, while at the same time mitigate the need for human-in-the-loop verification. While working on Ukrainian-focused MamayLM models, we had to translate 10+ benchmarks in a short span of time. Finding human evaluators is costly and time-consuming, same goes for using professional translators. With our pipeline we were able to do it in 3 days🏎️ We hope our findings will help enable stronger multilingual evaluations and developments. We release all produced benchmarks on Hugging Face together with the source code and Arxiv paper 🤗 Paper: https://huggingface.co/papers/2602.22207 Code: https://github.com/insait-institute/ritranslation Benchmarks: https://huggingface.co/collections/INSAIT-Institute/multilingual-benchmarks

View all activity

Organizations

stefan-it 's datasets 22

stefan-it/xlstm-transformers-bug-data

Viewer • Updated Nov 8, 2025 • 62.5k • 17

stefan-it/grokipedia-urls

Viewer • Updated Oct 28, 2025 • 885k • 27 • 2

stefan-it/nanochat-german-city-populations

Viewer • Updated Oct 26, 2025 • 706 • 12

stefan-it/nanochat-german-wordlist

Viewer • Updated Oct 25, 2025 • 9.06M • 36

stefan-it/nanochat-german-openhermes

Viewer • Updated Oct 25, 2025 • 239k • 24

stefan-it/nanochat-german-alpaca

Viewer • Updated Oct 25, 2025 • 50.5k • 17

stefan-it/nanochat-german-data

Viewer • Updated Oct 23, 2025 • 51.2M • 614

stefan-it/nanochat-german-eval-data

Viewer • Updated Oct 21, 2025 • 7 • 22

stefan-it/awesome-tagesschau

Updated Jun 26, 2025 • 413 • 2

stefan-it/turblimp-evaluations

Updated Jun 23, 2025 • 131

stefan-it/senti-anno

Viewer • Updated Nov 29, 2024 • 929 • 73

stefan-it/offenseval2020_tr

Viewer • Updated Nov 22, 2024 • 35.3k • 1.36k

stefan-it/dewiki-20230701-nltk-corpus

Viewer • Updated Sep 6, 2024 • 39.4M • 9 • 2

stefan-it/germeval14_no_wikipedia

Preview • Updated May 29, 2024 • 5

stefan-it/histnero

Viewer • Updated May 10, 2024 • 217k • 68

stefan-it/HisGermaNER

Preview • Updated Mar 28, 2024 • 49 • 2

stefan-it/co-funer

Preview • Updated Mar 25, 2024 • 18

stefan-it/german-dbmdz-bert-corpus

Viewer • Updated Dec 22, 2023 • 52.8M • 19 • 3

stefan-it/span-marker-base-model-detection

Viewer • Updated Sep 5, 2023 • 28 • 15

stefan-it/flair-base-model-detection

Viewer • Updated Sep 5, 2023 • 52 • 17 • 1

stefan-it/autotrain-flair-hipe2022-fr-hmbert

Updated Sep 4, 2023 • 28

stefan-it/autotrain-flair-hipe2022-de-hmbert

Updated Sep 4, 2023 • 24