Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a dataset 4 days ago

windprak/steuerllm_instruct_dataset

reacted to SeaWolf-AI's post with 🔥 4 days ago

Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? We gave LLMs autonomous trading over 30 real tickers at 100x leverage. All went bankrupt in 30 minutes from hallucination. This spawned FINAL Bench (first metacognition benchmark) and AI NPC Trading Arena — tens of thousands of metacognition-equipped AI agents competing under capitalist rules. Humans can only watch. Live Demo: https://huggingface.co/spaces/Heartsync/Prompt-Dump Article: https://huggingface.co/blog/FINAL-Bench/pumpdump NPCs form a society: 3-tier memory, self-modifying parameters, mutual criticism, strategy propagation, and a virtual SEC enforcing fines every 20 minutes. Every trade passes 4-stage verification including Brave Search fact-check. FINAL Bench confirmed across 9 SOTA models that AI can say "I might be wrong" (MA 0.694) but cannot actually fix errors (ER 0.302). Six findings: Bubbles form naturally through knowledge transfer and swarm herding. Identical NPCs diverge irreversibly from their first three trades. Metacognition blocks individual hallucination but not collective herding — this is the key finding. Information asymmetry solidifies hierarchy. Fraud and regulation co-evolve. Criticism improves returns. Individual intelligence does not guarantee collective intelligence. Dataset & Paper: https://huggingface.co/datasets/FINAL-Bench/Metacognitive

liked a dataset 4 days ago

castorini/NanoKnow-Fineweb-Edu-Index

View all activity

Organizations

upvoted a paper 4 days ago

NanoKnow: How to Know What Your Language Model Knows

Paper • 2602.20122 • Published 6 days ago • 4

upvoted an article 4 days ago

Article

Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism?

5 days ago

•

17

upvoted a paper 5 days ago

The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder

Paper • 2602.18487 • Published 18 days ago • 5

upvoted a collection 11 days ago

Avey B1 experimental

Experimental pre-trained checkpoints for Avey-B1 • 3 items • Updated 6 days ago • 2

upvoted 2 papers 11 days ago

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Paper • 2602.15547 • Published 12 days ago • 24

Avey-B

Paper • 2602.15814 • Published 12 days ago • 3

upvoted a collection 11 days ago

Aya Datasets

The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jul 31, 2025 • 25

upvoted 3 papers 17 days ago

LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules

Paper • 2602.10993 • Published 18 days ago • 1

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Paper • 2602.11149 • Published 18 days ago • 14

SteuerLLM: Local specialized large language model for German tax law analysis

Paper • 2602.11081 • Published 18 days ago • 1

upvoted a collection 18 days ago

GLM-5

2 items • Updated 18 days ago • 29

upvoted a paper 20 days ago

Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

Paper • 2602.06942 • Published 23 days ago • 3

upvoted a collection 25 days ago

GLiNER- Linker

GLiNER-bi-Encoder models for entity linking with the GLiNKER framework • 3 items • Updated 26 days ago • 6

upvoted a paper about 1 month ago

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Paper • 2601.22146 • Published about 1 month ago • 9

upvoted a collection about 1 month ago

GutenOCR

3 items • Updated Jan 22 • 6

upvoted 2 papers about 1 month ago

Say Anything but This: When Tokenizer Betrays Reasoning in LLMs

Paper • 2601.14658 • Published Jan 21 • 1

GutenOCR: A Grounded Vision-Language Front-End for Documents

Paper • 2601.14490 • Published Jan 20 • 37

upvoted an article about 1 month ago

Article

How We Built a Semantic Highlight Model To Save Token Cost for RAG

Jan 15

•

65

upvoted a collection about 1 month ago

TranslateGemma

3 items • Updated Jan 15 • 215

upvoted a paper about 2 months ago

It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models

Paper • 2601.08500 • Published Jan 13 • 1