view article Article Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism? 5 days ago • 17
The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder Paper • 2602.18487 • Published 18 days ago • 5
Avey B1 experimental Collection Experimental pre-trained checkpoints for Avey-B1 • 3 items • Updated 6 days ago • 2
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 12 days ago • 24
Aya Datasets Collection The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jul 31, 2025 • 25
LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules Paper • 2602.10993 • Published 18 days ago • 1
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning Paper • 2602.11149 • Published 18 days ago • 14
SteuerLLM: Local specialized large language model for German tax law analysis Paper • 2602.11081 • Published 18 days ago • 1
Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay Paper • 2602.06942 • Published 23 days ago • 3
GLiNER- Linker Collection GLiNER-bi-Encoder models for entity linking with the GLiNKER framework • 3 items • Updated 26 days ago • 6
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Paper • 2601.22146 • Published about 1 month ago • 9
Say Anything but This: When Tokenizer Betrays Reasoning in LLMs Paper • 2601.14658 • Published Jan 21 • 1
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published Jan 20 • 37
It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models Paper • 2601.08500 • Published Jan 13 • 1