BigScience Workshop

non-profit

https://bigscience.huggingface.co

bigscienceW

bigscience-workshop

Activity Feed

AI & ML interests

A one-year long research workshop on large language models: the Summer of Language Models 21 🌸

Recent Activity

christopher new activity about 9 hours ago

bigscience/bloom:Test PR

christopher new activity about 9 hours ago

bigscience/bloom:Test discussion

christopher new activity about 9 hours ago

bigscience/bloom:Test discussion

View all activity

christopher

in bigscience/bloom about 9 hours ago

Test PR

#286 opened about 10 hours ago by

FIRSTACCOUNT69

Test discussion

#287 opened about 9 hours ago by

FIRSTACCOUNT69

Test discussion

#288 opened about 9 hours ago by

FIRSTACCOUNT69

albertvillanova

posted an update 3 days ago

Post

1576

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

armanc

authored a paper 9 days ago

References Improve LLM Alignment in Non-Verifiable Domains

Paper • 2602.16802 • Published 10 days ago • 2

lewtun

submitted a paper to Daily Papers 15 days ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published 17 days ago • 8

lewtun

submitted a paper to Daily Papers 17 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published 26 days ago • 11

albertvillanova

posted an update 18 days ago

Post

1682

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

julien-c

submitted a paper to Daily Papers 30 days ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published about 1 month ago • 27

yjernite

authored a paper about 1 month ago

INTIMA: A Benchmark for Human-AI Companionship Behavior

Paper • 2508.09998 • Published Aug 4, 2025 • 11

stas

authored a paper about 1 month ago

Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences

Paper • 2506.13996 • Published Jun 16, 2025 • 1

armanc

authored a paper about 1 month ago

Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL

Paper • 2601.09876 • Published Jan 14 • 7

ybelkada

authored a paper about 2 months ago

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 42

christopher

in bigscience/bloomz-560m 3 months ago

Fails to load with transformers v4.57+

#14 opened 3 months ago by

qgallouedec

monsoon-nlp

posted an update 3 months ago

Post

436

PatchDNA, a DNA foundation model based on Meta's BLT tokenization strategy https://www.biorxiv.org/content/10.1101/2025.11.28.691095v1

christopher

in bigscience/petals-api 4 months ago

Bloom

#2 opened 4 months ago by

Raz-Test

nouamanetazi

posted an update 4 months ago

Post

4492

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team