Community Blog & Articles

Community Articles

The Optimal Architecture for Small Language Models

Deriving the PPO Loss from First Principles

LLM based Audio models

Encoding the World's Medical Knowledge into 970K

Skill is All You Need: Lessons from Building Marketing Agents at Noumena

Qwen-Image-i2L: Training Strategies for Image-to-LoRA Generation

Uncensor any LLM with abliteration

What makes good reasoning data

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries

How to make NeuTTS-air generate over 200 seconds of audio in a single second.

Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance

Code a simple RAG from scratch

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Why You Should Care About Partial Differential Equations (PDEs)

Make and publish your Reachy Mini App

cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents

Understanding InstaFlow/Rectified Flow

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

KV Caching Explained: Optimizing Transformer Inference Efficiency

open-source-collabnlp

Retrieval Augmented Generation with Huggingface Transformers and Ray

February 10, 2021

open-source-collab

Hugging Face on PyTorch / XLA TPUs

February 9, 2021

Faster TensorFlow models in Hugging Face Transformers

January 26, 2021

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

January 19, 2021

How we sped up transformer inference 100x for 🤗 API customers

January 18, 2021

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

November 9, 2020

open-source-collabnlp

Porting fairseq wmt19 translation system to transformers

November 3, 2020

open-source-collabnlp

Hyperparameter Search with Transformers and Ray Tune

November 2, 2020

Transformer-based Encoder-Decoder Models

October 10, 2020

Block Sparse Matrices for Smaller and Faster Language Models

September 10, 2020

The Reformer - Pushing the limits of language modeling

How to generate text: using different decoding methods for language generation with Transformers

How to train a new language model from scratch using Transformers and Tokenizers

February 14, 2020

Community Articles

NEW Articles from Team or Enterprise organizations will get promoted to the main section.

The Optimal Architecture for Small Language Models

Deriving the PPO Loss from First Principles

LLM based Audio models

Encoding the World's Medical Knowledge into 970K

Skill is All You Need: Lessons from Building Marketing Agents at Noumena

Qwen-Image-i2L: Training Strategies for Image-to-LoRA Generation

Uncensor any LLM with abliteration

What makes good reasoning data

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries

How to make NeuTTS-air generate over 200 seconds of audio in a single second.

Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance

Code a simple RAG from scratch

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Why You Should Care About Partial Differential Equations (PDEs)

Make and publish your Reachy Mini App

cua-bench: A Framework for Benchmarking, Training Data, and RL Environments for Computer-Use Agents

Understanding InstaFlow/Rectified Flow

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

KV Caching Explained: Optimizing Transformer Inference Efficiency

View all articles