In a Training Loop 🔄

9 20

aayush garg PRO

garg-aayush

https://aayushgarg.dev/

AI & ML interests

None yet

Recent Activity

published an article 5 days ago

GRPO: Building Intuition Through Ablation Studies

updated a model 5 days ago

garg-aayush/cs336-grpo-exps

published a model 5 days ago

garg-aayush/cs336-grpo-exps

View all activity

Organizations

Articles 6

Article

GRPO: Building Intuition Through Ablation Studies

Article

Expert Iteration for Math Reasoning

View all Articles

Collections 4

View 4 collections

models 47

datasets 4

garg-aayush/sft-cs336-assign5-datasets

Preview • Updated Jan 26 • 222 • 4

garg-aayush/GPT4-LLM-Cleaned-10K

Viewer • Updated May 24, 2024 • 10k • 12

garg-aayush/ultrachat-refined-100K-2048

Viewer • Updated Apr 23, 2024 • 110k • 4

garg-aayush/mini-platypus-1K

Viewer • Updated Apr 18, 2024 • 1k • 10 • 1

aayush garg PRO

AI & ML interests

Recent Activity

Organizations

Articles 6

GRPO: Building Intuition Through Ablation Studies

Expert Iteration for Math Reasoning

Collections 4

Qwen3 Technical Report

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Training language models to follow instructions with human feedback

Proximal Policy Optimization Algorithms

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Qwen3 Technical Report

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Training language models to follow instructions with human feedback

Proximal Policy Optimization Algorithms

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

models 47

garg-aayush/cs336-grpo-exps

garg-aayush/cs336_exp-iter_exps

garg-aayush/llama31-8b-sft-mask

garg-aayush/llama31-8b-sft-nomask

garg-aayush/ckpt-140

garg-aayush/ckpt-100

garg-aayush/test

garg-aayush/llama-2-7b-miniplatypus-1K

garg-aayush/zephyr-7b-sft-qlora

garg-aayush/wolf_plushie

datasets 4

garg-aayush/sft-cs336-assign5-datasets

garg-aayush/GPT4-LLM-Cleaned-10K

garg-aayush/ultrachat-refined-100K-2048

garg-aayush/mini-platypus-1K

aayush garg PRO

AI & ML interests

Recent Activity

Organizations

Articles 6

GRPO: Building Intuition Through Ablation Studies

Expert Iteration for Math Reasoning

Collections 4

models 47 Sort: Recently updated

datasets 4 Sort: Recently updated

models 47

datasets 4