Zoof-250M
Model Summary
Zoof is a family of compact text-only Small Language Models (SLMs) with 250 million parameters. Designed for lightweight text generation and instruction following.
This repository contains two versions:
- Zoof-250M-base: The foundational generative model pre-trained on high-quality educational data.
- Zoof-250M-chat: An instruction-tuned variant optimized for chat and Q&A tasks using Supervised Fine-Tuning (SFT).
Usage
Please refer to the Zoof repository.
Model Details
- Developer: Pradyuman Gangan
- Architecture: Auto-regressive language model using an optimized Transformer architecture.
- Parameters: 250M
- Context Window: 1024 Tokens
- Language: English
- License: MIT
Training Data
The Zoof models were trained on a rigorous curriculum of high-quality data:
Pre-Training
- Dataset: FineWeb-Edu
- Volume: 26 Billion tokens
Instruction Tuning (Chat Version Only)
The zoof-250M-chat model was further fine-tuned using Supervised Fine-Tuning (SFT) on a diverse mix of instruction datasets:
- WizardLM 70k: Complex instruction following.
- LongForm: Long-text generation and structural coherence.
- Alpaca: General instruction following.
- Dolly-15k: Human-generated instructions across various categories.
- Lamini: Diverse datasets for broad capability.
Training Results
Pre-Training Loss Curve
- Downloads last month
- 15
