Zoof-250M

Model Summary

Zoof is a family of compact text-only Small Language Models (SLMs) with 250 million parameters. Designed for lightweight text generation and instruction following.

This repository contains two versions:

Zoof-250M-base: The foundational generative model pre-trained on high-quality educational data.
Zoof-250M-chat: An instruction-tuned variant optimized for chat and Q&A tasks using Supervised Fine-Tuning (SFT).

Usage

Please refer to the Zoof repository.

Model Details

Developer: Pradyuman Gangan
Architecture: Auto-regressive language model using an optimized Transformer architecture.
Parameters: 250M
Context Window: 1024 Tokens
Language: English
License: MIT

Training Data

The Zoof models were trained on a rigorous curriculum of high-quality data:

Pre-Training

Dataset: FineWeb-Edu
Volume: 26 Billion tokens

Instruction Tuning (Chat Version Only)

The zoof-250M-chat model was further fine-tuned using Supervised Fine-Tuning (SFT) on a diverse mix of instruction datasets:

WizardLM 70k: Complex instruction following.
LongForm: Long-text generation and structural coherence.
Alpaca: General instruction following.
Dolly-15k: Human-generated instructions across various categories.
Lamini: Diverse datasets for broad capability.

Training Results

Pre-Training Loss Curve

Downloads last month: 15

Safetensors

Model size

0.3B params

Tensor type

F32

Jiraya
/

zoof-250M-base