Zoof-250M

Model Summary

Zoof is a family of compact text-only Small Language Models (SLMs) with 250 million parameters. Designed for lightweight text generation and instruction following.

This repository contains two versions:

  • Zoof-250M-base: The foundational generative model pre-trained on high-quality educational data.
  • Zoof-250M-chat: An instruction-tuned variant optimized for chat and Q&A tasks using Supervised Fine-Tuning (SFT).

Usage

Please refer to the Zoof repository.

Model Details

  • Developer: Pradyuman Gangan
  • Architecture: Auto-regressive language model using an optimized Transformer architecture.
  • Parameters: 250M
  • Context Window: 1024 Tokens
  • Language: English
  • License: MIT

Training Data

The Zoof models were trained on a rigorous curriculum of high-quality data:

Pre-Training

Instruction Tuning (Chat Version Only)

The zoof-250M-chat model was further fine-tuned using Supervised Fine-Tuning (SFT) on a diverse mix of instruction datasets:

  • WizardLM 70k: Complex instruction following.
  • LongForm: Long-text generation and structural coherence.
  • Alpaca: General instruction following.
  • Dolly-15k: Human-generated instructions across various categories.
  • Lamini: Diverse datasets for broad capability.

Training Results

Pre-Training Loss Curve

Pre-Training Loss Curve

Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Jiraya/zoof-250M-base