You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.12.2

# In case of weird errors, try reinstalling
# pip install --no-build-isolation axolotl[deepspeed]
# (unsloth libraries are incompatible)
base_model: Qwen/Qwen3-14B

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: Sunbird/ug40-instructions
    name: pretraining_text_qwen
    split: train
    text_column: text
    type: completion

test_datasets:
  - path: Sunbird/ug40-instructions
    name:  pretraining_text_qwen
    split: dev
    text_column: text
    type: completion
      
sequence_len: 512
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

gradient_accumulation_steps: 2 # Remember to check number of GPUs on the instance
micro_batch_size: 8 # 4 on 4xH100, 16 on 8xH100
num_epochs: 2
optimizer: adamw_torch_fused
learning_rate: 5e-5
lr_scheduler: cosine
weight_decay: 0.01
max_grad_norm: 1.0

train_on_inputs: 
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
xformers_attention:
flash_attention: true
eager_attention: 

loss_watchdog_threshold: 10.0
loss_watchdog_patience: 3


warmup_ratio: 0.01
eval_steps: 200
#save_steps: 5000 
logging_steps: 5
save_strategy: epoch
save_only_model: true
hub_model_id: sunflower-qwen32b-pretrained
hub_strategy: end


# auto_resume_from_checkpoints: true
debug:

deepspeed: zero3_bf16.json
  
dataset_prepared_path: last_run_prepared
output_dir: ./outputs-14b-bs64/

use_wandb: true
use_mlflow: true
wandb_project: ug40-pretraining
# wandb_name also sets mlflow run name
wandb_name: qwen3-14b-bs64-lr5e-5
mlflow_tracking_uri: https://mlflow.sunbird.ai
mlflow_experiment_name: ug40-pretraining
# mlflow_run_name: qwen3-14b-convergence-test-lr5e-5

sunflower-qwen32b-pretrained

This model is a fine-tuned version of Qwen/Qwen3-14B on the Sunbird/ug40-instructions dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4608
  • Memory/max Mem Active(gib): 111.49
  • Memory/max Mem Allocated(gib): 108.55
  • Memory/device Mem Reserved(gib): 115.27

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 130
  • training_steps: 13062

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 5.0386 38.18 34.21 40.07
2.0527 0.0306 200 4.0322 110.22 108.55 113.79
1.7769 0.0612 400 3.8333 111.49 108.55 114.85
1.7225 0.0919 600 3.7381 111.49 108.55 114.91
1.6762 0.1225 800 3.6606 111.49 108.55 114.91
1.5949 0.1531 1000 3.6150 111.49 108.55 114.91
1.5612 0.1837 1200 3.5607 111.49 108.55 114.91
1.5824 0.2143 1400 3.5324 111.49 108.55 114.91
1.5325 0.2450 1600 3.5128 111.49 108.55 114.91
1.4973 0.2756 1800 3.4702 111.49 108.55 114.91
1.4776 0.3062 2000 3.4229 111.49 108.55 114.91
1.499 0.3368 2200 3.3929 111.49 108.55 114.91
1.4651 0.3675 2400 3.3994 111.49 108.55 115.1
1.405 0.3981 2600 3.3974 111.49 108.55 115.1
1.4373 0.4287 2800 3.3844 111.49 108.55 115.1
1.4405 0.4593 3000 3.3675 111.49 108.55 115.1
1.3784 0.4899 3200 3.3564 111.49 108.55 115.1
1.3628 0.5206 3400 3.3485 111.49 108.55 115.1
1.4202 0.5512 3600 3.3180 111.49 108.55 115.1
1.3739 0.5818 3800 3.3114 111.49 108.55 115.1
1.3964 0.6124 4000 3.2930 111.49 108.55 115.1
1.3683 0.6430 4200 3.2908 111.49 108.55 115.1
1.3018 0.6737 4400 3.2860 111.49 108.55 115.1
1.3279 0.7043 4600 3.2564 111.49 108.55 115.1
1.328 0.7349 4800 3.2489 111.49 108.55 115.1
1.2789 0.7655 5000 3.2754 111.49 108.55 115.1
1.3026 0.7961 5200 3.2497 111.49 108.55 115.1
1.3379 0.8268 5400 3.2299 111.49 108.55 115.1
1.2975 0.8574 5600 3.2398 111.49 108.55 115.1
1.273 0.8880 5800 3.2268 111.49 108.55 115.1
1.2248 0.9186 6000 3.2236 111.49 108.55 115.1
1.2817 0.9492 6200 3.2055 111.49 108.55 115.1
1.2342 0.9799 6400 3.2175 111.49 108.55 115.1
1.1946 1.0104 6600 3.2020 111.49 108.55 115.1
1.1061 1.0410 6800 3.2278 111.49 108.55 115.1
1.0545 1.0717 7000 3.2217 111.49 108.55 115.1
1.0494 1.1023 7200 3.2425 111.49 108.55 115.1
1.0075 1.1329 7400 3.2449 111.49 108.55 115.1
0.9926 1.1635 7600 3.2459 111.49 108.55 115.1
0.9304 1.1941 7800 3.2731 111.49 108.55 115.1
1.019 1.2248 8000 3.2641 111.49 108.55 115.1
0.9183 1.2554 8200 3.2855 111.49 108.55 115.1
0.8923 1.2860 8400 3.2828 111.49 108.55 115.1
0.9785 1.3166 8600 3.3064 111.49 108.55 115.1
0.8967 1.3472 8800 3.2938 111.49 108.55 115.1
0.9023 1.3779 9000 3.3157 111.49 108.55 115.1
0.8973 1.4085 9200 3.3443 111.49 108.55 115.1
0.8528 1.4391 9400 3.3421 111.49 108.55 115.1
0.8806 1.4697 9600 3.3535 111.49 108.55 115.1
0.8064 1.5003 9800 3.3802 111.49 108.55 115.1
0.8365 1.5310 10000 3.4050 111.49 108.55 115.1
0.8578 1.5616 10200 3.3657 111.49 108.55 115.1
0.8425 1.5922 10400 3.3913 111.49 108.55 115.1
0.8507 1.6228 10600 3.4103 111.49 108.55 115.1
0.8424 1.6534 10800 3.4128 111.49 108.55 115.1
0.8283 1.6841 11000 3.4131 111.49 108.55 115.1
0.8331 1.7147 11200 3.4259 111.49 108.55 115.1
0.7861 1.7453 11400 3.4313 111.49 108.55 115.1
0.8445 1.7759 11600 3.4338 111.49 108.55 115.1
0.82 1.8066 11800 3.4356 111.49 108.55 115.1
0.791 1.8372 12000 3.4470 111.49 108.55 115.27
0.8259 1.8678 12200 3.4476 111.49 108.55 115.27
0.7753 1.8984 12400 3.4526 111.49 108.55 115.27
0.8402 1.9290 12600 3.4590 111.49 108.55 115.27
0.7771 1.9597 12800 3.4606 111.49 108.55 115.27
0.7722 1.9903 13000 3.4608 111.49 108.55 115.27

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.7.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
425k params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jq/sunflower-qwen32b-pretrained

Finetuned
Qwen/Qwen3-14B
Finetuned
(180)
this model
Finetunes
1 model

Evaluation results