See axolotl config
axolotl version: 0.12.2
# In case of weird errors, try reinstalling
# pip install --no-build-isolation axolotl[deepspeed]
# (unsloth libraries are incompatible)
base_model: Qwen/Qwen3-14B
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Sunbird/ug40-instructions
name: pretraining_text_qwen
split: train
text_column: text
type: completion
test_datasets:
- path: Sunbird/ug40-instructions
name: pretraining_text_qwen
split: dev
text_column: text
type: completion
sequence_len: 512
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
gradient_accumulation_steps: 2 # Remember to check number of GPUs on the instance
micro_batch_size: 8 # 4 on 4xH100, 16 on 8xH100
num_epochs: 2
optimizer: adamw_torch_fused
learning_rate: 5e-5
lr_scheduler: cosine
weight_decay: 0.01
max_grad_norm: 1.0
train_on_inputs:
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
xformers_attention:
flash_attention: true
eager_attention:
loss_watchdog_threshold: 10.0
loss_watchdog_patience: 3
warmup_ratio: 0.01
eval_steps: 200
#save_steps: 5000
logging_steps: 5
save_strategy: epoch
save_only_model: true
hub_model_id: sunflower-qwen32b-pretrained
hub_strategy: end
# auto_resume_from_checkpoints: true
debug:
deepspeed: zero3_bf16.json
dataset_prepared_path: last_run_prepared
output_dir: ./outputs-14b-bs64/
use_wandb: true
use_mlflow: true
wandb_project: ug40-pretraining
# wandb_name also sets mlflow run name
wandb_name: qwen3-14b-bs64-lr5e-5
mlflow_tracking_uri: https://mlflow.sunbird.ai
mlflow_experiment_name: ug40-pretraining
# mlflow_run_name: qwen3-14b-convergence-test-lr5e-5
sunflower-qwen32b-pretrained
This model is a fine-tuned version of Qwen/Qwen3-14B on the Sunbird/ug40-instructions dataset. It achieves the following results on the evaluation set:
- Loss: 3.4608
- Memory/max Mem Active(gib): 111.49
- Memory/max Mem Allocated(gib): 108.55
- Memory/device Mem Reserved(gib): 115.27
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 130
- training_steps: 13062
Training results
| Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 5.0386 | 38.18 | 34.21 | 40.07 |
| 2.0527 | 0.0306 | 200 | 4.0322 | 110.22 | 108.55 | 113.79 |
| 1.7769 | 0.0612 | 400 | 3.8333 | 111.49 | 108.55 | 114.85 |
| 1.7225 | 0.0919 | 600 | 3.7381 | 111.49 | 108.55 | 114.91 |
| 1.6762 | 0.1225 | 800 | 3.6606 | 111.49 | 108.55 | 114.91 |
| 1.5949 | 0.1531 | 1000 | 3.6150 | 111.49 | 108.55 | 114.91 |
| 1.5612 | 0.1837 | 1200 | 3.5607 | 111.49 | 108.55 | 114.91 |
| 1.5824 | 0.2143 | 1400 | 3.5324 | 111.49 | 108.55 | 114.91 |
| 1.5325 | 0.2450 | 1600 | 3.5128 | 111.49 | 108.55 | 114.91 |
| 1.4973 | 0.2756 | 1800 | 3.4702 | 111.49 | 108.55 | 114.91 |
| 1.4776 | 0.3062 | 2000 | 3.4229 | 111.49 | 108.55 | 114.91 |
| 1.499 | 0.3368 | 2200 | 3.3929 | 111.49 | 108.55 | 114.91 |
| 1.4651 | 0.3675 | 2400 | 3.3994 | 111.49 | 108.55 | 115.1 |
| 1.405 | 0.3981 | 2600 | 3.3974 | 111.49 | 108.55 | 115.1 |
| 1.4373 | 0.4287 | 2800 | 3.3844 | 111.49 | 108.55 | 115.1 |
| 1.4405 | 0.4593 | 3000 | 3.3675 | 111.49 | 108.55 | 115.1 |
| 1.3784 | 0.4899 | 3200 | 3.3564 | 111.49 | 108.55 | 115.1 |
| 1.3628 | 0.5206 | 3400 | 3.3485 | 111.49 | 108.55 | 115.1 |
| 1.4202 | 0.5512 | 3600 | 3.3180 | 111.49 | 108.55 | 115.1 |
| 1.3739 | 0.5818 | 3800 | 3.3114 | 111.49 | 108.55 | 115.1 |
| 1.3964 | 0.6124 | 4000 | 3.2930 | 111.49 | 108.55 | 115.1 |
| 1.3683 | 0.6430 | 4200 | 3.2908 | 111.49 | 108.55 | 115.1 |
| 1.3018 | 0.6737 | 4400 | 3.2860 | 111.49 | 108.55 | 115.1 |
| 1.3279 | 0.7043 | 4600 | 3.2564 | 111.49 | 108.55 | 115.1 |
| 1.328 | 0.7349 | 4800 | 3.2489 | 111.49 | 108.55 | 115.1 |
| 1.2789 | 0.7655 | 5000 | 3.2754 | 111.49 | 108.55 | 115.1 |
| 1.3026 | 0.7961 | 5200 | 3.2497 | 111.49 | 108.55 | 115.1 |
| 1.3379 | 0.8268 | 5400 | 3.2299 | 111.49 | 108.55 | 115.1 |
| 1.2975 | 0.8574 | 5600 | 3.2398 | 111.49 | 108.55 | 115.1 |
| 1.273 | 0.8880 | 5800 | 3.2268 | 111.49 | 108.55 | 115.1 |
| 1.2248 | 0.9186 | 6000 | 3.2236 | 111.49 | 108.55 | 115.1 |
| 1.2817 | 0.9492 | 6200 | 3.2055 | 111.49 | 108.55 | 115.1 |
| 1.2342 | 0.9799 | 6400 | 3.2175 | 111.49 | 108.55 | 115.1 |
| 1.1946 | 1.0104 | 6600 | 3.2020 | 111.49 | 108.55 | 115.1 |
| 1.1061 | 1.0410 | 6800 | 3.2278 | 111.49 | 108.55 | 115.1 |
| 1.0545 | 1.0717 | 7000 | 3.2217 | 111.49 | 108.55 | 115.1 |
| 1.0494 | 1.1023 | 7200 | 3.2425 | 111.49 | 108.55 | 115.1 |
| 1.0075 | 1.1329 | 7400 | 3.2449 | 111.49 | 108.55 | 115.1 |
| 0.9926 | 1.1635 | 7600 | 3.2459 | 111.49 | 108.55 | 115.1 |
| 0.9304 | 1.1941 | 7800 | 3.2731 | 111.49 | 108.55 | 115.1 |
| 1.019 | 1.2248 | 8000 | 3.2641 | 111.49 | 108.55 | 115.1 |
| 0.9183 | 1.2554 | 8200 | 3.2855 | 111.49 | 108.55 | 115.1 |
| 0.8923 | 1.2860 | 8400 | 3.2828 | 111.49 | 108.55 | 115.1 |
| 0.9785 | 1.3166 | 8600 | 3.3064 | 111.49 | 108.55 | 115.1 |
| 0.8967 | 1.3472 | 8800 | 3.2938 | 111.49 | 108.55 | 115.1 |
| 0.9023 | 1.3779 | 9000 | 3.3157 | 111.49 | 108.55 | 115.1 |
| 0.8973 | 1.4085 | 9200 | 3.3443 | 111.49 | 108.55 | 115.1 |
| 0.8528 | 1.4391 | 9400 | 3.3421 | 111.49 | 108.55 | 115.1 |
| 0.8806 | 1.4697 | 9600 | 3.3535 | 111.49 | 108.55 | 115.1 |
| 0.8064 | 1.5003 | 9800 | 3.3802 | 111.49 | 108.55 | 115.1 |
| 0.8365 | 1.5310 | 10000 | 3.4050 | 111.49 | 108.55 | 115.1 |
| 0.8578 | 1.5616 | 10200 | 3.3657 | 111.49 | 108.55 | 115.1 |
| 0.8425 | 1.5922 | 10400 | 3.3913 | 111.49 | 108.55 | 115.1 |
| 0.8507 | 1.6228 | 10600 | 3.4103 | 111.49 | 108.55 | 115.1 |
| 0.8424 | 1.6534 | 10800 | 3.4128 | 111.49 | 108.55 | 115.1 |
| 0.8283 | 1.6841 | 11000 | 3.4131 | 111.49 | 108.55 | 115.1 |
| 0.8331 | 1.7147 | 11200 | 3.4259 | 111.49 | 108.55 | 115.1 |
| 0.7861 | 1.7453 | 11400 | 3.4313 | 111.49 | 108.55 | 115.1 |
| 0.8445 | 1.7759 | 11600 | 3.4338 | 111.49 | 108.55 | 115.1 |
| 0.82 | 1.8066 | 11800 | 3.4356 | 111.49 | 108.55 | 115.1 |
| 0.791 | 1.8372 | 12000 | 3.4470 | 111.49 | 108.55 | 115.27 |
| 0.8259 | 1.8678 | 12200 | 3.4476 | 111.49 | 108.55 | 115.27 |
| 0.7753 | 1.8984 | 12400 | 3.4526 | 111.49 | 108.55 | 115.27 |
| 0.8402 | 1.9290 | 12600 | 3.4590 | 111.49 | 108.55 | 115.27 |
| 0.7771 | 1.9597 | 12800 | 3.4606 | 111.49 | 108.55 | 115.27 |
| 0.7722 | 1.9903 | 13000 | 3.4608 | 111.49 | 108.55 | 115.27 |
Framework versions
- Transformers 4.55.2
- Pytorch 2.7.1+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- -