Extractor_Adaptor_Qwen3_0.6b

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0554

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 4.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4239 0.1047 50 0.3683
0.1005 0.2095 100 0.1148
0.0853 0.3142 150 0.0908
0.0645 0.4190 200 0.0831
0.0767 0.5237 250 0.0755
0.0826 0.6284 300 0.0706
0.0882 0.7332 350 0.0669
0.0726 0.8379 400 0.0635
0.0711 0.9427 450 0.0620
0.0433 1.0461 500 0.0622
0.0488 1.1508 550 0.0594
0.0347 1.2556 600 0.0587
0.0475 1.3603 650 0.0591
0.0485 1.4650 700 0.0554
0.0421 1.5698 750 0.0541
0.0395 1.6745 800 0.0546
0.0483 1.7793 850 0.0520
0.0421 1.8840 900 0.0553
0.0735 1.9887 950 0.0510
0.0233 2.0922 1000 0.0550
0.027 2.1969 1050 0.0544
0.0213 2.3016 1100 0.0516
0.0284 2.4064 1150 0.0526
0.0175 2.5111 1200 0.0524
0.0218 2.6159 1250 0.0526
0.0253 2.7206 1300 0.0511
0.0227 2.8253 1350 0.0518
0.0304 2.9301 1400 0.0513
0.018 3.0335 1450 0.0516
0.0193 3.1383 1500 0.0543
0.0243 3.2430 1550 0.0560
0.0213 3.3477 1600 0.0553
0.0157 3.4525 1650 0.0553
0.0264 3.5572 1700 0.0551
0.0439 3.6620 1750 0.0549
0.0164 3.7667 1800 0.0550
0.0245 3.8714 1850 0.0550
0.0168 3.9762 1900 0.0550

Framework versions

  • PEFT 0.17.1
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
238
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdo-Mansour/Extractor_Adaptor_Qwen3_0.6b

Finetuned
Qwen/Qwen3-0.6B
Adapter
(216)
this model

Evaluation results