Extractor_Adaptor_Qwen3_0.6b

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.2
num_epochs: 4.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.4239	0.1047	50	0.3683
0.1005	0.2095	100	0.1148
0.0853	0.3142	150	0.0908
0.0645	0.4190	200	0.0831
0.0767	0.5237	250	0.0755
0.0826	0.6284	300	0.0706
0.0882	0.7332	350	0.0669
0.0726	0.8379	400	0.0635
0.0711	0.9427	450	0.0620
0.0433	1.0461	500	0.0622
0.0488	1.1508	550	0.0594
0.0347	1.2556	600	0.0587
0.0475	1.3603	650	0.0591
0.0485	1.4650	700	0.0554
0.0421	1.5698	750	0.0541
0.0395	1.6745	800	0.0546
0.0483	1.7793	850	0.0520
0.0421	1.8840	900	0.0553
0.0735	1.9887	950	0.0510
0.0233	2.0922	1000	0.0550
0.027	2.1969	1050	0.0544
0.0213	2.3016	1100	0.0516
0.0284	2.4064	1150	0.0526
0.0175	2.5111	1200	0.0524
0.0218	2.6159	1250	0.0526
0.0253	2.7206	1300	0.0511
0.0227	2.8253	1350	0.0518
0.0304	2.9301	1400	0.0513
0.018	3.0335	1450	0.0516
0.0193	3.1383	1500	0.0543
0.0243	3.2430	1550	0.0560
0.0213	3.3477	1600	0.0553
0.0157	3.4525	1650	0.0553
0.0264	3.5572	1700	0.0551
0.0439	3.6620	1750	0.0549
0.0164	3.7667	1800	0.0550
0.0245	3.8714	1850	0.0550
0.0168	3.9762	1900	0.0550

Base model

Finetuned

Adapter

(216)

this model