Z-model dissect and compare to SDXL models
Pipeline pieces (model_index.json):
scheduler -> ['diffusers', 'FlowMatchEulerDiscreteScheduler']
text_encoder -> ['transformers', 'Qwen3Model']
tokenizer -> ['transformers', 'Qwen2Tokenizer']
transformer -> ['diffusers', 'ZImageTransformer2DModel']
vae -> ['diffusers', 'AutoencoderKL']
[Text encoder]
architecture=Qwen3ForCausalLM
layers=36, hidden=2560, heads=32, intermediate=9728
vocab=151936, max_positions=40960
params=n/a
[Transformer]
class=ZImageTransformer2DModel
dim=3840, layers=30, heads=30
in_channels=16, cap_feat_dim=2560
patch_size=[2], f_patch_size=[1]
params=n/a
[VAE]
class=AutoencoderKL
sample_size=1024, in_channels=3, latent_channels=16, out_channels=3
block_out_channels=[128, 256, 512, 512], scaling_factor=0.3611
params=n/a
[Scheduler]
class=FlowMatchEulerDiscreteScheduler, timesteps=1000, shift=3.0
Compare it to WAI REAL CN (SDXL -based) models like telcom/deewaiREALCN
