Text-to-Image
Diffusers
Safetensors
English
ZImagePipeline

Z-model dissect and compare to SDXL models

#113
by telcom - opened

Pipeline pieces (model_index.json):
scheduler -> ['diffusers', 'FlowMatchEulerDiscreteScheduler']
text_encoder -> ['transformers', 'Qwen3Model']
tokenizer -> ['transformers', 'Qwen2Tokenizer']
transformer -> ['diffusers', 'ZImageTransformer2DModel']
vae -> ['diffusers', 'AutoencoderKL']

[Text encoder]
architecture=Qwen3ForCausalLM
layers=36, hidden=2560, heads=32, intermediate=9728
vocab=151936, max_positions=40960
params=n/a

[Transformer]
class=ZImageTransformer2DModel
dim=3840, layers=30, heads=30
in_channels=16, cap_feat_dim=2560
patch_size=[2], f_patch_size=[1]
params=n/a

[VAE]
class=AutoencoderKL
sample_size=1024, in_channels=3, latent_channels=16, out_channels=3
block_out_channels=[128, 256, 512, 512], scaling_factor=0.3611
params=n/a

[Scheduler]
class=FlowMatchEulerDiscreteScheduler, timesteps=1000, shift=3.0

Compare it to WAI REAL CN (SDXL -based) models like telcom/deewaiREALCN

image

Sign up or log in to comment