metadata
library_name: diffusers
tags:
- modular-diffusers
- diffusers
- qwenimage-layered
- text-to-image
- modular-diffusers
- diffusers
- qwenimage-layered
- text-to-image
This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: QwenImageLayeredAutoBlocks
Description: Auto Modular pipeline for layered denoising tasks using QwenImage-Layered.
This pipeline uses a 4-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- text_encoder (
QwenImageLayeredTextEncoderStep)- QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
- vae_encoder (
QwenImageLayeredVaeEncoderStep)- Vae encoder step that encode the image inputs into their latent representations.
- denoise (
QwenImageLayeredCoreDenoiseStep)- Core denoising workflow for QwenImage-Layered img2img task.
- decode (
QwenImageLayeredDecoderStep)- Decode unpacked latents (B, C, layers+1, H, W) into layer images.
Model Components
- image_resize_processor (
VaeImageProcessor) - text_encoder (
Qwen2_5_VLForConditionalGeneration) - processor (
Qwen2VLProcessor) - tokenizer (
Qwen2Tokenizer): The tokenizer to use - guider (
ClassifierFreeGuidance) - image_processor (
VaeImageProcessor) - vae (
AutoencoderKLQwenImage) - pachifier (
QwenImageLayeredPachifier) - scheduler (
FlowMatchEulerDiscreteScheduler) - transformer (
QwenImageTransformer2DModel)
Input/Output Specification
Inputs:
image(Image | list): Reference image(s) for denoising. Can be a single image or list of images.resolution(int, optional, defaults to640): The target area to resize the image to, can be 1024 or 640prompt(str, optional): The prompt or prompts to guide image generation.use_en_prompt(bool, optional, defaults toFalse): Whether to use English prompt templatenegative_prompt(str, optional): The prompt or prompts not to guide the image generation.max_sequence_length(int, optional, defaults to1024): Maximum sequence length for prompt encoding.generator(Generator, optional): Torch generator for deterministic generation.num_images_per_prompt(int, optional, defaults to1): The number of images to generate per prompt.latents(Tensor, optional): Pre-generated noisy latents for image generation.layers(int, optional, defaults to4): Number of layers to extract from the imagenum_inference_steps(int, optional, defaults to50): The number of denoising steps.sigmas(list, optional): Custom sigmas for the denoising process.attention_kwargs(dict, optional): Additional kwargs for attention processors.**denoiser_input_fields(None, optional): conditional model inputs for the denoiser: e.g. prompt_embeds, negative_prompt_embeds, etc.output_type(str, optional, defaults topil): Output format: 'pil', 'np', 'pt'.
Outputs:
images(list): Generated images.