diffusers
/

Qwen-Image-Layered-modular

@@ -5,6 +5,10 @@ tags:
 - diffusers
 - qwenimage-layered
 - text-to-image
 ---
 This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
@@ -24,36 +28,10 @@ This modular pipeline is composed of the following blocks:
 1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
    - QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
-   - *resize*: `QwenImageLayeredResizeStep`
-     - Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
-   - *get_image_prompt*: `QwenImageLayeredGetImagePromptStep`
-     - Auto-caption step that generates a text prompt from the input image if none is provided.
-   - *encode*: `QwenImageTextEncoderStep`
-     - Text Encoder step that generates text embeddings to guide the image generation.
 2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
    - Vae encoder step that encode the image inputs into their latent representations.
-   - *resize*: `QwenImageLayeredResizeStep`
-     - Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
-   - *preprocess*: `QwenImageEditProcessImagesInputStep`
-     - Image Preprocess step. Images needs to be resized first.
-   - *encode*: `QwenImageVaeEncoderStep`
-     - VAE Encoder step that converts processed_image into latent representations image_latents.
-   - *permute*: `QwenImageLayeredPermuteLatentsStep`
-     - Permute image latents from (B, C, 1, H, W) to (B, 1, C, H, W) for Layered packing.
 3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
    - Core denoising workflow for QwenImage-Layered img2img task.
-   - *input*: `QwenImageLayeredInputStep`
-     - Input step that prepares the inputs for the layered denoising step. It:
-   - *prepare_latents*: `QwenImageLayeredPrepareLatentsStep`
-     - Prepare initial random noise (B, layers+1, C, H, W) for the generation process
-   - *set_timesteps*: `QwenImageLayeredSetTimestepsStep`
-     - Set timesteps step for QwenImage Layered with custom mu calculation based on image_latents.
-   - *prepare_rope_inputs*: `QwenImageLayeredRoPEInputsStep`
-     - Step that prepares the RoPE inputs for the denoising process. Should be place after prepare_latents step
-   - *denoise*: `QwenImageLayeredDenoiseStep`
-     - Denoise step that iteratively denoise the latents.
-   - *after_denoise*: `QwenImageLayeredAfterDenoiseStep`
-     - Unpack latents from (B, seq, C*4) to (B, C, layers+1, H, W) after denoising.
 4. **decode** (`QwenImageLayeredDecoderStep`)
    - Decode unpacked latents (B, C, layers+1, H, W) into layer images.
@@ -68,7 +46,9 @@ This modular pipeline is composed of the following blocks:
 7. vae (`AutoencoderKLQwenImage`)
 8. pachifier (`QwenImageLayeredPachifier`)
 9. scheduler (`FlowMatchEulerDiscreteScheduler`)
-10. transformer (`QwenImageTransformer2DModel`)  ## Input/Output Specification
 **Inputs:**

 - diffusers
 - qwenimage-layered
 - text-to-image
+- modular-diffusers
+- diffusers
+- qwenimage-layered
+- text-to-image
 ---
 This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
 1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
    - QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
 2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
    - Vae encoder step that encode the image inputs into their latent representations.
 3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
    - Core denoising workflow for QwenImage-Layered img2img task.
 4. **decode** (`QwenImageLayeredDecoderStep`)
    - Decode unpacked latents (B, C, layers+1, H, W) into layer images.
 7. vae (`AutoencoderKLQwenImage`)
 8. pachifier (`QwenImageLayeredPachifier`)
 9. scheduler (`FlowMatchEulerDiscreteScheduler`)
+10. transformer (`QwenImageTransformer2DModel`)
+## Input/Output Specification
 **Inputs:**