Upload QwenImageLayeredModularPipeline

#4
by YiYiXu HF Staff - opened
Files changed (1) hide show
  1. README.md +7 -27
README.md CHANGED
@@ -5,6 +5,10 @@ tags:
5
  - diffusers
6
  - qwenimage-layered
7
  - text-to-image
 
 
 
 
8
  ---
9
  This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
10
 
@@ -24,36 +28,10 @@ This modular pipeline is composed of the following blocks:
24
 
25
  1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
26
  - QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
27
- - *resize*: `QwenImageLayeredResizeStep`
28
- - Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
29
- - *get_image_prompt*: `QwenImageLayeredGetImagePromptStep`
30
- - Auto-caption step that generates a text prompt from the input image if none is provided.
31
- - *encode*: `QwenImageTextEncoderStep`
32
- - Text Encoder step that generates text embeddings to guide the image generation.
33
  2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
34
  - Vae encoder step that encode the image inputs into their latent representations.
35
- - *resize*: `QwenImageLayeredResizeStep`
36
- - Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
37
- - *preprocess*: `QwenImageEditProcessImagesInputStep`
38
- - Image Preprocess step. Images needs to be resized first.
39
- - *encode*: `QwenImageVaeEncoderStep`
40
- - VAE Encoder step that converts processed_image into latent representations image_latents.
41
- - *permute*: `QwenImageLayeredPermuteLatentsStep`
42
- - Permute image latents from (B, C, 1, H, W) to (B, 1, C, H, W) for Layered packing.
43
  3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
44
  - Core denoising workflow for QwenImage-Layered img2img task.
45
- - *input*: `QwenImageLayeredInputStep`
46
- - Input step that prepares the inputs for the layered denoising step. It:
47
- - *prepare_latents*: `QwenImageLayeredPrepareLatentsStep`
48
- - Prepare initial random noise (B, layers+1, C, H, W) for the generation process
49
- - *set_timesteps*: `QwenImageLayeredSetTimestepsStep`
50
- - Set timesteps step for QwenImage Layered with custom mu calculation based on image_latents.
51
- - *prepare_rope_inputs*: `QwenImageLayeredRoPEInputsStep`
52
- - Step that prepares the RoPE inputs for the denoising process. Should be place after prepare_latents step
53
- - *denoise*: `QwenImageLayeredDenoiseStep`
54
- - Denoise step that iteratively denoise the latents.
55
- - *after_denoise*: `QwenImageLayeredAfterDenoiseStep`
56
- - Unpack latents from (B, seq, C*4) to (B, C, layers+1, H, W) after denoising.
57
  4. **decode** (`QwenImageLayeredDecoderStep`)
58
  - Decode unpacked latents (B, C, layers+1, H, W) into layer images.
59
 
@@ -68,7 +46,9 @@ This modular pipeline is composed of the following blocks:
68
  7. vae (`AutoencoderKLQwenImage`)
69
  8. pachifier (`QwenImageLayeredPachifier`)
70
  9. scheduler (`FlowMatchEulerDiscreteScheduler`)
71
- 10. transformer (`QwenImageTransformer2DModel`) ## Input/Output Specification
 
 
72
 
73
  **Inputs:**
74
 
 
5
  - diffusers
6
  - qwenimage-layered
7
  - text-to-image
8
+ - modular-diffusers
9
+ - diffusers
10
+ - qwenimage-layered
11
+ - text-to-image
12
  ---
13
  This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
14
 
 
28
 
29
  1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
30
  - QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
 
 
 
 
 
 
31
  2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
32
  - Vae encoder step that encode the image inputs into their latent representations.
 
 
 
 
 
 
 
 
33
  3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
34
  - Core denoising workflow for QwenImage-Layered img2img task.
 
 
 
 
 
 
 
 
 
 
 
 
35
  4. **decode** (`QwenImageLayeredDecoderStep`)
36
  - Decode unpacked latents (B, C, layers+1, H, W) into layer images.
37
 
 
46
  7. vae (`AutoencoderKLQwenImage`)
47
  8. pachifier (`QwenImageLayeredPachifier`)
48
  9. scheduler (`FlowMatchEulerDiscreteScheduler`)
49
+ 10. transformer (`QwenImageTransformer2DModel`)
50
+
51
+ ## Input/Output Specification
52
 
53
  **Inputs:**
54