Upload QwenImageLayeredModularPipeline
#4
by
YiYiXu HF Staff - opened
README.md
CHANGED
|
@@ -5,6 +5,10 @@ tags:
|
|
| 5 |
- diffusers
|
| 6 |
- qwenimage-layered
|
| 7 |
- text-to-image
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
|
| 10 |
|
|
@@ -24,36 +28,10 @@ This modular pipeline is composed of the following blocks:
|
|
| 24 |
|
| 25 |
1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
|
| 26 |
- QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
|
| 27 |
-
- *resize*: `QwenImageLayeredResizeStep`
|
| 28 |
-
- Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
|
| 29 |
-
- *get_image_prompt*: `QwenImageLayeredGetImagePromptStep`
|
| 30 |
-
- Auto-caption step that generates a text prompt from the input image if none is provided.
|
| 31 |
-
- *encode*: `QwenImageTextEncoderStep`
|
| 32 |
-
- Text Encoder step that generates text embeddings to guide the image generation.
|
| 33 |
2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
|
| 34 |
- Vae encoder step that encode the image inputs into their latent representations.
|
| 35 |
-
- *resize*: `QwenImageLayeredResizeStep`
|
| 36 |
-
- Image Resize step that resize the image to a target area (defined by the resolution parameter from user) while maintaining the aspect ratio.
|
| 37 |
-
- *preprocess*: `QwenImageEditProcessImagesInputStep`
|
| 38 |
-
- Image Preprocess step. Images needs to be resized first.
|
| 39 |
-
- *encode*: `QwenImageVaeEncoderStep`
|
| 40 |
-
- VAE Encoder step that converts processed_image into latent representations image_latents.
|
| 41 |
-
- *permute*: `QwenImageLayeredPermuteLatentsStep`
|
| 42 |
-
- Permute image latents from (B, C, 1, H, W) to (B, 1, C, H, W) for Layered packing.
|
| 43 |
3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
|
| 44 |
- Core denoising workflow for QwenImage-Layered img2img task.
|
| 45 |
-
- *input*: `QwenImageLayeredInputStep`
|
| 46 |
-
- Input step that prepares the inputs for the layered denoising step. It:
|
| 47 |
-
- *prepare_latents*: `QwenImageLayeredPrepareLatentsStep`
|
| 48 |
-
- Prepare initial random noise (B, layers+1, C, H, W) for the generation process
|
| 49 |
-
- *set_timesteps*: `QwenImageLayeredSetTimestepsStep`
|
| 50 |
-
- Set timesteps step for QwenImage Layered with custom mu calculation based on image_latents.
|
| 51 |
-
- *prepare_rope_inputs*: `QwenImageLayeredRoPEInputsStep`
|
| 52 |
-
- Step that prepares the RoPE inputs for the denoising process. Should be place after prepare_latents step
|
| 53 |
-
- *denoise*: `QwenImageLayeredDenoiseStep`
|
| 54 |
-
- Denoise step that iteratively denoise the latents.
|
| 55 |
-
- *after_denoise*: `QwenImageLayeredAfterDenoiseStep`
|
| 56 |
-
- Unpack latents from (B, seq, C*4) to (B, C, layers+1, H, W) after denoising.
|
| 57 |
4. **decode** (`QwenImageLayeredDecoderStep`)
|
| 58 |
- Decode unpacked latents (B, C, layers+1, H, W) into layer images.
|
| 59 |
|
|
@@ -68,7 +46,9 @@ This modular pipeline is composed of the following blocks:
|
|
| 68 |
7. vae (`AutoencoderKLQwenImage`)
|
| 69 |
8. pachifier (`QwenImageLayeredPachifier`)
|
| 70 |
9. scheduler (`FlowMatchEulerDiscreteScheduler`)
|
| 71 |
-
10. transformer (`QwenImageTransformer2DModel`)
|
|
|
|
|
|
|
| 72 |
|
| 73 |
**Inputs:**
|
| 74 |
|
|
|
|
| 5 |
- diffusers
|
| 6 |
- qwenimage-layered
|
| 7 |
- text-to-image
|
| 8 |
+
- modular-diffusers
|
| 9 |
+
- diffusers
|
| 10 |
+
- qwenimage-layered
|
| 11 |
+
- text-to-image
|
| 12 |
---
|
| 13 |
This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
|
| 14 |
|
|
|
|
| 28 |
|
| 29 |
1. **text_encoder** (`QwenImageLayeredTextEncoderStep`)
|
| 30 |
- QwenImage-Layered Text encoder step that encode the text prompt, will generate a prompt based on image if not provided.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
2. **vae_encoder** (`QwenImageLayeredVaeEncoderStep`)
|
| 32 |
- Vae encoder step that encode the image inputs into their latent representations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
3. **denoise** (`QwenImageLayeredCoreDenoiseStep`)
|
| 34 |
- Core denoising workflow for QwenImage-Layered img2img task.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
4. **decode** (`QwenImageLayeredDecoderStep`)
|
| 36 |
- Decode unpacked latents (B, C, layers+1, H, W) into layer images.
|
| 37 |
|
|
|
|
| 46 |
7. vae (`AutoencoderKLQwenImage`)
|
| 47 |
8. pachifier (`QwenImageLayeredPachifier`)
|
| 48 |
9. scheduler (`FlowMatchEulerDiscreteScheduler`)
|
| 49 |
+
10. transformer (`QwenImageTransformer2DModel`)
|
| 50 |
+
|
| 51 |
+
## Input/Output Specification
|
| 52 |
|
| 53 |
**Inputs:**
|
| 54 |
|