Extremely slow on 5090
I am trying the demo code, but it seems to take 3+ hours for the 50 steps from the demo... is that normal on a 5090? Or is the base model just not suitable for consumer hardware?
- GPU: NVIDIA GeForce RTX 5090 (32GB)
- CUDA Compute Capability: 12.0 (sm_120)
- NVIDIA Driver: 580.88
- OS: Windows 10 (AMD64)
- Python: 3.11.9
- PyTorch: 2.11.0.dev20251231+cu130 (nightly)
- CUDA: 13.0
- cuDNN: 91200
- Diffusers: 0.37.0.dev0
i think is is maby running on your cpu
Edited comment as I had forgotten to use the lightning 4 step lora. (though actually using 6 steps). Actually pretty decent results; 1920x1080 image in 55 seconds.
Yes sounded like CPU probably one thread too :) I am running the Q5M unsloth variant on a 5070ti generation time for 1024x1024 at 40 steps is about 148s.
You'd think so, and I suspected that too. But I edited the code slightly to disable the cpu option, and I also checked my task-manager and GPU was at full 100% while trying to create an image... very strange. I tried it in comfyUI now and that works fine...
Just use the df11 version https://github.com/LeanModels/DFloat11/issues/30
It fits perfectly on 32 GB VRAM at full quality using DF11
Just use the df11 version https://github.com/LeanModels/DFloat11/issues/30
It fits perfectly on 32 GB VRAM at full quality using DF11
Problem...it has no comfy ui support until now!
The 5090 is a 32GB graphics card, but doesn't the model have a total of 40GB? Can it run?
5090是32G显卡,但是模型不是有40G的总量吗?能跑起来?