Stable Cascade

Introduction of Stable Cascade

Stable Diffusion has come a long way to be an open-sourced AI Image generator in the market. We all know that image generated by AI often has flaws, particularly in Stable Diffusion, with its famous deformed human organs with three fingers or misplaced eyes, not-so-perfect added text prompts, or even slow rendering when users use low GPU. This is where Stability AI is trying to solve with Stable Cascade. Unlike the previous Stable Diffusion base model such as SD 1.5, and SDXL, Stable Cascade promised use less computing power to train and is better at following prompts. In this blog post, we will break it down in the most beginner-friendly language possible.

Technicals

Stable Cascade uses three different models that rely on the Würstchen architecture. The first stage, called Stage C, compresses text prompts into smaller latent space (basically a piece of code) and then passes to stages A & Stage B to decode the request. How does it work? The smaller the latent space, the faster you can run inference and the cheaper the training becomes.

The key to Würstchen’s efficiency is its use of a two-stage compression process. The first stage uses a VQ-VAE to compress images into a latent space that is 4 times smaller than the latent space used by Stable Diffusion. The second stage uses a diffusion model to further compress the latent space by another factor of 10. This results in a total compression ratio of 40, which is significantly higher than the compression ratio of 8 used by Stable Diffusion.

The compressed latent space allows the text-to-image diffusion model in Würstchen to be much smaller and faster to train than the model in Stable Diffusion. This makes it possible to train Würstchen on a single GPU in just 24,000 GPU hours, while Stable Diffusion 1.4 requires 150,000 GPU hours.

Despite its efficiency, Würstchen is able to generate images that are of comparable quality to those generated by Stable Diffusion. In some cases, Würstchen can even generate images that are of higher quality, such as images with higher resolutions or images that contain more detail. By breaking the requests into smaller bits means it will require less memory and can load faster. It took about 10 seconds to create an image compared to 22 seconds for the SDXL model used currently.

Features & Improvement

Text Rendering

Reddit & Civitai discussions has been buzzed about Stable Cascade’s ability to have text rendering. This is no doubt a huge step from the previous version SD models where generating text can be quite challenging. Below are the examples of the text render from Stable Cascade

Better results on human anatomy

Compared to SDXL, Stable Cascade has vastly superior compositions, the correct amount of limbs (no more triple-legged, three fingers human), correct hands and feet (correct amount of fingers and toes in the correct places). It is quite evident in the below examples.

Conclusion

It’s just been approximately a week since the announcement of Stable Cascade and the response from Stable Diffusion Community has been quite positive. Github, CivitAI developers, and Hugging Face developers are already working on fine-tuning using LoRA, ControlNet, and other features. We look forward to further accomplishments and updates in the future months. Meanwhile, try the Fooocus UI, which is accessible in DiffusionHub. It is quick, efficient, and convenient to bring with you on the go!

Share on Facebook

Post on X

Save

Uncategorized

Comments (3)

registrazione binance says:

March 3, 2024 at 11:12 pm

Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me. https://www.binance.com/it/register?ref=DB40ITMB

Zarejestruj sie na www.binance.com says:

March 9, 2024 at 5:34 pm

I don’t think the title of your article matches the content lol. Just kidding, mainly because I had some doubts after reading the article.

labākais binance norādījuma kods says:

March 13, 2024 at 11:33 am

Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me?

Introduction of Stable Cascade

Technicals

Features & Improvement

Conclusion

Related Posts

Comments (3)

Leave a Reply Cancel reply