The VAE used for Stable Diffusion 1.x/2.x and other models (KL-F8) has a critical flaw, probably due to bad training, that is holding back all models that use it (almost certainly including DALL-E 3).
Short summary for those who are technically inclined:
CompVis fucked up the KL divergence loss on the KL-F8 VAE that is used by SD1.x, SD2.x, SVD, DALL-E 3, and probably other models. As a result, the latent space created by it has a massive KL divergence and is smuggling global information about the image through a few pixels. If you are thinking of using it for training a new, trained-from-scratch foundation model, don't! (for the less technically inclined this does not mean switch out your ...
Read more at reddit.com