You've all seen CLIP guidance and diffusion models used for language-conditioned generation of images, audio, even 3d NeRFs... so we decided to go met
You've all seen CLIP guidance and diffusion models used for language-conditioned generation of images, audio, even 3d NeRFs... so we decided to go met