license: apache-2.0 | |
base_model: | |
- openai/clip-vit-base-patch32 | |
# Multimodal Learning for Autoencoders | |
Repository of my SIGGRAPH Asia publication. | |
In Multimodal Autoencoder the image is reconstructed using image and text inputs rather than just only image input. | |
https://dl.acm.org/doi/10.1145/3681756.3697974 |