AbstractPhil
/

vae-lyra

text-embeddings

Model card Files Files and versions

AbstractPhil commited on Nov 6, 2025

Commit

004b4b6

·

verified ·

1 Parent(s): 60adb7c

Update model card (step 5625)

Files changed (1) hide show

README.md +79 -3

README.md CHANGED Viewed

@@ -1,3 +1,79 @@
----
-license: mit
----

+---
+tags:
+- vae
+- multimodal
+- text-embeddings
+- clip
+- t5
+license: mit
+---
+# VAE Lyra 🎵
+Multi-modal Variational Autoencoder for text embedding transformation using geometric fusion.
+## Model Details
+- **Fusion Strategy**: cantor
+- **Latent Dimension**: 768
+- **Training Steps**: 5,625
+- **Best Loss**: 0.2159
+## Architecture
+- **Modalities**: CLIP-L (768d) + T5-base (768d)
+- **Encoder Layers**: 3
+- **Decoder Layers**: 3
+- **Hidden Dimension**: 1024
+## Usage
+```python
+from geovocab2.train.model.vae.vae_lyra import MultiModalVAE, MultiModalVAEConfig
+from huggingface_hub import hf_hub_download
+import torch
+# Download model
+model_path = hf_hub_download(
+    repo_id="AbstractPhil/vae-lyra",
+    filename="model.pt"
+)
+# Load checkpoint
+checkpoint = torch.load(model_path)
+# Create model
+config = MultiModalVAEConfig(
+    modality_dims={"clip": 768, "t5": 768},
+    latent_dim=768,
+    fusion_strategy="cantor"
+)
+model = MultiModalVAE(config)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Use model
+inputs = {
+    "clip": clip_embeddings,  # [batch, 77, 768]
+    "t5": t5_embeddings        # [batch, 77, 768]
+}
+reconstructions, mu, logvar = model(inputs)
+```
+## Training Details
+- Trained on 10,000 diverse prompts
+- Mix of LAION flavors (85%) and synthetic prompts (15%)
+- KL Annealing: True
+- Learning Rate: 0.0001
+## Citation
+```bibtex
+@software{vae_lyra_2025,
+  author = {AbstractPhil},
+  title = {VAE Lyra: Multi-Modal Variational Autoencoder},
+  year = {2025},
+  url = {https://huggingface.co/AbstractPhil/vae-lyra}
+}
+```