πŸ–ΌοΈπŸ“ OneEncoder: A Unified Text & Image Model

OneEncoder is a lightweight framework for cross-modal alignment, focusing on efficiently integrating text and images (with future extensions to other modalities). Unlike traditional methods relying on massive modality-specific encoders, OneEncoder progressively aligns different data types, making it cost-effective and performant even on small paired datasets.

πŸš€ Key Features

βœ… Multimodal Alignment: Initially supports text & image, with extension to other modalities.
βœ… Lightweight & Efficient: Avoids full retraining when adding new modalities.
βœ… Superior Performance: Outperforms models that require large specialized datasets.

🎯 Applications

  • Visual Question Answering (VQA)
  • Image-Text Retrieval
  • Multimodal Content Understanding

πŸ“„ Research Paper

πŸ“œ arXiv: OneEncoder: Progressive Cross-Modal Alignment

πŸ“Œ Resources

πŸ”— GitHub Repo: OneEncoder
πŸš€ Hugging Face Demo: OneEncoder Retriever
πŸ““ Demo Notebook: OneEncoder Demos
πŸ”Š OneEncoder for Text, Image with temperature=2.5: HF Model
πŸ”Š OneEncoder for Text, Image & Audio: HF Model
πŸ”Š OneEncoder for Text, Image & Video: HF Model
πŸ”Š OneEncoder for Text, Image & X-ray: HF Model

πŸ“ Authors

πŸ“Œ Bilal FAYE, Hanane AZZAG, Mustapha LEBBAH, Djamel BOUCHAFFRA

Note: This model is training with temperature=1.0 and addition as fusion operation

Downloads last month
8
Safetensors
Model size
198M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for bilalfaye/OneEncoder

Finetuned
(3343)
this model