dt-cococaps / README.md

patrickamadeus

Upload DualTowerVLM using push_to_hub

ecabf70 verified 2 days ago

preview code

raw

history blame contribute delete

756 Bytes

metadata

library_name: dualtowervlm
license: mit
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - multimodal
  - dual-tower
  - research

DualTowerVLM is a dual-tower Vision-Language Model (VLM) architecture that processes images and text through separate towers before combining their representations.

For more information, check out the repository.

Usage:

from models.dual_tower.dual_tower import DualTowerVLM
from models.config import VLMConfig

cfg = VLMConfig()
model = DualTowerVLM.from_pretrained("patrickamadeus/dt-cococaps")