sadassa17
/

rgb-language_cap

vision-encoder-decoder

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sadassa17 commited on Feb 5

Commit

c082bd8

•

1 Parent(s): df8cf53

Update README.md

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
 <u><b>We are creating a spatial aware vision-language(VL) model.</b></u>
 This is a trained model on COCO dataset images including extra information regarding the spatial relationship between the entities of the image.
@@ -6,7 +19,8 @@ This is a sequence to sequence model for image-captioning. The architecture is <
 <details>
   <summary>Requirements!</summary>
-- 4GB RAM.
 </details>
 The way to download and run this:

+---
+license: mit
+datasets:
+- ydshieh/coco_dataset_script
+language:
+- en
+metrics:
+- code_eval
+library_name: transformers
+pipeline_tag: image-to-text
+tags:
+- text-generation-inference
+---
 <u><b>We are creating a spatial aware vision-language(VL) model.</b></u>
 This is a trained model on COCO dataset images including extra information regarding the spatial relationship between the entities of the image.
 <details>
   <summary>Requirements!</summary>
+- 4GB GPU RAM.
+- CUDA enabled docker
 </details>
 The way to download and run this: