Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
<u><b>We are creating a spatial aware vision-language(VL) model.</b></u>
|
2 |
|
3 |
This is a trained model on COCO dataset images including extra information regarding the spatial relationship between the entities of the image.
|
@@ -6,7 +19,8 @@ This is a sequence to sequence model for image-captioning. The architecture is <
|
|
6 |
|
7 |
<details>
|
8 |
<summary>Requirements!</summary>
|
9 |
-
- 4GB RAM.
|
|
|
10 |
</details>
|
11 |
|
12 |
The way to download and run this:
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- ydshieh/coco_dataset_script
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- code_eval
|
9 |
+
library_name: transformers
|
10 |
+
pipeline_tag: image-to-text
|
11 |
+
tags:
|
12 |
+
- text-generation-inference
|
13 |
+
---
|
14 |
<u><b>We are creating a spatial aware vision-language(VL) model.</b></u>
|
15 |
|
16 |
This is a trained model on COCO dataset images including extra information regarding the spatial relationship between the entities of the image.
|
|
|
19 |
|
20 |
<details>
|
21 |
<summary>Requirements!</summary>
|
22 |
+
- 4GB GPU RAM.
|
23 |
+
- CUDA enabled docker
|
24 |
</details>
|
25 |
|
26 |
The way to download and run this:
|