bidiptas
/

PG-InstructBLIP

image-captioning

Model card Files Files and versions Community

bidiptas commited on Sep 20, 2023

Commit

f536ed7

•

1 Parent(s): ee6345a

Correct output from test.py

Files changed (2) hide show

README.md +2 -2
test.py +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ PG-InstructBLIP is finetuned using the [PhysObjects dataset](https://drive.googl
 ## Example Usage and Installation
-This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
 After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
@@ -62,7 +62,7 @@ question_samples = {
 answers, scores = generate(vlm, question_samples, length_penalty=0, repetition_penalty=1, num_captions=3)
 print(answers, scores)
-# (['opaque', 'translucent', 'transparent'], tensor([-0.0448, -4.1387, -4.2793], device='cuda:0'))
 ```
 Note that the output of the generate function includes the log probabilities of each generation. For categorical properties (like material, transparency, and contents), these probabilities can be interpreted as confidences, as typical with VLMs. In the example above, PG-InstructBLIP is very confident that the ceramic bowl is opaque, which is true.

 ## Example Usage and Installation
+This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) from source and download this model through git-lfs or direct downloading.
 After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
 answers, scores = generate(vlm, question_samples, length_penalty=0, repetition_penalty=1, num_captions=3)
 print(answers, scores)
+# ['opaque', 'translucent', 'transparent'] tensor([-0.0373, -4.2404, -4.4436], device='cuda:0')
 ```
 Note that the output of the generate function includes the log probabilities of each generation. For categorical properties (like material, transparency, and contents), these probabilities can be interpreted as confidences, as typical with VLMs. In the example above, PG-InstructBLIP is very confident that the ceramic bowl is opaque, which is true.

test.py CHANGED Viewed

@@ -35,4 +35,4 @@ question_samples = {
 answers, scores = generate(vlm, question_samples, length_penalty=0, repetition_penalty=1, num_captions=3)
 print(answers, scores)
-# (['opaque', 'translucent', 'transparent'], tensor([-0.0448, -4.1387, -4.2793], device='cuda:0'))

 answers, scores = generate(vlm, question_samples, length_penalty=0, repetition_penalty=1, num_captions=3)
 print(answers, scores)
+# ['opaque', 'translucent', 'transparent'] tensor([-0.0373, -4.2404, -4.4436], device='cuda:0')