asaakyan
/

LLaVA-1.5-7b-eViL-VFLUTE-lora

@@ -1,65 +1,66 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
@@ -69,133 +70,54 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- art
+datasets:
+- ColumbiaNLP/V-FLUTE
+language:
+- en
+metrics:
+- f1
 ---
 # Model Card for Model ID
+This is the checkpoint for the model from the paper [V-FLUTE: Visual Figurative Language Understanding with Textual Explanations](https://arxiv.org/abs/2405.01474).
+Specifically, it is the best performing fine-tuned model on a combination of V-FLUTE and e-ViL (e-SNLI-VE) datasets with early stopping based on the V-FLUTE validation set.
 ## Model Details
 ### Model Description
+See more on LLaVA 1.5 here: https://github.com/haotian-liu/LLaVA
+V-FLUTE dataset: https://huggingface.co/datasets/ColumbiaNLP/V-FLUTE
+V-FLUTE paper: https://arxiv.org/abs/2405.01474
+Citation:
+```
+@misc{saakyan2024vflute,
+      title={V-FLUTE: Visual Figurative Language Understanding with Textual Explanations},
+      author={Arkadiy Saakyan and Shreyas Kulkarni and Tuhin Chakrabarty and Smaranda Muresan},
+      year={2024},
+      eprint={2405.01474},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** Arkadiy Saakyan (ColumbiaNLP)
+- **Model type:** Vision-Language Model
+- **Language(s) (NLP):** English
+- **Finetuned from model [optional]:** LLaVA-v1.5
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/asaakyan/V-FLUTE
+- **Paper [optional]:** https://arxiv.org/abs/2405.01474
 ## Uses
+The model's intended use is limited to interpreting multimodal figurative inputs such as metaphors, similes, idioms, sarcasm, and humor.
 ### Out-of-Scope Use
+The model may not work well for other general instruction-following usecases.
 [More Information Needed]
 ## Bias, Risks, and Limitations
+The V-FLUTE dataset or its source datasets may contain bias, especially in datasets reflecting user-generated distributions (memecap and muse).
 ### Recommendations
 ## How to Get Started with the Model
+Install LLaVA as described here: https://github.com/asaakyan/LLaVA/tree/6f595efcf2699884f18957ee603986cebfaa9df7
+```
+from llava.model.builder import load_pretrained_model
+from llava.mm_utils import get_model_name_from_path
+from llava.eval.run_llava_mod import eval_model
+model_base =  "llava-v1.5-7b"
+model_dir = "llava-v1.5-7b-evil-vflue-v2-lora"
+model_name = get_model_name_from_path(model_path)
+tokenizer, model, image_processor, context_len = load_pretrained_model(
+    model_path=model_path,
+    model_base=model_base,
+    model_name=model_name,
+    load_4bit=False
+)
+prompt = """Does the illustration affirm or contest the claim "Feeling motivated and energetic after only cleaning a room minimally."? Provide your argument and choose a label: entailment or contradiction."""
+image_file = f"{image_path}/27.png"
+infer_args = type('Args', (), {
+"model_name": model_name,
+    "model": model,
+    "tokenizer": tokenizer,
+    "image_processor": image_processor,
+    "query": prompt,
+    "conv_mode": None,
+    "image_file": image_file,
+    "sep": ",",
+    "temperature": 0,
+    "top_p": None,
+    "num_beams": 3,
+    "max_new_tokens": 512
+})()
+output = eval_model(infer_args)
+print(output)
+```
 ## Training Details
+See [here](https://github.com/asaakyan/LLaVA/tree/6f595efcf2699884f18957ee603986cebfaa9df7/scripts/vflute)
+or [here](https://github.com/asaakyan/V-FLUTE)
+### Training Data
+https://huggingface.co/datasets/ColumbiaNLP/V-FLUTE
 ## Model Card Contact
+a.saakyan@cs.columbia.edu