ruitongs commited on
Commit
69c10fa
·
verified ·
1 Parent(s): 54633de

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-VL-3B-Instruct
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - qwen2.5-vl
7
+ - lora
8
+ - sft
9
+ - context-classification
10
+ - out-of-context-detection
11
+ - coinco
12
+ license: cc-by-4.0
13
+ ---
14
+
15
+ # COinCO Context Classification Models
16
+
17
+ **Authors:** Tianze Yang\*, Tyson Jordan\*, Ruitong Sun\*, Ninghao Liu, Jin Sun
18
+ \*Equal contribution
19
+ **Affiliation:** University of Georgia
20
+
21
+ ## Overview
22
+
23
+ Fine-grained context classification models for detecting **out-of-context objects** in images. Each model is a fully merged Qwen2.5-VL-3B-Instruct fine-tuned via LoRA on the [COinCO dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset).
24
+
25
+ The models classify whether an object (marked by a red bounding box) is **in-context** or **out-of-context** based on three criteria:
26
+
27
+ | Model | Criterion | Description |
28
+ |-------|-----------|-------------|
29
+ | `co_occurrence/` | Co-occurrence | Whether the object can reasonably appear together with other objects in the scene |
30
+ | `location/` | Location | Whether the object is placed in a physically and contextually reasonable position |
31
+ | `size/` | Size | Whether the object's size is proportional and realistic relative to other objects |
32
+
33
+ ## How to Use
34
+
35
+ ```python
36
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
37
+ import torch
38
+
39
+ # Choose a model: "co_occurrence", "location", or "size"
40
+ model_id = "COinCO/Context_Classification_Models"
41
+ subfolder = "co_occurrence" # or "location" or "size"
42
+
43
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
44
+ model_id,
45
+ subfolder=subfolder,
46
+ torch_dtype=torch.float16,
47
+ device_map="auto",
48
+ )
49
+ processor = AutoProcessor.from_pretrained(model_id, subfolder=subfolder)
50
+ ```
51
+
52
+ ## Training Details
53
+
54
+ - **Base Model:** [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
55
+ - **Method:** LoRA fine-tuning (merged into base model)
56
+ - **Dataset:** [COinCO](https://huggingface.co/datasets/COinCO/COinCO-dataset) inpainted images with multi-model consensus labels
57
+ - **Training Data:** ~5,000 samples per criterion from the training split
58
+ - **Epochs:** 3
59
+ - **Learning Rate:** 2e-4
60
+ - **LoRA Rank:** See adapter config for details
61
+
62
+ ## Evaluation Results
63
+
64
+ ### Inpainted Test Set (binary classification: In-context vs Out-of-context)
65
+
66
+ | Criterion | Baseline (Qwen2.5-VL-3B) | Fine-tuned | Improvement |
67
+ |-----------|--------------------------|------------|-------------|
68
+ | Co-occurrence | 75.54% | **80.82%** | +5.28% |
69
+ | Location | 74.43% | 71.05% | -3.38% |
70
+ | Size | 50.21% | **66.01%** | +15.80% |
71
+
72
+ ### Real COCO Images (shortcut learning detection, higher = less shortcut reliance)
73
+
74
+ | Criterion | Baseline | Fine-tuned | Improvement |
75
+ |-----------|----------|------------|-------------|
76
+ | Co-occurrence | 88.95% | 87.00% | -1.95% |
77
+ | Location | 47.55% | **91.35%** | +43.80% |
78
+ | Size | 52.55% | **83.20%** | +30.65% |
79
+
80
+ ## Related Resources
81
+
82
+ - **Paper:** "Common Inpainted Objects In-N-Out of Context"
83
+ - **Dataset:** [COinCO/COinCO-dataset](https://huggingface.co/datasets/COinCO/COinCO-dataset)
84
+ - **Code:** [YangTianze009/COinCO](https://github.com/YangTianze009/COinCO)
85
+
86
+ ## Citation
87
+
88
+ ```bibtex
89
+ @article{yang2025coinco,
90
+ title={Common Inpainted Objects In-N-Out of Context},
91
+ author={Tianze Yang and Tyson Jordan and Ruitong Sun and Ninghao Liu and Jin Sun},
92
+ year={2025}
93
+ }
94
+ ```