File size: 1,936 Bytes
ef311b4
 
37da91d
 
 
 
 
 
ef311b4
 
 
 
 
 
c3b7198
eb63d85
ef311b4
c3b7198
 
ef311b4
c3b7198
 
ef311b4
c3b7198
 
 
2c73754
 
 
 
ef311b4
2c73754
c3b7198
ef311b4
c3b7198
 
 
 
 
ef311b4
c3b7198
ef311b4
c3b7198
 
 
 
 
 
 
ef311b4
c3b7198
 
 
ef311b4
c3b7198
ef311b4
c3b7198
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
library_name: transformers
license: cc-by-nc-4.0
datasets:
- TheFusion21/PokemonCards
language:
- en
pipeline_tag: image-to-text
---

## Model Details

### Model Description

- **Developed by:** [https://huggingface.co/Mit1208]
- **Finetuned from model:** [microsoft/kosmos-2-patch14-224]

## Training Details
https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb

## Inference Details
https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb

### How to Use
```python
from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
import torch
from io import BytesIO
import requests
from PIL import Image

processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")
my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)

# load image
image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
response = requests.get(image_url)
# Read the image from the response content
image = Image.open(BytesIO(response.content))

prompt = "Pokemon name is"

inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
with torch.no_grad():
    # autoregressively generate completion
    generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
# convert generated token IDs back to strings
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text.split("</image>")[-1].split(" and")[0] + ".")

'''
Output: Pokemon name is Wartortle.
'''

```

### Limitation
This model was fine-tuned using free colab version so only used 300 samples in training for **85** epochs. 
Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data *and/or* update tokenizer padding token to tokenizer eos token.