|
--- |
|
library_name: transformers |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- TheFusion21/PokemonCards |
|
language: |
|
- en |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** [https://huggingface.co/Mit1208] |
|
- **Finetuned from model:** [microsoft/kosmos-2-patch14-224] |
|
|
|
## Training Details |
|
https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb |
|
|
|
## Inference Details |
|
https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb |
|
|
|
### How to Use |
|
```python |
|
from transformers import AutoProcessor, Kosmos2ForConditionalGeneration |
|
import torch |
|
from io import BytesIO |
|
import requests |
|
from PIL import Image |
|
|
|
processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224") |
|
my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True) |
|
|
|
# load image |
|
image_url = "https://images.pokemontcg.io/sm9/24_hires.png" |
|
response = requests.get(image_url) |
|
# Read the image from the response content |
|
image = Image.open(BytesIO(response.content)) |
|
|
|
prompt = "Pokemon name is" |
|
|
|
inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0") |
|
with torch.no_grad(): |
|
# autoregressively generate completion |
|
generated_ids = my_model.generate(**inputs, max_new_tokens=30,) |
|
# convert generated token IDs back to strings |
|
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(generated_text.split("</image>")[-1].split(" and")[0] + ".") |
|
|
|
''' |
|
Output: Pokemon name is Wartortle. |
|
''' |
|
|
|
``` |
|
|
|
### Limitation |
|
This model was fine-tuned using free colab version so only used 300 samples in training for **85** epochs. |
|
Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data *and/or* update tokenizer padding token to tokenizer eos token. |