Mit1208
/

Kosmos-2-PokemonCards-trl-merged

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

Kosmos-2-PokemonCards-trl-merged / README.md

Mit1208's picture

Update README.md

2c73754 verified 6 months ago

|

history blame contribute delete

1.94 kB

	---
	library_name: transformers
	license: cc-by-nc-4.0
	datasets:
	- TheFusion21/PokemonCards
	language:
	- en
	pipeline_tag: image-to-text
	---

	## Model Details

	### Model Description

	- Developed by: [https://huggingface.co/Mit1208]
	- Finetuned from model: [microsoft/kosmos-2-patch14-224]

	## Training Details
	https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb

	## Inference Details
	https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb

	### How to Use
	```python
	from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
	import torch
	from io import BytesIO
	import requests
	from PIL import Image

	processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")
	my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)

	# load image
	image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
	response = requests.get(image_url)
	# Read the image from the response content
	image = Image.open(BytesIO(response.content))

	prompt = "Pokemon name is"

	inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
	with torch.no_grad():
	# autoregressively generate completion
	generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
	# convert generated token IDs back to strings
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(generated_text.split("</image>")[-1].split(" and")[0] + ".")

	'''
	Output: Pokemon name is Wartortle.
	'''

	```

	### Limitation
	This model was fine-tuned using free colab version so only used 300 samples in training for 85 epochs.
	Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data and/or update tokenizer padding token to tokenizer eos token.