Mit1208
/

Kosmos-2-PokemonCards-trl-merged

@@ -8,52 +8,52 @@ language:
 pipeline_tag: image-to-text
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Shared by [optional]:** [Mit]
 - **Finetuned from model [optional]:** [microsoft/kosmos-2-patch14-224]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

 pipeline_tag: image-to-text
 ---
 ## Model Details
 ### Model Description
+- **Developed by:** [https://huggingface.co/Mit1208]
 - **Finetuned from model [optional]:** [microsoft/kosmos-2-patch14-224]
+[More Information Needed]
+## Training Details
+https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb
+## Inference Details
+https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb
+### How to Use
+```python
+# Load model directly
+from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
+# processor = AutoProcessor.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged")
+my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)
+# load image
+image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
+response = requests.get(image_url)
+# Read the image from the response content
+image = Image.open(BytesIO(response.content))
+prompt = "Pokemon name is"
+inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
+with torch.no_grad():
+    # autoregressively generate completion
+    generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
+# convert generated token IDs back to strings
+generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(generated_text.split("</image>")[-1].split(" and")[0] + ".")
+'''
+Output: Pokemon name is Wartortle.
+'''
+```
+### Limitation
+This model was fine-tuned using free colab version so only used 300 samples in training for **85** epochs.
+Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data *and/or* update tokenizer padding token to tokenizer eos token.