Mit1208 commited on
Commit
c3b7198
1 Parent(s): 79dd5c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -32
README.md CHANGED
@@ -8,52 +8,52 @@ language:
8
  pipeline_tag: image-to-text
9
  ---
10
 
11
- # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
24
-
25
- - **Developed by:** [More Information Needed]
26
- - **Shared by [optional]:** [Mit]
27
  - **Finetuned from model [optional]:** [microsoft/kosmos-2-patch14-224]
28
 
29
- ### Model Sources [optional]
30
-
31
- <!-- Provide the basic links for the model. -->
32
-
33
- - **Repository:** [More Information Needed]
34
- - **Paper [optional]:** [More Information Needed]
35
- - **Demo [optional]:** [More Information Needed]
36
-
37
 
 
 
38
 
39
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
40
 
41
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
42
 
43
- ## How to Get Started with the Model
 
44
 
45
- Use the code below to get started with the model.
 
 
 
 
46
 
47
- [More Information Needed]
48
 
49
- ## Training Details
 
 
 
 
 
 
50
 
51
- ### Training Data
52
-
53
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
54
-
55
- [More Information Needed]
56
 
57
- ### Training Procedure
58
 
59
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
8
  pipeline_tag: image-to-text
9
  ---
10
 
 
 
 
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
+ - **Developed by:** [https://huggingface.co/Mit1208]
 
 
 
 
 
17
  - **Finetuned from model [optional]:** [microsoft/kosmos-2-patch14-224]
18
 
19
+ [More Information Needed]
 
 
 
 
 
 
 
20
 
21
+ ## Training Details
22
+ https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb
23
 
24
+ ## Inference Details
25
+ https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb
26
 
27
+ ### How to Use
28
+ ```python
29
+ # Load model directly
30
+ from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
31
 
32
+ # processor = AutoProcessor.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged")
33
+ my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)
34
 
35
+ # load image
36
+ image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
37
+ response = requests.get(image_url)
38
+ # Read the image from the response content
39
+ image = Image.open(BytesIO(response.content))
40
 
41
+ prompt = "Pokemon name is"
42
 
43
+ inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
44
+ with torch.no_grad():
45
+ # autoregressively generate completion
46
+ generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
47
+ # convert generated token IDs back to strings
48
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
49
+ print(generated_text.split("</image>")[-1].split(" and")[0] + ".")
50
 
51
+ '''
52
+ Output: Pokemon name is Wartortle.
53
+ '''
 
 
54
 
55
+ ```
56
 
57
+ ### Limitation
58
+ This model was fine-tuned using free colab version so only used 300 samples in training for **85** epochs.
59
+ Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data *and/or* update tokenizer padding token to tokenizer eos token.