echo840 commited on
Commit
e12c976
1 Parent(s): 6487a89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -12
README.md CHANGED
@@ -56,18 +56,6 @@ We also provide the source code and the model weight for the original demo, allo
56
  python demo.py -c echo840/Monkey
57
  ```
58
 
59
- In order to generate more detailed captions, we provide some prompt examples so that you can conduct more interesting explorations. You can modify these two variables in the `caption` function to implement different prompt inputs for the caption task, as shown below:
60
- ```
61
- query = "Generate the detailed caption in English: "
62
- chat_query = "Generate the detailed caption in English: "
63
- ```
64
- - Generate the detailed caption in English.
65
- - Explain the visual content of the image in great detail.
66
- - Analyze the image in a comprehensive and detailed manner.
67
- - Describe the image in as much detail as possible in English without duplicating it.
68
- - Describe the image in as much detail as possible in English, including as many elements from the image as possible, but without repetition.
69
-
70
-
71
  ## Dataset
72
 
73
  We have open-sourced the data generated by the multi-level description generation method. You can download it at [Detailed Caption](https://huggingface.co/datasets/echo840/Detailed_Caption).
@@ -122,6 +110,42 @@ We also offer Monkey's model definition and training code, which you can explore
122
  **ATTENTION:** Specify the path to your training data, which should be a json file consisting of a list of conversations.
123
 
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
 
126
  ## Citing Monkey
127
  If you wish to refer to the baseline results published here, please use the following BibTeX entries:
 
56
  python demo.py -c echo840/Monkey
57
  ```
58
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## Dataset
60
 
61
  We have open-sourced the data generated by the multi-level description generation method. You can download it at [Detailed Caption](https://huggingface.co/datasets/echo840/Detailed_Caption).
 
110
  **ATTENTION:** Specify the path to your training data, which should be a json file consisting of a list of conversations.
111
 
112
 
113
+ ## Inference
114
+
115
+ ```python
116
+ from transformers import AutoModelForCausalLM, AutoTokenizer
117
+ checkpoint = "echo840/Monkey"
118
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map='cuda', trust_remote_code=True).eval()
119
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
120
+ tokenizer.padding_side = 'left'
121
+ tokenizer.pad_token_id = tokenizer.eod_id
122
+ img_path = ""
123
+ question = ""
124
+ query = f'<img>{img_path}</img> {question} Answer: ' #VQA
125
+ # query = f'<img>{img_path}</img> Generate the detailed caption in English: ' #detailed caption
126
+
127
+ input_ids = tokenizer(query, return_tensors='pt', padding='longest')
128
+ attention_mask = input_ids.attention_mask
129
+ input_ids = input_ids.input_ids
130
+
131
+ pred = model.generate(
132
+ input_ids=input_ids.cuda(),
133
+ attention_mask=attention_mask.cuda(),
134
+ do_sample=False,
135
+ num_beams=1,
136
+ max_new_tokens=512,
137
+ min_new_tokens=1,
138
+ length_penalty=1,
139
+ num_return_sequences=1,
140
+ output_hidden_states=True,
141
+ use_cache=True,
142
+ pad_token_id=tokenizer.eod_id,
143
+ eos_token_id=tokenizer.eod_id,
144
+ )
145
+ response = tokenizer.decode(pred[0][input_ids.size(1):].cpu(), skip_special_tokens=True).strip()
146
+ print(response)
147
+ ```
148
+
149
 
150
  ## Citing Monkey
151
  If you wish to refer to the baseline results published here, please use the following BibTeX entries: