RhapsodyAI
/

minicpm-guidance

@@ -20,13 +20,37 @@ Introducing the GUIDance, Model that trained on [GUICourse](https://arxiv.org/pd
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f706dfe94ed998c463ed66/5d4rJFWjKn-c-iOXJKYXF.png)
 # News
-- 2024-07-09: 🚀 We released MiniCPM-GUIDance on huggingface.
 - 2024-03-09: 📦 We have open-sourced guicourse, [GUIAct](https://huggingface.co/datasets/yiye2023/GUIAct),[GUIChat](https://huggingface.co/datasets/yiye2023/GUIChat), [GUIEnv](https://huggingface.co/datasets/yiye2023/GUIEnv)
 # ToDo
-[ ] Update detailed task type prompt
 [ ] Batch inference
 # Example
 Pip install all dependencies:
 ```
@@ -49,6 +73,7 @@ or
 huggingface-cli download RhapsodyAI/minicpm-guidance
 ```
 ```python
 from transformers import AutoProcessor, AutoTokenizer, AutoModel
 from PIL import Image
@@ -68,11 +93,11 @@ example_messages = [
     [
         {
             "role": "user",
-            "content": Image.open("/home/jeeves/cuiunbo/minicpmv/examples/test.png").convert('RGB')
         },
         {
             "role": "user",
-            "content": "What this is?"
         }
     ]
 ]
@@ -80,21 +105,30 @@ example_messages = [
 input = processor(example_messages, padding_side="right")
 for key in input:
-    if isinstance(a[key], list):
-        for i in range(len(a[key])):
-            if isinstance(a[key][i], torch.Tensor):
-                input[key][i] = a[key][i].cuda()
     if isinstance(input[key], torch.Tensor):
         input[key] = input[key].cuda()
 with torch.no_grad():
-    outputs = model.generate(input, max_new_tokens=64, do_sample=False, num_beams=3)
     text = tokenizer.decode(outputs[0].cpu().tolist())
     text = tokenizer.batch_decode(outputs.cpu().tolist())
     for i in text:
         print('-'*20)
         print(i)
 ```
 # Citation

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f706dfe94ed998c463ed66/5d4rJFWjKn-c-iOXJKYXF.png)
 # News
+- 2024-07-09: 🚀 We released MiniCPM-GUIDance on [huggingface](https://huggingface.co/RhapsodyAI/minicpm-guidance).
 - 2024-03-09: 📦 We have open-sourced guicourse, [GUIAct](https://huggingface.co/datasets/yiye2023/GUIAct),[GUIChat](https://huggingface.co/datasets/yiye2023/GUIChat), [GUIEnv](https://huggingface.co/datasets/yiye2023/GUIEnv)
 # ToDo
 [ ] Batch inference
+# CookBook
+ - Prompt for Actions
+```
+Your Task
+{Task}
+Generate next actions to do this task.
+```
+```
+Actions History
+{hover, select_text, click, scroll}
+Information
+{Information about the web}
+Your Task
+{TASK}
+Generate next actions to do this task.
+```
+ - Prompt for Chat w or w/o Grounding
+```
+{Query}
+OR
+{Query} Grounding all objects in the image.
+```
 # Example
 Pip install all dependencies:
 ```
 huggingface-cli download RhapsodyAI/minicpm-guidance
 ```
+Example case image: ![case](https://cdn-uploads.huggingface.co/production/uploads/63f706dfe94ed998c463ed66/KJFeGDBj3SOgQqGAU7lU5.png)
 ```python
 from transformers import AutoProcessor, AutoTokenizer, AutoModel
 from PIL import Image
     [
         {
             "role": "user",
+            "content": Image.open("./case.png").convert('RGB')
         },
         {
             "role": "user",
+            "content": "How could I use this model from this web? Grounding all objects in the image."
         }
     ]
 ]
 input = processor(example_messages, padding_side="right")
 for key in input:
+    if isinstance(input[key], list):
+        for i in range(len(input[key])):
+            if isinstance(input[key][i], torch.Tensor):
+                input[key][i] = input[key][i].cuda()
     if isinstance(input[key], torch.Tensor):
         input[key] = input[key].cuda()
 with torch.no_grad():
+    outputs = model.generate(input, max_new_tokens=1024, do_sample=False, num_beams=3)
     text = tokenizer.decode(outputs[0].cpu().tolist())
     text = tokenizer.batch_decode(outputs.cpu().tolist())
     for i in text:
         print('-'*20)
         print(i)
+'''
+To use the model from this webpage, you would typically follow these steps:
+1. **Access the Model**: Navigate to the section of the webpage where the model is described. In this case, it's under the heading "Use this model"<box> 864 238 964 256</box> .
+2. **Download the Model**: There should be a link or button that allows you to download the model. Look for a button or link that says "Download" or something similar.
+3. **Install the Model**: Once you've downloaded the model, you'll need to install it on your system. This typically involves extracting the downloaded file and placing it in a directory where the model can be found.
+4. **Use the Model**: After installation, you can use the model in your application or project. This might involve importing the model into your programming environment and using it to perform specific tasks.
+The exact steps would depend on the specifics of the model and the environment in which you're using it, but these are the general steps you would follow to use the model from this webpage.</s>
+'''
 ```
 # Citation