Cuiunbo commited on
Commit
80a2f49
β€’
1 Parent(s): 353340d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -1
README.md CHANGED
@@ -10,4 +10,100 @@ tags:
10
  - GUI
11
  - Agent
12
  - minicpm
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - GUI
11
  - Agent
12
  - minicpm
13
+ ---
14
+
15
+ # πŸ“±πŸ–₯️ GUIDance: Vision Langauge Models as Your Screen Guide
16
+
17
+ Introducing the GUIDance, Model that trained on GUICourse! πŸŽ‰
18
+ By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent.
19
+
20
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f706dfe94ed998c463ed66/5d4rJFWjKn-c-iOXJKYXF.png)
21
+
22
+ # News
23
+ - 2024-07-09: πŸš€ We released MiniCPM-GUIDance on huggingface.
24
+ - 2024-03-09: πŸ“¦ We have open-sourced guicourse, [GUIAct](https://huggingface.co/datasets/yiye2023/GUIAct),[GUIChat](https://huggingface.co/datasets/yiye2023/GUIChat), [GUIEnv](https://huggingface.co/datasets/yiye2023/GUIEnv)
25
+
26
+ # ToDo
27
+ [ ] Update detailed task type prompt
28
+ [ ] Batch inference
29
+
30
+ # Example
31
+ Pip install all dependencies:
32
+ ```
33
+ Pillow==10.1.0
34
+ timm==0.9.10
35
+ torch==2.1.2
36
+ torchvision==0.16.2
37
+ transformers==4.40.0
38
+ sentencepiece==0.1.99
39
+
40
+ flash_attn==2.4.2
41
+ ```
42
+ First you are suggested to git clone this huggingface repo or download repo with huggingface_cli.
43
+ ```
44
+ git lfs install
45
+ git clone https://huggingface.co/RhapsodyAI/minicpm-guidance
46
+ ```
47
+ or
48
+ ```
49
+ huggingface-cli download RhapsodyAI/minicpm-guidance
50
+
51
+ ```
52
+ ```python
53
+ from transformers import AutoProcessor, AutoTokenizer, AutoModel
54
+ from PIL import Image
55
+ import torch
56
+
57
+ MODEL_PATH = '/path/to/minicpm-guidance'
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
60
+ processor = AutoProcessor.from_pretrained(MODEL_PATH, trust_remote_code=True)
61
+
62
+ # model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, attn_implementation="eager", torch_dtype=torch.bfloat16)
63
+ model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, torch_dtype=torch.bfloat16)
64
+ model.cuda().eval()
65
+
66
+ # Currently only support batch=1
67
+ example_messages = [
68
+ [
69
+ {
70
+ "role": "user",
71
+ "content": Image.open("/home/jeeves/cuiunbo/minicpmv/examples/test.png").convert('RGB')
72
+ },
73
+ {
74
+ "role": "user",
75
+ "content": "What this is?"
76
+ }
77
+ ]
78
+ ]
79
+
80
+ input = processor(example_messages, padding_side="right")
81
+
82
+ for key in input:
83
+ if isinstance(a[key], list):
84
+ for i in range(len(a[key])):
85
+ if isinstance(a[key][i], torch.Tensor):
86
+ input[key][i] = a[key][i].cuda()
87
+ if isinstance(input[key], torch.Tensor):
88
+ input[key] = input[key].cuda()
89
+
90
+ with torch.no_grad():
91
+ outputs = model.generate(input, max_new_tokens=64, do_sample=False, num_beams=3)
92
+ text = tokenizer.decode(outputs[0].cpu().tolist())
93
+ text = tokenizer.batch_decode(outputs.cpu().tolist())
94
+
95
+ for i in text:
96
+ print('-'*20)
97
+ print(i)
98
+ ```
99
+
100
+ # Citation
101
+ If you find our work useful, please consider cite us:
102
+ ```
103
+ @misc{,
104
+ title={GUICourse: From General Vision Language Models to Versatile GUI Agents},
105
+ author={Wentong Chen and Junbo Cui and Jinyi Hu and Yujia Qin and Junjie Fang and Yue Zhao and Chongyi Wang and Jun Liu and Guirong Chen and Yupeng Huo and Yuan Yao and Yankai Lin and Zhiyuan Liu and Maosong Sun},
106
+ year={2024},
107
+ journal={arXiv preprint arXiv:2406.11317},
108
+ }
109
+ ```