U4R
/

Safetensors
chimera
custom_code
BoZhang commited on
Commit
4ec1180
1 Parent(s): 53c8d40

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -0
README.md CHANGED
@@ -1,3 +1,120 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ <div align="center">
6
+ <h1>Chimera: Improving Generalist Model with<br>Domain-Specific Experts</h1>
7
+
8
+
9
+ [[ Paper ]](https://huggingface.co/papers/2412.05983) [[ Website ]](https://unimodal4reasoning.github.io/chimera_page/) [[ Dataset🤗 ]]() [[ Github ]](https://github.com/UniModal4Reasoning/Chimera)
10
+
11
+ </div>
12
+
13
+ ## 🛠️ Installation
14
+
15
+ - Clone this repository:
16
+
17
+ ```bash
18
+ git clone https://github.com/UniModal4Reasoning/Chimera.git
19
+ ```
20
+
21
+ - Create a conda virtual environment and activate it:
22
+
23
+ ```bash
24
+ conda create -n chimera python=3.9 -y
25
+ conda activate chimera
26
+ ```
27
+
28
+ - Install dependencies using `requirements.txt`:
29
+
30
+ ```bash
31
+ pip install -r requirements.txt
32
+ ```
33
+
34
+ - Install other requirements:
35
+
36
+ ```bash
37
+ cd chimera/
38
+ pip install --upgrade pip # enable PEP 660 support
39
+ pip install -e .
40
+ ```
41
+
42
+ ### Additional Instructions
43
+
44
+ - Install `flash-attn==2.3.4`:
45
+
46
+ ```bash
47
+ pip install flash-attn==2.3.4 --no-build-isolation
48
+ ```
49
+
50
+ Alternatively you can compile from source:
51
+
52
+ ```bash
53
+ git clone https://github.com/Dao-AILab/flash-attention.git
54
+ cd flash-attention
55
+ git checkout v2.3.4
56
+ python setup.py install
57
+ ```
58
+
59
+
60
+ ## Quick Start
61
+ ### Multi-modal reasoning
62
+ ```python
63
+ from chimera.chimera_infer import Chimera4easyuse
64
+ import torch
65
+ from PIL import Image
66
+
67
+ # prepare model
68
+ # model_path = "U4R/Chimera-Reasoner-2B"
69
+ # model_path = "U4R/Chimera-Reasoner-4B"
70
+ model_path = "U4R/Chimera-Reasoner-8B"
71
+ generation_config = dict(max_new_tokens=256, do_sample=False)
72
+ model = Chimera4easyuse(model_path, dtype = torch.bfloat16, generation_config= generation_config)
73
+
74
+ # prepare input
75
+ image_path = "path/to/image"
76
+ user_prompt = "<image>\nuser prompt"
77
+ input_image = Image.open(image_path).convert('RGB')
78
+ response = model.get_response(user_prompt, [input_image])
79
+ print(response)
80
+ ```
81
+
82
+ ### Visual content extraction
83
+ ```python
84
+ from chimera.chimera_infer import Chimera4easyuse
85
+ import torch
86
+ from PIL import Image
87
+
88
+ # prepare model
89
+ model_path = "U4R/Chimera-Extractor-1B"
90
+ generation_config = dict(max_new_tokens=4096, do_sample=False, no_repeat_ngram_size = 20)
91
+ model = Chimera4easyuse(model_path, dtype = torch.float16, generation_config= generation_config)
92
+
93
+ # prepare input
94
+ image_path = "path/to/document"
95
+ user_prompt = "<image>\nAs a smart PDF to Markdown conversion tool, please convert the content of the provided PDF into Markdown format."
96
+ input_image = Image.open(image_path).convert('RGB')
97
+ response = model.get_response(user_prompt, [input_image])
98
+ print(response)
99
+ ```
100
+
101
+
102
+ ## License
103
+ Chimera is released under the [Apache License 2.0](LICENSE)
104
+
105
+ ## Citation
106
+ If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)
107
+ ```bibtex
108
+ @misc{peng2024chimeraimprovinggeneralistmodel,
109
+ title={Chimera: Improving Generalist Model with Domain-Specific Experts},
110
+ author={Tianshuo Peng and Mingsheng Li and Hongbin Zhou and Renqiu Xia and Renrui Zhang and Lei Bai and Song Mao and Bin Wang and Conghui He and Aojun Zhou and Botian Shi and Tao Chen and Bo Zhang and Xiangyu Yue},
111
+ year={2024},
112
+ eprint={2412.05983},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CV},
115
+ url={https://arxiv.org/abs/2412.05983},
116
+ }
117
+ ```
118
+
119
+ ## Contact Us
120
+ If you encounter any issues or have questions, please feel free to contact us via bo.zhangzx@gmail.com or zhangbo@pjlab.org.cn.