Inoichan commited on
Commit
b3d51a1
1 Parent(s): 0de4fa8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -1
README.md CHANGED
@@ -1,3 +1,107 @@
1
  ---
2
- license: cc-by-nc-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
1
  ---
2
+ language:
3
+ - ja
4
+ tags:
5
+ - heron
6
+ - vision
7
+ - image-captioning
8
+ - VQA
9
+ pipeline_tag: image-to-text
10
+ license:
11
+ - cc-by-nc-4.0
12
+ inference: false
13
+ ---
14
+ # Heron GIT Japanese StableLM Base 7B
15
+
16
+
17
+ ## Model Details
18
+ Heron GIT Japanese StableLM Base 7B is a vision-language model that can converse about input images.<br>
19
+ This model was trained using [the heron library](https://github.com/turingmotors/heron). Please refer to the code for details.
20
+
21
+
22
+ ## Usage
23
+
24
+ Follow [the installation guide](https://github.com/turingmotors/heron/).
25
+
26
+ ```python
27
+ import torch
28
+ from heron.models.git_llm.git_japanese_stablelm_alpha import GitJapaneseStableLMAlphaForCausalLM
29
+ from transformers import AutoProcessor
30
+
31
+ device_id = 0
32
+ device = f"cuda:{device_id}"
33
+
34
+ MODEL_NAME = "turing-motors/heron-chat-git-ja-stablelm-base-7b-v1"
35
+
36
+ model = GitJapaneseStableLMAlphaForCausalLM.from_pretrained(
37
+ MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
38
+ )
39
+ model.eval()
40
+ model.to(device)
41
+
42
+ # prepare a processor
43
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
44
+
45
+ import requests
46
+ from PIL import Image
47
+
48
+ # prepare inputs
49
+ url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
50
+ image = Image.open(requests.get(url, stream=True).raw)
51
+
52
+ text = f"##human: この画像の面白い点は何ですか?\n##gpt: "
53
+
54
+ # do preprocessing
55
+ inputs = processor(
56
+ text=text,
57
+ images=image,
58
+ return_tensors="pt",
59
+ truncation=True,
60
+ )
61
+
62
+ inputs = {k: v.to(device) for k, v in inputs.items()}
63
+
64
+ # do inference
65
+ with torch.no_grad():
66
+ out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., no_repeat_ngram_size=2)
67
+
68
+ # print result
69
+ print(processor.tokenizer.batch_decode(out))
70
+ ```
71
+
72
+
73
+ ## Model Details
74
+ * **Developed by**: [Turing Inc.](https://www.turing-motors.com/)
75
+ * **Adaptor type**: [GIT](https://arxiv.org/abs/2205.14100)
76
+ * **Lamguage Model**: [Japanese StableLM Base Alpha](https://huggingface.co/stabilityai/japanese-stablelm-base-alpha-7b)
77
+ * **Language(s)**: Japanese
78
+
79
+ ### Training
80
+
81
+ 1. The GIT adaptor was trained with LLaVA-Pratrain-JA.
82
+ 2. The LLM and the adapter were fully fine-tuned with LLaVA-Instruct-620K-JA-v2.
83
+
84
+ ### Training Dataset
85
+
86
+ 1. LLaVA-Pratrain-JA
87
+ 2. LLaVA-Instruct-620K-JA-v2
88
+
89
+
90
+ ## Use and Limitations
91
+
92
+ ### Intended Use
93
+
94
+ This model is intended for use in chat-like applications and for research purposes.
95
+
96
+ ### Limitations
97
+
98
+ The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.
99
+
100
+ ## How to cite
101
+ ```bibtex
102
+ @misc{}
103
+ ```
104
+
105
  ---
106
+ license: cc-by-nc-4.0
107
+ ---