gokaygokay commited on
Commit
770ad2a
1 Parent(s): 3ec40c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,3 +1,51 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - google/docci
5
+ - gokaygokay/random_instruct_docci
6
+ language:
7
+ - en
8
+ pipeline_tag: image-text-to-text
9
+ ---
10
+
11
+ Fine tuned version of [moondream2](https://huggingface.co/vikhyatk/moondream2) model using [gokaygokay/random_instruct_docci](https://huggingface.co/datasets/gokaygokay/random_instruct_docci) dataset. Which gives extremely detailed captions of the images.
12
+
13
+ ```
14
+ pip install transformers timm einops bitsandbytes accelerate flash-attn
15
+ ```
16
+
17
+ ```python
18
+ import torch
19
+ from transformers import AutoTokenizer, AutoModelForCausalLM
20
+ from PIL import Image
21
+
22
+ DEVICE = "cuda"
23
+ DTYPE = (
24
+ torch.float32 if DEVICE == "cpu" else torch.float16
25
+ ) # CPU doesn't support float16
26
+ revision = "3ec40c7b6b5d87bc0c51edee45e21f5f29b449d8"
27
+ tokenizer = AutoTokenizer.from_pretrained(
28
+ "gokaygokay/moondream2-docci-with-instruction",
29
+ trust_remote_code=True,
30
+ revision=revision
31
+ )
32
+ moondream = AutoModelForCausalLM.from_pretrained(
33
+ "gokaygokay/moondream2-docci-with-instruction",
34
+ trust_remote_code=True,
35
+ torch_dtype=DTYPE,
36
+ device_map={"": DEVICE},
37
+ attn_implementation="flash_attention_2",
38
+ revision=revision
39
+ )
40
+ moondream.eval()
41
+
42
+ image_path = "<your_image_path>"
43
+ image = Image.open(image_path).convert("RGB")
44
+ md_answer = moondream.answer_question(
45
+ moondream.encode_image(image),
46
+ "what is this picture about",
47
+ tokenizer=tokenizer,
48
+ )
49
+
50
+ print(md_answer)
51
+ ```