Xenova HF staff commited on
Commit
3005354
1 Parent(s): 8d4aae1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -12,4 +12,89 @@ license: apache-2.0
12
 
13
  https://huggingface.co/qnguyen3/nanoLLaVA with ONNX weights to be compatible with Transformers.js.
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
 
12
 
13
  https://huggingface.co/qnguyen3/nanoLLaVA with ONNX weights to be compatible with Transformers.js.
14
 
15
+ ## Usage (Transformers.js)
16
+
17
+ > [!IMPORTANT]
18
+ > NOTE: nanoLLaVA support is experimental and requires you to install Transformers.js [v3](https://github.com/xenova/transformers.js/tree/v3) from source.
19
+
20
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [GitHub](https://github.com/xenova/transformers.js/tree/v3) using:
21
+ ```bash
22
+ npm install xenova/transformers.js#v3
23
+ ```
24
+
25
+ **Example:**
26
+ ```js
27
+ import { AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration, RawImage } from '@xenova/transformers';
28
+
29
+ // Load tokenizer, processor and model
30
+ const model_id = 'Xenova/nanoLLaVA';
31
+ const tokenizer = await AutoTokenizer.from_pretrained(model_id);
32
+ const processor = await AutoProcessor.from_pretrained(model_id);
33
+ const model = await LlavaForConditionalGeneration.from_pretrained(model_id, {
34
+ dtype: {
35
+ embed_tokens: 'fp16', // or 'fp32' or 'q8'
36
+ vision_encoder: 'fp16', // or 'fp32' or 'q8'
37
+ decoder_model_merged: 'q4', // or 'q8'
38
+ },
39
+ // device: 'webgpu',
40
+ });
41
+
42
+ // Prepare text inputs
43
+ const prompt = 'What does the text say?';
44
+ const messages = [
45
+ { role: 'system', content: 'Answer the question.' },
46
+ { role: 'user', content: `<image>\n${prompt}` }
47
+ ]
48
+ const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
49
+ const text_inputs = tokenizer(text);
50
+
51
+ // Prepare vision inputs
52
+ const url = 'https://huggingface.co/qnguyen3/nanoLLaVA/resolve/main/example_1.png';
53
+ const image = await RawImage.fromURL(url);
54
+ const vision_inputs = await processor(image);
55
+
56
+ // Generate response
57
+ const { past_key_values, sequences } = await model.generate({
58
+ ...text_inputs,
59
+ ...vision_inputs,
60
+ do_sample: false,
61
+ max_new_tokens: 64,
62
+ return_dict_in_generate: true,
63
+ });
64
+
65
+ // Decode output
66
+ const answer = tokenizer.decode(
67
+ sequences.slice(0, [text_inputs.input_ids.dims[1], null]),
68
+ { skip_special_tokens: true },
69
+ );
70
+ console.log(answer);
71
+ // The text reads "Small but mighty".
72
+
73
+ const new_messages = [
74
+ ...messages,
75
+ { role: 'assistant', content: answer },
76
+ { role: 'user', content: 'How does the text correlate to the context of the image?' }
77
+ ]
78
+ const new_text = tokenizer.apply_chat_template(new_messages, { tokenize: false, add_generation_prompt: true });
79
+ const new_text_inputs = tokenizer(new_text);
80
+
81
+ // Generate another response
82
+ const output = await model.generate({
83
+ ...new_text_inputs,
84
+ past_key_values,
85
+ do_sample: false,
86
+ max_new_tokens: 256,
87
+ });
88
+ const new_answer = tokenizer.decode(
89
+ output.slice(0, [new_text_inputs.input_ids.dims[1], null]),
90
+ { skip_special_tokens: true },
91
+ );
92
+ console.log(new_answer);
93
+ // The context of the image is that of a playful and humorous illustration of a mouse holding a weightlifting bar. The text "Small but mighty" is a playful reference to the mouse's size and strength.
94
+ ```
95
+
96
+ We also released an online demo, which you can try yourself: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu
97
+
98
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/0T-aNjgXt6PGL3qIl8wBc.mp4"></video>
99
+
100
  Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).