dangvansam commited on
Commit
fb016ec
1 Parent(s): 2a1677b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - vi
5
+ - zh
6
+ base_model:
7
+ - google/gemma-2-2b-it
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - vllm
11
+ - system-role
12
+ - langchain
13
+ license: gemma
14
+ ---
15
+
16
+ # gemma-2-2b-it-fix-system-role
17
+
18
+ Quantized version of [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) and update **`chat_template`** for support **`system`** role to handle cases:
19
+ - `Conversation roles must alternate user/assistant/user/assistant/...`
20
+ - `System role not supported`
21
+
22
+ ## Model Overview
23
+ - **Model Architecture:** Gemma 2
24
+ - **Input:** Text
25
+ - **Output:** Text
26
+ - **Release Date:** 04/12/2024
27
+ - **Version:** 1.0
28
+
29
+ ## Deployment
30
+
31
+ ### Use with vLLM
32
+
33
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
34
+
35
+ With CLI:
36
+ ```bash
37
+ vllm serve --model dangvansam/gemma-2-2b-it-fix-system-role
38
+ ```
39
+ ```bash
40
+ curl http://localhost:8000/v1/chat/completions \
41
+ -H "Content-Type: application/json" \
42
+ -d '{
43
+ "model": "dangvansam/gemma-2-2b-it-fix-system-role",
44
+ "messages": [
45
+ {"role": "system", "content": "You are a helpful assistant."},
46
+ {"role": "user", "content": "Who are you?"}
47
+ ]
48
+ }'
49
+ ```
50
+
51
+ With Python:
52
+ ```python
53
+ from vllm import LLM, SamplingParams
54
+ from transformers import AutoTokenizer
55
+
56
+ model_id = "dangvansam/gemma-2-2b-it-fix-system-role"
57
+
58
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are helpfull assistant."},
64
+ {"role": "user", "content": "Who are you?"}
65
+ ]
66
+
67
+ prompts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
68
+
69
+ llm = LLM(model=model_id)
70
+
71
+ outputs = llm.generate(prompts, sampling_params)
72
+
73
+ generated_text = outputs[0].outputs[0].text
74
+ print(generated_text)
75
+ ```
76
+
77
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.