Llama-2-7b-chat-hf

#1
Files changed (2) hide show
  1. README.md +0 -57
  2. mlc-chat-config.json +2 -35
README.md DELETED
@@ -1,57 +0,0 @@
1
- ---
2
- library_name: mlc-llm
3
- base_model: meta-llama/Llama-2-7b-chat-hf
4
- tags:
5
- - mlc-llm
6
- - web-llm
7
- ---
8
-
9
- # Llama-2-7b-chat-hf-q4f16_1-MLC
10
-
11
- This is the [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model in MLC format `q4f16_1`.
12
- The model can be used for projects [MLC-LLM](https://github.com/mlc-ai/mlc-llm) and [WebLLM](https://github.com/mlc-ai/web-llm).
13
-
14
- ## Example Usage
15
-
16
- Here are some examples of using this model in MLC LLM.
17
- Before running the examples, please install MLC LLM by following the [installation documentation](https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages).
18
-
19
- ### Chat
20
-
21
- In command line, run
22
- ```bash
23
- mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
24
- ```
25
-
26
- ### REST Server
27
-
28
- In command line, run
29
- ```bash
30
- mlc_llm serve HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
31
- ```
32
-
33
- ### Python API
34
-
35
- ```python
36
- from mlc_llm import MLCEngine
37
-
38
- # Create engine
39
- model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
40
- engine = MLCEngine(model)
41
-
42
- # Run chat completion in OpenAI API.
43
- for response in engine.chat.completions.create(
44
- messages=[{"role": "user", "content": "What is the meaning of life?"}],
45
- model=model,
46
- stream=True,
47
- ):
48
- for choice in response.choices:
49
- print(choice.delta.content, end="", flush=True)
50
- print("\n")
51
-
52
- engine.terminate()
53
- ```
54
-
55
- ## Documentation
56
-
57
- For more information on MLC LLM project, please visit our [documentation](https://llm.mlc.ai/docs/) and [GitHub repo](http://github.com/mlc-ai/mlc-llm).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
mlc-chat-config.json CHANGED
@@ -13,8 +13,7 @@
13
  "prefill_chunk_size": 4096,
14
  "num_key_value_heads": 32,
15
  "head_dim": 128,
16
- "tensor_parallel_shards": 1,
17
- "max_batch_size": 80
18
  },
19
  "vocab_size": 32000,
20
  "context_window_size": 4096,
@@ -28,39 +27,7 @@
28
  "temperature": 0.6,
29
  "repetition_penalty": 1.0,
30
  "top_p": 0.9,
31
- "conv_template": {
32
- "name": "llama-2",
33
- "system_template": "[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
34
- "system_message": "You are a helpful, respectful and honest assistant.",
35
- "system_prefix_token_ids": [
36
- 1
37
- ],
38
- "add_role_after_system_message": false,
39
- "roles": {
40
- "user": "[INST]",
41
- "assistant": "[/INST]",
42
- "tool": "[INST]"
43
- },
44
- "role_templates": {
45
- "user": "{user_message}",
46
- "assistant": "{assistant_message}",
47
- "tool": "{tool_message}"
48
- },
49
- "messages": [],
50
- "seps": [
51
- " "
52
- ],
53
- "role_content_sep": " ",
54
- "role_empty_sep": " ",
55
- "stop_str": [
56
- "[INST]"
57
- ],
58
- "stop_token_ids": [
59
- 2
60
- ],
61
- "function_string": "",
62
- "use_function_calling": false
63
- },
64
  "pad_token_id": 0,
65
  "bos_token_id": 1,
66
  "eos_token_id": 2,
 
13
  "prefill_chunk_size": 4096,
14
  "num_key_value_heads": 32,
15
  "head_dim": 128,
16
+ "tensor_parallel_shards": 1
 
17
  },
18
  "vocab_size": 32000,
19
  "context_window_size": 4096,
 
27
  "temperature": 0.6,
28
  "repetition_penalty": 1.0,
29
  "top_p": 0.9,
30
+ "conv_template": "llama-2",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  "pad_token_id": 0,
32
  "bos_token_id": 1,
33
  "eos_token_id": 2,