mlc-ai
/

Llama-2-7b-chat-hf-q4f16_1-MLC

Model card Files Files and versions Community

Llama-2-7b-chat-hf

#1

by zhuangwt - opened Feb 21

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

This PR is in draft mode

Files changed (2) hide show

README.md +0 -57
mlc-chat-config.json +2 -35

README.md DELETED Viewed

@@ -1,57 +0,0 @@
----
-library_name: mlc-llm
-base_model: meta-llama/Llama-2-7b-chat-hf
-tags:
-- mlc-llm
-- web-llm
----
-# Llama-2-7b-chat-hf-q4f16_1-MLC
-This is the [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model in MLC format `q4f16_1`.
-The model can be used for projects [MLC-LLM](https://github.com/mlc-ai/mlc-llm) and [WebLLM](https://github.com/mlc-ai/web-llm).
-## Example Usage
-Here are some examples of using this model in MLC LLM.
-Before running the examples, please install MLC LLM by following the [installation documentation](https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages).
-### Chat
-In command line, run
-```bash
-mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
-```
-### REST Server
-In command line, run
-```bash
-mlc_llm serve HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC
-```
-### Python API
-```python
-from mlc_llm import MLCEngine
-# Create engine
-model = "HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC"
-engine = MLCEngine(model)
-# Run chat completion in OpenAI API.
-for response in engine.chat.completions.create(
-    messages=[{"role": "user", "content": "What is the meaning of life?"}],
-    model=model,
-    stream=True,
-):
-    for choice in response.choices:
-        print(choice.delta.content, end="", flush=True)
-print("\n")
-engine.terminate()
-```
-## Documentation
-For more information on MLC LLM project, please visit our [documentation](https://llm.mlc.ai/docs/) and [GitHub repo](http://github.com/mlc-ai/mlc-llm).

mlc-chat-config.json CHANGED Viewed

@@ -13,8 +13,7 @@
     "prefill_chunk_size": 4096,
     "num_key_value_heads": 32,
     "head_dim": 128,
-    "tensor_parallel_shards": 1,
-    "max_batch_size": 80
   },
   "vocab_size": 32000,
   "context_window_size": 4096,
@@ -28,39 +27,7 @@
   "temperature": 0.6,
   "repetition_penalty": 1.0,
   "top_p": 0.9,
-  "conv_template": {
-    "name": "llama-2",
-    "system_template": "[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
-    "system_message": "You are a helpful, respectful and honest assistant.",
-    "system_prefix_token_ids": [
-      1
-    ],
-    "add_role_after_system_message": false,
-    "roles": {
-      "user": "[INST]",
-      "assistant": "[/INST]",
-      "tool": "[INST]"
-    },
-    "role_templates": {
-      "user": "{user_message}",
-      "assistant": "{assistant_message}",
-      "tool": "{tool_message}"
-    },
-    "messages": [],
-    "seps": [
-      " "
-    ],
-    "role_content_sep": " ",
-    "role_empty_sep": " ",
-    "stop_str": [
-      "[INST]"
-    ],
-    "stop_token_ids": [
-      2
-    ],
-    "function_string": "",
-    "use_function_calling": false
-  },
   "pad_token_id": 0,
   "bos_token_id": 1,
   "eos_token_id": 2,

     "prefill_chunk_size": 4096,
     "num_key_value_heads": 32,
     "head_dim": 128,
+    "tensor_parallel_shards": 1
   },
   "vocab_size": 32000,
   "context_window_size": 4096,
   "temperature": 0.6,
   "repetition_penalty": 1.0,
   "top_p": 0.9,
+  "conv_template": "llama-2",
   "pad_token_id": 0,
   "bos_token_id": 1,
   "eos_token_id": 2,