beomi
/

llama-2-ko-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

beomi commited on Jul 24, 2023

Commit

34cd4c5

•

1 Parent(s): 9f1bbde

Update README.md

Files changed (1) hide show

README.md +17 -23

README.md CHANGED Viewed

@@ -41,30 +41,24 @@ Llama-2-Ko is an auto-regressive language model that uses an optimized transform
 **Vocab Expansion**
-- Original Llama-2: 32000 Sentencepiece BPE
-- **Expanded Llama-2-ko: 46336** Sentencepiece BPE
-  - New vocab and merges, trained with Korean Corpus
-- Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
-  - Use the same tokenization for English, but a shorter/merged tokenization for Korean.
-  - Tokenize "안녕하세요, 오늘은 날씨가 좋네요."
-    - Llama-2:
-      ```
-      ['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']
-      ```
-    - **Llama-2-Ko**:
-      ```
-      ['▁안녕', '하세요', ',', '▁오늘은', '▁날', '씨가', '▁좋네요']
-      ```
-  - Tokenize "Llama 2: Open Foundation and Fine-Tuned Chat Models"
-    - Llama-2:
-      ```
-      ['▁L', 'l', 'ama', '▁', '2', ':', '▁Open', '▁Foundation', '▁and', '▁Fine', '-', 'T', 'un', 'ed', '▁Ch', 'at', '▁Mod', 'els']
-      ```
-    - **Llama-2-Ko**:
-      ```
-      ['▁L', 'l', 'ama', '▁', '2', ':', '▁Open', '▁Foundation', '▁and', '▁Fine', '-', 'T', 'un', 'ed', '▁Ch', 'at', '▁Mod', 'els']
-      ```
 # **Model Benchmark**

 **Vocab Expansion**
+| Model Name | Vocabulary Size | Description |
+| --- | --- | --- |
+| Original Llama-2 | 32000 | Sentencepiece BPE |
+| **Expanded Llama-2-Ko** | 46336 | Sentencepiece BPE. Added Korean vocab and merges |
+**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."**
+| Model | Tokens |
+| --- | --- |
+| Llama-2 | `['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']` |
+| Llama-2-Ko | `['▁안녕', '하세요', ',', '▁오늘은', '▁날', '씨가', '▁좋네요']` |
+**Tokenizing "Llama 2: Open Foundation and Fine-Tuned Chat Models"**
+| Model | Tokens |
+| --- | --- |
+| Llama-2 | `['▁L', 'l', 'ama', '▁', '2', ':', '▁Open', '▁Foundation', '▁and', '▁Fine', '-', 'T', 'un', 'ed', '▁Ch', 'at', '▁Mod', 'els']` |
+| Llama-2-Ko | `['▁L', 'l', 'ama', '▁', '2', ':', '▁Open', '▁Foundation', '▁and', '▁Fine', '-', 'T', 'un', 'ed', '▁Ch', 'at', '▁Mod', 'els']` |
 # **Model Benchmark**