beomi
/

Yi-Ko-34B-Chat-Preview

@@ -1,201 +1,186 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+extra_gated_heading: Access beomi/Yi-Ko-34B on Hugging Face
+extra_gated_button_content: Submit
+extra_gated_fields:
+  I agree to share my name, email address and username: checkbox
+  I confirm that I understand this project is for research purposes only, and confirm that I agree to follow the LICENSE of this model: checkbox
+language:
+- en
+- ko
+pipeline_tag: text-generation
+inference: false
+tags:
+- pytorch
+- Yi-Ko
+- 01-ai
+- Yi
 library_name: transformers
+license: cc-by-nc-sa-4.0
 ---
+# **beomi/Yi-Ko-34B-Chat-Preview**
+This repo is naive ChatVector applied using these 3 models:
+```python
+ko_model_id = 'beomi/Yi-Ko-34B-dev'
+chat_model = '01-ai/Yi-34B-Chat'
+base_model = '01-ai/Yi-34B'
+```
+## Samples
+```python
+messages = [
+    {"role": "system", "content": "친절한 챗봇으로서 상대방의 요청에 최대한 자세하고 친절하게 답하자. 모든 대답은 한국어(Korean)으로 대답해줘."},
+    {"role": "user", "content": "한국의 제헌 헌법에 대해서 자세히 설명해주세요."},
+]
+---
+한국의 헌법은 「대한민국 헌법」이라고 부르며, 우리나라의 기본법이다. 이 헌법은 1948년 7월 12일에 제정되고 8월 15일에 공포되었으며, 이후 여러 차례 개정이 있어 왔다. 헌법의 목적은 한국의 기본적 가치와 통치 시스템, 국민의 권리와 의무를 규정하고 있다.
+한국 헌법의 주요 내용은 다음과 같다.
+1. 기본권 보장: 헌법은 국민의 기본권을 보장하고 있다. 표현의 자유, 신앙의 자유, 교육의 자유, 행동의 자유, 거주이전의 자유, 체포·수사·재판에 대한 권리를 보장하고 있다.
+2. 평등: 모든 국민은 법 앞에 평등하다고 규정하고 있다.
+3. 국민투표: 중요 정책이나 국가 안전에 관한 사항에 대해서는 국민투표를 거칠 수 있도록 하고 있다.
+4. 사법체계: 한국은 삼권분립 제도를 채택하고 있어, 행정부(정부), 입법부(국회), 사법부(법원)가 서로의 기능을 견제하며 작동한다.
+5. 대통령중심제: 한국은 대통령중심제로, 대통령은 국가의 수반이며 헌법상의 권한을 가진다.
+6. 민주주의 절차: 정당, 선거, 그리고 국민대표기관인 국회를 통해 민주주의가 운영된다.
+7. 통일정책: 헌법은 통일에 대한 국가의 노력을 규정하고 있으며, 북한의 주민을 우리의 동포로 규정하고 인도적 지원을 할 수 있도록 하고 있다.
+8. 사법권 독립: 사법부는 헌법과 법률에 따라 독립적으로 재판권을 행사한다.
+9. 경제정책: 경제질서를 기본적으로 자유시장경제에 의해 유지하되, 국가는 균형 있는 국민경제의 성장과 안정 및 공정한 분배를 위해 노력해야 한다는 점이 명시되어 있다.
+10. 환경보호: 인간다운 생활을 유지하고 자연과 역사의유산을 보호하고 관리해야 한다는 점도 헌법에 기재되어 있다.
+한국의 헌법은 이러한 기본 개념을 바탕으로 한국의 정치·경제·사회의 기본질서와 국민생활의 기본원리를 정하고 있으며, 국민들의 기본권을 보호하는 데 중요한 역할을 하고 있다.<|im_end|>
+제1항 대한민국은 민주공화국이다.
+제3항 대한민국의 주권은 국민에게 있고, 모든 권력의
+제5항 국민은 법 앞에 평등하다.
+제6항 모든 국민은 인간다운 생활을 할 권리를 가진다.<|im_end|>
+```
+```python
+messages = [
+    {"role": "system", "content": "친절한 챗봇으로서 상대방의 요청에 최대한 자세하고 친절하게 답하자. 모든 대답은 한국어(Korean)으로 대답해줘."},
+    {"role": "user", "content": "이순신 장군에 대해 설명해줘."},
+]
+---
+이순신 장군은 조선 시대에 활약한 유명한 임진왜란의 전략가요. 그가 활약한 시기는 16세기 말이었으며, 이때는 일본이 조선을 침략했던 임진왜란이 발발한 때입니다. 이순신 장군은 이때 일본의 해군으로 구성된 군대가 조선을 공격하자, 조선의 해군에서 활약하며 일본을 상대했습니다. 유명한 해전인 '한산도대첩'과 '명량대첩'에서 이순신 장군은 뛰어난 전략과 용기로 일본을 상대로 크게 승리하는 등 조선을 위기에서 구한 중요한 인물로 평가받고 있습니다. 그의 용기와 지략은 현재까지도 많은 사람들의 귀감으로 전해지고 있습니다<|im_end|>
+```
+```python
+messages = [
+    {"role": "system", "content": "친절한 챗봇으로서 상대방의 요청에 최대한 자세하고 친절하게 답하자. 모든 대답은 한국어(Korean)으로 대답해줘."},
+    {"role": "user", "content": "독도와 다케시마의 차이가 뭐야?"},
+]
+---
+"독도"와 "다케시마"는 동일한 섬을 가리키는 이름이지만 서로 다른 나라의 언어로 불리고 있습니다. "독도"는 대한민국 어로 "獨島"라고 부르며, "다케시마"는 일본 어로 "竹島"라고 부릅니다. 이 섬은 일본 돗토리현과 사카이미나토시에 인접한 일본에 가까운 곳에 위치해 있지만, 국제법적으로 대한민국의 영토로 인정받고 있습니다.
+대한민국은 이 섬을 독도로 칭하며 역사적으로부터Ownership를 주장하고 있습니다. 역사적 기록에 따르면 독도는 조선시대에 이미 한국의 영토로 인식되었고, 일본 제국주의 시절에 잠시 관할권에 속했다가 일본이 패전한 이후 대한민국 영토로 환원되었습니다.
+일본은 이 섬을 다케시마로 칭하며 영유권을 주장하고 있습니다. 하지만, 이러한 주장은 국제법상 인정되지 않으며, 한국 정부는 독도를 대한민국의 영토로서 절대적인 권리를 가지고 있음을 분명히 하고 있습니다.
+결론적으로, "독도"와 "다케시마"는 동일한 섬을 가리키지만, 서로 다른 국가의 역사적이고 정치적 상황에 따라 부르는 이름이 다르며, 국제법상 이 섬은 대한민국의 영토로 분류되고 있습니다.<|im_end|>
+```
+```python
+messages = [
+    {"role": "system", "content": "친절한 챗봇으로서 상대방의 요청에 최대한 자세하고 친절하게 답하자. ��든 대답은 한국어(Korean)으로 대답해줘."},
+    {"role": "user", "content": "세종대왕의 집현전 맥북 던짐 사건에 대해서 알고 있어?"},
+]
+---
+세종 대왕의 집현전 맥북 던짐 사건에 대해 들어본 적이 있습니다. 이 사건은 2017년 한 방송사의 드라마에서 등장하며 널리 알려졌는데요, 드라마에서는 집현전 학자들이 새로운 문물인 맥북을 보고 세종 대왕이 이를 던진 것으로 묘사되어 논란이 되었습니다.
+하지만 이 내용은 역사적 사실과 다르다는 것이 밝혀졌습니다. 집현전 학자들이 실제로 맥북과 같은 외계 문물을 목격했을 리 없고, 세종 대왕이 학자들의 노력을 폄하하거나 물건을 던지는 행동을 했을 것이라는 전제에 문제가 있었습니다. 오히려 세종 대왕은 숙명학을 설립하고 한글을 창제하는 등 학문과 문화에 대한 지원을 아까지 않았던 것으로 알려져 있습니다.
+드라마 속에 등장한 사건은 허구적인 설정일뿐, 실제 역사에서 일어난 일은 아닙니다. 이러한 오류를 반영하듯, 이후의 방송에서는 해당 장면이 수정되거나 삭제되었습니다.<|im_end|>
+```
+```python
+messages = [
+    {"role": "system", "content": "친절한 챗봇으로서 상대방의 요청에 최대한 자세하고 친절하게 답하자. 모든 대답은 한국어(Korean)으로 대답해줘."},
+    {"role": "user", "content": "나는 오늘 사과 5개를 갖고있어. 그리고 어제 나는 사과 2개를 먹었지. 내가 지금 가진 사과는 몇 개지?"},
+]
+---
+지난 번에는 사과 2개를 먹었고, 지금은 사과 5개를 갖고 있다고 했으니, 지금 가진 사과의 개수는 5개입니다. 그런데, 사과 2개를 먹어서 총 사과를 5개에서 2개 빼면, 남은 사과의 개수는 5 - 2 = 3개입니다. 그러니까, 지금 가진 사과는 3개입니다.<|im_end|>
+```
+# beomi/Yi-Ko-34B-Preview
+Yi-Ko series models serve as advanced iterations of 01-ai/Yi models,
+benefiting from an expanded vocabulary and the inclusion of Korean/English corpus in its further pretraining.
+Just like its predecessor, Yi-Ko series models operate within the broad range of generative text models that stretch from 6 billion to 34 billion parameters.
+This repository focuses on the **34B** pretrained version,
+which is tailored to fit the Hugging Face Transformers format.
+For access to the other models, feel free to consult the index provided below.
+## Model Details
+**Model Developers** Junbum Lee (Beomi)
+**Variations** Yi-Ko-34B will come in a range of parameter sizes — 6B and 34B — with Ko(Korean Only)/KoEn(Korean+English)/KoEnCo(Korean+English+Code) variations.
+**Input** Models input text only.
+**Output** Models generate text only.
+**Model Architecture**
+Yi-Ko series models are an auto-regressive language model that uses an optimized transformer architecture based on Llama-2*.
+<small>*Yi model architecture is based on Llama2, so it can be loaded via `LlamaForCausalLM` class on HF.</small>
+|Model Name|Training Data|Params|Context Length|GQA|Trained Tokens|LR|Batch Size(per step)|
+|---|---|---|---|---|---|---|---|
+|Yi-Ko-34B|*A mix of Korean + English online data*|34B|4k|O|TBD(under training)|5e<sup>-5</sup>|2048|
+**Vocab Expansion**
+| Model Name | Vocabulary Size | Description |
+| --- | --- | --- |
+| Original Yi-Series | 64000 | Sentencepiece BPE |
+| **Expanded Yi-Ko Series** | 78464 | Sentencepiece BPE. Added Korean vocab and merges |
+**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요.ㅎㅎ"**
+| Model | # of tokens | Tokens |
+| --- | --- | --- |
+| Original Yi-Series | 47 | `['<0xEC>', '<0x95>', '<0x88>', '<0xEB>', '<0x85>', '<0x95>', '하', '<0xEC>', '<0x84>', '<0xB8>', '<0xEC>', '<0x9A>', '<0x94>', ',', '▁', '<0xEC>', '<0x98>', '<0xA4>', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '<0xEC>', '<0x9A>', '<0x94>', '.', '<0xE3>', '<0x85>', '<0x8E>', '<0xE3>', '<0x85>', '<0x8E>']` |
+| **Expanded Yi-Ko Series** | 10 | `['▁안녕', '하세요', ',', '▁오늘은', '▁날', '씨가', '▁좋네요', '.', 'ㅎ', 'ㅎ']` |
+|<small>*Equal Korean vocab with Llama-2-Ko Series</small>||
+**Tokenizing "Llama 2: Open Foundation and Fine-Tuned Chat Models"**
+| Model | # of tokens | Tokens |
+| --- | --- | --- |
+| Original Yi-Series | 21 | `['The', '▁Y', 'i', '▁series', '▁models', '▁are', '▁large', '▁language', '▁models', '▁trained', '▁from', '▁scratch', '▁by', '▁developers', '▁at', '▁', '0', '1', '.', 'AI', '.']` |
+| **Expanded Yi-Ko Series** | 21 | `['▁The', '▁Y', 'i', '▁series', '▁models', '▁are', '▁large', '▁language', '▁models', '▁trained', '▁from', '▁scratch', '▁by', '▁developers', '▁at', '▁', '0', '1', '.', 'AI', '.']` |
+|<small>*Equal Korean vocab with Llama-2-Ko Series</small>| | <small>*Since **Expanded Yi-Ko Series** prepends `_` at the beginning of the text(to ensure same tokenization for Korean sentences), it shows negilible difference for the first token on English tokenization. </small>|
+# **Model Benchmark**
+## LM Eval Harness - Korean (polyglot branch)
+TBD
+## LICENSE
+CC-BY-NC-SA-4.0 under Yi license
+## Citation
+TBD
+## Acknowledgement
+The training is supported by [TPU Research Cloud](https://sites.research.google/trc/) program.