YoLo2000 commited on
Commit
95a4878
1 Parent(s): d3f0deb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -6
README.md CHANGED
@@ -7,15 +7,31 @@ language:
7
 
8
  <!-- Provide a longer summary of what this model is. -->
9
 
10
- ### TiLamb-7B(Tibetan Large Language Model Base)是藏文大语言模型的基座模型,使用26.43GB藏文语料,基于LLaMA2-7B进行LoRA方法的增量预训练。TiLamb-7B扩展了LLaMA2词表,在原有词表大小32,000的基础上扩充藏文词表至61,221,并对embedding和lm_head进行均值扩充的初始化。
11
 
12
- #### 需要注意的是,TiLamb-7B为未经微调的base模型,无对话能力,需SFT进行藏文对话和藏文NLP下游任务(已验证过的有:藏文新闻分类、藏文实体关系分类、藏文机器阅读理解、藏文分词、藏文摘要、藏文问题回答、藏文问题生成)的适配。
13
 
14
- #### 说明:本项目基于由Meta发布的LLaMA2-7B模型进行开发,使用过程中请严格遵守LLaMA2-7B的开源许可协议。如果涉及使用第三方代码,请务必遵从相关的开源许可协议。模型生成的内容可能会因为计算方法、随机因素等影响其准确性,因此,本项目不对模型输出的准确性提供任何保证,也不会对任何因使用相关资源和输出结果产生的损失承担责任。如果将本项目的相关模型用于商业用途,开发者应遵守当地的法律法规,确保模型输出内容的合规性,本项目不对任何由此衍生的产品或服务承担责任。
 
 
15
 
16
- ### TiLamb-7B (Tibetan Large Language Model Base) is the base model for the Tibetan large language model, utilizing 26.43GB of Tibetan textual corpus. It is incrementally pre-trained on the LLaMA2-7B model using the LoRA method. TiLamb-7B has expanded the LLaMA2 vocabulary by incorporating a Tibetan lexicon, increasing the original vocabulary size from 32,000 to 61,221. It also initializes the embedding and lm_head with mean expansion.
 
 
 
 
17
 
18
- #### It is important to note that TiLamb-7B, as an unrefined base model, lacks conversational capabilities. It requires Supervised Fine-Tuning (SFT) for adaptation to Tibetan conversational applications and other NLP downstream tasks in Tibetan (which have been verified to include: Tibetan news classification, Tibetan entity relation classification, Tibetan machine reading comprehension, Tibetan word segmentation, Tibetan summarization, Tibetan question answering, and Tibetan question generation).
19
 
20
- #### Disclaimer: This project is developed based on the LLaMA2-7B model released by Meta, and users must strictly adhere to the open-source license agreement of LLaMA2-7B during use. If third-party code is used, it is imperative to comply with the relevant open-source licenses. The accuracy of the content generated by the model may be affected by computational methods and random factors; therefore, this project does not guarantee the accuracy of the model's outputs and will not be responsible for any loss incurred from using the related resources and outputs. If this project's models are used for commercial purposes, developers must comply with local laws and regulations to ensure the compliance of the model outputs. The project will not be liable for any products or services derived from this use.
 
 
 
 
 
 
 
 
 
 
21
 
 
7
 
8
  <!-- Provide a longer summary of what this model is. -->
9
 
10
+ # TiLamb-7B(Tibetan Large Language Model Base
11
 
12
+ **TiLamb-7B** 是一款专注于藏文的大型语言模型基座模型,它使用了 26.43GB 的藏文语料库进行开发,并基于 LLaMA2-7B 模型,通过 LoRA 方法进行了增量预训练。该模型在 LLaMA2 的基础上扩展了词表,从原有的词表大小 32,000 扩充藏文词汇至 61,221 ,并对 embedding 和 lm_head 进行了均值扩充初始化。更多信息请访问 [TiLamb-7B GitHub 主页](https://github.com/NLP-Learning/TiLamb)。
13
 
14
+ **重要说明**:
15
+ - TiLamb-7B 是一个未经微调的基础模型,**不具备对话能力**。
16
+ - 要进行藏文对话和藏文 NLP 下游任务的适配(已验证的任务包括藏文新闻分类、藏文实体关系分类、藏文机器阅读理解、藏文分词、藏文摘要、藏文问题回答和藏文问题生成),建议使用 [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) 框架进行微调。
17
 
18
+ **使用须知**:
19
+ - 本项目基于 Meta 发布的 LLaMA2-7B 模型开发,使用时请严格遵守 LLaMA2-7B 的开源许可协议。
20
+ - 如果涉及使用第三方代码,请务必遵从相关的开源许可协议。
21
+ - 模型生成的内容准确性可能受到计算方法、随机因素等的影响,因此,我们不对模型输出的准确性提供任何保证,也不会对使用相关资源和输出结果产生的任何损失承担责任。
22
+ - 如果将相关模型用于商业用途,开发者应遵守当地法律法规,确保模型输出内容的合规性。本项目不对任何由此衍生的产品或服务承担责任。
23
 
24
+ # TiLamb-7B (Tibetan Large Language Model Base)
25
 
26
+ **TiLamb-7B** is a large-scale language model base focused on the Tibetan language, developed using a 26.43GB Tibetan corpus, and incrementally pre-trained through the LoRA method based on the LLaMA2-7B model. This model expands the vocabulary from the original size of 32,000 to 61,221 Tibetan entries, and initializes the embedding and lm_head with mean expansion. For more information, please visit the [TiLamb-7B GitHub page](https://github.com/NLP-Learning/TiLamb).
27
+
28
+ **Important Notes**:
29
+ - TiLamb-7B is an unrefined base model, **lacking conversational capabilities**.
30
+ - For adaptation to Tibetan dialogue and Tibetan NLP downstream tasks (verified tasks include Tibetan news classification, Tibetan entity relation classification, Tibetan machine reading comprehension, Tibetan word segmentation, Tibetan summarization, Tibetan question answering, and Tibetan question generation), it is recommended to use the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory/tree/main) framework for fine-tuning.
31
+
32
+ **Usage Notice**:
33
+ - This project is developed based on the LLaMA2-7B model released by Meta, and its use must strictly adhere to the open-source license agreement of LLaMA2-7B.
34
+ - If third-party code is involved, it is essential to comply with the relevant open-source license agreements.
35
+ - The accuracy of the content generated by the model may be affected by computational methods, random factors, etc., therefore, we do not provide any guarantee for the accuracy of the model outputs, nor will we bear any responsibility for losses arising from the use of related resources and results.
36
+ - If the related models are used for commercial purposes, developers must comply with local laws and regulations to ensure the compliance of the model output content. This project will not bear any responsibility for any products or services derived from it.
37