Text Generation
Transformers
Safetensors
chatglm
feature-extraction
custom_code
JosephusCheung commited on
Commit
7f94f50
·
verified ·
1 Parent(s): 26900b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -41,7 +41,7 @@ co2_eq_emissions:
41
 
42
  [GGUF (Text-Only, not recommended)](https://huggingface.co/CausalLM/miniG/tree/gguf): There is a significant degradation, even with the F16.
43
 
44
- **Hint:** How can I check if my inference parameters and quantized inference are performing well? You can try having the model recite "The Gift of the Magi" by O. Henry (which is a public domain text). You should expect it to recite the entire text accurately, including the formatting.
45
 
46
  A model trained on a synthesis dataset of over **120 million** entries, this dataset having been generated through the application of state-of-the-art language models utilizing large context windows, alongside methodologies akin to retrieval-augmented generation and knowledge graph integration, where the data synthesis is conducted within clusters derived from a curated pretraining corpus of 20 billion tokens, with subsequent validation performed by the model itself.
47
 
@@ -75,7 +75,7 @@ Despite the absence of thorough alignment with human preferences, the model is u
75
 
76
  [GGUF (纯文本,不推荐)](https://huggingface.co/CausalLM/miniG/tree/gguf): 即使使用F16,性能也有显著下降。
77
 
78
- ***提示:** 如何检查我的推理参数和量化推理是否表现良好?你可以尝试让模型背诵朱自清的《背影》(这是一个公共领域的文本)。你应该期待它能够准确地背诵整个文本,包括格式和换行。
79
 
80
  一个在超过**1.2亿**条数据合成数据集上训练的模型,这些数据集是通过应用具有大上下文窗口的最先进语言模型生成的,并结合了类似于检索增强生成和知识图谱集成的方法,数据合成是在一个由200亿个标记组成的预训练语料库中提取的聚类内进行的,随后由模型本身进行验证。
81
 
 
41
 
42
  [GGUF (Text-Only, not recommended)](https://huggingface.co/CausalLM/miniG/tree/gguf): There is a significant degradation, even with the F16.
43
 
44
+ > **Hint:** How can I check if my inference parameters and quantized inference are performing well? You can try having the model recite "The Gift of the Magi" by O. Henry (which is a public domain text). You should expect it to recite the entire text accurately, including the formatting.
45
 
46
  A model trained on a synthesis dataset of over **120 million** entries, this dataset having been generated through the application of state-of-the-art language models utilizing large context windows, alongside methodologies akin to retrieval-augmented generation and knowledge graph integration, where the data synthesis is conducted within clusters derived from a curated pretraining corpus of 20 billion tokens, with subsequent validation performed by the model itself.
47
 
 
75
 
76
  [GGUF (纯文本,不推荐)](https://huggingface.co/CausalLM/miniG/tree/gguf): 即使使用F16,性能也有显著下降。
77
 
78
+ > **提示:** 如何检查我的推理参数和量化推理是否表现良好?你可以尝试让模型背诵朱自清的《背影》(这是一个公共领域的文本)。你应该期待它能够准确地背诵整个文本,包括格式和换行。
79
 
80
  一个在超过**1.2亿**条数据合成数据集上训练的模型,这些数据集是通过应用具有大上下文窗口的最先进语言模型生成的,并结合了类似于检索增强生成和知识图谱集成的方法,数据合成是在一个由200亿个标记组成的预训练语料库中提取的聚类内进行的,随后由模型本身进行验证。
81