p208p2002
/

llama-chinese-81M

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

llama-chinese-81M / README.md

Ubuntu

init

96980b7 10 months ago

|

raw history blame

No virus

304 Bytes

	# Baby LLaMA Chinese 81M
	一個小型中文預訓練語言模型。

	## Training Dataset
	- 中文維基百科(20230601)
	- 英文維基百科(20230601)

	## Tokenizer
	使用在中英文維基百科上訓練的 BPE Tokenizer，詞表大小為32k。
	> https://github.com/p208p2002/BPE-tokenizer-from-zh-wiki