p208p2002
/

llama-3-zhtw-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama 3 zhtw

在 Llama 3 上試驗中文 Continue Pretraining (CP)，共計訓練 800M tokens。

由於中文預訓練語料品質還有改進空間，CP 後表現未能超越原版 Llama 3，我們比較幾個開源社群訓練的中文 Llama 3 也有類似狀況。

在英文方面 LLaMA 3 zhtw 使用 FineWeb，使得 MMLU 表現高於其他中文CP模型，能力與原版 LLaMA 3 持平。

Benchmarks

Models		↑ TMMLU+ (ACC)	CMMLU (ACC)	MMLU (ACC)
		TC, Knowledge	CN, Knowledge	EN, Knowledge
		5 shot	5 shot	5 shot
Yi-6B	6B	49.63	75.53	65.35
Qwen-7B	7B	42.84	73.1	61.00
Meta-Llama-3-8B	8B	41.97	50.8	65.17
p208p2002/llama-3-zhtw-8B	8B	41.84	50.6	65.31
Breeze-7B-Base-v0_1	7B	40.35	44.05	61.63
hfl/llama-3-chinese-8b	8B	39.64	50.9	61.1

Recipe

Datasets

Dataset	Lang	Weight
FineWeb	en	0.35
Wudao	zh-cn	0.1
C4Tw	zh-tw	0.1
WikiZhTw	zh-tw	0.15
NdltdT10	zh-tw	0.1
GitHubMarkDown	code	0.1
GitHubPython	code	0.1

Hyper Parameters

Learning Rate: 1e-7
Global Batch Size: 60
Sequence Length: 8192

Downloads last month: 211

Safetensors

Model size

8.03B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for p208p2002/llama-3-zhtw-8B

Quantizations

1 model

Datasets used to train p208p2002/llama-3-zhtw-8B

Spaces using p208p2002/llama-3-zhtw-8B 6

Collection including p208p2002/llama-3-zhtw-8B

LLaMA-zhtw

6 items • Updated Jun 11, 2024