Chuxin-1.6B-1M

介绍 (Introduction)

Chuxin-1.6B-Base是16亿参数规模的模型。Chuxin-1.6B完全基于开源数据构建，在经过超大规模数据训练后，Chuxin-1.6B在各类下游任务上具有非常的竞争力。

Chuxin-1.6B-1M是基于Chuxin-1.6B模型在1M窗口下训练后的结果，大海捞针实验显示其具有非常强的上下文检索能力。

如果您想了解更多关于Chuxin-1.6B开源模型的细节，我们建议您参阅我们的技术报告

Chuxin-1.6B-Base is a model with 1.6 billion parameters. Chuxin-1.6B is built entirely on open-source data. After being trained with large-scale data, Chuxin has very competitive capabilities in various downstream tasks.

Chuxin-1.6B-1M is the result of training the Chuxin-1.6B model with a 1M windows. Experiments such as searching for a needle in a haystack demonstrate its strong contextual retrieval abilities.

If you would like to learn more about the Chuxin-1.6B open-source model, we suggest you refer to our technical report.

快速使用（Quickstart）

您可以通过以下代码轻松调用：

You can easily call the model with the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("chuxin-llm/Chuxin-1.6B-1M", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chuxin-llm/Chuxin-1.6B-1M", device_map="auto", trust_remote_code=True, bf16=True).eval()
inputs = tokenizer('蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_new_tokens=15, do_sample=False)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古国的首都是乌兰巴托（Ulaanbaatar）\n冰岛的首都是雷克雅未克（Reykjavik）\n埃塞俄比亚的首都是亚的斯亚贝巴（Addis Ababa）...

评测效果（Evaluation）

常识推理和阅读理解 (Common Sense Reasoning and Reading Comprehension tasks)

Model	size	ARC-c	ARC-e	Boolq	Copa	Hellaswag	OpenbookQA	Piqa	Sciq	Winogrande	Avg
chuxin-1.6B-base	1.6B	39.68	71.38	71.25	83	66.09	35.00	77.09	95	63.54	66.89
chuxin-1.6B-32k	1.6B	39.16	70.66	67.71	81	65.69	35.8	76.88	94.2	62.51	65.96
chuxin-1.6B-64k	1.6B	38.48	70.24	67.52	82	65.6	35.2	76.61	94.3	63.3	65.92
chuxin-1.6B-128k	1.6B	39.08	69.4	67.71	80	65.74	35.4	76.39	94.1	63.3	65.68
chuxin-1.6B-256k	1.6B	40.19	70.75	69.3	78	65.85	35.8	76.88	93.5	63.85	66.01
chuxin-1.6B-512k	1.6B	40.61	71.21	67.77	78	64.82	34.8	76.88	93.6	61.88	65.51
chuxin-1.6B-1M	1.6B	41.13	72.26	62.08	75	64.59	34.8	76.71	93.33	62.43	64.7

Open LLM LeaderBoard

Model	size	ARC-c	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM-8k	Avg	Avg wo GSM
chuxin-1.6B-base	1.6B	39.68	66.09	41.07	37.65	63.54	12.66	43.45	49.61
chuxin-1.6B-32k	1.6B	39.16	65.69	38.63	35.66	62.51	11.6	42.21	48.33
chuxin-1.6B-64k	1.6B	38.48	65.6	38.43	35.07	63.3	11.9	42.13	48.18
chuxin-1.6B-128k	1.6B	39.08	65.74	37.65	34.89	63.3	11.07	41.96	48.13
chuxin-1.6B-256k	1.6B	40.19	65.85	37.16	35.2	63.85	10.16	42.07	48.45
chuxin-1.6B-512k	1.6B	40.61	64.82	36.66	33.66	61.88	8.11	40.96	47.53
Chuxin-1.6B-1M	1.6B	41.13	64.59	35.76	34.67	62.43	6.82	40.9	47.72

大海捞针 (needle in a haystack)

引用 (Citation)

如果你觉得我们的工作对你有帮助，欢迎引用！

If you find our work helpful, feel free to give us a cite.

@article{chuxin,
  title={CHUXIN: 1.6B TECHNICAL REPORT},
  author={Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu},
  journal={arXiv preprint arXiv:2405.04828},
  year={2024}
}