File size: 6,237 Bytes
387a16e
 
 
 
 
 
 
 
 
 
 
 
 
ff5bb23
387a16e
e2d3a90
387a16e
 
 
ff5bb23
387a16e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8349285
387a16e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: mit
---
# Chuxin-1.6B-Base
<br>

## 介绍 (Introduction)

**Chuxin-1.6B-Base**是16亿参数规模的模型。Chuxin-1.6B完全基于开源数据构建,在经过超大规模数据训练后,Chuxin-1.6B在各类下游任务上具有非常强的竞争力。

**Chuxin-1.6B-1M**是基于Chuxin-1.6B-base模型在1M窗口下训练后的结果,大海捞针实验显示其具有非常强的上下文检索能力。


如果您想了解更多关于Chuxin-1.6B开源模型的细节,我们建议您参阅我们的[技术报告](https://xxxx)

**Chuxin-1.6B-Base** is a model with 1.6 billion parameters. Chuxin-1.6B is built entirely on open-source data. After being trained with large-scale data, Chuxin has very competitive capabilities in various downstream tasks.

**Chuxin-1.6B-1M** is the result of training the Chuxin-1.6B-base model with a 1M windows. Experiments such as searching for a needle in a haystack demonstrate its strong contextual retrieval abilities.

If you would like to learn more about the Chuxin-1.6B open-source model, we suggest you refer to our [technical report](https://xxxx).
<br>

## 快速使用(Quickstart)

您可以通过以下代码轻松调用:

You can easily call the model with the following code:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("chuxin-llm/Chuxin-1.6B-Base", device_map="auto", trust_remote_code=True, bf16=True).eval()
inputs = tokenizer('蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# 蒙古国的首都是乌兰巴托(Ulaanbaatar)\n冰岛的首都是雷克雅未克(Reykjavik)\n埃塞俄比亚的首都是亚的斯亚贝巴(Addis Ababa)...
```

## 评测效果(Evaluation)

### (常识推理和阅读理解)  Common Sense Reasoning and Reading Comprehension tasks

| Model         | size | ARC-c |ARC-e |Boolq |Copa |Hellaswag |OpenbookQA |Piqa |Sciq |Winogrande |Avg|
|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| Gemma     |    2B   |     48.98    |  78.45    |  69.51   |  84    |  71.73     |  39.8    |  78.02   |  94.3    |  65.51    |  70.03      |
| H2O-Danube†     |    1.8B  |     35.84   |  62.29   |  65.81   |  -   |  68.20     |  37.6    |  76.93   |  -    |  61.96   |  -     |
| Qwen1.5    |    1.8B   |      37.03     |  67.51    |  66.64    |  78    |  61.60     |  34.40    |  73.99     |  93     |  61.56     |  63.74    | 
| StableLM 2     |    1.6B    |      43.52   |69.44     | 75.5    | 84     | 70.3      | 39.6     | 76.82    | 96.1      | 64.17     | 68.82     |
| OpenLlama†   |    3B    |      34   |69| 68| -| 49| 40| 75| -| 62 |-|
| CT-LLM |  2B |  34.81   |  65.49   | 62.45    | 74    | 54.77      | 33.4     | 71.38   | 90.6     | 57.85     | 60.63     | 
| TinyLLama |  1.1B  |  34.81   | 67.47     | 63.15      | 74     | 60    | 34.6      | 73.12     | 88.8     | 58.88     | 61.64    |
| OLMo |  1B |  34.22   | 67.55     | 61.4     | 82     | 63.96      | 36.4     | 75.1    | 86.7     | 60.3      | 63.07     |
| Chuxin-1.6B-Base |  1.6B |  39.68  | 71.38     | 71.25      | 83    | 66.09     | 35.00      | 77.09     | 95     | 63.54      | 66.89     |

带有†的模型表示我们直接报告了相应论文中的分数,其他的则来自于我们重新测试的结果。

Models with † denote that we directly report the scores from the corresponding paper, and others are from our implementation.

### Open LLM LeaderBoard

| Model         | size | ARC-c  |HellaSwag|MMLU |TruthfulQA |Winogrande |GSM-8k |Avg |Avg wo GSM|
|:--------------|:----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| Gemma     |    2B   |     48.98   |  71.73     |  42.47    |  33   |  65.51    |10.08|  45.3    |  52.34      |
| H2O-Danube    |    1.8B    |      39. 68    | 69.75    | 25.97    | 33.63   | 64.17| 2.05    | 39.21     |46.64|
| Qwen1.5†     |    1.8B   |      37.88     |   61.42    |  46.71   |  39.43    |  60.3     |  33.59     |  46.55   | 49.15|
| StableLM 2     |    1.6B    |      43.52 |70.3  | 39.8     | 36.61     | 64.17   | 17.29      | 45.28     | 50.88    |
| OpenLlama†      |     3B    |    39.9  | 71.6    | 27.1    | 34.8     | 67   | 0.9      |40.3|48.08|
| CT-LLM |  2B |  34.81   | 54.77      | 37.81     | 39.81   | 57.85     | 7.35     | 38.73     | 45.01|
| TinyLLama |  1.1B  |  33.87  | 60.31   | 26.04     | 37.32   | 59.51    | 1.44     | 36.42    |43.41|
| OLMo |  1B |  34.22   | 63.96      | 35.44     | 35.53    | 62.67     | 9.86     | 41.81    |48.2|
| Chuxin-1.6B-Base |  1.6B |  39.68  | 66.09     | 41.07      | 37.65    | 63.54    | 12.66     | 43.45    |49.61|

带有†的模型表示我们直接报告 Open LLM排行榜的分数,其他的则来自于我们重新测试的结果。

Models with † denote that we directly report the scores from the Open LLM Leaderboard, and others are from our implementation.

### CMMLU, C-Eval and HumanEval

| Model         | size | C-Eval  |CMMLU|HUMANEVAL |
|:--------------|:----------:|:-----------:|:-----------:|:-----------:|
| Gemma     |    2B   |     31   |  31.06    |  9.51| 
| Qwen1.5    |    1.8B   |      59.38     |   57.08  |  23.17   | 
| StableLM 2     |    1.6B    |      29.27 |30.1 | 7.32     | 
| CT-LLM |  2B |  36.78  | 36.4      | 9.15     | 
| Chuxin-1.6B-Base |  1.6B |  39.31  | 37.11     | 9.76 |

## 引用 (Citation)

如果你觉得我们的工作对你有帮助,欢迎引用!

If you find our work helpful, feel free to give us a cite.

```
@article{chuxin,
  title={CHUXIN: 1.6B TECHNICAL REPORT},
  author={Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu},
  journal={arXiv preprint arXiv:xxx},
  year={2024}
}
```
<br>