Xiangxin-2XL-Chat-1048k

---
license: llama3
language:
- zh
- en
pipeline_tag: text-generation
---
<div align="center">

<picture> 
  <img src="https://github.com/xiangxinai/XiangxinLM/blob/main/assets/logo.png?raw=true" width="150px">
</picture>

</div>
<div align="center">
<h1>
  Xiangxin-2XL-Chat-1048k
</h1>
</div>

我们提供私有化模型训练服务，如果您需要训练行业模型、领域模型或者私有模型，请联系我们: customer@xiangxinai.cn

We offer customized model training services. If you need to train industry-specific models, domain-specific models, or private models, please contact us at: customer@xiangxinai.cn.


# <span id="Introduction">模型介绍/Introduction</span>

Xiangxin-2XL-Chat-1048k是[象信AI](https://www.xiangxinai.cn)基于Meta Llama-3-70B-Instruct模型和[Gradient AI的扩充上下文的工作](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k)，利用自行研发的中文价值观对齐数据集进行ORPO训练而形成的Chat模型。该模型具备更强的中文能力和中文价值观，其上下文长度达到100万字。在模型性能方面，该模型在ARC、HellaSwag、MMLU、TruthfulQA_mc2、Winogrande、GSM8K_flex、CMMLU、CEVAL-VALID等八项测评中，取得了平均分70.22分的成绩，超过了Gradientai-Llama-3-70B-Instruct-Gradient-1048k。我们的训练数据并不包含任何测评数据集。

Xiangxin-2XL-Chat-1048k is a Chat model developed by [Xiangxin AI](https://www.xiangxinai.cn), based on the Meta Llama-3-70B-Instruct model and [expanded context from Gradient AI](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k). It was trained using a proprietary Chinese value-aligned dataset through ORPO training, resulting in enhanced Chinese proficiency and alignment with Chinese values. The model has a context length of up to 1 million words. In terms of performance, it surpassed the Gradientai-Llama-3-70B-Instruct-Gradient-1048k model with an average score of 70.22 across eight evaluations including ARC, HellaSwag, MMLU, TruthfulQA_mc2, Winogrande, GSM8K_flex, CMMLU, and C-EVAL. It's worth noting that our training data did not include any evaluation datasets.
<div align="center">
  
Model | Context Length | Pre-trained Tokens
| :------------: | :------------: | :------------: |
| Xiangxin-2XL-Chat-1048k | 1048k | 15T

</div>


# <span id="Benchmark">Benchmark 结果/Benchmark Evaluation</span>

|                         | **Average** | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** | **Winogrande** | **GSM8K** | **CMMLU** | **CEVAL** |
|:-----------------------:|:----------:|:--------:|:---------:|:----------:|:-----------:|:-------:|:-------:|:-------:|:-------:|
|**Xiangxin-2XL-Chat-1048k**| 70.22     |	60.92	|  83.29	|75.13|	57.33|	76.64|	81.05|	65.40|	62.03   |
|**Llama-3-70B-Instruct-Gradient-1048k**| 69.66|	61.18	|82.88	|74.95	|55.28	|75.77	|77.79	|66.44	|63.00|

Note：truthfulqa_mc2, gsm8k flexible-extract


# <span id="Training">训练过程模型/Training</span>

该模型是使用ORPO技术和自行研发的中文价值观对齐数据集进行训练的。由于内容的敏感性，该数据集无法公开披露。

The model was trained using ORPO and a proprietary Chinese alignment dataset developed in-house. Due to the sensitivity of the content, the dataset cannot be publicly disclosed.

## Training loss

![image/png](https://cdn-uploads.huggingface.co/production/uploads/655b15957f2466433998bb89/oLLnrWaxQnyVwI8n2QqHK.png)

## Reward accuracies

![image/png](https://cdn-uploads.huggingface.co/production/uploads/655b15957f2466433998bb89/yD4My-43lLRWecyq-bgZ2.png)

## SFT loss

![image/png](https://cdn-uploads.huggingface.co/production/uploads/655b15957f2466433998bb89/iUoQfVZDftoW7C-2VXeWe.png)


# <span id="Start">快速开始/Quick Start</span>

## Use with transformers

You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the `generate()` function. Let's see examples of both.

使用Transformers运行本模型推理需要约400GB的显存。

Running inference with this model using Transformers requires approximately 400GB of GPU memory.


### Transformers pipeline

```python
import transformers
import torch

model_id = "xiangxinai/Xiangxin-2XL-Chat-1048k-Chinese-Llama3-70B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": ""},
    {"role": "user", "content": "解释一下“温故而知新”"},
]

prompt = pipeline.tokenizer.apply_chat_template(
		messages, 
		tokenize=False, 
		add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

“温故而知新”是中国古代的一句成语，出自《论语·子路篇》。
它的意思是通过温习过去的知识和经验，来获得新的理解和见解。
这里的“温故”是指温习过去，回顾历史，复习旧知识，
而“知新”则是指了解新鲜事物，掌握新知识。
这个成语强调学习的循序渐进性，强调在学习新知识时，
不能忽视过去的基础，而是要在继承和发扬的基础上，去理解和创新。
```

### Transformers AutoModelForCausalLM

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "xiangxinai/Xiangxin-2XL-Chat-1048k-Chinese-Llama3-70B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": ""},
    {"role": "user", "content": "解释一下“温故而知新”"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

“温故而知新”是中国古代的一句成语，出自《论语·子路篇》。
它的意思是通过温习过去的知识和经验，来获得新的理解和见解。
这里的“温故”是指温习过去，回顾历史，复习旧知识，
而“知新”则是指了解新鲜事物，掌握新知识。
这个成语强调学习的循序渐进性，强调在学习新知识时，
不能忽视过去的基础，而是要在继承和发扬的基础上，去理解和创新。
```

# 协议/License
This code is licensed under the META LLAMA 3 COMMUNITY LICENSE AGREEMENT License.

# 联系我们/Contact Us
For inquiries, please contact us via email at customer@xiangxinai.cn.