shenzhi-wang
/

Mistral-7B-v0.3-Chinese-Chat

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+language:
+- en
+- zh
+tags:
+- llama-factory
+- orpo
+---
+❗️❗️❗️NOTICE: For optimal performance, we refrain from fine-tuning the model's identity. Thus, inquiries such as "Who are you" or "Who developed you" may yield random responses that are not necessarily accurate.
+# Updates
+- 🚀🚀🚀 [May 26, 2024] We now introduce [Mistral-7B-v0.3-Chinese-Chat](https://huggingface.co/shenzhi-wang/Mistral-7B-v0.3-Chinese-Chat)! Full-parameter fine-tuned on a mixed Chinese-English dataset of **~100K preference pairs**, **the Chinese ability is greatly improved** compared to [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)! Besides, it has great performance in **mathematics, roleplay, tool use**, etc.
+# Model Summary
+[Mistral-7B-v0.3-Chinese-Chat](https://huggingface.co/shenzhi-wang/Mistral-7B-v0.3-Chinese-Chat) is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3).
+Developed by: [Shenzhi Wang](https://shenzhi-wang.netlify.app) (王慎执) and [Yaowei Zheng](https://github.com/hiyouga) (郑耀威)
+- License: [Apache License 2.0](https://choosealicense.com/licenses/apache-2.0/)
+- Base Model: [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
+- Model Size: 7.25B
+- Context length: 32K
+# 1. Introduction
+This is **the first model** specifically fine-tuned for Chinese & English user based on the [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3). The fine-tuning algorithm used is ORPO [1].
+**Compared to the original [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), our [Mistral-7B-v0.3-Chinese-Chat](https://huggingface.co/shenzhi-wang/Mistral-7B-v0.3-Chinese-Chat) model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses.**
+[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
+Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
+Training details:
+- epochs: 3
+- learning rate: 3e-6
+- learning rate scheduler type: cosine
+- Warmup ratio: 0.1
+- cutoff len (i.e. context length): 32768
+- orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
+- global batch size: 128
+- fine-tuning type: full parameters
+- optimizer: paged_adamw_32bit
+# 2. Usage
+```python
+from transformers import pipeline
+messages = [
+    {
+        "role": "system",
+        "content": "You are a helpful assistant.",
+    },
+    {"role": "user", "content": "简要地介绍一下什么是机器学习"},
+]
+chatbot = pipeline(
+    "text-generation",
+    model="shenzhi-wang/Mistral-7B-v0.3-Chinese-Chat",
+    max_length=32768,
+)
+print(chatbot(messages))
+```