File size: 9,505 Bytes
90d65c3
 
880597f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c645da
880597f
 
3c645da
880597f
3c645da
880597f
 
 
 
3c645da
880597f
 
 
 
 
 
 
 
 
502ee16
880597f
 
 
 
 
c79abc3
880597f
3c645da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
880597f
3c645da
880597f
 
3c645da
 
9904f80
3c645da
 
 
 
880597f
 
3c645da
880597f
3c645da
9904f80
3c645da
 
 
 
 
 
880597f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
928e689
 
 
 
 
 
 
 
 
 
 
578dbdf
880597f
578dbdf
880597f
578dbdf
880597f
3c645da
880597f
3c645da
880597f
3c645da
 
 
880597f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
license: apache-2.0
tags:
- LLMs
- Intel
base_model: mosaicml/mpt-7b
datasets:
- Intel/neural-chat-dataset-v1-1
- allenai/real-toxicity-prompts
language:
- en
model-index:
- name: neural-chat-7b-v1-1
  results:
  - task:
      type: Large Language Model
      name: Large Language Model
    dataset:
      type: Intel/neural-chat-dataset-v1-1
      name: Intel/neural-chat-dataset-v1-1
    metrics:
    - type: Average
      value: 51.41
      name: Average
      verified: true
    - type: ARC (25-shot)
      value: 50.09 
      name: ARC (25-shot)
      verified: true
    - type: HellaSwag (10-shot)
      value: 76.69
      name: HellaSwag (10-shot)
      verified: true
    - type: MMLU (5-shot)
      value: 38.79
      name: MMLU (5-shot)
      verified: true
    - type: TruthfulQA (0-shot)
      value: 40.07
      name: TruthfulQA (0-shot)
      verified: true
    - type: Toxicity Rito
      value: 0.0264
      name: Toxicity Rito

---
## Model Details: Neural-Chat-v1-1

This model is a fine-tuned model for chat based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b) with a max sequence length of 2048 on the dataset [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1), which is a compilation of open-source datasets.

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6297f0e30bd2f58c647abb1d/fWCqhGKZQKNuLmvj093rB.jpeg" width="500"/>
  Prompt of "an image of a brain that has to do with LLMs" from https://clipdrop.co/stable-diffusion-turbo.
</p>

| Model Detail | Description |
| ----------- | ----------- | 
| Model Authors | Intel. The NeuralChat team with members from DCAI/AISE/AIPT. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen. | 
| Date | July, 2023 | 
| Version | v1-1 | 
| Type | 7B Large Language Model | 
| Paper or Other Resources | Base model: [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b); Dataset: [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1) | 
| License | Apache 2.0 |
| Questions or Comments | [Community Tab](https://huggingface.co/Intel/neural-chat-7b-v1-1/discussions) and [Intel DevHub Discord](https://discord.gg/rv2Gp55UJQ)|

| Intended Use | Description |
| ----------- | ----------- | 
| Primary intended uses | You can use the fine-tuned model for several language-related tasks. Checkout the [LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to see this model's performance relative to other LLMs. | 
| Primary intended users | Anyone doing inference on language-related tasks. | 
| Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task.  The model should not be used to intentionally create hostile or alienating environments for people.|

## How To Use

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.02
- num_epochs: 3.0

## Use The Model

### Loading the model with Transformers
```python
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'Intel/neural-chat-7b-v1-1',
  trust_remote_code=True
)
```

### Inference with INT8
Follow the instructions at the [GitHub repository](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization) to install the necessary dependencies for quantization to INT8. Use the below command to quantize the model using [Intel Neural Compressor](https://github.com/intel/neural-compressor) to accelerate inference.

```bash
python run_generation.py \
    --model Intel/neural-chat-7b-v1-1 \
    --quantize \
    --sq \
    --alpha 0.95 \
    --ipex
```

| Factors | Description | 
| ----------- | ----------- | 
| Groups | More details about the dataset can be found at [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1). | 
| Instrumentation | The performance of the model can vary depending on the inputs to the model. In this case, the prompts provided can drastically change the prediction of the language model. |
| Environment | - |
| Card Prompts | Model deployment on varying hardware and software will change model performance.  |

| Metrics | Description | 
| ----------- | ----------- | 
| Model performance measures | The model metrics are: ARC, HellaSwag, MMLU, and TruthfulQA. Bias evaluation was also evaluated using using Toxicity Rito (see Quantitative Analyses below). The model performance was evaluated against other LLMs according to the standards at the time the model was published. |
| Decision thresholds | No decision thresholds were used. | 
| Approaches to uncertainty and variability | - | 


## Training Data

The training data are from [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1). The total number of instruction samples is about 1.1M, and the number of tokens is 326M. This dataset is composed of several other datasets:

| Type         | Language | Dataset                                                                                                                           | Number |
  |--| ---- |--------|----|
  | HC3  | en | [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)                                 | 24K   |
  | dolly  | en | [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)                                 | 15K   |
  | alpaca-zh  | zh | [tigerbot-alpaca-zh-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-zh-0.5m)                                 | 500K   |
  | alpaca-en  | en | [TigerResearch/tigerbot-alpaca-en-50k](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-en-50k)                                  | 50K   |
  | math     | en | [tigerbot-gsm-8k-en](https://huggingface.co/datasets/TigerResearch/tigerbot-gsm-8k-en)                                           | 8K     |
  | general     | en | [tigerbot-stackexchange-qa-en-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-stackexchange-qa-en-0.5m)             | 500K    |

Note: There is no contamination from the GSM8k test set, as this is not a part of this dataset.

## Quantitative Analyses

### LLM metrics
We used the same evaluation metrics as [HuggingFaceH4/open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), which uses [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master), a unified framework to test generative language models on a large number of different evaluation tasks.

| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
| --- | --- | --- | --- | --- | --- |
|[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 47.4  | 47.61 | 77.56 | 31 | 33.43 |
| [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) | **49.95** | 46.5 | 75.55 | 37.60 | 40.17 |
| [Intel/neural-chat-dataset-v1-1](https://huggingface.co/Intel/neural-chat-dataset-v1-1) | **51.41** | 50.09 | 76.69 | 38.79 | 40.07 |

### Bias evaluation

Following the blog [evaluating-llm-bias](https://huggingface.co/blog/evaluating-llm-bias), we selected 10000 samples randomly from [allenai/real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts) to evaluate toxicity bias. 

| Model | Toxicity Rito ↓|
| --- | --- |
|[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 0.027 |
| [Intel/neural-chat-dataset-v1-1](https://huggingface.co/Intel/neural-chat-dataset-v1-1) | 0.0264 |

### Examples

- code generation
![code-generation](examples/code.png)

- summarization
![summarization](examples/summarization.png)

- trip
![trip](examples/trip.png)

## Ethical Considerations and Limitations
Neural-chat-7b-v1-1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. neural-chat-7b-v1-1 was trained on various instruction/chat datasets based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b). Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are some useful GitHub repository links to learn more about Intel's open-source AI software:
* Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
* Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
* Intel Extension for PyTorch [link](https://github.com/intel/intel-extension-for-pytorch)

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.