|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- QingyiSi/Alpaca-CoT |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
This is a beta release of a QLoRa adapter model to [Falcon-40b](https://huggingface.co/tiiuae/falcon-40b). |
|
Please read the instruction carefully before downloading the model. |
|
|
|
Though Falcon is not specifically trained on Chinese corpus, it exhibits strong performance in Chinese Language Understanding in our experiment. We would like to explore out of curiosity whether a |
|
small amount of Chinese instruction data can push it further and make it better at speaking. |
|
|
|
The LoRa model is trained with the [QLoRa code](https://github.com/artidoro/qlora) on a subset of bilingual instruction data from [Alpaca-CoT dataset](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT) for a mere 5k steps. |
|
The finetune model is not as good as the carefully continue-trained-and-finetuned LLaMA-models such as [OpenBuddy](https://huggingface.co/OpenBuddy) and [Ziya](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) in Chinese generation, |
|
still it quickly adapts to the new langauge and generate superisingly good result. We call for more research on applying Falcon-40b to the Chinese domain. |
|
|
|
## Evalutions |
|
|
|
We evaluate on two Chinese language understanding benchmarks, [C-Eval](https://cevalbenchmark.com/) and Gaokao subset of [AGIEval](https://github.com/microsoft/AGIEval). |
|
|
|
* C-Eval made breaking change in 2023/06/08 from few-shot to zero-shot, |
|
|
|
Result on C-Eval test set with 5-shot and no CoT |
|
|
|
| Average | Avg(Hard) | STEM | Social Science | Humanities | Others | |
|
| - | - | - | - | - | - | |
|
| 40.4 | 30.1 | 35.8 | 47.6 | 42.0 | 40.6 | |
|
|
|
|
|
Result on GaoKao subset of C-Eval with 0-shot |
|
|
|
| Average | GK-chinese | GK-English | GK-geography | GK-history | GK-biology | GK-chemistry | GK-physics | GK-mathqa | GK-mathcloze |
|
| - | - | - | - | - | - | - | - | - | - | |
|
| 33.6 | 26.4 | 69.0 | 46.7 | 47.8 | 27.1 | 32.4 | 24.5 | 26.8 | 1.7 | |