lightblue
/

openorca_stx

Text Generation

text-generation-inference

Model card Files Files and versions Community

ptrdvn commited on Oct 2, 2023

Commit

5ab010b

·

1 Parent(s): d33dcfe

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -31,6 +31,16 @@ With these datasets, we achieve the following scores on the JGLUE benchmark:
 | jnli-1.1-0.3           | 0.504                                    | 0.48                   |
 | marc_ja-1.1-0.3        | 0.936                                    | 0.959                  |
 Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
 This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.

 | jnli-1.1-0.3           | 0.504                                    | 0.48                   |
 | marc_ja-1.1-0.3        | 0.936                                    | 0.959                  |
+We achieved these scores by using the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness) from Stability AI.
+```bash
+MODEL_ARGS=pretrained=lightblue/openorca_stx,use_accelerate=True
+TASK="jsquad-1.1-0.3,jcommonsenseqa-1.1-0.3,jnli-1.1-0.3,marc_ja-1.1-0.3"
+export JGLUE_OUTPUT_DIR=../jglue_results/$MODEL_NAME/$DATSET_NAME/$DATASET_SIZE
+mkdir -p $JGLUE_OUTPUT_DIR
+python main.py --model hf-causal-experimental --model_args $MODEL_ARGS --tasks $TASK --num_fewshot "2,3,3,3" --device "cuda" --output_path $JGLUE_OUTPUT_DIR/result.json --batch_size 4 > $JGLUE_OUTPUT_DIR/harness.out 2> $JGLUE_OUTPUT_DIR/harness.err
+```
 Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
 This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.