Text Generation
Transformers
PyTorch
Japanese
llama
text-generation-inference
Inference Endpoints
ptrdvn commited on
Commit
5ab010b
1 Parent(s): d33dcfe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -31,6 +31,16 @@ With these datasets, we achieve the following scores on the JGLUE benchmark:
31
  | jnli-1.1-0.3 | 0.504 | 0.48 |
32
  | marc_ja-1.1-0.3 | 0.936 | 0.959 |
33
 
 
 
 
 
 
 
 
 
 
 
34
  Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
35
  This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.
36
 
 
31
  | jnli-1.1-0.3 | 0.504 | 0.48 |
32
  | marc_ja-1.1-0.3 | 0.936 | 0.959 |
33
 
34
+ We achieved these scores by using the [lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness) from Stability AI.
35
+
36
+ ```bash
37
+ MODEL_ARGS=pretrained=lightblue/openorca_stx,use_accelerate=True
38
+ TASK="jsquad-1.1-0.3,jcommonsenseqa-1.1-0.3,jnli-1.1-0.3,marc_ja-1.1-0.3"
39
+ export JGLUE_OUTPUT_DIR=../jglue_results/$MODEL_NAME/$DATSET_NAME/$DATASET_SIZE
40
+ mkdir -p $JGLUE_OUTPUT_DIR
41
+ python main.py --model hf-causal-experimental --model_args $MODEL_ARGS --tasks $TASK --num_fewshot "2,3,3,3" --device "cuda" --output_path $JGLUE_OUTPUT_DIR/result.json --batch_size 4 > $JGLUE_OUTPUT_DIR/harness.out 2> $JGLUE_OUTPUT_DIR/harness.err
42
+ ```
43
+
44
  Our model achieves much better results on the question answering benchmark (JSQuAD) than the base checkpoint without monstrous degradation of performance on multi-choice question benchmarks (JCommonSense, JNLI, MARC-Ja) purely through QLoRA training.
45
  This shows the potential for applying strong language models such as [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) to minimal QLoRA fine-tuning using Japanese fine-tuning datasets to achieve better results at narrow NLP tasks.
46