jaspercatapang commited on
Commit
e0c9db3
1 Parent(s): 9711067

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -41,6 +41,7 @@ According to the leaderboard description, here are the benchmarks used for the e
41
  *Based on a [leaderboard clone](https://huggingface.co/spaces/gsaivinay/open_llm_leaderboard) with GPT-3.5 and GPT-4 included.
42
 
43
  ### Reproducing Evaluation Results
 
44
 
45
  Install LM Evaluation Harness:
46
  ```
@@ -53,26 +54,25 @@ git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463
53
  # install
54
  pip install -e .
55
  ```
56
- Each task was evaluated on a single A100 80GB GPU.
57
 
58
  ARC:
59
  ```
60
- python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus2-70B-instruct --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus2-70B-instruct/arc_challenge_25shot.json --device cuda --num_fewshot 25
61
  ```
62
 
63
  HellaSwag:
64
  ```
65
- python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus2-70B-instruct --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus2-70B-instruct/hellaswag_10shot.json --device cuda --num_fewshot 10
66
  ```
67
 
68
  MMLU:
69
  ```
70
- python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus2-70B-instruct --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus2-70B-instruct/mmlu_5shot.json --device cuda --num_fewshot 5
71
  ```
72
 
73
  TruthfulQA:
74
  ```
75
- python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus2-70B-instruct --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus2-70B-instruct/truthfulqa_0shot.json --device cuda
76
  ```
77
 
78
  ### Prompt Template
 
41
  *Based on a [leaderboard clone](https://huggingface.co/spaces/gsaivinay/open_llm_leaderboard) with GPT-3.5 and GPT-4 included.
42
 
43
  ### Reproducing Evaluation Results
44
+ *Instruction template taken from [Platypus 2 70B instruct](https://huggingface.co/garage-bAInd/Platypus2-70B-instruct).
45
 
46
  Install LM Evaluation Harness:
47
  ```
 
54
  # install
55
  pip install -e .
56
  ```
 
57
 
58
  ARC:
59
  ```
60
+ python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/G270B/arc_challenge_25shot.json --device cuda --num_fewshot 25
61
  ```
62
 
63
  HellaSwag:
64
  ```
65
+ python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/G270B/hellaswag_10shot.json --device cuda --num_fewshot 10
66
  ```
67
 
68
  MMLU:
69
  ```
70
+ python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/G270B/mmlu_5shot.json --device cuda --num_fewshot 5
71
  ```
72
 
73
  TruthfulQA:
74
  ```
75
+ python main.py --model hf-causal-experimental --model_args pretrained=MayaPH/GodziLLa2-70B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/G270B/truthfulqa_0shot.json --device cuda
76
  ```
77
 
78
  ### Prompt Template