xianbin commited on
Commit
22a4a44
1 Parent(s): c773b24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -12
README.md CHANGED
@@ -39,9 +39,11 @@ For tokenization, the model employs the default tokenizer used in Gemma-2-9B.
39
  We evaluated Gemma2 9B CPT SEA-LIONv3 base model on general language capabilities.
40
 
41
  #### General Language Capabilities
42
- For the evaluation of general language capabilities in SEA languages, we employed the [BHASA evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
43
  These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarization (Summ), Causal Reasoning (Causal) and Natural Language Inference (NLI).
44
 
 
 
45
  The evaluation was done **five-shot** with native prompts and only a sample of 100-1000 instances for each dataset was used as per the setting described in the paper.
46
 
47
  For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
@@ -102,22 +104,22 @@ Gemma2 9B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.co
102
  on the following hardware:
103
 
104
  | Training Details | Gemma2 9B CPT SEA-LIONv3 |
105
- |----------------------|:--------------------:|
106
- | SingTel HGX-100 | 8+1 instances |
107
- | Nvidia H100 80GB GPU | 64+8 |
108
- | Training Duration | 10 days |
109
 
110
 
111
  ### Configuration
112
 
113
  | HyperParameter | Gemma2 9B CPT SEA-LIONv3 |
114
- |-------------------|:--------------------:|
115
- | Precision | bfloat16 |
116
- | Optimizer | decoupled_adamw |
117
- | Scheduler | weight_stable_decay |
118
- | Learning Rate | 1.0e-5 |
119
- | Global Batch Size | 512 |
120
- | Micro Batch Size | 1 |
121
 
122
 
123
  ## The Team
 
39
  We evaluated Gemma2 9B CPT SEA-LIONv3 base model on general language capabilities.
40
 
41
  #### General Language Capabilities
42
+ For the evaluation of general language capabilities, we employed the [SEA HELM evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
43
  These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarization (Summ), Causal Reasoning (Causal) and Natural Language Inference (NLI).
44
 
45
+ Note: SEA HELM is implemented using prompts which expect answers in a strict format. For all tasks, the model is expected to provide an answer tag from which the answer would be extracted. For tasks where options are provided, the answer should only include one of the pre-defined options. The weighted accuracy of the answers is calculated and normalisation is performed to account for baseline performance due to random chance.
46
+
47
  The evaluation was done **five-shot** with native prompts and only a sample of 100-1000 instances for each dataset was used as per the setting described in the paper.
48
 
49
  For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
 
104
  on the following hardware:
105
 
106
  | Training Details | Gemma2 9B CPT SEA-LIONv3 |
107
+ |----------------------|:------------------------:|
108
+ | SingTel HGX-100 | 8+1 instances |
109
+ | Nvidia H100 80GB GPU | 64+8 |
110
+ | Training Duration | 10 days |
111
 
112
 
113
  ### Configuration
114
 
115
  | HyperParameter | Gemma2 9B CPT SEA-LIONv3 |
116
+ |-------------------|:------------------------:|
117
+ | Precision | bfloat16 |
118
+ | Optimizer | decoupled_adamw |
119
+ | Scheduler | weight_stable_decay |
120
+ | Learning Rate | 1.0e-5 |
121
+ | Global Batch Size | 512 |
122
+ | Micro Batch Size | 1 |
123
 
124
 
125
  ## The Team