sofial commited on
Commit
94cfd32
1 Parent(s): a838dd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -6
README.md CHANGED
@@ -10,21 +10,39 @@ model-index:
10
 
11
  This model is the fine-tuned version of [EleutherAI/gpt-j-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on the [MNLI dataset]()
12
 
 
 
 
 
 
13
 
 
14
 
15
- ## Fine-tuning and evaluation data
 
 
 
 
 
 
 
 
16
 
 
 
17
 
18
- ## Fine-tuning procedure
19
 
20
- Fine tuned on a Graphcore IPU-POD64 using `popxl`
 
21
 
22
- Command lines:
 
 
23
 
24
  ### Fine-tuning hyperparameters
25
-
26
  The following hyperparameters were used:
27
-
28
  ### Framework versions
29
 
30
  - Transformers
 
10
 
11
  This model is the fine-tuned version of [EleutherAI/gpt-j-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on the [MNLI dataset]()
12
 
13
+ MNLI dataset consists of pairs of sentences, a *premise* and a *hypothesis*.
14
+ The task is to predict the relation between the premise and the hypothesis, which can be:
15
+ - `entailment`: hypothesis follows from the premise,
16
+ - `contradiction`: hypothesis contradicts the premise,
17
+ - `neutral`: hypothesis and premise are unrelated.
18
 
19
+ The MNLI task is to take two sentences referred to as the hypothesis and the premise as input and decide if the sentences entail (support), are neutral (cover different subjects) or contradict each other.
20
 
21
+ We finetune the model as a Causal Language Model (CLM): given a sequence of tokens, the task is to predict the next token.
22
+ To achieve this, we create a stylised prompt string, following the approach of [T5 paper](https://arxiv.org/pdf/1910.10683.pdf).
23
+ ```shell
24
+ mnli hypothesis: {hypothesis} premise: {premise} target: {class_label} <|endoftext|>
25
+ ```
26
+ For example:
27
+ ```
28
+ mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target: contradiction <|endoftext|>
29
+ ```
30
 
31
+ ## Fine-tuning and validation data
32
+ Fine tuning is done using the `train` split of the GLUE MNLI dataset and the performance is measured using the `validation_mismatched` split.
33
 
34
+ `validation_mismatched` means validation examples are not derived from the same sources as those in the training set and therefore not closely resembling any of the examples seen at training time.
35
 
36
+ ## Fine-tuning procedure
37
+ Fine tuned on a Graphcore IPU-POD64 using `popxl`.
38
 
39
+ Prompt sentences are tokenized and packed together to form 1024 token sequences, following [HF packing algorithm](https://github.com/huggingface/transformers/blob/v4.20.1/examples/pytorch/language-modeling/run_clm.py). No padding is used.
40
+ Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
41
+ Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
42
 
43
  ### Fine-tuning hyperparameters
 
44
  The following hyperparameters were used:
45
+
46
  ### Framework versions
47
 
48
  - Transformers