File size: 1,204 Bytes
2a51ca8
 
 
 
 
 
 
 
 
558438f
2a51ca8
558438f
2a51ca8
 
 
 
 
61855d5
2a51ca8
 
 
 
 
 
 
 
 
 
 
 
 
61855d5
2a51ca8
 
 
61855d5
2a51ca8
 
61855d5
 
 
 
 
 
 
 
 
 
 
 
2a51ca8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
language: 
  - ms
tags:
- paraphrase
metrics:
- sacrebleu
---

# finetune-paraphrase-t5-base-standard-bahasa-cased

Finetuned T5 base on MS paraphrase tasks.

## Dataset

1. translated PAWS, https://huggingface.co/datasets/mesolitica/translated-PAWS
2. translated MRPC, https://huggingface.co/datasets/mesolitica/translated-MRPC
3. translated ParaSCI, https://huggingface.co/datasets/mesolitica/translated-paraSCI

## Finetune details

1. Finetune using single RTX 3090 Ti.

Scripts at https://github.com/huseinzol05/malaya/tree/master/session/paraphrase/hf-t5

## Supported prefix

1. `parafrasa: {string}`, for MS paraphrase.

## Evaluation

Evaluated on MRPC validation set and ParaSCI Arxiv test set.

```
{'name': 'BLEU',
 'score': 35.95965899952292,
 '_mean': -1.0,
 '_ci': -1.0,
 '_verbose': '61.7/41.3/32.0/25.8 (BP = 0.944 ratio = 0.946 hyp_len = 95593 ref_len = 101064)',
 'bp': 0.9443747373110852,
 'counts': [59014, 37157, 27016, 20383],
 'totals': [95593, 90049, 84505, 78961],
 'sys_len': 95593,
 'ref_len': 101064,
 'precisions': [61.73464584226878,
  41.263090095392506,
  31.969705934560086,
  25.81400944770203],
 'prec_str': '61.7/41.3/32.0/25.8',
 'ratio': 0.9458659859099184}
```