File size: 1,800 Bytes
6173656
a059bd9
6173656
8997db3
 
6173656
a059bd9
 
c63f23c
a059bd9
c63f23c
a059bd9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c63f23c
a059bd9
 
c63f23c
a059bd9
 
c63f23c
 
a059bd9
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
### Finetuned on annual report sentence pair
This marianMT has been further finetuned on annual report sentence pairs

## Test out at huggingface spaces!
https://huggingface.co/spaces/wolfrage89/finance_domain_translation_marianMT

## Sample colab notebook
https://colab.research.google.com/drive/1H57vwiah7n1JXvXYMqJ8dklrIuU6Cljb?usp=sharing

## How to use

```python
!pip install transformers
!pip install sentencepiece


from transformers import MarianMTModel, MarianTokenizer

tokenizer = MarianTokenizer.from_pretrained("wolfrage89/annual_report_translation_id_en")
model = MarianMTModel.from_pretrained("wolfrage89/annual_report_translation_id_en")

#tokenizing bahasa sentence
bahasa_sentence = "Interpretasi ini merupakan interpretasi atas PSAK 46: Pajak Penghasilan yang bertujuan untuk mengklarifikasi dan memberikan panduan dalam merefleksikan ketidakpastian perlakuan pajak penghasilan dalam laporan keuangan."
tokenized_bahasa_sentence = tokenizer([bahasa_sentence], return_tensors='pt', max_length=104, truncation=True)

#feeding tokenized sentence into model, the max_legnth have been set to 104 as the model was trained mostly on sentences with this length
translated_tokens = model.generate(**tokenized_bahasa_sentence, max_length=104)[0]

## decoding the tokens to get english sentence
english_sentence = tokenizer.decode(translated_tokens, skip_special_tokens=True)

print(english_sentence)
# This interpretation is an interpretation of PSAK 46: Income Tax that aims to clarify and provide guidance in reflecting the uncertainty of income tax treatments in the financial statements.


```




### opus-mt-id-en (original model)

* source languages: id
* target languages: en
*  OPUS readme: [id-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/id-en/README.md)