wolfrage89 commited on
Commit
a059bd9
1 Parent(s): 8997db3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -20
README.md CHANGED
@@ -1,31 +1,45 @@
1
- ---
2
- tags:
3
- - translation
4
- license: apache-2.0
5
- ---
6
  ### Finetuned on annual report sentence pair
7
- Original model shown below
8
 
9
  ## Test out at huggingface spaces!
10
  https://huggingface.co/spaces/wolfrage89/finance_domain_translation_marianMT
11
 
 
 
12
 
13
- ### opus-mt-id-en
14
 
15
- * source languages: id
16
- * target languages: en
17
- * OPUS readme: [id-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/id-en/README.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- * dataset: opus
20
- * model: transformer-align
21
- * pre-processing: normalization + SentencePiece
22
- * download original weights: [opus-2019-12-18.zip](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.zip)
23
- * test set translations: [opus-2019-12-18.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.test.txt)
24
- * test set scores: [opus-2019-12-18.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.eval.txt)
25
 
26
- ## Benchmarks
 
27
 
28
- | testset | BLEU | chr-F |
29
- |-----------------------|-------|-------|
30
- | Tatoeba.id.en | 47.7 | 0.647 |
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ### Finetuned on annual report sentence pair
2
+ This marianMT has been further finetuned on annual report sentence pairs
3
 
4
  ## Test out at huggingface spaces!
5
  https://huggingface.co/spaces/wolfrage89/finance_domain_translation_marianMT
6
 
7
+ ## Sample colab notebook
8
+ https://colab.research.google.com/drive/1H57vwiah7n1JXvXYMqJ8dklrIuU6Cljb?usp=sharing
9
 
10
+ ## How to use
11
 
12
+ ```python
13
+ !pip install transformers
14
+ !pip install sentencepiece
15
+
16
+
17
+ from transformers import MarianMTModel, MarianTokenizer
18
+
19
+ tokenizer = MarianTokenizer.from_pretrained("wolfrage89/annual_report_translation_id_en")
20
+ model = MarianMTModel.from_pretrained("wolfrage89/annual_report_translation_id_en")
21
+
22
+ #tokenizing bahasa sentence
23
+ bahasa_sentence = "Interpretasi ini merupakan interpretasi atas PSAK 46: Pajak Penghasilan yang bertujuan untuk mengklarifikasi dan memberikan panduan dalam merefleksikan ketidakpastian perlakuan pajak penghasilan dalam laporan keuangan."
24
+ tokenized_bahasa_sentence = tokenizer([bahasa_sentence], return_tensors='pt', max_length=104, truncation=True)
25
+
26
+ #feeding tokenized sentence into model, the max_legnth have been set to 104 as the model was trained mostly on sentences with this length
27
+ translated_tokens = model.generate(**tokenized_bahasa_sentence, max_length=104)[0]
28
 
29
+ ## decoding the tokens to get english sentence
30
+ english_sentence = tokenizer.decode(translated_tokens, skip_special_tokens=True)
 
 
 
 
31
 
32
+ print(english_sentence)
33
+ # This interpretation is an interpretation of PSAK 46: Income Tax that aims to clarify and provide guidance in reflecting the uncertainty of income tax treatments in the financial statements.
34
 
 
 
 
35
 
36
+ ```
37
+
38
+
39
+
40
+
41
+ ### opus-mt-id-en (original model)
42
+
43
+ * source languages: id
44
+ * target languages: en
45
+ * OPUS readme: [id-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/id-en/README.md)