wolfrage89
/

annual_report_translation_id_en

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

wolfrage89 commited on Jan 27, 2022

Commit

a059bd9

•

1 Parent(s): 8997db3

Update README.md

Files changed (1) hide show

README.md +34 -20

README.md CHANGED Viewed

@@ -1,31 +1,45 @@
----
-tags:
-- translation
-license: apache-2.0
----
 ### Finetuned on annual report sentence pair
-Original model shown below
 ## Test out at huggingface spaces!
 https://huggingface.co/spaces/wolfrage89/finance_domain_translation_marianMT
-### opus-mt-id-en
-* source languages: id
-* target languages: en
-*  OPUS readme: [id-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/id-en/README.md)
-*  dataset: opus
-* model: transformer-align
-* pre-processing: normalization + SentencePiece
-* download original weights: [opus-2019-12-18.zip](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.zip)
-* test set translations: [opus-2019-12-18.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.test.txt)
-* test set scores: [opus-2019-12-18.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/id-en/opus-2019-12-18.eval.txt)
-## Benchmarks
-| testset               | BLEU  | chr-F |
-|-----------------------|-------|-------|
-| Tatoeba.id.en 	| 47.7 	| 0.647 |

 ### Finetuned on annual report sentence pair
+This marianMT has been further finetuned on annual report sentence pairs
 ## Test out at huggingface spaces!
 https://huggingface.co/spaces/wolfrage89/finance_domain_translation_marianMT
+## Sample colab notebook
+https://colab.research.google.com/drive/1H57vwiah7n1JXvXYMqJ8dklrIuU6Cljb?usp=sharing
+## How to use
+```python
+!pip install transformers
+!pip install sentencepiece
+from transformers import MarianMTModel, MarianTokenizer
+tokenizer = MarianTokenizer.from_pretrained("wolfrage89/annual_report_translation_id_en")
+model = MarianMTModel.from_pretrained("wolfrage89/annual_report_translation_id_en")
+#tokenizing bahasa sentence
+bahasa_sentence = "Interpretasi ini merupakan interpretasi atas PSAK 46: Pajak Penghasilan yang bertujuan untuk mengklarifikasi dan memberikan panduan dalam merefleksikan ketidakpastian perlakuan pajak penghasilan dalam laporan keuangan."
+tokenized_bahasa_sentence = tokenizer([bahasa_sentence], return_tensors='pt', max_length=104, truncation=True)
+#feeding tokenized sentence into model, the max_legnth have been set to 104 as the model was trained mostly on sentences with this length
+translated_tokens = model.generate(**tokenized_bahasa_sentence, max_length=104)[0]
+## decoding the tokens to get english sentence
+english_sentence = tokenizer.decode(translated_tokens, skip_special_tokens=True)
+print(english_sentence)
+# This interpretation is an interpretation of PSAK 46: Income Tax that aims to clarify and provide guidance in reflecting the uncertainty of income tax treatments in the financial statements.
+```
+### opus-mt-id-en (original model)
+* source languages: id
+* target languages: en
+*  OPUS readme: [id-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/id-en/README.md)