SEBIS
/

legal_t5_small_trans_it_sv

Text2Text Generation Transformers PyTorch JAX t5 translation Italian Swedish model Inference Endpoints text-generation-inference

Model card Files Files and versions Community

Mainak Manna commited on Jan 29, 2021

Commit

5ed79a1

•

1 Parent(s): 9b32f6c

First version of the model

Browse files

Files changed (4) hide show

README.md +69 -0
config.json +36 -0
pytorch_model.bin +3 -0
spiece.model +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+language: Italian Swedish
+tags:
+- translation Italian Swedish  model
+datasets:
+- dcep europarl jrc-acquis
+widget:
+- text: "Inoltre, come è emerso da un discorso pronunciato dal direttore del Centro europeo per la prevenzione e il controllo delle malattie (ECDC) in occasione della riunione dell'EPSCO svoltasi il 6 giugno 2011, gli Stati membri dell'UE sono i paesi di maggiore diffusione del morbillo nel mondo sviluppato."
+---
+# legal_t5_small_trans_it_sv model
+Model on translating legal text from Italian to Swedish. It was first released in
+[this repository](https://github.com/agemagician/LegalTrans). This model is trained on three parallel corpus from jrc-acquis, europarl and dcep.
+## Model description
+legal_t5_small_trans_it_sv is based on the `t5-small` model and was trained on a large corpus of parallel text. This is a smaller model, which scales the baseline model of t5 down by using `dmodel = 512`, `dff = 2,048`, 8-headed attention, and only 6 layers each in the encoder and decoder. This variant has about 60 million parameters.
+## Intended uses & limitations
+The model could be used for translation of legal texts from Italian to Swedish.
+### How to use
+Here is how to use this model to translate legal text from Italian to Swedish in PyTorch:
+```python
+from transformers import AutoTokenizer, AutoModelWithLMHead, TranslationPipeline
+pipeline = TranslationPipeline(
+model=AutoModelWithLMHead.from_pretrained("SEBIS/legal_t5_small_trans_it_sv"),
+tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "SEBIS/legal_t5_small_trans_it_sv", do_lower_case=False,
+                                            skip_special_tokens=True),
+    device=0
+)
+it_text = "Inoltre, come è emerso da un discorso pronunciato dal direttore del Centro europeo per la prevenzione e il controllo delle malattie (ECDC) in occasione della riunione dell'EPSCO svoltasi il 6 giugno 2011, gli Stati membri dell'UE sono i paesi di maggiore diffusione del morbillo nel mondo sviluppato."
+pipeline([it_text], max_length=512)
+```
+## Training data
+The legal_t5_small_trans_it_sv model was trained on [JRC-ACQUIS](https://wt-public.emm4u.eu/Acquis/index_2.2.html), [EUROPARL](https://www.statmt.org/europarl/), and [DCEP](https://ec.europa.eu/jrc/en/language-technologies/dcep) dataset consisting of 5 Million parallel texts.
+## Training procedure
+### Preprocessing
+### Pretraining
+An unigram model with 88M parameters is trained over the complete parallel corpus to get the vocabulary (with byte pair encoding), which is used with this model.
+## Evaluation results
+When the model is used for translation test dataset, achieves the following results:
+Test results :
+| Model | BLEU score |
+|:-----:|:-----:|
+|   legal_t5_small_trans_it_sv | 39.17|
+### BibTeX entry and citation info

config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 2048,
+  "d_kv": 64,
+  "d_model": 512,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "relu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_decoder_layers": 6,
+  "num_heads": 8,
+  "num_layers": 6,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_num_buckets": 32,
+  "task_specific_params": {
+    "translation_it_to_sv": {
+      "early_stopping": true,
+      "length_penalty": 2.0,
+      "max_length": 512,
+      "min_length": 1,
+      "no_repeat_ngram_size": 3,
+      "num_beams": 4,
+      "prefix": "translate Italian to Swedish : "
+    }
+  },
+  "use_cache": true,
+  "vocab_size": 32128
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:00f7be5d7ae76e13a2a926706e100df4f5ad7abb64773ed8114ce4b66d44238b
+size 242087498

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53b3c9b1becca02342bbf2c8b00abe9154fb0fc8dbe8c71ad506537b2222523a
+size 840425