File size: 1,345 Bytes
25fcbb7
5506447
 
 
 
25fcbb7
 
 
 
5506447
 
61d8745
25fcbb7
 
5506447
25fcbb7
5553d7e
45b8f55
 
 
5553d7e
45b8f55
1cb7e87
d0e3a42
1cb7e87
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
language:
- en
- is
- multilingual
tags:
- translation
inference:
  parameters:
    src_lang: en_XX
    tgt_lang: is_IS
    decoder_start_token_id: 2
    max_length: 512
widget:
- text: I once owned a horse. It was black and white.
---
# mBART based translation model
This model was trained to translate multiple sentences at once, compared to one sentence at a time.

It will occasionally combine sentences or add an extra sentence.

This is the same model as are provided on CLARIN: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278

You can use the following example to get started:

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
    import torch
    
    device = torch.cuda.current_device() if torch.cuda.is_available() else -1
    
    tokenizer = AutoTokenizer.from_pretrained("mideind/nmt-doc-en-is-2022-10",src_lang="en_XX",tgt_lang="is_IS")
    
    model = AutoModelForSeq2SeqLM.from_pretrained("mideind/nmt-doc-en-is-2022-10")
    
    translate = pipeline("translation_XX_to_YY",model=model,tokenizer=tokenizer,device=device,src_lang="en_XX",tgt_lang="is_IS")
    
    target_seq = translate("I am using a translation model to translate text from English to Icelandic.",src_lang="en_XX",tgt_lang="is_IS",max_length=128)
    print(target_seq[0]['translation_text'].strip('YY '))