File size: 2,068 Bytes
e89cd0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: cc-by-sa-4.0
library_name: pytorch
language:
- ru
- vep
datasets:
- Lynxpda/back-translated-veps-russian
---

# Model Card for Veps - Russian version 1.0

A model of translation from Vepsian into Russian.
In archive initial weights of the model trained with OpenNMT-py (Locomotive).
The model has 457M parameters and is trained from scratch.
Also presented are model weights converted for Ctranslate2 and a package for installation and use with Argostranslate/Libretranslate.

## Model Architecture and Objective

```
dec_layers: 20
decoder_type: transformer
enc_layers: 20
encoder_type: transformer
heads: 8
hidden_size: 512
max_relative_positions: 20
model_dtype: fp16
pos_ffn_activation_fn: gated-gelu
position_encoding: false
share_decoder_embeddings: true
share_embeddings: true
share_vocab: true
src_vocab_size: 32000
tgt_vocab_size: 32000
transformer_ff: 6144
word_vec_size: 512
```

# Citing & Authors 

Authors: Maksim Migukin, Maksim Kuznetsov, Alexey Kutashov.

## Credits

Data compiled by [Opus](https://opus.nlpl.eu/).

Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).

Data from Vepsian [WiKi](https://vep.wikipedia.org/wiki/)

Data from [Lehme No 2051 // Open corpus of Vepsian and Karelian languages VepKar.](http://dictorpus.krc.karelia.ru/)

Data from [OMAMEDIA](https://omamedia.ru/)

CCMatrix

http://opus.nlpl.eu/CCMatrix-v1.php

If you use the dataset or code, please cite (pdf) and, please, acknowledge OPUS (bib, pdf) as well for this release.

This corpus has been extracted from web crawls using the margin-based bitext mining techniques described here. The original distribution is available from http://data.statmt.org/cc-matrix/

OpenSubtitles

http://opus.nlpl.eu/OpenSubtitles-v2018.php

Please cite the following article if you use any part of the corpus in your own work: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)