Lynxpda commited on
Commit
e89cd0d
1 Parent(s): bae832d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -1 +1,68 @@
1
- # Veps - Russian version 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ library_name: pytorch
4
+ language:
5
+ - ru
6
+ - vep
7
+ datasets:
8
+ - Lynxpda/back-translated-veps-russian
9
+ ---
10
+
11
+ # Model Card for Veps - Russian version 1.0
12
+
13
+ A model of translation from Vepsian into Russian.
14
+ In archive initial weights of the model trained with OpenNMT-py (Locomotive).
15
+ The model has 457M parameters and is trained from scratch.
16
+ Also presented are model weights converted for Ctranslate2 and a package for installation and use with Argostranslate/Libretranslate.
17
+
18
+ ## Model Architecture and Objective
19
+
20
+ ```
21
+ dec_layers: 20
22
+ decoder_type: transformer
23
+ enc_layers: 20
24
+ encoder_type: transformer
25
+ heads: 8
26
+ hidden_size: 512
27
+ max_relative_positions: 20
28
+ model_dtype: fp16
29
+ pos_ffn_activation_fn: gated-gelu
30
+ position_encoding: false
31
+ share_decoder_embeddings: true
32
+ share_embeddings: true
33
+ share_vocab: true
34
+ src_vocab_size: 32000
35
+ tgt_vocab_size: 32000
36
+ transformer_ff: 6144
37
+ word_vec_size: 512
38
+ ```
39
+
40
+ # Citing & Authors
41
+
42
+ Authors: Maksim Migukin, Maksim Kuznetsov, Alexey Kutashov.
43
+
44
+ ## Credits
45
+
46
+ Data compiled by [Opus](https://opus.nlpl.eu/).
47
+
48
+ Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).
49
+
50
+ Data from Vepsian [WiKi](https://vep.wikipedia.org/wiki/)
51
+
52
+ Data from [Lehme No 2051 // Open corpus of Vepsian and Karelian languages VepKar.](http://dictorpus.krc.karelia.ru/)
53
+
54
+ Data from [OMAMEDIA](https://omamedia.ru/)
55
+
56
+ CCMatrix
57
+
58
+ http://opus.nlpl.eu/CCMatrix-v1.php
59
+
60
+ If you use the dataset or code, please cite (pdf) and, please, acknowledge OPUS (bib, pdf) as well for this release.
61
+
62
+ This corpus has been extracted from web crawls using the margin-based bitext mining techniques described here. The original distribution is available from http://data.statmt.org/cc-matrix/
63
+
64
+ OpenSubtitles
65
+
66
+ http://opus.nlpl.eu/OpenSubtitles-v2018.php
67
+
68
+ Please cite the following article if you use any part of the corpus in your own work: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)