upload
Browse files- .gitattributes +10 -35
- LICENSE +21 -0
- Laos-Viet_Translator.ipynb +0 -0
- bin/dict.lo.txt +0 -0
- bin/dict.vi.txt +0 -0
- bin/preprocess.log +14 -0
- bin/test.lo-vi.lo.bin +0 -0
- bin/test.lo-vi.lo.idx +0 -0
- bin/test.lo-vi.vi.bin +0 -0
- bin/test.lo-vi.vi.idx +0 -0
- bin/train.lo-vi.lo.bin +3 -0
- bin/train.lo-vi.lo.idx +3 -0
- bin/train.lo-vi.vi.bin +3 -0
- bin/train.lo-vi.vi.idx +3 -0
- bin/valid.lo-vi.lo.bin +0 -0
- bin/valid.lo-vi.lo.idx +0 -0
- bin/valid.lo-vi.vi.bin +0 -0
- bin/valid.lo-vi.vi.idx +0 -0
- data/Dev/dev2023.lo +0 -0
- data/Dev/dev2023.vi +0 -0
- data/Train/train2023.lo +3 -0
- data/Train/train2023.vi +3 -0
- data/VLSP2023.TestSet/test_lo.txt +0 -0
- data/VLSP2023.TestSet/test_vi.txt +0 -0
- dev.spm.lo +0 -0
- dev.spm.vi +0 -0
- dict.txt +0 -0
- spm.model +0 -0
- spm.vocab +0 -0
- test.spm.lo +0 -0
- test.spm.vi +0 -0
- train.spm.lo +3 -0
- train.spm.vi +3 -0
.gitattributes
CHANGED
@@ -1,35 +1,10 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
1 |
+
# Auto detect text files and perform LF normalization
|
2 |
+
* text=auto
|
3 |
+
bin/train.lo-vi.lo.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
bin/train.lo-vi.lo.idx filter=lfs diff=lfs merge=lfs -text
|
5 |
+
bin/train.lo-vi.vi.bin filter=lfs diff=lfs merge=lfs -text
|
6 |
+
bin/train.lo-vi.vi.idx filter=lfs diff=lfs merge=lfs -text
|
7 |
+
data/Train/train2023.lo filter=lfs diff=lfs merge=lfs -text
|
8 |
+
data/Train/train2023.vi filter=lfs diff=lfs merge=lfs -text
|
9 |
+
train.spm.lo filter=lfs diff=lfs merge=lfs -text
|
10 |
+
train.spm.vi filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MIT License
|
2 |
+
|
3 |
+
Copyright (c) 2023 Nguyễn Duy Chiến
|
4 |
+
|
5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
+
of this software and associated documentation files (the "Software"), to deal
|
7 |
+
in the Software without restriction, including without limitation the rights
|
8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
+
copies of the Software, and to permit persons to whom the Software is
|
10 |
+
furnished to do so, subject to the following conditions:
|
11 |
+
|
12 |
+
The above copyright notice and this permission notice shall be included in all
|
13 |
+
copies or substantial portions of the Software.
|
14 |
+
|
15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
+
SOFTWARE.
|
Laos-Viet_Translator.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
bin/dict.lo.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
bin/dict.vi.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
bin/preprocess.log
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Namespace(no_progress_bar=False, log_interval=100, log_format=None, log_file=None, aim_repo=None, aim_run_hash=None, tensorboard_logdir=None, wandb_project=None, azureml_logging=False, seed=1, cpu=False, tpu=False, bf16=False, memory_efficient_bf16=False, fp16=False, memory_efficient_fp16=False, fp16_no_flatten_grads=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, on_cpu_convert_precision=False, min_loss_scale=0.0001, threshold_loss_scale=None, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, user_dir=None, empty_cache_freq=0, all_gather_list_size=16384, model_parallel_size=1, quantization_config_path=None, profile=False, reset_logging=False, suppress_crashes=False, use_plasma_view=False, plasma_path='/tmp/plasma', criterion='cross_entropy', tokenizer=None, bpe='sentencepiece', optimizer=None, lr_scheduler='fixed', scoring='bleu', task='translation', source_lang='lo', target_lang='vi', trainpref='train.spm', validpref='dev.spm', testpref='test.spm', align_suffix=None, destdir='bin', thresholdtgt=0, thresholdsrc=0, tgtdict=None, srcdict='dict.txt', nwordstgt=-1, nwordssrc=-1, alignfile=None, dataset_impl='mmap', joined_dictionary=True, only_source=False, padding_factor=8, workers=16, dict_only=False)
|
2 |
+
[lo] Dictionary: 18000 types
|
3 |
+
[lo] train.spm.lo: 100000 sents, 2766238 tokens, 0.158% replaced (by <unk>)
|
4 |
+
[lo] Dictionary: 18000 types
|
5 |
+
[lo] dev.spm.lo: 2000 sents, 52744 tokens, 0.209% replaced (by <unk>)
|
6 |
+
[lo] Dictionary: 18000 types
|
7 |
+
[lo] test.spm.lo: 1000 sents, 36046 tokens, 0.0694% replaced (by <unk>)
|
8 |
+
[vi] Dictionary: 18000 types
|
9 |
+
[vi] train.spm.vi: 100000 sents, 2839310 tokens, 0.309% replaced (by <unk>)
|
10 |
+
[vi] Dictionary: 18000 types
|
11 |
+
[vi] dev.spm.vi: 2000 sents, 52798 tokens, 0.331% replaced (by <unk>)
|
12 |
+
[vi] Dictionary: 18000 types
|
13 |
+
[vi] test.spm.vi: 1000 sents, 33513 tokens, 0.0627% replaced (by <unk>)
|
14 |
+
Wrote preprocessed data to bin
|
bin/test.lo-vi.lo.bin
ADDED
Binary file (72.1 kB). View file
|
|
bin/test.lo-vi.lo.idx
ADDED
Binary file (12 kB). View file
|
|
bin/test.lo-vi.vi.bin
ADDED
Binary file (67 kB). View file
|
|
bin/test.lo-vi.vi.idx
ADDED
Binary file (12 kB). View file
|
|
bin/train.lo-vi.lo.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b36a8600988233ebaae09916be22770de14474495704c25a4afb621df1b1a30e
|
3 |
+
size 5532476
|
bin/train.lo-vi.lo.idx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c25828ff1d74c3e5bdb9e04b323ef027b24ce69ebe8c50f76d95d8ac3147342a
|
3 |
+
size 1200026
|
bin/train.lo-vi.vi.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d3aca6e9fd6f3dc4a2274b868f79810bf1f52d19109b2ecbdc3bb92a2fc63a98
|
3 |
+
size 5678620
|
bin/train.lo-vi.vi.idx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:05b6e5d03f8eb98690c50a1da15d98ba20eafbe1b5b6b281c2ee199f4a0bf1ca
|
3 |
+
size 1200026
|
bin/valid.lo-vi.lo.bin
ADDED
Binary file (105 kB). View file
|
|
bin/valid.lo-vi.lo.idx
ADDED
Binary file (24 kB). View file
|
|
bin/valid.lo-vi.vi.bin
ADDED
Binary file (106 kB). View file
|
|
bin/valid.lo-vi.vi.idx
ADDED
Binary file (24 kB). View file
|
|
data/Dev/dev2023.lo
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/Dev/dev2023.vi
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/Train/train2023.lo
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:45510f4636d468886a65c5a2c4262c4360b985023daea805eed62468e42fa9e4
|
3 |
+
size 23144026
|
data/Train/train2023.vi
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:93796bfbb2ef00421a8e1424016d68e24a14a189527e9898a11e8602e2fbf42e
|
3 |
+
size 13555024
|
data/VLSP2023.TestSet/test_lo.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
data/VLSP2023.TestSet/test_vi.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
dev.spm.lo
ADDED
The diff for this file is too large to render.
See raw diff
|
|
dev.spm.vi
ADDED
The diff for this file is too large to render.
See raw diff
|
|
dict.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
spm.model
ADDED
Binary file (550 kB). View file
|
|
spm.vocab
ADDED
The diff for this file is too large to render.
See raw diff
|
|
test.spm.lo
ADDED
The diff for this file is too large to render.
See raw diff
|
|
test.spm.vi
ADDED
The diff for this file is too large to render.
See raw diff
|
|
train.spm.lo
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:efe724e87e83bc9a07e30bbb223956ae73f4c8218e077296ea9e111c375f7add
|
3 |
+
size 28296977
|
train.spm.vi
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:418ddda8dbd4b78ffe9d0b05f5d9c0ec20f6bb1057a90e1b1d665d761fb06eb4
|
3 |
+
size 20730665
|