--- license: apache-2.0 --- ## Dataset NEWS2018 DATASET_04, Task ID: M-EnHi http://workshop.colips.org/news2018/dataset.html ## Notebooks - `xmltodict.ipynb` contains the code to convert the `xml` files to `json` for training - `training_script.ipynb` contains the code for training and inference. It is a modified version of https://github.com/AI4Bharat/IndianNLP-Transliteration/blob/master/NoteBooks/Xlit_TrainingSetup_condensed.ipynb ## Predictions `pred_test.json` contains top-10 predictions on the validation set of the dataset ## Evaluation Scores on validation set TOP 10 SCORES FOR 1000 SAMPLES: |Metrics | Score | | ----------- | ----------- | |ACC: | 0.703000| |Mean F-score:| 0.949289| |MRR: | 0.486549| |MAP_ref: | 0.381000| TOP 5 SCORES FOR 1000 SAMPLES: |Metrics | Score | | ----------- | ----------- | |ACC: |0.621000| |Mean F-score: |0.937985| |MRR: |0.475033| |MAP_ref: |0.381000| TOP 3 SCORES FOR 1000 SAMPLES: |Metrics | Score | | ----------- | ----------- | |ACC: |0.560000| |Mean F-score: |0.927025| |MRR: |0.461333| |MAP_ref: |0.381000| TOP 2 SCORES FOR 1000 SAMPLES: |Metrics | Score | | ----------- | ----------- | |ACC: | 0.502000| |Mean F-score: | 0.913697| |MRR: | 0.442000| |MAP_ref: | 0.381000| TOP 1 SCORES FOR 1000 SAMPLES: |Metrics | Score | | ----------- | ----------- | |ACC: | 0.382000| |Mean F-score: | 0.881272| |MRR: | 0.382000| |MAP_ref: | 0.380500|