Model Card for Model ID
This model is a fine-tuned checkpoint of mBART-large-50. facebook/mbart-large-50-many-to-many-mmt is fine-tuned for selecting the most suitable geonames (relocated cities). The model can select the most suitable city names from relocated countries ('Armenia', 'Belarus', 'Kyrgyzstan', 'Kazakhstan', 'Russia', 'Serbia', 'Turkey'), if you entered the name incorrectly.
Model Details
Uses
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
# model and tokenizer loading
output_model_path = "EldarKerimkhan/mbart-large-50-many-to-many-mmt.geonames_RU_RELOCATION"
model = MBartForConditionalGeneration.from_pretrained(output_model_path)
tokenizer = MBart50TokenizerFast.from_pretrained(output_model_path)
# select the most suitable city names
city = 'Масква'
city_tokens = tokenizer(city, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(city_tokens.input_ids)
output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(output_str)
['Moscow']
Training Details
It was trained on 4 epochs and about 47000 city names. (but for better result, you shoud use more than 12 epochs)
Training Data
Used table cities15000 from http://download.geonames.org/export/dump/ and created augmentations of alternatenames.
Languages covered
Arabic (ar_AR), Czech (cs_CZ), German (de_DE), English (en_XX), Spanish (es_XX), Estonian (et_EE), Finnish (fi_FI), French (fr_XX), Gujarati (gu_IN), Hindi (hi_IN), Italian (it_IT), Japanese (ja_XX), Kazakh (kk_KZ), Korean (ko_KR), Lithuanian (lt_LT), Latvian (lv_LV), Burmese (my_MM), Nepali (ne_NP), Dutch (nl_XX), Romanian (ro_RO), Russian (ru_RU), Sinhala (si_LK), Turkish (tr_TR), Vietnamese (vi_VN), Chinese (zh_CN), Afrikaans (af_ZA), Azerbaijani (az_AZ), Bengali (bn_IN), Persian (fa_IR), Hebrew (he_IL), Croatian (hr_HR), Indonesian (id_ID), Georgian (ka_GE), Khmer (km_KH), Macedonian (mk_MK), Malayalam (ml_IN), Mongolian (mn_MN), Marathi (mr_IN), Polish (pl_PL), Pashto (ps_AF), Portuguese (pt_XX), Swedish (sv_SE), Swahili (sw_KE), Tamil (ta_IN), Telugu (te_IN), Thai (th_TH), Tagalog (tl_XX), Ukrainian (uk_UA), Urdu (ur_PK), Xhosa (xh_ZA), Galician (gl_ES), Slovene (sl_SI)
- Downloads last month
- 9