Edit model card

Model Card for Model ID

This model is a fine-tuned checkpoint of mBART-large-50. facebook/mbart-large-50-many-to-many-mmt is fine-tuned for selecting the most suitable geonames (relocated cities). The model can select the most suitable city names from relocated countries ('Armenia', 'Belarus', 'Kyrgyzstan', 'Kazakhstan', 'Russia', 'Serbia', 'Turkey'), if you entered the name incorrectly.

Model Details

Uses

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

# model and tokenizer loading
output_model_path = "EldarKerimkhan/mbart-large-50-many-to-many-mmt.geonames_RU_RELOCATION"
model = MBartForConditionalGeneration.from_pretrained(output_model_path)
tokenizer = MBart50TokenizerFast.from_pretrained(output_model_path)

# select the most suitable city names

city = 'Масква'

city_tokens = tokenizer(city, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(city_tokens.input_ids)
output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

print(output_str)
['Moscow']

Training Details

It was trained on 4 epochs and about 47000 city names. (but for better result, you shoud use more than 12 epochs)

Training Data

Used table cities15000 from http://download.geonames.org/export/dump/ and created augmentations of alternatenames.

Languages covered

Arabic (ar_AR), Czech (cs_CZ), German (de_DE), English (en_XX), Spanish (es_XX), Estonian (et_EE), Finnish (fi_FI), French (fr_XX), Gujarati (gu_IN), Hindi (hi_IN), Italian (it_IT), Japanese (ja_XX), Kazakh (kk_KZ), Korean (ko_KR), Lithuanian (lt_LT), Latvian (lv_LV), Burmese (my_MM), Nepali (ne_NP), Dutch (nl_XX), Romanian (ro_RO), Russian (ru_RU), Sinhala (si_LK), Turkish (tr_TR), Vietnamese (vi_VN), Chinese (zh_CN), Afrikaans (af_ZA), Azerbaijani (az_AZ), Bengali (bn_IN), Persian (fa_IR), Hebrew (he_IL), Croatian (hr_HR), Indonesian (id_ID), Georgian (ka_GE), Khmer (km_KH), Macedonian (mk_MK), Malayalam (ml_IN), Mongolian (mn_MN), Marathi (mr_IN), Polish (pl_PL), Pashto (ps_AF), Portuguese (pt_XX), Swedish (sv_SE), Swahili (sw_KE), Tamil (ta_IN), Telugu (te_IN), Thai (th_TH), Tagalog (tl_XX), Ukrainian (uk_UA), Urdu (ur_PK), Xhosa (xh_ZA), Galician (gl_ES), Slovene (sl_SI)

Downloads last month
9