--- base_model: bert-base-multilingual-uncased model-index: - name: lang-recogn-model results: - task: type: text-classification dataset: name: ai2_arc type: ai2_arc metrics: - name: accuracy type: accuracy value: 0.9836 source: name: Language Detection Dataset url: https://www.kaggle.com/datasets/basilb2s/language-detection --- # Language Detection Model The model presented in the following repository represents a fine-tuned version of `BertModelForSequenceClassification` pretrained on [multilingual texts](https://huggingface.co/bert-base-multilingual-uncased). ## Training/fine-tuning The model has been fine-tuned based on [Language Detection](https://www.kaggle.com/datasets/basilb2s/language-detection) dataset found on *Kaggle*. The entire process of the dataset analysis as well as a complete description of the training procedure can be found in [one of my *Kaggle* notebooks](https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert) which has been used for the purpose of a faster model training on *GPU*. ## Supported languages The model has been fine-tuned to detect one of the following 17 languages: - Arabic - Danish - Dutch - English - French - German - Greek - Hindi - Italian - Kannada - Malayalam - Portugeese - Russian - Spanish - Sweedish - Tamil - Turkish ## References 1. [BERT multilingual base model (uncased)](https://huggingface.co/bert-base-multilingual-uncased) 2. [Language Detection Dataset](https://www.kaggle.com/datasets/basilb2s/language-detection)