lang-recogn-model / README.md
spolivin's picture
Update README.md
f7c5de7
|
raw
history blame
No virus
1.89 kB
metadata
base_model: bert-base-multilingual-uncased
model-index:
  - name: lang-recogn-model
    results:
      - task:
          type: text-classification
        dataset:
          name: language-detection
          type: language-detection
        metrics:
          - name: accuracy
            type: accuracy
            value: 0.9836
        source:
          name: Language recognition using BERT
          url: >-
            https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert
language:
  - ar
  - da
  - nl
  - en
  - fr
  - de
  - el
  - hi
  - it
  - kn
  - ml
  - pt
  - ru
  - es
  - sv
  - ta
  - tr
pipeline_tag: text-classification
widget:
  - text: Hello, world
    example_title: English language
  - text: Ik heb het al gezien
    example_title: Dutch language

Language Detection Model

The model presented in the following repository represents a fine-tuned version of BertForSequenceClassification pretrained on multilingual texts.

Training/fine-tuning

The model has been fine-tuned based on Language Detection dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure can be found in one of my Kaggle notebooks which has been used for the purpose of a faster model training on GPU.

Supported languages

The model has been fine-tuned to detect one of the following 17 languages:

  • Arabic
  • Danish
  • Dutch
  • English
  • French
  • German
  • Greek
  • Hindi
  • Italian
  • Kannada
  • Malayalam
  • Portugeese
  • Russian
  • Spanish
  • Sweedish
  • Tamil
  • Turkish

References

  1. BERT multilingual base model (uncased)
  2. Language Detection Dataset