spolivin
/

lang-recogn-model

Text Classification

Model card Files Files and versions Community

lang-recogn-model / README.md

spolivin's picture

Update README.md

a0ff89e over 1 year ago

|

history blame contribute delete

2.98 kB

	---
	base_model: bert-base-multilingual-uncased
	# model-index:
	# - name: lang-recogn-model
	# results:
	# - task:
	# type: text-classification
	# dataset:
	# name: language-detection
	# type: language-detection
	# metrics:
	# - name: accuracy
	# type: accuracy
	# value: 0.9836
	# source:
	# name: Language recognition using BERT
	# url: >-
	# https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert
	language:
	- ar
	- da
	- nl
	- en
	- fr
	- de
	- el
	- hi
	- it
	- kn
	- ml
	- pt
	- ru
	- es
	- sv
	- ta
	- tr
	pipeline_tag: text-classification
	widget:
	- text: "I have seen it somewhere..."
	example_title: "English"
	- text: "Ik heb het al gezien"
	example_title: "Dutch"
	- text: "Интересная идея"
	example_title: "Russian"
	- text: "Que vamos a hacer?"
	example_title: "Spanish"
	- text: "Hvor er der en pengeautomat?"
	example_title: "Danish"
	- text: "إنه مشوق جدا"
	example_title: "Arabic"
	- text: "Es ist sehr interessant"
	example_title: "German"
	- text: "c'est très intéressant"
	example_title: "French"
	- text: "Non ho mai visto una tale bellezza"
	example_title: "Italian"
	- text: "Jag har aldrig sett en sådan skönhet"
	example_title: "Swedish"
	- text: "Böyle bir güzellik görmedim"
	example_title: "Turkish"
	- text: "ಅದ್ಭುತ ಕಲ್ಪನೆ"
	example_title: "Kannada"
	- text: "அற்புதமான யோசனை"
	example_title: "Tamil"
	- text: "Υπέροχη ιδέα"
	example_title: "Greek"
	- text: "Eu nunca estive aqui"
	example_title: "Portugeese"
	- text: "मैं यहां कभी नहीं गया"
	example_title: "Hindi"
	- text: "ഞാൻ ഇവിടെ പോയിട്ടില്ല"
	example_title: "Malayam"

	license: mit
	---

	# Language Detection Model

	The model presented in the following repository represents a fine-tuned version of `BertForSequenceClassification`
	pretrained on [multilingual texts](https://huggingface.co/bert-base-multilingual-uncased).

	## Training/fine-tuning

	The model has been fine-tuned based on [Language Detection](https://www.kaggle.com/datasets/basilb2s/language-detection)
	dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure
	can be found in [one of my Kaggle notebooks](https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert)
	which has been used for the purpose of a faster model training on GPU.

	## Supported languages

	The model has been fine-tuned to detect one of the following 17 languages:

	- Arabic
	- Danish
	- Dutch
	- English
	- French
	- German
	- Greek
	- Hindi
	- Italian
	- Kannada
	- Malayalam
	- Portugeese
	- Russian
	- Spanish
	- Sweedish
	- Tamil
	- Turkish

	## References

	1. [BERT multilingual base model (uncased)](https://huggingface.co/bert-base-multilingual-uncased)
	2. [Language Detection Dataset](https://www.kaggle.com/datasets/basilb2s/language-detection)