spolivin
/

lang-recogn-model

Text Classification

Inference Endpoints

Model card Files Files and versions Community

lang-recogn-model / README.md

spolivin's picture

Update README.md

f7c5de7 8 months ago

|

No virus

1.89 kB

	---
	base_model: bert-base-multilingual-uncased
	model-index:
	- name: lang-recogn-model
	results:
	- task:
	type: text-classification
	dataset:
	name: language-detection
	type: language-detection
	metrics:
	- name: accuracy
	type: accuracy
	value: 0.9836
	source:
	name: Language recognition using BERT
	url: >-
	https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert
	language:
	- ar
	- da
	- nl
	- en
	- fr
	- de
	- el
	- hi
	- it
	- kn
	- ml
	- pt
	- ru
	- es
	- sv
	- ta
	- tr
	pipeline_tag: text-classification
	widget:
	- text: "Hello, world"
	example_title: "English language"
	- text: "Ik heb het al gezien"
	example_title: "Dutch language"
	---

	# Language Detection Model

	The model presented in the following repository represents a fine-tuned version of `BertForSequenceClassification`
	pretrained on [multilingual texts](https://huggingface.co/bert-base-multilingual-uncased).

	## Training/fine-tuning

	The model has been fine-tuned based on [Language Detection](https://www.kaggle.com/datasets/basilb2s/language-detection)
	dataset found on Kaggle. The entire process of the dataset analysis as well as a complete description of the training procedure
	can be found in [one of my Kaggle notebooks](https://www.kaggle.com/code/sergeypolivin/language-recognition-using-bert)
	which has been used for the purpose of a faster model training on GPU.

	## Supported languages

	The model has been fine-tuned to detect one of the following 17 languages:

	- Arabic
	- Danish
	- Dutch
	- English
	- French
	- German
	- Greek
	- Hindi
	- Italian
	- Kannada
	- Malayalam
	- Portugeese
	- Russian
	- Spanish
	- Sweedish
	- Tamil
	- Turkish

	## References

	1. [BERT multilingual base model (uncased)](https://huggingface.co/bert-base-multilingual-uncased)
	2. [Language Detection Dataset](https://www.kaggle.com/datasets/basilb2s/language-detection)