You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Language Detection Model

image

This repository contains a PyTorch-based model for language identification using multiple language detection methods. It combines predictions from various language detection libraries and models to determine the most probable language for a given text input.

Overview

The LanguageIdentificationModel integrates scores from several language detection methods:

LangDetect: Language detection using the langdetect library. LangID: Language identification with the langid library. Hugging Face: Language classification using the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers. FastText: Language prediction using the lid.176.bin model from FastText. These methods are integrated into a PyTorch neural network model, enabling accurate language identification across various text inputs.


language:

  • en
  • ar
  • fr
  • es
  • pt
  • ja
  • it
  • de
  • ru
  • zh metrics:
  • accuracy
  • code_eval library_name: transformers pipeline_tag: text-classification

Installation

Clone this repository: git clone https://github.com/jasserchtourou/Language_detection.git cd Language_detection

Install the required dependencies: pip install -r requirements.txt

Download the necessary models: Download the papluca/xlm-roberta-base-language-detection model from Hugging Face Transformers. Download the lid.176.bin model from FastText and place it in the repository.

Usage

Training the Model To train the model, customize the input_size, hidden_size, and output_size in train.py based on your data and run: python train.py

Using the Model

After training, you can use the model for language identification: from language_identification_model import LanguageIdentificationModel

model = LanguageIdentificationModel(input_size, hidden_size, output_size)
model.load_state_dict(torch.load('model_weights.bin'))
model.eval()
text = "Bonjour tout le monde"
language = model.predict(text)
print(f"The identified language is: {language}")

Contributing

Contributions are welcome! If you have any suggestions, improvements, or bug fixes, please submit a pull request or open an issue.

License

This project is licensed under the MIT License - see the LICENSE file for details.

image

Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.