En-Hi-Translation / project.md
Lohith9923's picture
Update project.md
b9f6458

English to Hindi Text Translation using Transformers

This project showcases a simple text translation model that translates English text to Hindi using the Hugging Face Transformers library. The model utilizes pre-trained sequence-to-sequence architecture for accurate and efficient translation.

Table of Contents

Project Overview

Text translation is an essential task in natural language processing, and this project aims to provide a practical example of building and deploying a translation model. The project covers the following aspects:

  • Data preprocessing: Tokenization and dataset preparation.
  • Model training: Training a sequence-to-sequence model for English-to-Hindi translation.
  • Model testing: Translating text using the trained model.
  • User interface: Creating a user-friendly interface for text translation.

Installation

To run this project, you'll need the following dependencies:

  • Python 3.x
  • TensorFlow
  • Hugging Face Transformers
  • Datasets library
  • Gradio

You can install the required libraries using the following shell command:

pip install datasets transformers[sentencepiece] tensorflow gradio -q

Usage

Checkout the app here where you need to give english sentences or text in input textbox and output is translated text or sentence in Hindi. You can see some examples down for checking.

Model Training and Dataset

For training the text translation model. You can checkout the pre-trained model from here and Dataset from here.

  • First Download the pre-trained model using transformers library in python.
  • Load the Dataset cfilt/iitb-english-hindi using Datasets library in python.
  • Initialized the model, tokenizer, and preprocessing function.
  • Tokenized the dataset and prepared the training and validation data.
  • Compiled the model with the optimizer(Adam) with required parameters.
  • Trained the model for the desired number of epochs.

Model Testing and Deployment

To test the trained model and deploy a user interface:

  • Saved the trained model at a preferred location.
  • Loaded the model from the location and tokenizer for testing.
  • Translated sample input text using the model.
  • Deployed a Gradio interface for user-friendly translation.

User Interface

The Gradio interface provides an interactive way to translate English text to Hindi. To use the interface:

  • Run the project and navigate to the specified URL.
  • Enter English text in the input box.
  • Checkout the translated Hindi text in the output box.

Challenges Faced

  • Surfed through lot of resources in google and other platforms for best dataset for my project.
  • Spent a lot of time gathering the correct resources for understanding about transformers, LLM's and gradio.

Contributions

Contributions to this project are welcome! Here are some ways you can contribute:

  • Improve the model's translation quality and performance.
  • Enhance the user interface for a better user experience.
  • Add support for more languages and translation directions.

To contribute, follow these steps:

  • Fork this repository.
  • Create a new branch for your feature or bug fix.
  • Commit your changes and push them to your fork.
  • Open a pull request with a detailed description of your changes.