# English to Hindi Text Translation using Transformers This project showcases a simple text translation model that translates English text to Hindi using the Hugging Face Transformers library. The model utilizes pre-trained sequence-to-sequence architecture for accurate and efficient translation. ## Table of Contents - [Project Overview](#project-overview) - [Installation](#installation) - [Usage](#usage) - [Model Training and Dataset](#model-training-and-dataset) - [Model Testing and Deployment](#model-testing-and-deployment) - [User Interface](#user-interface) - [Challenges Faced](#challenges-faced) - [Contributions](#contributions) ## Project Overview Text translation is an essential task in natural language processing, and this project aims to provide a practical example of building and deploying a translation model. The project covers the following aspects: - Data preprocessing: Tokenization and dataset preparation. - Model training: Training a sequence-to-sequence model for English-to-Hindi translation. - Model testing: Translating text using the trained model. - User interface: Creating a user-friendly interface for text translation. ## Installation To run this project, you'll need the following dependencies: - Python 3.x - TensorFlow - Hugging Face Transformers - Datasets library - Gradio You can install the required libraries using the following shell command: ```shell pip install datasets transformers[sentencepiece] tensorflow gradio -q ``` ## Usage Checkout the app [here](https://huggingface.co/spaces/Lohith9923/En-Hi-Translation) where you need to give english sentences or text in input textbox and output is translated text or sentence in Hindi. You can see some examples down for checking. ## Model Training and Dataset For training the text translation model. You can checkout the pre-trained model from [here](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2FHelsinki-NLP%2Fopus-mt-en-hi) and Dataset from [here](https://huggingface.co/datasets/cfilt/iitb-english-hindi/viewer/cfilt--iitb-english-hindi). - First Download the pre-trained model using **transformers** library in python. - Load the Dataset **cfilt/iitb-english-hindi** using **Datasets** library in python. - Initialized the model, tokenizer, and preprocessing function. - Tokenized the dataset and prepared the training and validation data. - Compiled the model with the optimizer(**Adam**) with required parameters. - Trained the model for the desired number of epochs. ## Model Testing and Deployment To test the trained model and deploy a user interface: - Saved the trained model at a preferred location. - Loaded the model from the location and tokenizer for testing. - Translated sample input text using the model. - Deployed a Gradio interface for user-friendly translation. ## User Interface The Gradio interface provides an interactive way to translate English text to Hindi. To use the interface: - Run the project and navigate to the specified URL. - Enter English text in the input box. - Checkout the translated Hindi text in the output box. ## Challenges Faced - Surfed through lot of resources in google and other platforms for best dataset for my project. - Spent a lot of time gathering the correct resources for understanding about transformers, LLM's and gradio. ## Contributions Contributions to this project are welcome! Here are some ways you can contribute: - Improve the model's translation quality and performance. - Enhance the user interface for a better user experience. - Add support for more languages and translation directions. To contribute, follow these steps: - Fork this repository. - Create a new branch for your feature or bug fix. - Commit your changes and push them to your fork. - Open a pull request with a detailed description of your changes.