Spaces:

Lohith9923
/

Text-Translation

Runtime error

App Files Files Community

Text-Translation / README.md

Lohith9923

Update README.md

35e6f3b about 1 year ago

preview code

raw

history blame

3.73 kB

	# English to Hindi Text Translation using Transformers

	This project showcases a simple text translation model that translates English text to Hindi using the Hugging Face Transformers library. The model utilizes pre-trained sequence-to-sequence architecture for accurate and efficient translation.

	## Table of Contents

	- [Project Overview](#project-overview)
	- [Installation](#installation)
	- [Usage](#usage)
	- [Model Training and Dataset](#model-training-and-dataset)
	- [Model Testing and Deployment](#model-testing-and-deployment)
	- [User Interface](#user-interface)
	- [Challenges Faced](#challenges-faced)
	- [Contributions](#contributions)

	## Project Overview

	Text translation is an essential task in natural language processing, and this project aims to provide a practical example of building and deploying a translation model. The project covers the following aspects:

	- Data preprocessing: Tokenization and dataset preparation.
	- Model training: Training a sequence-to-sequence model for English-to-Hindi translation.
	- Model testing: Translating text using the trained model.
	- User interface: Creating a user-friendly interface for text translation.

	## Installation

	To run this project, you'll need the following dependencies:

	- Python 3.x
	- TensorFlow
	- Hugging Face Transformers
	- Datasets library
	- Gradio

	You can install the required libraries using the following shell command:

	```shell
	pip install datasets transformers[sentencepiece] tensorflow gradio -q
	```

	## Usage
	Download the folder from here and the run the following command

	```shell
	python3 app.py
	```
	After running this command
	## Model Training and Dataset
	For training the text translation model.
	You can checkout the pre-trained model from [here](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2FHelsinki-NLP%2Fopus-mt-en-hi) and Dataset from [here](https://huggingface.co/datasets/cfilt/iitb-english-hindi/viewer/cfilt--iitb-english-hindi).
	- First Download the pre-trained model using transformers library in python.
	- Load the Dataset cfilt/iitb-english-hindi using Datasets library in python.
	- Initialized the model, tokenizer, and preprocessing function.
	- Tokenized the dataset and prepared the training and validation data.
	- Compiled the model with the optimizer(Adam) with required parameters.
	- Trained the model for the desired number of epochs.

	## Model Testing and Deployment
	To test the trained model and deploy a user interface:

	- Saved the trained model at a preferred location.
	- Loaded the model from the location and tokenizer for testing.
	- Translated sample input text using the model.
	- Deployed a Gradio interface for user-friendly translation.

	## User Interface

	The Gradio interface provides an interactive way to translate English text to Hindi. To use the interface:

	- Run the project and navigate to the specified URL.
	- Enter English text in the input box.
	- Checkout the translated Hindi text in the output box.

	## Challenges Faced

	- Surfed through lot of resources in google and other platforms for best dataset for my project.
	- Spent a lot of time gathering the correct resources for understanding about transformers, LLM's and gradio.

	## Contributions
	Contributions to this project are welcome! Here are some ways you can contribute:

	- Improve the model's translation quality and performance.
	- Enhance the user interface for a better user experience.
	- Add support for more languages and translation directions.

	To contribute, follow these steps:

	- Fork this repository.
	- Create a new branch for your feature or bug fix.
	- Commit your changes and push them to your fork.
	- Open a pull request with a detailed description of your changes.