File size: 2,526 Bytes
f30b7f9
 
c9e09fc
 
 
 
 
 
 
 
f30b7f9
1df277c
 
 
 
 
 
c034279
1df277c
 
 
 
 
 
 
 
 
cb857f8
1df277c
 
 
cb857f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: mit
datasets:
- HariprasathSB/tamil_summarization
language:
- en
- ta
tags:
- summarization
- translation
---
# Tamil Summarization and English-to-Tamil Translation Model

## Overview
This repository contains a fine-tuned model for both Tamil summarization and English-to-Tamil translation. The model was fine-tuned using the Hugging Face Transformers library. This README provides information on how to use the model and its capabilities.

## Model Details
- **Model Name**: [suriya7/Tamil-Summarization]
- **Model Type**: [Summarization , Translation]
- **Framework**: Hugging Face Transformers
- **Original Model**: [Mr-Vicky-01/Fine_tune_english_to_tamil](Mr-Vicky-01/Fine_tune_english_to_tamil)
- **Fine-tuning Dataset**: [HariprasathSB/tamil_summarization](https://huggingface.co/datasets/HariprasathSB/tamil_summarization)
- **Languages Supported**: English, Tamil
## Model Performance
![W&B Chart 23_3_2024, 11_46_59 pm.png](https://cdn-uploads.huggingface.co/production/uploads/65ae9249e50627e40c159b16/82PwF19H9V9o1CVoYuuJo.png)
## Usage
### Installation

You can install the necessary dependencies using pip:

```bash
pip install transformers
```

## Inference

Below is an example of how to use the model for both summarization and translation tasks:
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("suriya7/Tamil-Summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("suriya7/Tamil-Summarization")

- **Example English-to-Tamil Translation**
input_text = "This is an example English sentence."
input_ids = tokenizer.encode(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids,max_length=128)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Translated Tamil Sentence:", translated_text)

- **Example Tamil Summarization**
tamil_article = "தமிழ் உரையினை சுருக்கமாக சுருக்கமாக உரையிடுவது எப்படி?"
tamil_input_ids = tokenizer.encode(tamil_article, return_tensors="pt",truncation=True).input_ids
summary_ids = model.generate(tamil_input_ids, max_length=128)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summarized Tamil Text:", summary)
```
## Model Output
- **For translation tasks, the model outputs translated text in Tamil.**
- **For summarization tasks, the model outputs a summarized version of the input Tamil text.**