MounikaAithagoni
commited on
Commit
β’
f609538
1
Parent(s):
0dec95d
Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
Arabic Translator: Machine Learning Model
|
2 |
This repository contains a machine learning model designed to translate text into Arabic. The model is trained on a custom dataset and fine-tuned to optimize translation accuracy while balancing training and validation performance.
|
3 |
|
4 |
-
π Overview
|
|
|
|
|
|
|
5 |
The model is built using deep learning techniques to translate text effectively. It was trained and validated using loss metrics to monitor performance over multiple epochs. The training process is visualized through loss curves that demonstrate learning progress and highlight overfitting challenges.
|
6 |
|
7 |
Key Features:
|
@@ -9,20 +20,21 @@ Language Support: Translates text into Arabic.
|
|
9 |
Model Architecture: Based on [model architecture used, e.g., Transformer, RNN, etc.].
|
10 |
Preprocessing: Includes tokenization and encoding steps for handling Arabic script.
|
11 |
Evaluation: Monitored with training and validation loss for consistent improvement.
|
|
|
|
|
|
|
12 |
π How to Use
|
|
|
|
|
13 |
Installation
|
14 |
Clone this repository:
|
15 |
-
|
16 |
-
Copy code
|
17 |
-
git clone https://huggingface.co/<your-username>/arabic-translator
|
18 |
cd arabic-translator
|
|
|
19 |
Install dependencies:
|
20 |
-
bash
|
21 |
-
Copy code
|
22 |
pip install -r requirements.txt
|
|
|
23 |
Model Inference
|
24 |
-
python
|
25 |
-
Copy code
|
26 |
from transformers import <ModelClass>, AutoTokenizer
|
27 |
|
28 |
# Load the model and tokenizer
|
@@ -35,11 +47,16 @@ inputs = tokenizer(text, return_tensors="pt")
|
|
35 |
outputs = model.generate(**inputs)
|
36 |
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
37 |
print(f"Translation: {translation}")
|
|
|
|
|
38 |
π§βπ» Training Details
|
39 |
Training Loss: Decreased steadily across epochs, indicating effective learning.
|
40 |
Validation Loss: Decreased initially but plateaued later, suggesting overfitting beyond epoch 5.
|
41 |
Epochs: Trained for 10 epochs with an early stopping mechanism.
|
|
|
|
|
42 |
π Dataset
|
|
|
43 |
The model was trained on a custom dataset tailored for Arabic translation. Preprocessing steps included:
|
44 |
|
45 |
Tokenizing and encoding text data.
|
@@ -49,9 +66,8 @@ For details on the dataset format, refer to the data/ folder.
|
|
49 |
π Evaluation
|
50 |
Metrics: Training and validation loss monitored.
|
51 |
Performance: Shows good initial generalization with validation loss increasing slightly after the 5th epoch, signaling overfitting.
|
|
|
|
|
52 |
π§ Future Improvements
|
53 |
Implement techniques to address overfitting, such as regularization or data augmentation.
|
54 |
-
Fine-tune on larger, more diverse datasets for better generalization.
|
55 |
-
π License
|
56 |
-
This project is licensed under the MIT License.
|
57 |
-
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- saillab/taco-datasets
|
5 |
+
language:
|
6 |
+
- ar
|
7 |
+
- en
|
8 |
+
---
|
9 |
Arabic Translator: Machine Learning Model
|
10 |
This repository contains a machine learning model designed to translate text into Arabic. The model is trained on a custom dataset and fine-tuned to optimize translation accuracy while balancing training and validation performance.
|
11 |
|
12 |
+
π Overview:
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
The model is built using deep learning techniques to translate text effectively. It was trained and validated using loss metrics to monitor performance over multiple epochs. The training process is visualized through loss curves that demonstrate learning progress and highlight overfitting challenges.
|
17 |
|
18 |
Key Features:
|
|
|
20 |
Model Architecture: Based on [model architecture used, e.g., Transformer, RNN, etc.].
|
21 |
Preprocessing: Includes tokenization and encoding steps for handling Arabic script.
|
22 |
Evaluation: Monitored with training and validation loss for consistent improvement.
|
23 |
+
|
24 |
+
|
25 |
+
|
26 |
π How to Use
|
27 |
+
|
28 |
+
|
29 |
Installation
|
30 |
Clone this repository:
|
31 |
+
git clone https://huggingface.co/MounikaAithagoni/Traanslator
|
|
|
|
|
32 |
cd arabic-translator
|
33 |
+
|
34 |
Install dependencies:
|
|
|
|
|
35 |
pip install -r requirements.txt
|
36 |
+
|
37 |
Model Inference
|
|
|
|
|
38 |
from transformers import <ModelClass>, AutoTokenizer
|
39 |
|
40 |
# Load the model and tokenizer
|
|
|
47 |
outputs = model.generate(**inputs)
|
48 |
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
49 |
print(f"Translation: {translation}")
|
50 |
+
|
51 |
+
|
52 |
π§βπ» Training Details
|
53 |
Training Loss: Decreased steadily across epochs, indicating effective learning.
|
54 |
Validation Loss: Decreased initially but plateaued later, suggesting overfitting beyond epoch 5.
|
55 |
Epochs: Trained for 10 epochs with an early stopping mechanism.
|
56 |
+
|
57 |
+
|
58 |
π Dataset
|
59 |
+
https://huggingface.co/datasets/saillab/taco-datasets/tree/main/multilingual-instruction-tuning-dataset%20/multilingual-alpaca-52k-gpt-4Links to an external site.
|
60 |
The model was trained on a custom dataset tailored for Arabic translation. Preprocessing steps included:
|
61 |
|
62 |
Tokenizing and encoding text data.
|
|
|
66 |
π Evaluation
|
67 |
Metrics: Training and validation loss monitored.
|
68 |
Performance: Shows good initial generalization with validation loss increasing slightly after the 5th epoch, signaling overfitting.
|
69 |
+
|
70 |
+
|
71 |
π§ Future Improvements
|
72 |
Implement techniques to address overfitting, such as regularization or data augmentation.
|
73 |
+
Fine-tune on larger, more diverse datasets for better generalization.
|
|
|
|
|
|