AiresPucrs
/

GRU-eng-por

Translation

Keras

TF-Keras

English

Portuguese

Model card Files Files and versions Community

dieineb commited on Dec 8, 2023

Commit

9c8ef39

•

1 Parent(s): 5bedb03

Update README.md

Browse files

Files changed (1) hide show

README.md +155 -11

README.md CHANGED Viewed

@@ -2,25 +2,169 @@
 library_name: keras
 tags:
 - translation
 ---
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
- ## Model Plot
-<details>
-<summary>View Model Plot</summary>
-![Model Image](./model.png)
-</details>

 library_name: keras
 tags:
 - translation
+license: apache-2.0
 ---
+# Model Description
+The GRU-eng-por model is used to translate English to Portuguese automatically.
+The model was trained with a traduction dataset.
+## Details
+- Size: 42,554,912 parameters
+- Dataset: [`English-to-Portuguese`](https://www.kaggle.com/datasets/nageshsingh/englishportuguese-translation)
+- Languages: English, Portuguese
+- Number of Training Steps: 15
+- Batch size: 32
+- Optimizer: rmsprop
+- Learning Rate: 0.001
+- GPU: T4
+- This repository has the source [code used](https://github.com/Nkluge-correa/teeny-tiny_castle/blob/master/ML%20Intro%20Course/16_sequence_to_sequence.ipynb) to train this model.
+## Usage
+```
+!pip install huggingface_hub["tensorflow"] -q
+from huggingface_hub import from_pretrained_keras
+from huggingface_hub import hf_hub_download
+import tensorflow as tf
+import numpy as np
+import string
+import re
+# Select characters to strip, but preserve the "[" and "]"
+strip_chars = string.punctuation
+strip_chars = strip_chars.replace("[", "")
+strip_chars = strip_chars.replace("]", "")
+def custom_standardization(input_string):
+    lowercase = tf.strings.lower(input_string)
+    return tf.strings.regex_replace(lowercase, f"[{re.escape(strip_chars)}]", "")
+# Load the `seq2seq_rnn` from the Hub
+seq2seq_rnn = from_pretrained_keras("AiresPucrs/GRU-eng-por")
+# Load the portuguese vocabulary
+portuguese_vocabulary_path = hf_hub_download(
+    repo_id="AiresPucrs/GRU-eng-por",
+    filename="portuguese_vocabulary.txt",
+    repo_type='model',
+    local_dir="./")
+# Load the english vocabulary
+english_vocabulary_path = hf_hub_download(
+    repo_id="AiresPucrs/GRU-eng-por",
+    filename="english_vocabulary.txt",
+    repo_type='model',
+    local_dir="./")
+with open(portuguese_vocabulary_path, encoding='utf-8',  errors='backslashreplace') as fp:
+    portuguese_vocab = [line.strip() for line in fp]
+    fp.close()
+with open(english_vocabulary_path, encoding='utf-8',  errors='backslashreplace') as fp:
+    english_vocab = [line.strip() for line in fp]
+    fp.close()
+# Initialize the vectorizers with the learned vocabularies
+target_vectorization = tf.keras.layers.TextVectorization(max_tokens=20000,
+                                        output_mode="int",
+                                        output_sequence_length=21,
+                                        standardize=custom_standardization,
+                                        vocabulary=portuguese_vocab)
+source_vectorization = tf.keras.layers.TextVectorization(max_tokens=20000,
+                                        output_mode="int",
+                                        output_sequence_length=20,
+                                        vocabulary=english_vocab)
+# Create a dictionary from `int`to portuguese words
+portuguese_index_lookup = dict(zip(range(len(portuguese_vocab)), portuguese_vocab))
+max_decoded_sentence_length = 20
+def decode_sequence(input_sentence):
+    """
+    Decodes a sequence using a trained seq2seq RNN model.
+    Args:
+        input_sentence (str): the input sentence to be decoded
+    Returns:
+        decoded_sentence (str): the decoded sentence
+            generated by the model
+    """
+    tokenized_input_sentence = source_vectorization([input_sentence])
+    decoded_sentence = "[start]"
+    for i in range(max_decoded_sentence_length):
+        tokenized_target_sentence = target_vectorization([decoded_sentence])
+        next_token_predictions = seq2seq_rnn.predict([tokenized_input_sentence, tokenized_target_sentence], verbose=0)
+        sampled_token_index = np.argmax(next_token_predictions[0, i, :])
+        sampled_token = portuguese_index_lookup[sampled_token_index]
+        decoded_sentence += " " + sampled_token
+        if sampled_token == "[end]":
+            break
+    return decoded_sentence
+eng_sentences =["What is its name?",
+                "How old are you?",
+                "I know you know where Mary is.",
+                "We will show Tom.",
+                "What do you all do?",
+                "Don't do it!"]
+for sentence in eng_sentences:
+    print(f"English sentence:\n{sentence}")
+    print(f'Portuguese translation:\n{decode_sequence(sentence)}')
+    print('-' * 50)
+```
+This will output the following:
+```
+English sentence:
+What is its name?
+Portuguese translation:
+[start] qual é o nome [end]
+--------------------------------------------------
+English sentence:
+How old are you?
+Portuguese translation:
+[start] quantos anos você tem [end]
+--------------------------------------------------
+English sentence:
+I know you know where Mary is.
+Portuguese translation:
+[start] eu sei que você sabe onde maria está [end]
+--------------------------------------------------
+English sentence:
+We will show Tom.
+Portuguese translation:
+[start] nós vamos tom [end]
+--------------------------------------------------
+English sentence:
+What do you all do?
+Portuguese translation:
+[start] o que vocês faz [end]
+--------------------------------------------------
+English sentence:
+Don't do it!
+Portuguese translation:
+[start] não faça isso [end]
+--------------------------------------------------
+```
+# Cite as 🤗
+```
+@misc{teenytinycastle,
+    doi = {10.5281/zenodo.7112065},
+    url = {https://huggingface.co/AiresPucrs/GRU-eng-por},
+    author = {Nicholas Kluge Corr{\^e}a},
+    title = {Teeny-Tiny Castle},
+    year = {2023},
+    publisher = {HuggingFace},
+    journal = {HuggingFace repository},
+}
+```
+## License
+The GRU-eng-por is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.