haruyuu
/

viT5_han-vie_v1.1

text2text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

haruyuu commited on Oct 11, 2023

Commit

5cc0c75

·

1 Parent(s): 1943611

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -24,11 +24,11 @@ Enhanced version from version 1.0 with larger dataset.
 ### Default
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
 ```python
-with open('/kaggle/input/chingchongdingdong/map.json', encoding = 'utf-8') as f:
     map = json.load(f)
 global map
 def mapping(text):
     for i in text:
         try:
@@ -37,10 +37,23 @@ def mapping(text):
         except:
             continue
     return text.strip()
-```
 Step 2: Load model and generate
 ## Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

 ### Default
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
 ```python
+with open('map.json', encoding = 'utf-8') as f:
     map = json.load(f)
 global map
 def mapping(text):
     for i in text:
         try:
         except:
             continue
     return text.strip()
+input_text = mapping('“ 早就知道叶微情是卧底了，于是将计就计，想要趁机嫁祸。 ” 的正确证物是：')
+```
 Step 2: Load model and generate
+```python
+from transformers import T5ForConditionalGeneration, T5Tokenizer
+model = T5ForConditionalGeneration.from_pretrained('haruyuu/viT5_han-vie_v1.1')
+tokenizer = T5Tokenizer.from_pretrained('haruyuu/viT5_han-vie_v1.1')
+input_ids = tokenizer.encode(input_text, return_tensors="pt")
+translated_ids = model.generate(input_ids)
+translated_text = tokenizer.decode(translated_ids[0], skip_special_tokens=True)
+print("Chinese Input:", input_text)
+print("\nVietnamese Translation:", translated_text)
+```
 ## Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->