Update README.md
Browse files
README.md
CHANGED
@@ -24,11 +24,11 @@ Enhanced version from version 1.0 with larger dataset.
|
|
24 |
### Default
|
25 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
26 |
Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
|
27 |
-
|
28 |
```python
|
29 |
-
with open('
|
30 |
map = json.load(f)
|
31 |
global map
|
|
|
32 |
def mapping(text):
|
33 |
for i in text:
|
34 |
try:
|
@@ -37,10 +37,23 @@ def mapping(text):
|
|
37 |
except:
|
38 |
continue
|
39 |
return text.strip()
|
40 |
-
```
|
41 |
|
|
|
|
|
42 |
Step 2: Load model and generate
|
|
|
|
|
|
|
|
|
|
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
## Training Data
|
45 |
|
46 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
24 |
### Default
|
25 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
26 |
Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
|
|
|
27 |
```python
|
28 |
+
with open('map.json', encoding = 'utf-8') as f:
|
29 |
map = json.load(f)
|
30 |
global map
|
31 |
+
|
32 |
def mapping(text):
|
33 |
for i in text:
|
34 |
try:
|
|
|
37 |
except:
|
38 |
continue
|
39 |
return text.strip()
|
|
|
40 |
|
41 |
+
input_text = mapping('“ 早就知道叶微情是卧底了,于是将计就计,想要趁机嫁祸。 ” 的正确证物是:')
|
42 |
+
```
|
43 |
Step 2: Load model and generate
|
44 |
+
```python
|
45 |
+
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
46 |
+
|
47 |
+
model = T5ForConditionalGeneration.from_pretrained('haruyuu/viT5_han-vie_v1.1')
|
48 |
+
tokenizer = T5Tokenizer.from_pretrained('haruyuu/viT5_han-vie_v1.1')
|
49 |
|
50 |
+
input_ids = tokenizer.encode(input_text, return_tensors="pt")
|
51 |
+
translated_ids = model.generate(input_ids)
|
52 |
+
translated_text = tokenizer.decode(translated_ids[0], skip_special_tokens=True)
|
53 |
+
|
54 |
+
print("Chinese Input:", input_text)
|
55 |
+
print("\nVietnamese Translation:", translated_text)
|
56 |
+
```
|
57 |
## Training Data
|
58 |
|
59 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|