haruyuu commited on
Commit
5cc0c75
·
1 Parent(s): 1943611

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -24,11 +24,11 @@ Enhanced version from version 1.0 with larger dataset.
24
  ### Default
25
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
26
  Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
27
-
28
  ```python
29
- with open('/kaggle/input/chingchongdingdong/map.json', encoding = 'utf-8') as f:
30
  map = json.load(f)
31
  global map
 
32
  def mapping(text):
33
  for i in text:
34
  try:
@@ -37,10 +37,23 @@ def mapping(text):
37
  except:
38
  continue
39
  return text.strip()
40
- ```
41
 
 
 
42
  Step 2: Load model and generate
 
 
 
 
 
43
 
 
 
 
 
 
 
 
44
  ## Training Data
45
 
46
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
24
  ### Default
25
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
26
  Step 1: Map all Chinese word from original text to Sino-Vietnamese with [map.json](https://huggingface.co/haruyuu/viT5_han-vie_v1.1/blob/main/map.json) file
 
27
  ```python
28
+ with open('map.json', encoding = 'utf-8') as f:
29
  map = json.load(f)
30
  global map
31
+
32
  def mapping(text):
33
  for i in text:
34
  try:
 
37
  except:
38
  continue
39
  return text.strip()
 
40
 
41
+ input_text = mapping('“ 早就知道叶微情是卧底了,于是将计就计,想要趁机嫁祸。 ” 的正确证物是:')
42
+ ```
43
  Step 2: Load model and generate
44
+ ```python
45
+ from transformers import T5ForConditionalGeneration, T5Tokenizer
46
+
47
+ model = T5ForConditionalGeneration.from_pretrained('haruyuu/viT5_han-vie_v1.1')
48
+ tokenizer = T5Tokenizer.from_pretrained('haruyuu/viT5_han-vie_v1.1')
49
 
50
+ input_ids = tokenizer.encode(input_text, return_tensors="pt")
51
+ translated_ids = model.generate(input_ids)
52
+ translated_text = tokenizer.decode(translated_ids[0], skip_special_tokens=True)
53
+
54
+ print("Chinese Input:", input_text)
55
+ print("\nVietnamese Translation:", translated_text)
56
+ ```
57
  ## Training Data
58
 
59
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->