nhanv commited on
Commit
5a84ae8
1 Parent(s): f87531f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -25
README.md CHANGED
@@ -18,6 +18,24 @@ You can download trained model:
18
 
19
  **[BERT](https://github.com/google-research/bert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  # Vietnamese toolkit with bert
23
  ViNLP is a system annotation for Vietnamese, it use pretrain [Bert4news](https://github.com/bino282/bert4news/) to fine-turning to NLP problems in Vietnamese components of wordsegmentation,Named entity recognition (NER) and achieve high accuravy.
@@ -78,35 +96,15 @@ print(entities)
78
 
79
  ```
80
 
81
-
82
- Use with huggingface/transformers
83
- ``` bash
84
- import torch
85
- from transformers import AutoTokenizer,AutoModel
86
- tokenizer= AutoTokenizer.from_pretrained("NlpHUST/vibert4news-base-cased")
87
- bert_model = AutoModel.from_pretrained("NlpHUST/vibert4news-base-cased")
88
-
89
- line = "Tôi là sinh viên trường Bách Khoa Hà Nội ."
90
- input_id = tokenizer.encode(line,add_special_tokens = True)
91
- att_mask = [int(token_id > 0) for token_id in input_id]
92
- input_ids = torch.tensor([input_id])
93
- att_masks = torch.tensor([att_mask])
94
- with torch.no_grad():
95
- features = bert_model(input_ids,att_masks)
96
-
97
- print(features)
98
-
99
- ```
100
-
101
  Run training with base config
102
 
103
  ``` bash
104
 
105
- python train_pytorch.py \
106
- --model_path=bert4news.pytorch \
107
- --max_len=200 \
108
- --batch_size=16 \
109
- --epochs=6 \
110
  --lr=2e-5
111
 
112
  ```
 
18
 
19
  **[BERT](https://github.com/google-research/bert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
20
 
21
+ Use with huggingface/transformers
22
+ ``` bash
23
+ import torch
24
+ from transformers import BertTokenizer,BertModel
25
+ tokenizer= BertTokenizer.from_pretrained("NlpHUST/vibert4news-base-cased")
26
+ bert_model = BertModel.from_pretrained("NlpHUST/vibert4news-base-cased")
27
+
28
+ line = "Tôi là sinh viên trường Bách Khoa Hà Nội ."
29
+ input_id = tokenizer.encode(line,add_special_tokens = True)
30
+ att_mask = [int(token_id > 0) for token_id in input_id]
31
+ input_ids = torch.tensor([input_id])
32
+ att_masks = torch.tensor([att_mask])
33
+ with torch.no_grad():
34
+ features = bert_model(input_ids,att_masks)
35
+
36
+ print(features)
37
+
38
+ ```
39
 
40
  # Vietnamese toolkit with bert
41
  ViNLP is a system annotation for Vietnamese, it use pretrain [Bert4news](https://github.com/bino282/bert4news/) to fine-turning to NLP problems in Vietnamese components of wordsegmentation,Named entity recognition (NER) and achieve high accuravy.
 
96
 
97
  ```
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  Run training with base config
100
 
101
  ``` bash
102
 
103
+ python train_pytorch.py \\
104
+ --model_path=bert4news.pytorch \\
105
+ --max_len=200 \\
106
+ --batch_size=16 \\
107
+ --epochs=6 \\
108
  --lr=2e-5
109
 
110
  ```