hhou435 commited on
Commit
cf37f5a
1 Parent(s): bf05a6e
.gitattributes DELETED
@@ -1,8 +0,0 @@
1
- *.bin.* filter=lfs diff=lfs merge=lfs -text
2
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.h5 filter=lfs diff=lfs merge=lfs -text
5
- *.tflite filter=lfs diff=lfs merge=lfs -text
6
- *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
- *.ot filter=lfs diff=lfs merge=lfs -text
8
- *.onnx filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
README.md DELETED
@@ -1,70 +0,0 @@
1
- ---
2
- language: zh
3
- widget: text: "中国的首都是[MASK]京。"
4
- thumbnail:
5
- tags:
6
- license:
7
- datasets:
8
- metrics:
9
- ---
10
-
11
- # MyModelName
12
-
13
- ## Model description
14
-
15
- ## Intended uses & limitations
16
-
17
- #### How to use
18
-
19
- You can use this model directly with a pipeline for masked language modeling:
20
-
21
- ```python
22
- >>> from transformers import pipeline
23
- >>> unmasker = pipeline('fill-mask', model='hhou435/chinese_roberta_L-2_H-128')
24
- >>> unmasker("中国的首都是[MASK]京。")
25
- [
26
- {'sequence': '[CLS] 中 国 的 首 都 是 北 京 。 [SEP]',
27
- 'score': 0.9427323937416077,
28
- 'token': 1266,
29
- 'token_str': '北'},
30
- {'sequence': '[CLS] 中 国 的 首 都 是 南 京 。 [SEP]',
31
- 'score': 0.029202355071902275,
32
- 'token': 1298,
33
- 'token_str': '南'},
34
- {'sequence': '[CLS] 中 国 的 首 都 是 东 京 。 [SEP]',
35
- 'score': 0.00977553054690361,
36
- 'token': 691,
37
- 'token_str': '东'},
38
- {'sequence': '[CLS] 中 国 的 首 都 是 葡 京 。 [SEP]',
39
- 'score': 0.00489805219694972,
40
- 'token': 5868,
41
- 'token_str': '葡'},
42
- {'sequence': '[CLS] 中 国 的 首 都 是 新 京 。 [SEP]',
43
- 'score': 0.0027360401581972837,
44
- 'token': 3173,
45
- 'token_str': '新'}
46
- ]
47
-
48
- ```
49
-
50
- Here is how to use this model to get the features of a given text in PyTorch:
51
-
52
- ```python
53
- from transformers import BertTokenizer, BertModel
54
- tokenizer = BertTokenizer.from_pretrained('hhou435/chinese_roberta_L-2_H-128')
55
- model = BertModel.from_pretrained("hhou435/chinese_roberta_L-2_H-128")
56
- text = "用你喜欢的任何文本替换我。"
57
- encoded_input = tokenizer(text, return_tensors='pt')
58
- output = model(**encoded_input)
59
- ```
60
-
61
- and in TensorFlow:
62
-
63
- ```python
64
- from transformers import BertTokenizer, TFBertModel
65
- tokenizer = BertTokenizer.from_pretrained('hhou435/chinese_roberta_L-2_H-128')
66
- model = TFBertModel.from_pretrained("hhou435/chinese_roberta_L-2_H-128")
67
- text = "用你喜欢的任何文本替换我。"
68
- encoded_input = tokenizer(text, return_tensors='tf')
69
- output = model(encoded_input)
70
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json DELETED
@@ -1,20 +0,0 @@
1
- {
2
- "architectures": [
3
- "BertForMaskedLM"
4
- ],
5
- "attention_probs_dropout_prob": 0.1,
6
- "gradient_checkpointing": false,
7
- "hidden_act": "gelu",
8
- "hidden_dropout_prob": 0.1,
9
- "hidden_size": 128,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 512,
12
- "layer_norm_eps": 1e-12,
13
- "max_position_embeddings": 512,
14
- "model_type": "bert",
15
- "num_attention_heads": 2,
16
- "num_hidden_layers": 2,
17
- "pad_token_id": 0,
18
- "type_vocab_size": 2,
19
- "vocab_size": 21128
20
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:5bc50a2bc8ac3bc07c8c5865c6893c813150b1d9b8bd3332af4940b347c850bf
3
- size 12840967
 
 
 
special_tokens_map.json DELETED
@@ -1 +0,0 @@
1
- {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
 
tf_model.h5 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:88841e2f97913b66ed4ce2aa5be275352edc000b7b36a1daa3232aae29d9bd99
3
- size 24044192
 
 
 
tokenizer_config.json DELETED
@@ -1 +0,0 @@
1
- {"do_lower_case": false, "do_basic_tokenize": true, "never_split": null, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512}
 
vocab.txt DELETED
The diff for this file is too large to render. See raw diff