beyond commited on
Commit
5b59934
1 Parent(s): a063fd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -26
README.md CHANGED
@@ -34,35 +34,99 @@ inference:
34
  # GENIUS: generating text using sketches!
35
 
36
 
37
- ```
38
- from transformers import pipeline
39
- genius = pipeline("text2text-generation", model="beyond/genius-large", device=0)
40
- sketch = "<mask> Conference on Empirical Methods <mask> submission of research papers <mask> Deep Learning <mask>"
41
- genius(sketch, num_beams=3, do_sample=True, max_length=200)[0]['generated_text']
42
- ```
43
-
44
  - **Paper: [GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation](https://arxiv.org/abs/2211.10330)**
45
  - **GitHub: [GENIUS, Pre-training/Data Augmentation Tutorial](https://github.com/beyondguo/genius)**
46
 
47
 
48
 
49
- 💡**GENIUS** is a powerful conditional text generation model using sketches as input, which can fill in the missing contexts for a given **sketch** (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel *reconstruction from sketch* objective using an *extreme and selective masking* strategy, enabling it to generate diverse and high-quality texts given sketches.
50
-
51
- **GENIUS** can also be used as a general textual **data augmentation tool** for **various NLP tasks** (including sentiment analysis, topic classification, NER, and QA).
52
-
53
-
54
- ![image-20221119164544165](https://cdn.jsdelivr.net/gh/beyondguo/mdnice_pictures/typora/hi-genius.png)
55
-
56
-
57
- **Model variations:**
58
-
59
- | Model | #params | Language | comment|
60
- |------------------------|--------------------------------|-------|---------|
61
- | [`genius-large`](https://huggingface.co/beyond/genius-large) | 406M | English | The version used in **paper** (recommend) |
62
- | [`genius-large-k2t`](https://huggingface.co/beyond/genius-large-k2t) | 406M | English | keywords-to-text |
63
- | [`genius-base`](https://huggingface.co/beyond/genius-base) | 139M | English | smaller version |
64
- | [`genius-base-ps`](https://huggingface.co/beyond/genius-base) | 139M | English | pre-trained both in paragraphs and short sentences |
65
- | [`genius-base-chinese`](https://huggingface.co/beyond/genius-base-chinese) | 116M | 中文 | 在一千万纯净中文段落上预训练|
66
-
67
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
 
34
  # GENIUS: generating text using sketches!
35
 
36
 
 
 
 
 
 
 
 
37
  - **Paper: [GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation](https://arxiv.org/abs/2211.10330)**
38
  - **GitHub: [GENIUS, Pre-training/Data Augmentation Tutorial](https://github.com/beyondguo/genius)**
39
 
40
 
41
 
42
+ # BERT base model (uncased)
43
+
44
+ Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
45
+ [this paper](https://arxiv.org/abs/1810.04805) and first released in
46
+ [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
47
+ between english and English.
48
+
49
+ Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
50
+ the Hugging Face team.
51
+
52
+ ## Model description
53
+
54
+ BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
55
+ was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of
56
+ publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
57
+ was pretrained with two objectives:
58
+
59
+ - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
60
+ the entire masked sentence through the model and has to predict the masked words. This is different from traditional
61
+ recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
62
+ GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the
63
+ sentence.
64
+ - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
65
+ they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
66
+ predict if the two sentences were following each other or not.
67
+
68
+ This way, the model learns an inner representation of the English language that can then be used to extract features
69
+ useful for downstream tasks: if you have a dataset of labeled sentences, for instance, you can train a standard
70
+ classifier using the features produced by the BERT model as inputs.
71
+
72
+ ## Model variations
73
+
74
+ BERT has originally been released in base and large variations, for cased and uncased input text. The uncased models also strips out an accent markers.
75
+ Chinese and multilingual uncased and cased versions followed shortly after.
76
+ Modified preprocessing with whole word masking has replaced subpiece masking in a following work, with the release of two models.
77
+ Other 24 smaller models are released afterward.
78
+
79
+ The detailed release history can be found on the [google-research/bert readme](https://github.com/google-research/bert/blob/master/README.md) on github.
80
+
81
+ | Model | #params | Language |
82
+ |------------------------|--------------------------------|-------|
83
+ | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M | English |
84
+ | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased) | 340M | English | sub
85
+ | [`bert-base-cased`](https://huggingface.co/bert-base-cased) | 110M | English |
86
+ | [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M | English |
87
+ | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) | 110M | Chinese |
88
+ | [`bert-base-multilingual-cased`](https://huggingface.co/bert-base-multilingual-cased) | 110M | Multiple |
89
+ | [`bert-large-uncased-whole-word-masking`](https://huggingface.co/bert-large-uncased-whole-word-masking) | 340M | English |
90
+ | [`bert-large-cased-whole-word-masking`](https://huggingface.co/bert-large-cased-whole-word-masking) | 340M | English |
91
+
92
+ ## Intended uses & limitations
93
+
94
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
95
+ be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
96
+ fine-tuned versions of a task that interests you.
97
+
98
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
99
+ to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
100
+ generation you should look at model like GPT2.
101
+
102
+ ### How to use
103
+
104
+ You can use this model directly with a pipeline for masked language modeling:
105
+
106
+ ```python
107
+ >>> from transformers import pipeline
108
+ >>> unmasker = pipeline('fill-mask', model='bert-base-uncased')
109
+ >>> unmasker("Hello I'm a [MASK] model.")
110
+
111
+ [{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
112
+ 'score': 0.1073106899857521,
113
+ 'token': 4827,
114
+ 'token_str': 'fashion'},
115
+ {'sequence': "[CLS] hello i'm a role model. [SEP]",
116
+ 'score': 0.08774490654468536,
117
+ 'token': 2535,
118
+ 'token_str': 'role'},
119
+ {'sequence': "[CLS] hello i'm a new model. [SEP]",
120
+ 'score': 0.05338378623127937,
121
+ 'token': 2047,
122
+ 'token_str': 'new'},
123
+ {'sequence': "[CLS] hello i'm a super model. [SEP]",
124
+ 'score': 0.04667217284440994,
125
+ 'token': 3565,
126
+ 'token_str': 'super'},
127
+ {'sequence': "[CLS] hello i'm a fine model. [SEP]",
128
+ 'score': 0.027095865458250046,
129
+ 'token': 2986,
130
+ 'token_str': 'fine'}]
131
+ ```
132