Update README.md
Browse files
README.md
CHANGED
@@ -9,8 +9,6 @@ language:
|
|
9 |
|
10 |
*by GPT-4 & Crumb*
|
11 |
|
12 |
-
***Note***: *this model is in the process of being re-evaluated because it was retrained.*
|
13 |
-
|
14 |
### Introduction
|
15 |
|
16 |
Transformer models have become a popular choice for natural language processing (NLP) tasks due to their ability to handle long-range dependencies and their superior performance on various NLP benchmarks. The transformer model architecture was introduced in 2017 by [Vaswani et al](https://arxiv.org/abs/1706.03762). and has since been used in many state-of-the-art models such as BERT and GPT. The decoder-only transformer model is a variant of the transformer model that has is commonly used for generative tasks in NLP. It uses masked self-attention to predict the next token in a sequence and has been shown to be powerful at predicting sequences of text.
|
|
|
9 |
|
10 |
*by GPT-4 & Crumb*
|
11 |
|
|
|
|
|
12 |
### Introduction
|
13 |
|
14 |
Transformer models have become a popular choice for natural language processing (NLP) tasks due to their ability to handle long-range dependencies and their superior performance on various NLP benchmarks. The transformer model architecture was introduced in 2017 by [Vaswani et al](https://arxiv.org/abs/1706.03762). and has since been used in many state-of-the-art models such as BERT and GPT. The decoder-only transformer model is a variant of the transformer model that has is commonly used for generative tasks in NLP. It uses masked self-attention to predict the next token in a sequence and has been shown to be powerful at predicting sequences of text.
|