Updated readme. Base model name corrected.
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ This model achieves top performance on the RAID benchmark at the time of submiss
|
|
29 |
|
30 |
The model is built upon a fine-tuned **microsoft/deberta-v3-large** transformer architecture. The core components include:
|
31 |
|
32 |
-
* **Transformer Base:** The pre-trained `microsoft/deberta-v3-
|
33 |
* **Mean Pooling:** A mean pooling layer aggregates the hidden states from the transformer, creating a fixed-size representation of the input text. This method averages the token embeddings, weighted by the attention mask, to capture the overall semantic meaning.
|
34 |
* **Classifier Head:** A linear layer acts as a classifier, taking the pooled representation and outputting a single logit. This logit represents the model's confidence that the input text is AI-generated. Sigmoid activation is applied to the logit to produce a probability.
|
35 |
|
|
|
29 |
|
30 |
The model is built upon a fine-tuned **microsoft/deberta-v3-large** transformer architecture. The core components include:
|
31 |
|
32 |
+
* **Transformer Base:** The pre-trained `microsoft/deberta-v3-large` model serves as the foundation. This model utilizes DeBERTa (Decoding-enhanced BERT with disentangled attention), an improved version of BERT and RoBERTa, which incorporates disentangled attention and enhanced mask decoder for better performance.
|
33 |
* **Mean Pooling:** A mean pooling layer aggregates the hidden states from the transformer, creating a fixed-size representation of the input text. This method averages the token embeddings, weighted by the attention mask, to capture the overall semantic meaning.
|
34 |
* **Classifier Head:** A linear layer acts as a classifier, taking the pooled representation and outputting a single logit. This logit represents the model's confidence that the input text is AI-generated. Sigmoid activation is applied to the logit to produce a probability.
|
35 |
|