PereLluis13 commited on
Commit
947d33f
1 Parent(s): c18edcb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ar
4
+ - fr
5
+ - es
6
+ - de
7
+ - el
8
+ - bg
9
+ - ru
10
+ - tr
11
+ - vi
12
+ - th
13
+ - zh
14
+ - hi
15
+ - sw
16
+ - ur
17
+ datasets:
18
+ - xnli
19
+ widget:
20
+ - text: >-
21
+ The Red Hot Chili Peppers were formed in Los Angeles by Kiedis, Flea, Hillel Slovak and Jack Irons. [SEP] Jack Irons place of birth Los Angeles
22
+ ---
23
+
24
+ # Model Card for mdeberta-v3-base-triplet-critic-xnli
25
+
26
+ <!-- Provide a quick summary of what the model is/does. [Optional] -->
27
+ This is the Triplit Critic model presented in the ACL 2023 paper RED^{FM}: a Filtered and Multilingual Relation Extraction Dataset.
28
+
29
+ It is based on mdeberta-v3-base and it was trained as a multitask system to filter triplets as well as on the XNLI dataset. The model weights contain the two classification heads, however loading it using the huggingface library will only load those for Triplet filtering (ie. a binary classification head), if one wants to use it for XNLI it needs a custom script. While it is defined and trained as a classification system, we use the positive score (ie. Label_1) as the confidence score for a triplet. For SRED<sup>FM</sup> the confidence score thresshold was set at 0.75.
30
+
31
+
32
+
33
+ To load the multitask model:
34
+ ```python
35
+ from transformers import DebertaV2PreTrainedModel, DebertaV2Model
36
+ from torch import nn
37
+ from transformers.models.deberta_v2.modeling_deberta_v2 import *
38
+ from transformers.file_utils import ModelOutput
39
+
40
+ @dataclass
41
+ class TXNLIClassifierOutput(ModelOutput):
42
+ """
43
+ Base class for outputs of sentence classification models.
44
+
45
+ Args:
46
+ loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided):
47
+ Classification (or regression if config.num_labels==1) loss.
48
+ logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`):
49
+ Classification (or regression if config.num_labels==1) scores (before SoftMax).
50
+ logits_xnli (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`):
51
+ Classification (or regression if config.num_labels==1) scores (before SoftMax).
52
+ hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
53
+ Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
54
+ of shape :obj:`(batch_size, sequence_length, hidden_size)`.
55
+
56
+ Hidden-states of the model at the output of each layer plus the initial embedding outputs.
57
+ attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
58
+ Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,
59
+ sequence_length, sequence_length)`.
60
+
61
+ Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
62
+ heads.
63
+ """
64
+
65
+ loss: Optional[torch.FloatTensor] = None
66
+ logits: torch.FloatTensor = None
67
+ logits_xnli: torch.FloatTensor = None
68
+ hidden_states: Optional[Tuple[torch.FloatTensor]] = None
69
+ attentions: Optional[Tuple[torch.FloatTensor]] = None
70
+
71
+ class DebertaV2ForTripletClassification(DebertaV2PreTrainedModel):
72
+ def __init__(self, config):
73
+ super().__init__(config)
74
+
75
+ num_labels = getattr(config, "num_labels", 2)
76
+ self.num_labels = num_labels
77
+
78
+ self.deberta = DebertaV2Model(config)
79
+ self.pooler = ContextPooler(config)
80
+ output_dim = self.pooler.output_dim
81
+
82
+ self.classifier = nn.Linear(output_dim, num_labels)
83
+ drop_out = getattr(config, "cls_dropout", None)
84
+ drop_out = self.config.hidden_dropout_prob if drop_out is None else drop_out
85
+ self.dropout = StableDropout(drop_out)
86
+ self.classifier_xnli = nn.Linear(output_dim, 3)
87
+
88
+ # Initialize weights and apply final processing
89
+ self.post_init()
90
+
91
+ def get_input_embeddings(self):
92
+ return self.deberta.get_input_embeddings()
93
+
94
+ def set_input_embeddings(self, new_embeddings):
95
+ self.deberta.set_input_embeddings(new_embeddings)
96
+
97
+ @add_start_docstrings_to_model_forward(DEBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
98
+ def forward(
99
+ self,
100
+ input_ids=None,
101
+ attention_mask=None,
102
+ token_type_ids=None,
103
+ position_ids=None,
104
+ inputs_embeds=None,
105
+ labels=None,
106
+ output_attentions=None,
107
+ output_hidden_states=None,
108
+ return_dict=None,
109
+ ):
110
+ r"""
111
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
112
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss),
113
+ If `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
114
+ """
115
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
116
+
117
+ outputs = self.deberta(
118
+ input_ids,
119
+ token_type_ids=token_type_ids,
120
+ attention_mask=attention_mask,
121
+ position_ids=position_ids,
122
+ inputs_embeds=inputs_embeds,
123
+ output_attentions=output_attentions,
124
+ output_hidden_states=output_hidden_states,
125
+ return_dict=return_dict,
126
+ )
127
+
128
+ encoder_layer = outputs[0]
129
+ pooled_output = self.pooler(encoder_layer)
130
+ pooled_output = self.dropout(pooled_output)
131
+ logits = self.classifier(pooled_output)
132
+ logits_xnli = self.classifier_xnli(pooled_output)
133
+
134
+ loss = None
135
+ if labels is not None:
136
+ if labels.dtype != torch.bool:
137
+ loss_fct = CrossEntropyLoss()
138
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
139
+ else:
140
+ loss_fct = BCEWithLogitsLoss()
141
+ loss = loss_fct(logits_xnli.view(-1, 3), labels.view(-1).long())
142
+ if not return_dict:
143
+ output = (logits,) + outputs[1:]
144
+ return ((loss,) + output) if loss is not None else output
145
+
146
+ return TXNLIClassifierOutput(
147
+ loss=loss, logits=logits, logits_xnli=logits_xnli, hidden_states=outputs.hidden_states, attentions=outputs.attentions
148
+ )
149
+ ```
150
+
151
+