Upload model

Browse files

Files changed (5) hide show

README.md +199 -0
config.json +39 -0
config.py +29 -0
model.py +306 -0
model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "architectures": [
+    "ILKTModel"
+  ],
+  "auto_map": {
+    "AutoConfig": "config.ILKTConfig",
+    "AutoModel": "model.ILKTModel"
+  },
+  "backbone_config": {
+    "pretrained_model_name_or_path": "microsoft/mdeberta-v3-base",
+    "trust_remote_code": true
+  },
+  "cls_head_config": {
+    "dropout": 0.0,
+    "n_dense": 0,
+    "pool_type": "cls",
+    "use_batch_norm": true,
+    "use_layer_norm": false
+  },
+  "cls_heads": [],
+  "embedding_head_config": {
+    "dropout": 0.0,
+    "n_dense": 0,
+    "normalize_embeddings": false,
+    "pool_type": "cls",
+    "use_batch_norm": false,
+    "use_layer_norm": false
+  },
+  "hidden_size": 768,
+  "mlm_head_config": {
+    "dropout": 0.0,
+    "n_dense": 0,
+    "use_batch_norm": true,
+    "use_layer_norm": false
+  },
+  "model_type": "ILKT",
+  "torch_dtype": "float32",
+  "transformers_version": "4.41.2"
+}

config.py ADDED Viewed

	@@ -0,0 +1,29 @@

+from typing import Any, Dict, List, Tuple
+from transformers import PretrainedConfig
+class ILKTConfig(PretrainedConfig):
+    model_type = "ILKT"
+    def __init__(
+        self,
+        backbone_config: Dict[str, Any] = {},
+        embedding_head_config: Dict[str, Any] = {},
+        mlm_head_config: Dict[str, Any] = {},
+        cls_head_config: Dict[str, Any] = {},
+        cls_heads: List[Tuple[int, str]] = [],
+        **kwargs
+    ):
+        self.backbone_config = backbone_config
+        self.embedding_head_config = embedding_head_config
+        self.mlm_head_config = mlm_head_config
+        self.cls_head_config = cls_head_config
+        self.cls_heads = cls_heads
+        self.output_hidden_states = False
+        # TODO:
+        # make config a proper HF config, save max length ets, don't know how it works exactly in hf ecosystem
+        super().__init__(**kwargs)

model.py ADDED Viewed

	@@ -0,0 +1,306 @@

+from typing import Any, Dict, Optional
+import torch
+import torch.nn as nn
+from transformers import AutoConfig, AutoModel, PreTrainedModel
+from transformers.modeling_outputs import (
+    BaseModelOutputWithPooling,
+    MaskedLMOutput,
+    BaseModelOutput,
+    SequenceClassifierOutput,
+)
+from enum import Enum
+import sys
+import os
+from .config import ILKTConfig
+import os, sys
+parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))
+sys.path.append(parent_dir)
+from eval_utils.metrics import stiffness
+sys.path.pop(-1)
+def cls_pooling(last_hidden_state, attention_mask):
+    return last_hidden_state[:, 0, :]
+def create_head_blocks(
+    hidden_size: int,
+    n_dense: int,
+    use_batch_norm: bool,
+    use_layer_norm: bool,
+    dropout: float,
+    **kwargs,
+) -> nn.Module:
+    blocks = []
+    for _ in range(n_dense):
+        blocks.append(nn.Linear(hidden_size, hidden_size))
+        if use_batch_norm:
+            blocks.append(nn.BatchNorm1d(hidden_size))
+        elif use_layer_norm:
+            blocks.append(nn.LayerNorm(hidden_size))
+        blocks.append(nn.ReLU())
+        if dropout > 0:
+            blocks.append(nn.Dropout(dropout))
+    return nn.Sequential(*blocks)
+class SentenceEmbeddingHead(nn.Module):
+    def __init__(
+        self, backbone_hidden_size: int, embedding_head_config: Dict[str, Any]
+    ):
+        super().__init__()
+        self.config = embedding_head_config
+        self.head = nn.Sequential(
+            *[
+                create_head_blocks(backbone_hidden_size, **embedding_head_config),
+            ]
+        )
+    def forward(
+        self, backbone_output: BaseModelOutput, attention_mask: torch.Tensor, **kwargs
+    ) -> BaseModelOutputWithPooling:
+        if self.config["pool_type"] == "cls":
+            embeddings = cls_pooling(backbone_output.last_hidden_state, attention_mask)
+        else:
+            raise NotImplementedError(
+                f"Pooling type {self.config['pool_type']} not implemented"
+            )
+        if self.config["normalize_embeddings"]:
+            embeddings = nn.functional.normalize(embeddings, p=2, dim=-1)
+        return BaseModelOutputWithPooling(
+            last_hidden_state=backbone_output.last_hidden_state,
+            pooler_output=embeddings,  # type: ignore
+        )
+class MLMHead(nn.Module):
+    def __init__(
+        self,
+        backbone_hidden_size: int,
+        vocab_size: int,
+        mlm_head_config: Dict[str, Any],
+    ):
+        super().__init__()
+        self.config = mlm_head_config
+        self.head = nn.Sequential(
+            *[
+                create_head_blocks(backbone_hidden_size, **mlm_head_config),
+                nn.Linear(backbone_hidden_size, vocab_size),
+            ]
+        )
+    def forward(
+        self,
+        backbone_output: BaseModelOutput,
+        attention_mask: torch.Tensor,
+        labels: Optional[torch.Tensor] = None,
+        **kwargs,
+    ) -> MaskedLMOutput:
+        prediction_scores = self.head(backbone_output.last_hidden_state)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(
+                prediction_scores.view(-1, prediction_scores.size(-1)),
+                labels.view(-1),
+            )
+        return MaskedLMOutput(loss=loss)
+class CLSHead(nn.Module):
+    def __init__(
+        self,
+        backbone_hidden_size: int,
+        n_classes: int,
+        cls_head_config: Dict[str, Any],
+    ):
+        super().__init__()
+        self.config = cls_head_config
+        self.head = nn.Sequential(
+            *[
+                create_head_blocks(backbone_hidden_size, **cls_head_config),
+                nn.Linear(backbone_hidden_size, n_classes),
+            ]
+        )
+    def forward(
+        self,
+        backbone_output: BaseModelOutput,
+        attention_mask: torch.Tensor,
+        labels: Optional[torch.Tensor] = None,
+        **kwargs,
+    ) -> SequenceClassifierOutput:
+        if self.config["pool_type"] == "cls":
+            embeddings = cls_pooling(backbone_output.last_hidden_state, attention_mask)
+        else:
+            raise NotImplementedError(
+                f"Pooling type {self.config['pool_type']} not implemented"
+            )
+        prediction_scores = self.head(embeddings)
+        loss = None
+        if labels is not None:
+            loss_fct = nn.CrossEntropyLoss()
+            loss = loss_fct(
+                prediction_scores.view(-1, prediction_scores.size(-1)),
+                labels.view(-1),
+            )
+        return SequenceClassifierOutput(loss=loss)
+class ForwardRouting(Enum):
+    GET_SENTENCE_EMBEDDING = "get_sentence_embedding"
+    GET_MLM_OUTPUT = "get_mlm_output"
+    GET_CLS_OUTPUT = "get_cls_output"
+class ILKTModel(PreTrainedModel):
+    config_class = ILKTConfig
+    def __init__(self, config: ILKTConfig):
+        super().__init__(config)
+        backbone_config = AutoConfig.from_pretrained(**config.backbone_config)
+        pretrained_model_name_or_path = config.backbone_config[
+            "pretrained_model_name_or_path"
+        ]
+        self.backbone = AutoModel.from_pretrained(
+            pretrained_model_name_or_path, config=backbone_config
+        )
+        backbone_hidden_size = backbone_config.hidden_size
+        self.config.hidden_size = backbone_hidden_size
+        backbone_vocab_size = backbone_config.vocab_size
+        self.embedding_head = SentenceEmbeddingHead(
+            backbone_hidden_size, config.embedding_head_config
+        )
+        self.mlm_head = MLMHead(
+            backbone_hidden_size, backbone_vocab_size, config.mlm_head_config
+        )
+        self.cls_heads = nn.ModuleDict(
+            dict(
+                [
+                    (
+                        name,
+                        CLSHead(
+                            backbone_hidden_size, n_classes, config.cls_head_config
+                        ),
+                    )
+                    for n_classes, name in config.cls_heads
+                ]
+            )
+        )
+        self.initiate_stiffness()
+    def forward(
+        self,
+        input_ids: torch.Tensor,
+        attention_mask: torch.Tensor,
+        token_type_ids: Optional[torch.Tensor] = None,
+        forward_routing: ForwardRouting = ForwardRouting.GET_SENTENCE_EMBEDDING,
+        **kwargs,
+    ):
+        self.set_current_task(forward_routing)
+        if forward_routing == ForwardRouting.GET_SENTENCE_EMBEDDING:
+            return self.get_sentence_embedding(
+                input_ids, attention_mask, token_type_ids=token_type_ids
+            )
+        elif forward_routing == ForwardRouting.GET_MLM_OUTPUT:
+            return self.get_mlm_output(
+                input_ids, attention_mask, token_type_ids=token_type_ids, **kwargs
+            )
+        elif forward_routing == ForwardRouting.GET_CLS_OUTPUT:
+            return self.get_cls_output(
+                input_ids, attention_mask, token_type_ids=token_type_ids, **kwargs
+            )
+        else:
+            raise ValueError(f"Unknown forward routing {forward_routing}")
+    def get_sentence_embedding(
+        self, input_ids: torch.Tensor, attention_mask: torch.Tensor, **kwargs
+    ):
+        backbone_output: BaseModelOutput = self.backbone(
+            input_ids=input_ids, attention_mask=attention_mask, **kwargs
+        )
+        embedding_output = self.embedding_head(
+            backbone_output, attention_mask, **kwargs
+        )
+        return embedding_output
+    def get_mlm_output(
+        self,
+        input_ids: torch.Tensor,
+        attention_mask: torch.Tensor,
+        labels: Optional[torch.Tensor] = None,
+        **kwargs,
+    ):
+        backbone_output: BaseModelOutput = self.backbone(
+            input_ids=input_ids, attention_mask=attention_mask, **kwargs
+        )
+        mlm_output = self.mlm_head(backbone_output, attention_mask, labels, **kwargs)
+        return mlm_output
+    def get_cls_output(
+        self,
+        input_ids: torch.Tensor,
+        attention_mask: torch.Tensor,
+        head_name: str,
+        labels: Optional[torch.Tensor] = None,
+        **kwargs,
+    ):
+        backbone_output: BaseModelOutput = self.backbone(
+            input_ids=input_ids, attention_mask=attention_mask, **kwargs
+        )
+        if head_name not in self.cls_heads:
+            raise ValueError(f"Head {head_name} not found in model")
+        cls_output = self.cls_heads[head_name](
+            backbone_output, attention_mask, labels, **kwargs
+        )
+        return cls_output
+    def set_current_task(self, task):
+        self.current_task = task
+    def initiate_stiffness(self):
+        self.log_gradients = False
+        self.backbone.encoder.layer[-1].register_full_backward_hook(self._backward_hook)
+        self.gradients = {}
+        self.current_task = None
+    def _backward_hook(self, module, grad_input, grad_output):
+        if self.log_gradients and self.current_task in self.gradients:
+            self.gradients[self.current_task].append(grad_input[0])
+        elif self.log_gradients:
+            self.gradients[self.current_task] = [grad_input[0]]
+    def get_stiffness(self):
+        # REMARK: make sure that you train on CLS and MLM tasks
+        values = {}
+        for task1 in self.gradients:
+            for task2 in self.gradients:
+                if str(task1) > str(task2) and len(self.gradients[task1]) > 0 and len(self.gradients[task2]) > 0:
+                    values[f'{task1}x{task2}_cosine'] = stiffness(torch.cat(self.gradients[task1], dim=-2), torch.cat(self.gradients[task2], dim=-2), "cosine")
+                    values[f'{task1}x{task2}_sign'] = stiffness(torch.cat(self.gradients[task1], dim=-2), torch.cat(self.gradients[task2], dim=-2), "sign")
+        for task in self.gradients:
+            del self.gradients[task][:]
+        return values

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54b77df10db83839dcc1b979f552726ea8cf10ed69146afaf64a6cb79996370a
+size 1884975744