End of training

Browse files

Files changed (15) hide show

.hydra/config.yaml +3 -0
.hydra/hydra.yaml +2 -2
README.md +15 -15
config.json +0 -1
configuration_measurement_pred.py +0 -2
logs/events.out.tfevents.1734630919.gail.ist.berkeley.edu.140349.0 +3 -0
model.safetensors +1 -1
modeling_code_gen_measurement_pred.py +7 -1
modeling_measurement_pred.py +19 -17
sensor_loc_stories.py +2 -0
sensor_locs_from_token.py +2 -0
special_tokens_map.json +1 -1
tokenizer.json +2 -2
tokenizer_config.json +1 -1
train.log +1 -4

.hydra/config.yaml CHANGED Viewed

@@ -3,6 +3,9 @@ model:
   model_type: codegen
   pretrained_model_name: Salesforce/codegen-350M-mono
   max_length: 1024
 hparams:
   learning_rate: 2.0e-05
   weight_decay: 0.02

   model_type: codegen
   pretrained_model_name: Salesforce/codegen-350M-mono
   max_length: 1024
+  model_config_params:
+    sensor_loc_type: locs_from_token
+    sensor_token: ' omit'
 hparams:
   learning_rate: 2.0e-05
   weight_decay: 0.02

.hydra/hydra.yaml CHANGED Viewed

@@ -142,7 +142,7 @@ hydra:
     name: train
     chdir: null
     override_dirname: model.dataset_name=redwoodresearch/diamonds-seed1
-    id: '747438'
     num: 0
     config_name: codegen_diamonds_slurm
     env_set: {}
@@ -166,7 +166,7 @@ hydra:
     - path: ''
       schema: structured
       provider: schema
-    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-17/07-26-22/0
     choices:
       hparams: hparams
       model: codegen_diamonds

     name: train
     chdir: null
     override_dirname: model.dataset_name=redwoodresearch/diamonds-seed1
+    id: '748836_0'
     num: 0
     config_name: codegen_diamonds_slurm
     env_set: {}
     - path: ''
       schema: structured
       provider: schema
+    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-19/09-54-27/0
     choices:
       hparams: hparams
       model: codegen_diamonds

README.md CHANGED Viewed

@@ -17,16 +17,16 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4208
-- Accuracy: 0.9039
-- Accuracy Sensor 0: 0.8951
-- Auroc Sensor 0: 0.9544
-- Accuracy Sensor 1: 0.9114
-- Auroc Sensor 1: 0.9468
-- Accuracy Sensor 2: 0.9304
-- Auroc Sensor 2: 0.9752
-- Accuracy Aggregated: 0.8787
-- Auroc Aggregated: 0.9601
 ## Model description
@@ -61,11 +61,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
-| 0.2957        | 0.9997 | 781  | 0.3062          | 0.8755   | 0.8833            | 0.8927         | 0.8724            | 0.8925         | 0.8966            | 0.9193         | 0.8494              | 0.8893           |
-| 0.1972        | 1.9994 | 1562 | 0.2602          | 0.8922   | 0.8898            | 0.9341         | 0.9076            | 0.9355         | 0.9133            | 0.9617         | 0.8582              | 0.9350           |
-| 0.1195        | 2.9990 | 2343 | 0.2889          | 0.8943   | 0.8747            | 0.9475         | 0.9022            | 0.9347         | 0.9168            | 0.9700         | 0.8835              | 0.9516           |
-| 0.0784        | 4.0    | 3125 | 0.3078          | 0.9104   | 0.9084            | 0.9574         | 0.9125            | 0.9486         | 0.9380            | 0.9760         | 0.8828              | 0.9611           |
-| 0.0347        | 4.9984 | 3905 | 0.4208          | 0.9039   | 0.8951            | 0.9544         | 0.9114            | 0.9468         | 0.9304            | 0.9752         | 0.8787              | 0.9601           |
 ### Framework versions

 This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.4083
+- Accuracy: 0.9134
+- Accuracy Sensor 0: 0.9153
+- Auroc Sensor 0: 0.9651
+- Accuracy Sensor 1: 0.9094
+- Auroc Sensor 1: 0.9502
+- Accuracy Sensor 2: 0.9317
+- Auroc Sensor 2: 0.9780
+- Accuracy Aggregated: 0.8974
+- Auroc Aggregated: 0.9672
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
+| 0.2812        | 0.9997 | 781  | 0.2931          | 0.8747   | 0.8785            | 0.9058         | 0.8806            | 0.9047         | 0.8897            | 0.9331         | 0.8499              | 0.9009           |
+| 0.1938        | 1.9994 | 1562 | 0.2940          | 0.8844   | 0.8760            | 0.9330         | 0.9017            | 0.9300         | 0.9160            | 0.9574         | 0.8438              | 0.9252           |
+| 0.1202        | 2.9990 | 2343 | 0.2551          | 0.9080   | 0.9055            | 0.9601         | 0.9119            | 0.9504         | 0.9235            | 0.9757         | 0.8910              | 0.9615           |
+| 0.0779        | 4.0    | 3125 | 0.2902          | 0.9178   | 0.9194            | 0.9667         | 0.9164            | 0.9516         | 0.9309            | 0.9799         | 0.9044              | 0.9680           |
+| 0.035         | 4.9984 | 3905 | 0.4083          | 0.9134   | 0.9153            | 0.9651         | 0.9094            | 0.9502         | 0.9317            | 0.9780         | 0.8974              | 0.9672           |
 ### Framework versions

config.json CHANGED Viewed

@@ -48,7 +48,6 @@
   "tokenizer_class": "GPT2Tokenizer",
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
-  "use_aggregated": true,
   "use_cache": false,
   "vocab_size": 51200
 }

   "tokenizer_class": "GPT2Tokenizer",
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
   "use_cache": false,
   "vocab_size": 51200
 }

configuration_measurement_pred.py CHANGED Viewed

@@ -7,7 +7,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
-        use_aggregated=True,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
@@ -15,7 +14,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
-        self.use_aggregated = use_aggregated
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

logs/events.out.tfevents.1734630919.gail.ist.berkeley.edu.140349.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9939459ca106646fb93ee511c17966004cf7182a2bf76843e3078d3ed524054d
+size 16043

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:534b1793885a479e7ba9217ee8a820a82d9ab8fd5fa08e9f8fecbdfae51bce71
 size 1216963976

 version https://git-lfs.github.com/spec/v1
+oid sha256:8b800b73c4459a9c88932c43842e1b7fdfce769c94bf0792ae4282a00f248eb8
 size 1216963976

modeling_code_gen_measurement_pred.py CHANGED Viewed

@@ -1,5 +1,5 @@
 from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
@@ -11,3 +11,9 @@ class CodeGenMeasurementPredictor(CodeGenPreTrainedModel, MeasurementPredictorMi
         super().__init__(config)
         self.transformer = CodeGenModel(config)
         self.post_init()

 from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
+from transformers import PreTrainedTokenizerBase
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
         super().__init__(config)
         self.transformer = CodeGenModel(config)
         self.post_init()
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        pad_token = ' .'
+        pad_token_id = tokenizer.encode(pad_token)[0]
+        tokenizer.pad_token = pad_token
+        tokenizer.pad_token_id = pad_token_id

modeling_measurement_pred.py CHANGED Viewed

@@ -1,4 +1,5 @@
 from typing import Optional, Tuple, Union
 import torch
 from torch.nn import BCEWithLogitsLoss
@@ -20,16 +21,18 @@ class MeasurementPredictorMixin(PreTrainedModel):
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
-        self.use_aggregated = config.use_aggregated
-        if config.use_aggregated:
-            self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
-        self.get_sensor_locs: SensorLocFinder = None
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
-        self.get_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
@@ -67,28 +70,27 @@ class MeasurementPredictorMixin(PreTrainedModel):
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
-        sensor_locs = self.get_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
-        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors, self.config.emb_dim), f"{sensor_embs.shape} != {(input_ids.shape[0], self.n_sensors, self.config.emb_dim)}"
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
-        logits = sensor_logits
-        if self.use_aggregated:
-            last_emb = base_model_output.last_hidden_state[:, -1, :]
-            aggregate_logits = self.aggregate_probe(last_emb)
-            logits = torch.concat([logits, aggregate_logits], dim=-1)
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
-            sensor_loss = loss_fct(sensor_logits, labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
-            if self.use_aggregated: #TOOD: should be use aggregate
-                aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
-                loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

 from typing import Optional, Tuple, Union
+from abc import abstractmethod
 import torch
 from torch.nn import BCEWithLogitsLoss
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
+        self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
+        self.find_sensor_locs: SensorLocFinder = None
+    @abstractmethod
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        pass
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
+        self.find_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
+        # get sensor embeddings (including aggregate)
+        sensor_locs = self.find_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
+        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors + 1, self.config.emb_dim), sensor_embs.shape
+        # get sensor and aggregate logits
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
+        aggregate_logits = self.aggregate_probe(sensor_embs[:, -1, :])
+        logits = torch.concat([sensor_logits, aggregate_logits], dim=-1)
+        # compute loss
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
+            sensor_loss = loss_fct(sensor_logits[:, :self.n_sensors], labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
+            aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
+            loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

sensor_loc_stories.py CHANGED Viewed

@@ -26,6 +26,8 @@ class StoriesSensorLocFinder(SensorLocFinder):
             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
         return locs

             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
+        aggregate_sensor_loc = locs[:, -1].unsqueeze(1)
+        locs = torch.cat([locs, aggregate_sensor_loc], dim=1)
         return locs

sensor_locs_from_token.py CHANGED Viewed

@@ -13,4 +13,6 @@ class SensorLocFinderFromToken(SensorLocFinder):
     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
         return sensor_token_idxs

     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
+        aggregate_sensor_token_idx = sensor_token_idxs[:, -1].unsqueeze(1)
+        sensor_token_idxs = torch.cat([sensor_token_idxs, aggregate_sensor_token_idx], dim=1)
         return sensor_token_idxs

special_tokens_map.json CHANGED Viewed

@@ -13,7 +13,7 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<|endoftext|>",
   "unk_token": {
     "content": "<|endoftext|>",
     "lstrip": false,

     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "Ġ.",
   "unk_token": {
     "content": "<|endoftext|>",
     "lstrip": false,

tokenizer.json CHANGED Viewed

@@ -12,9 +12,9 @@
     },
     "direction": "Left",
     "pad_to_multiple_of": null,
-    "pad_id": 50256,
     "pad_type_id": 0,
-    "pad_token": "<|endoftext|>"
   },
   "added_tokens": [
     {

     },
     "direction": "Left",
     "pad_to_multiple_of": null,
+    "pad_id": 764,
     "pad_type_id": 0,
+    "pad_token": "Ġ."
   },
   "added_tokens": [
     {

tokenizer_config.json CHANGED Viewed

@@ -318,7 +318,7 @@
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 2048,
-  "pad_token": "<|endoftext|>",
   "padding_side": "left",
   "return_token_type_ids": false,
   "tokenizer_class": "CodeGenTokenizer",

   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 2048,
+  "pad_token": "Ġ.",
   "padding_side": "left",
   "return_token_type_ids": false,
   "tokenizer_class": "CodeGenTokenizer",

train.log CHANGED Viewed

@@ -1,4 +1 @@
-[2024-12-17 07:27:38,728][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
-[2024-12-17 07:27:38,922][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
-[2024-12-17 11:28:11,228][submitit][INFO] - Job completed successfully
-[2024-12-17 11:28:11,254][submitit][INFO] - Exiting after successful completion


1	+ [2024-12-19 09:55:18,220][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.