End of training

Browse files

Files changed (15) hide show

.hydra/config.yaml +3 -0
.hydra/hydra.yaml +3 -3
README.md +15 -15
config.json +0 -1
configuration_measurement_pred.py +0 -2
logs/events.out.tfevents.1734630919.gail.ist.berkeley.edu.140347.0 +3 -0
model.safetensors +1 -1
modeling_code_gen_measurement_pred.py +7 -1
modeling_measurement_pred.py +19 -17
sensor_loc_stories.py +2 -0
sensor_locs_from_token.py +2 -0
special_tokens_map.json +1 -1
tokenizer.json +2 -2
tokenizer_config.json +1 -1
train.log +1 -1

.hydra/config.yaml CHANGED Viewed

@@ -3,6 +3,9 @@ model:
   model_type: codegen
   pretrained_model_name: Salesforce/codegen-350M-mono
   max_length: 1024
 hparams:
   learning_rate: 2.0e-05
   weight_decay: 0.02

   model_type: codegen
   pretrained_model_name: Salesforce/codegen-350M-mono
   max_length: 1024
+  model_config_params:
+    sensor_loc_type: locs_from_token
+    sensor_token: ' omit'
 hparams:
   learning_rate: 2.0e-05
   weight_decay: 0.02

.hydra/hydra.yaml CHANGED Viewed

@@ -142,8 +142,8 @@ hydra:
     name: train
     chdir: null
     override_dirname: model.dataset_name=redwoodresearch/diamonds-seed3
-    id: '747440'
-    num: 0
     config_name: codegen_diamonds_slurm
     env_set: {}
     env_copy: []
@@ -166,7 +166,7 @@ hydra:
     - path: ''
       schema: structured
       provider: schema
-    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-17/07-26-23/0
     choices:
       hparams: hparams
       model: codegen_diamonds

     name: train
     chdir: null
     override_dirname: model.dataset_name=redwoodresearch/diamonds-seed3
+    id: '748836_2'
+    num: 2
     config_name: codegen_diamonds_slurm
     env_set: {}
     env_copy: []
     - path: ''
       schema: structured
       provider: schema
+    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-19/09-54-27/2
     choices:
       hparams: hparams
       model: codegen_diamonds

README.md CHANGED Viewed

@@ -17,16 +17,16 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.4321
-- Accuracy: 0.9160
-- Accuracy Sensor 0: 0.9279
-- Auroc Sensor 0: 0.9601
-- Accuracy Sensor 1: 0.9017
-- Auroc Sensor 1: 0.9575
-- Accuracy Sensor 2: 0.9458
-- Auroc Sensor 2: 0.9593
-- Accuracy Aggregated: 0.8887
-- Auroc Aggregated: 0.9488
 ## Model description
@@ -61,11 +61,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
-| 0.281         | 0.9997 | 781  | 0.4126          | 0.8254   | 0.8474            | 0.9025         | 0.8423            | 0.9144         | 0.8339            | 0.9166         | 0.7778              | 0.8949           |
-| 0.1966        | 1.9994 | 1562 | 0.2290          | 0.9123   | 0.9079            | 0.9312         | 0.9138            | 0.9490         | 0.9288            | 0.9424         | 0.8990              | 0.9313           |
-| 0.1412        | 2.9990 | 2343 | 0.2619          | 0.9043   | 0.9059            | 0.9537         | 0.8838            | 0.9581         | 0.9410            | 0.9551         | 0.8863              | 0.9464           |
-| 0.0757        | 4.0    | 3125 | 0.2862          | 0.9224   | 0.9307            | 0.9622         | 0.9153            | 0.9626         | 0.9464            | 0.9601         | 0.8971              | 0.9516           |
-| 0.0356        | 4.9984 | 3905 | 0.4321          | 0.9160   | 0.9279            | 0.9601         | 0.9017            | 0.9575         | 0.9458            | 0.9593         | 0.8887              | 0.9488           |
 ### Framework versions

 This model is a fine-tuned version of [Salesforce/codegen-350M-mono](https://huggingface.co/Salesforce/codegen-350M-mono) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3757
+- Accuracy: 0.9134
+- Accuracy Sensor 0: 0.9235
+- Auroc Sensor 0: 0.9559
+- Accuracy Sensor 1: 0.8989
+- Auroc Sensor 1: 0.9539
+- Accuracy Sensor 2: 0.9486
+- Auroc Sensor 2: 0.9653
+- Accuracy Aggregated: 0.8826
+- Auroc Aggregated: 0.9553
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
+| 0.287         | 0.9997 | 781  | 0.4392          | 0.8094   | 0.8151            | 0.8977         | 0.8235            | 0.9036         | 0.8395            | 0.9106         | 0.7594              | 0.8793           |
+| 0.2108        | 1.9994 | 1562 | 0.2409          | 0.9058   | 0.9011            | 0.9242         | 0.9062            | 0.9344         | 0.9238            | 0.9424         | 0.8920              | 0.9178           |
+| 0.1549        | 2.9990 | 2343 | 0.2347          | 0.9119   | 0.9185            | 0.9519         | 0.8929            | 0.9546         | 0.9481            | 0.9605         | 0.8883              | 0.9476           |
+| 0.0887        | 4.0    | 3125 | 0.2867          | 0.9139   | 0.9243            | 0.9558         | 0.9057            | 0.9547         | 0.9473            | 0.9653         | 0.8785              | 0.9543           |
+| 0.0444        | 4.9984 | 3905 | 0.3757          | 0.9134   | 0.9235            | 0.9559         | 0.8989            | 0.9539         | 0.9486            | 0.9653         | 0.8826              | 0.9553           |
 ### Framework versions

config.json CHANGED Viewed

@@ -48,7 +48,6 @@
   "tokenizer_class": "GPT2Tokenizer",
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
-  "use_aggregated": true,
   "use_cache": false,
   "vocab_size": 51200
 }

   "tokenizer_class": "GPT2Tokenizer",
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
   "use_cache": false,
   "vocab_size": 51200
 }

configuration_measurement_pred.py CHANGED Viewed

@@ -7,7 +7,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
-        use_aggregated=True,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
@@ -15,7 +14,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
-        self.use_aggregated = use_aggregated
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

logs/events.out.tfevents.1734630919.gail.ist.berkeley.edu.140347.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeaf5fce7b36ae16a76c048d9d292a6290fa585bf86583dadb89bc3407752120
+size 16043

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:abd1a2a3f20d0de4bfb8baf9861a742550fd59259e4e33da910c4bdf2f71cf5c
 size 1216963976

 version https://git-lfs.github.com/spec/v1
+oid sha256:ec46cd12e8988c9f77be8d156927fd1ecc03b79128b2c42773c34cd94fa6afee
 size 1216963976

modeling_code_gen_measurement_pred.py CHANGED Viewed

@@ -1,5 +1,5 @@
 from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
@@ -11,3 +11,9 @@ class CodeGenMeasurementPredictor(CodeGenPreTrainedModel, MeasurementPredictorMi
         super().__init__(config)
         self.transformer = CodeGenModel(config)
         self.post_init()

 from transformers.models.codegen import CodeGenPreTrainedModel, CodeGenModel
+from transformers import PreTrainedTokenizerBase
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_code_gen_measuremet_pred import CodeGenMeasurementPredictorConfig
         super().__init__(config)
         self.transformer = CodeGenModel(config)
         self.post_init()
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        pad_token = ' .'
+        pad_token_id = tokenizer.encode(pad_token)[0]
+        tokenizer.pad_token = pad_token
+        tokenizer.pad_token_id = pad_token_id

modeling_measurement_pred.py CHANGED Viewed

@@ -1,4 +1,5 @@
 from typing import Optional, Tuple, Union
 import torch
 from torch.nn import BCEWithLogitsLoss
@@ -20,16 +21,18 @@ class MeasurementPredictorMixin(PreTrainedModel):
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
-        self.use_aggregated = config.use_aggregated
-        if config.use_aggregated:
-            self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
-        self.get_sensor_locs: SensorLocFinder = None
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
-        self.get_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
@@ -67,28 +70,27 @@ class MeasurementPredictorMixin(PreTrainedModel):
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
-        sensor_locs = self.get_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
-        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors, self.config.emb_dim), f"{sensor_embs.shape} != {(input_ids.shape[0], self.n_sensors, self.config.emb_dim)}"
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
-        logits = sensor_logits
-        if self.use_aggregated:
-            last_emb = base_model_output.last_hidden_state[:, -1, :]
-            aggregate_logits = self.aggregate_probe(last_emb)
-            logits = torch.concat([logits, aggregate_logits], dim=-1)
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
-            sensor_loss = loss_fct(sensor_logits, labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
-            if self.use_aggregated: #TOOD: should be use aggregate
-                aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
-                loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

 from typing import Optional, Tuple, Union
+from abc import abstractmethod
 import torch
 from torch.nn import BCEWithLogitsLoss
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
+        self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
+        self.find_sensor_locs: SensorLocFinder = None
+    @abstractmethod
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        pass
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
+        self.find_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
+        # get sensor embeddings (including aggregate)
+        sensor_locs = self.find_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
+        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors + 1, self.config.emb_dim), sensor_embs.shape
+        # get sensor and aggregate logits
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
+        aggregate_logits = self.aggregate_probe(sensor_embs[:, -1, :])
+        logits = torch.concat([sensor_logits, aggregate_logits], dim=-1)
+        # compute loss
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
+            sensor_loss = loss_fct(sensor_logits[:, :self.n_sensors], labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
+            aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
+            loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

sensor_loc_stories.py CHANGED Viewed

@@ -26,6 +26,8 @@ class StoriesSensorLocFinder(SensorLocFinder):
             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
         return locs

             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
+        aggregate_sensor_loc = locs[:, -1].unsqueeze(1)
+        locs = torch.cat([locs, aggregate_sensor_loc], dim=1)
         return locs

sensor_locs_from_token.py CHANGED Viewed

@@ -13,4 +13,6 @@ class SensorLocFinderFromToken(SensorLocFinder):
     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
         return sensor_token_idxs

     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
+        aggregate_sensor_token_idx = sensor_token_idxs[:, -1].unsqueeze(1)
+        sensor_token_idxs = torch.cat([sensor_token_idxs, aggregate_sensor_token_idx], dim=1)
         return sensor_token_idxs

special_tokens_map.json CHANGED Viewed

@@ -13,7 +13,7 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": "<|endoftext|>",
   "unk_token": {
     "content": "<|endoftext|>",
     "lstrip": false,

     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "Ġ.",
   "unk_token": {
     "content": "<|endoftext|>",
     "lstrip": false,

tokenizer.json CHANGED Viewed

@@ -12,9 +12,9 @@
     },
     "direction": "Left",
     "pad_to_multiple_of": null,
-    "pad_id": 50256,
     "pad_type_id": 0,
-    "pad_token": "<|endoftext|>"
   },
   "added_tokens": [
     {

     },
     "direction": "Left",
     "pad_to_multiple_of": null,
+    "pad_id": 764,
     "pad_type_id": 0,
+    "pad_token": "Ġ."
   },
   "added_tokens": [
     {

tokenizer_config.json CHANGED Viewed

@@ -318,7 +318,7 @@
   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 2048,
-  "pad_token": "<|endoftext|>",
   "padding_side": "left",
   "return_token_type_ids": false,
   "tokenizer_class": "CodeGenTokenizer",

   "clean_up_tokenization_spaces": true,
   "eos_token": "<|endoftext|>",
   "model_max_length": 2048,
+  "pad_token": "Ġ.",
   "padding_side": "left",
   "return_token_type_ids": false,
   "tokenizer_class": "CodeGenTokenizer",

train.log CHANGED Viewed

	@@ -1 +1 @@
1	- [2024-12-17 07:27:39,~~043~~][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


1	+ [2024-12-19 09:55:18,435][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.