Rogarcia18 commited on about 1 month ago

Commit

1357ffb

verified ·

1 Parent(s): 16b2e77

Upload folder using huggingface_hub

Browse files

Files changed (24) hide show

checkpoint-1500/README.md +206 -0
checkpoint-1500/adapter_config.json +41 -0
checkpoint-1500/adapter_model.safetensors +3 -0
checkpoint-1500/optimizer.pt +3 -0
checkpoint-1500/rng_state.pth +3 -0
checkpoint-1500/scheduler.pt +3 -0
checkpoint-1500/special_tokens_map.json +7 -0
checkpoint-1500/tokenizer.json +0 -0
checkpoint-1500/tokenizer_config.json +56 -0
checkpoint-1500/trainer_state.json +262 -0
checkpoint-1500/training_args.bin +3 -0
checkpoint-1500/vocab.txt +0 -0
checkpoint-750/README.md +206 -0
checkpoint-750/adapter_config.json +41 -0
checkpoint-750/adapter_model.safetensors +3 -0
checkpoint-750/optimizer.pt +3 -0
checkpoint-750/rng_state.pth +3 -0
checkpoint-750/scheduler.pt +3 -0
checkpoint-750/special_tokens_map.json +7 -0
checkpoint-750/tokenizer.json +0 -0
checkpoint-750/tokenizer_config.json +56 -0
checkpoint-750/trainer_state.json +148 -0
checkpoint-750/training_args.bin +3 -0
checkpoint-750/vocab.txt +0 -0

checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: google-bert/bert-base-uncased
+library_name: peft
+tags:
+- base_model:adapter:google-bert/bert-base-uncased
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "google-bert/bert-base-uncased",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value",
+    "key"
+  ],
+  "target_parameters": null,
+  "task_type": "SEQ_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-1500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:322554d22a210ea4ef574c0ecb05e391230083e36f51e7f1ec48cc031a15eb7d
+size 1782740

checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f477602fa7b1aff57be7c2f3f298c17e0f6bb781bc9672cc418b377f34abd7dc
+size 3609035

checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39493a19f0bf124dbdfa315df995fcce5c252bbce511c81c41bafb5983aec910
+size 14645

checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d5e9883e554356d837bd1e97610c6d4704b9dbcef3b1f7cac5eccb0a7c25ef7c
+size 1465

checkpoint-1500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

checkpoint-1500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,262 @@

+{
+  "best_global_step": 1500,
+  "best_metric": 0.925,
+  "best_model_checkpoint": "./qlora-bert-sentiment/checkpoint-1500",
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.06666666666666667,
+      "grad_norm": 3.986419677734375,
+      "learning_rate": 0.00016333333333333334,
+      "loss": 0.7202,
+      "step": 50
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 3.3505516052246094,
+      "learning_rate": 0.00033,
+      "loss": 0.6925,
+      "step": 100
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 5.206148624420166,
+      "learning_rate": 0.0004966666666666666,
+      "loss": 0.4795,
+      "step": 150
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 2.9707682132720947,
+      "learning_rate": 0.0004983764571440296,
+      "loss": 0.3741,
+      "step": 200
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 6.371100902557373,
+      "learning_rate": 0.0004933947257182901,
+      "loss": 0.3153,
+      "step": 250
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 1.7173587083816528,
+      "learning_rate": 0.0004851214981669406,
+      "loss": 0.3596,
+      "step": 300
+    },
+    {
+      "epoch": 0.4666666666666667,
+      "grad_norm": 0.642329216003418,
+      "learning_rate": 0.0004736686557000246,
+      "loss": 0.3291,
+      "step": 350
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 1.6279314756393433,
+      "learning_rate": 0.00045919107836474044,
+      "loss": 0.307,
+      "step": 400
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 7.462891578674316,
+      "learning_rate": 0.0004418845505584972,
+      "loss": 0.2681,
+      "step": 450
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 3.423023223876953,
+      "learning_rate": 0.00042198311338080465,
+      "loss": 0.327,
+      "step": 500
+    },
+    {
+      "epoch": 0.7333333333333333,
+      "grad_norm": 4.022140979766846,
+      "learning_rate": 0.00039975589962889645,
+      "loss": 0.3699,
+      "step": 550
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 15.722062110900879,
+      "learning_rate": 0.00037550349423834005,
+      "loss": 0.3142,
+      "step": 600
+    },
+    {
+      "epoch": 0.8666666666666667,
+      "grad_norm": 4.997221946716309,
+      "learning_rate": 0.00034955386938743217,
+      "loss": 0.2502,
+      "step": 650
+    },
+    {
+      "epoch": 0.9333333333333333,
+      "grad_norm": 5.527884483337402,
+      "learning_rate": 0.0003222579492361179,
+      "loss": 0.2472,
+      "step": 700
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 5.495438575744629,
+      "learning_rate": 0.00029398486427873274,
+      "loss": 0.2484,
+      "step": 750
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.9033333333333333,
+      "eval_loss": 0.2915270924568176,
+      "eval_runtime": 14.1665,
+      "eval_samples_per_second": 42.353,
+      "eval_steps_per_second": 21.177,
+      "step": 750
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 0.16799378395080566,
+      "learning_rate": 0.0002651169594873036,
+      "loss": 0.2231,
+      "step": 800
+    },
+    {
+      "epoch": 1.1333333333333333,
+      "grad_norm": 0.31085747480392456,
+      "learning_rate": 0.00023604462375170903,
+      "loss": 0.2297,
+      "step": 850
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 0.6704829335212708,
+      "learning_rate": 0.00020716101053964964,
+      "loss": 0.166,
+      "step": 900
+    },
+    {
+      "epoch": 1.2666666666666666,
+      "grad_norm": 10.427606582641602,
+      "learning_rate": 0.0001788567211704453,
+      "loss": 0.245,
+      "step": 950
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 4.69648551940918,
+      "learning_rate": 0.00015151452260226222,
+      "loss": 0.2057,
+      "step": 1000
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 6.8767571449279785,
+      "learning_rate": 0.00012550417116563413,
+      "loss": 0.2155,
+      "step": 1050
+    },
+    {
+      "epoch": 1.4666666666666668,
+      "grad_norm": 0.233122318983078,
+      "learning_rate": 0.00010117741224340254,
+      "loss": 0.1568,
+      "step": 1100
+    },
+    {
+      "epoch": 1.5333333333333332,
+      "grad_norm": 3.3703675270080566,
+      "learning_rate": 7.886322351782782e-05,
+      "loss": 0.2277,
+      "step": 1150
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 0.9207598567008972,
+      "learning_rate": 5.88633661117921e-05,
+      "loss": 0.1672,
+      "step": 1200
+    },
+    {
+      "epoch": 1.6666666666666665,
+      "grad_norm": 0.23675969243049622,
+      "learning_rate": 4.144830378727901e-05,
+      "loss": 0.2183,
+      "step": 1250
+    },
+    {
+      "epoch": 1.7333333333333334,
+      "grad_norm": 0.19786447286605835,
+      "learning_rate": 2.6853545386968604e-05,
+      "loss": 0.1465,
+      "step": 1300
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 4.413337230682373,
+      "learning_rate": 1.527645998115246e-05,
+      "loss": 0.1298,
+      "step": 1350
+    },
+    {
+      "epoch": 1.8666666666666667,
+      "grad_norm": 6.995970249176025,
+      "learning_rate": 6.873607789640579e-06,
+      "loss": 0.2451,
+      "step": 1400
+    },
+    {
+      "epoch": 1.9333333333333333,
+      "grad_norm": 0.16676431894302368,
+      "learning_rate": 1.7586229733657643e-06,
+      "loss": 0.2173,
+      "step": 1450
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 4.929508209228516,
+      "learning_rate": 6.769272940521099e-10,
+      "loss": 0.2513,
+      "step": 1500
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.925,
+      "eval_loss": 0.24354523420333862,
+      "eval_runtime": 14.005,
+      "eval_samples_per_second": 42.842,
+      "eval_steps_per_second": 21.421,
+      "step": 1500
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3173696815104000.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6dab441745e31edac563b106e746b0557b62e5d734a8b72eeea252dbbc6afe1
+size 5841

checkpoint-1500/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-750/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: google-bert/bert-base-uncased
+library_name: peft
+tags:
+- base_model:adapter:google-bert/bert-base-uncased
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.17.1

checkpoint-750/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "google-bert/bert-base-uncased",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query",
+    "value",
+    "key"
+  ],
+  "target_parameters": null,
+  "task_type": "SEQ_CLS",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-750/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8a3877c12304f2a252e6230648a6fe8b8504b979424fe516a23dd1f9466d513
+size 1782740

checkpoint-750/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d01db1be6a412db347bcb925b0320ddc500a07beacab58a231c8ff66f3d79029
+size 3609035

checkpoint-750/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:23bc3d1e8167a2527221cdce15f6dc1848352a8722a278bbfd94c5c334208341
+size 14645

checkpoint-750/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22997b9e7a2a02c9f027d7d9a095b883b5f246987605d42afd388eba9f0181fd
+size 1465

checkpoint-750/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

checkpoint-750/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-750/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

checkpoint-750/trainer_state.json ADDED Viewed

	@@ -0,0 +1,148 @@

+{
+  "best_global_step": 750,
+  "best_metric": 0.9033333333333333,
+  "best_model_checkpoint": "./qlora-bert-sentiment/checkpoint-750",
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 750,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.06666666666666667,
+      "grad_norm": 3.986419677734375,
+      "learning_rate": 0.00016333333333333334,
+      "loss": 0.7202,
+      "step": 50
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 3.3505516052246094,
+      "learning_rate": 0.00033,
+      "loss": 0.6925,
+      "step": 100
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 5.206148624420166,
+      "learning_rate": 0.0004966666666666666,
+      "loss": 0.4795,
+      "step": 150
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 2.9707682132720947,
+      "learning_rate": 0.0004983764571440296,
+      "loss": 0.3741,
+      "step": 200
+    },
+    {
+      "epoch": 0.3333333333333333,
+      "grad_norm": 6.371100902557373,
+      "learning_rate": 0.0004933947257182901,
+      "loss": 0.3153,
+      "step": 250
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 1.7173587083816528,
+      "learning_rate": 0.0004851214981669406,
+      "loss": 0.3596,
+      "step": 300
+    },
+    {
+      "epoch": 0.4666666666666667,
+      "grad_norm": 0.642329216003418,
+      "learning_rate": 0.0004736686557000246,
+      "loss": 0.3291,
+      "step": 350
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 1.6279314756393433,
+      "learning_rate": 0.00045919107836474044,
+      "loss": 0.307,
+      "step": 400
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 7.462891578674316,
+      "learning_rate": 0.0004418845505584972,
+      "loss": 0.2681,
+      "step": 450
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 3.423023223876953,
+      "learning_rate": 0.00042198311338080465,
+      "loss": 0.327,
+      "step": 500
+    },
+    {
+      "epoch": 0.7333333333333333,
+      "grad_norm": 4.022140979766846,
+      "learning_rate": 0.00039975589962889645,
+      "loss": 0.3699,
+      "step": 550
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 15.722062110900879,
+      "learning_rate": 0.00037550349423834005,
+      "loss": 0.3142,
+      "step": 600
+    },
+    {
+      "epoch": 0.8666666666666667,
+      "grad_norm": 4.997221946716309,
+      "learning_rate": 0.00034955386938743217,
+      "loss": 0.2502,
+      "step": 650
+    },
+    {
+      "epoch": 0.9333333333333333,
+      "grad_norm": 5.527884483337402,
+      "learning_rate": 0.0003222579492361179,
+      "loss": 0.2472,
+      "step": 700
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 5.495438575744629,
+      "learning_rate": 0.00029398486427873274,
+      "loss": 0.2484,
+      "step": 750
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.9033333333333333,
+      "eval_loss": 0.2915270924568176,
+      "eval_runtime": 14.1665,
+      "eval_samples_per_second": 42.353,
+      "eval_steps_per_second": 21.177,
+      "step": 750
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1586848407552000.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-750/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6dab441745e31edac563b106e746b0557b62e5d734a8b72eeea252dbbc6afe1
+size 5841

checkpoint-750/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff