Training in progress, step 100, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +5 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +20 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +43 -0
last-checkpoint/trainer_state.json +758 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2-0.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-0.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e87ddb4b6888977bdf126835ab3222ec37a0f6de4c95db918b1c88ad5a881eb1
+size 35237104

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0beb7bc7cb78da07b6474453f10b08dfe006e5709542bb99daa915fc979173f7
+size 18810036

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fa37a7cf7f7e0a1b6bcff48efc4571c205b16dd75c990c58de904af2ad7e32e
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edcec314f8e722914c719974861f42da84b858ee2040ad068af07f33ee7c82fa
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcfe42da0a4497e8b2b172c1f9f4ec423a46dc12907f4349c55025f670422ba9
+size 11418266

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,758 @@

+{
+  "best_metric": 3.233719825744629,
+  "best_model_checkpoint": "miner_id_24/checkpoint-100",
+  "epoch": 0.027828990851219256,
+  "eval_steps": 100,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0002782899085121926,
+      "grad_norm": 17.002941131591797,
+      "learning_rate": 2e-05,
+      "loss": 5.9812,
+      "step": 1
+    },
+    {
+      "epoch": 0.0002782899085121926,
+      "eval_loss": 5.98688268661499,
+      "eval_runtime": 84.9746,
+      "eval_samples_per_second": 58.841,
+      "eval_steps_per_second": 14.71,
+      "step": 1
+    },
+    {
+      "epoch": 0.0005565798170243852,
+      "grad_norm": 19.287992477416992,
+      "learning_rate": 4e-05,
+      "loss": 5.9342,
+      "step": 2
+    },
+    {
+      "epoch": 0.0008348697255365778,
+      "grad_norm": 21.941875457763672,
+      "learning_rate": 6e-05,
+      "loss": 6.385,
+      "step": 3
+    },
+    {
+      "epoch": 0.0011131596340487704,
+      "grad_norm": 16.80741310119629,
+      "learning_rate": 8e-05,
+      "loss": 5.6794,
+      "step": 4
+    },
+    {
+      "epoch": 0.0013914495425609629,
+      "grad_norm": 12.198833465576172,
+      "learning_rate": 0.0001,
+      "loss": 5.4401,
+      "step": 5
+    },
+    {
+      "epoch": 0.0016697394510731556,
+      "grad_norm": 7.938112735748291,
+      "learning_rate": 0.00012,
+      "loss": 5.3999,
+      "step": 6
+    },
+    {
+      "epoch": 0.001948029359585348,
+      "grad_norm": 14.917191505432129,
+      "learning_rate": 0.00014,
+      "loss": 5.7897,
+      "step": 7
+    },
+    {
+      "epoch": 0.0022263192680975407,
+      "grad_norm": 7.297199726104736,
+      "learning_rate": 0.00016,
+      "loss": 4.7889,
+      "step": 8
+    },
+    {
+      "epoch": 0.002504609176609733,
+      "grad_norm": 7.74416446685791,
+      "learning_rate": 0.00018,
+      "loss": 4.7504,
+      "step": 9
+    },
+    {
+      "epoch": 0.0027828990851219257,
+      "grad_norm": 6.026649475097656,
+      "learning_rate": 0.0002,
+      "loss": 3.9825,
+      "step": 10
+    },
+    {
+      "epoch": 0.0030611889936341184,
+      "grad_norm": 9.58944320678711,
+      "learning_rate": 0.0001999999904195954,
+      "loss": 4.3154,
+      "step": 11
+    },
+    {
+      "epoch": 0.003339478902146311,
+      "grad_norm": 7.8204426765441895,
+      "learning_rate": 0.00019999996167838346,
+      "loss": 4.0044,
+      "step": 12
+    },
+    {
+      "epoch": 0.0036177688106585034,
+      "grad_norm": 11.4631929397583,
+      "learning_rate": 0.0001999999137763697,
+      "loss": 4.0882,
+      "step": 13
+    },
+    {
+      "epoch": 0.003896058719170696,
+      "grad_norm": 9.104846000671387,
+      "learning_rate": 0.00019999984671356322,
+      "loss": 4.1362,
+      "step": 14
+    },
+    {
+      "epoch": 0.004174348627682889,
+      "grad_norm": 4.854076385498047,
+      "learning_rate": 0.00019999976048997695,
+      "loss": 3.8936,
+      "step": 15
+    },
+    {
+      "epoch": 0.0044526385361950815,
+      "grad_norm": 4.761904239654541,
+      "learning_rate": 0.0001999996551056274,
+      "loss": 3.7275,
+      "step": 16
+    },
+    {
+      "epoch": 0.004730928444707274,
+      "grad_norm": 5.670674800872803,
+      "learning_rate": 0.0001999995305605347,
+      "loss": 3.7384,
+      "step": 17
+    },
+    {
+      "epoch": 0.005009218353219466,
+      "grad_norm": 3.3813259601593018,
+      "learning_rate": 0.00019999938685472278,
+      "loss": 3.717,
+      "step": 18
+    },
+    {
+      "epoch": 0.005287508261731659,
+      "grad_norm": 3.5495376586914062,
+      "learning_rate": 0.0001999992239882192,
+      "loss": 3.6773,
+      "step": 19
+    },
+    {
+      "epoch": 0.005565798170243851,
+      "grad_norm": 2.8463995456695557,
+      "learning_rate": 0.00019999904196105507,
+      "loss": 3.4787,
+      "step": 20
+    },
+    {
+      "epoch": 0.005844088078756044,
+      "grad_norm": 3.221832513809204,
+      "learning_rate": 0.00019999884077326533,
+      "loss": 3.6683,
+      "step": 21
+    },
+    {
+      "epoch": 0.006122377987268237,
+      "grad_norm": 3.0998988151550293,
+      "learning_rate": 0.00019999862042488853,
+      "loss": 3.9501,
+      "step": 22
+    },
+    {
+      "epoch": 0.0064006678957804295,
+      "grad_norm": 3.077512502670288,
+      "learning_rate": 0.00019999838091596688,
+      "loss": 3.6844,
+      "step": 23
+    },
+    {
+      "epoch": 0.006678957804292622,
+      "grad_norm": 2.7098445892333984,
+      "learning_rate": 0.00019999812224654625,
+      "loss": 3.3924,
+      "step": 24
+    },
+    {
+      "epoch": 0.006957247712804814,
+      "grad_norm": 3.481621026992798,
+      "learning_rate": 0.00019999784441667627,
+      "loss": 3.6165,
+      "step": 25
+    },
+    {
+      "epoch": 0.007235537621317007,
+      "grad_norm": 2.7846627235412598,
+      "learning_rate": 0.0001999975474264101,
+      "loss": 3.6616,
+      "step": 26
+    },
+    {
+      "epoch": 0.0075138275298291994,
+      "grad_norm": 2.986056089401245,
+      "learning_rate": 0.00019999723127580468,
+      "loss": 3.6636,
+      "step": 27
+    },
+    {
+      "epoch": 0.007792117438341392,
+      "grad_norm": 2.3427329063415527,
+      "learning_rate": 0.00019999689596492058,
+      "loss": 3.3375,
+      "step": 28
+    },
+    {
+      "epoch": 0.008070407346853584,
+      "grad_norm": 2.994873523712158,
+      "learning_rate": 0.00019999654149382206,
+      "loss": 3.4239,
+      "step": 29
+    },
+    {
+      "epoch": 0.008348697255365778,
+      "grad_norm": 2.916687250137329,
+      "learning_rate": 0.00019999616786257703,
+      "loss": 3.6612,
+      "step": 30
+    },
+    {
+      "epoch": 0.00862698716387797,
+      "grad_norm": 3.2186973094940186,
+      "learning_rate": 0.00019999577507125705,
+      "loss": 3.4924,
+      "step": 31
+    },
+    {
+      "epoch": 0.008905277072390163,
+      "grad_norm": 2.7801923751831055,
+      "learning_rate": 0.00019999536311993742,
+      "loss": 3.5651,
+      "step": 32
+    },
+    {
+      "epoch": 0.009183566980902355,
+      "grad_norm": 3.049238443374634,
+      "learning_rate": 0.00019999493200869713,
+      "loss": 3.6423,
+      "step": 33
+    },
+    {
+      "epoch": 0.009461856889414548,
+      "grad_norm": 2.719590425491333,
+      "learning_rate": 0.00019999448173761865,
+      "loss": 3.2166,
+      "step": 34
+    },
+    {
+      "epoch": 0.00974014679792674,
+      "grad_norm": 2.965956211090088,
+      "learning_rate": 0.00019999401230678837,
+      "loss": 3.66,
+      "step": 35
+    },
+    {
+      "epoch": 0.010018436706438932,
+      "grad_norm": 2.7884485721588135,
+      "learning_rate": 0.00019999352371629617,
+      "loss": 3.3686,
+      "step": 36
+    },
+    {
+      "epoch": 0.010296726614951126,
+      "grad_norm": 3.150559186935425,
+      "learning_rate": 0.00019999301596623567,
+      "loss": 3.6838,
+      "step": 37
+    },
+    {
+      "epoch": 0.010575016523463317,
+      "grad_norm": 3.7389256954193115,
+      "learning_rate": 0.0001999924890567042,
+      "loss": 3.2839,
+      "step": 38
+    },
+    {
+      "epoch": 0.010853306431975511,
+      "grad_norm": 2.6185200214385986,
+      "learning_rate": 0.00019999194298780273,
+      "loss": 3.4399,
+      "step": 39
+    },
+    {
+      "epoch": 0.011131596340487703,
+      "grad_norm": 2.8291258811950684,
+      "learning_rate": 0.0001999913777596358,
+      "loss": 3.3369,
+      "step": 40
+    },
+    {
+      "epoch": 0.011409886248999896,
+      "grad_norm": 3.6246542930603027,
+      "learning_rate": 0.00019999079337231185,
+      "loss": 3.2463,
+      "step": 41
+    },
+    {
+      "epoch": 0.011688176157512088,
+      "grad_norm": 2.834122657775879,
+      "learning_rate": 0.0001999901898259427,
+      "loss": 3.2887,
+      "step": 42
+    },
+    {
+      "epoch": 0.01196646606602428,
+      "grad_norm": 2.774077892303467,
+      "learning_rate": 0.00019998956712064412,
+      "loss": 3.2401,
+      "step": 43
+    },
+    {
+      "epoch": 0.012244755974536474,
+      "grad_norm": 3.3849713802337646,
+      "learning_rate": 0.00019998892525653535,
+      "loss": 3.599,
+      "step": 44
+    },
+    {
+      "epoch": 0.012523045883048665,
+      "grad_norm": 3.1346535682678223,
+      "learning_rate": 0.00019998826423373942,
+      "loss": 3.4501,
+      "step": 45
+    },
+    {
+      "epoch": 0.012801335791560859,
+      "grad_norm": 2.7817208766937256,
+      "learning_rate": 0.00019998758405238295,
+      "loss": 3.405,
+      "step": 46
+    },
+    {
+      "epoch": 0.01307962570007305,
+      "grad_norm": 3.800513744354248,
+      "learning_rate": 0.0001999868847125963,
+      "loss": 3.285,
+      "step": 47
+    },
+    {
+      "epoch": 0.013357915608585244,
+      "grad_norm": 3.1255245208740234,
+      "learning_rate": 0.00019998616621451349,
+      "loss": 3.2415,
+      "step": 48
+    },
+    {
+      "epoch": 0.013636205517097436,
+      "grad_norm": 3.535891056060791,
+      "learning_rate": 0.0001999854285582721,
+      "loss": 3.5676,
+      "step": 49
+    },
+    {
+      "epoch": 0.013914495425609628,
+      "grad_norm": 3.263808012008667,
+      "learning_rate": 0.00019998467174401355,
+      "loss": 3.525,
+      "step": 50
+    },
+    {
+      "epoch": 0.014192785334121822,
+      "grad_norm": 2.7439327239990234,
+      "learning_rate": 0.00019998389577188284,
+      "loss": 3.4441,
+      "step": 51
+    },
+    {
+      "epoch": 0.014471075242634013,
+      "grad_norm": 3.0952131748199463,
+      "learning_rate": 0.00019998310064202866,
+      "loss": 3.2142,
+      "step": 52
+    },
+    {
+      "epoch": 0.014749365151146207,
+      "grad_norm": 2.5860517024993896,
+      "learning_rate": 0.00019998228635460336,
+      "loss": 3.3881,
+      "step": 53
+    },
+    {
+      "epoch": 0.015027655059658399,
+      "grad_norm": 2.5894296169281006,
+      "learning_rate": 0.00019998145290976287,
+      "loss": 3.4558,
+      "step": 54
+    },
+    {
+      "epoch": 0.015305944968170592,
+      "grad_norm": 3.080479621887207,
+      "learning_rate": 0.000199980600307667,
+      "loss": 3.427,
+      "step": 55
+    },
+    {
+      "epoch": 0.015584234876682784,
+      "grad_norm": 3.147078037261963,
+      "learning_rate": 0.00019997972854847912,
+      "loss": 3.539,
+      "step": 56
+    },
+    {
+      "epoch": 0.015862524785194978,
+      "grad_norm": 2.9708526134490967,
+      "learning_rate": 0.0001999788376323662,
+      "loss": 3.3065,
+      "step": 57
+    },
+    {
+      "epoch": 0.016140814693707168,
+      "grad_norm": 2.3979759216308594,
+      "learning_rate": 0.000199977927559499,
+      "loss": 3.3108,
+      "step": 58
+    },
+    {
+      "epoch": 0.01641910460221936,
+      "grad_norm": 2.7668023109436035,
+      "learning_rate": 0.0001999769983300518,
+      "loss": 3.2346,
+      "step": 59
+    },
+    {
+      "epoch": 0.016697394510731555,
+      "grad_norm": 3.3077287673950195,
+      "learning_rate": 0.00019997604994420276,
+      "loss": 3.0784,
+      "step": 60
+    },
+    {
+      "epoch": 0.01697568441924375,
+      "grad_norm": 3.038454055786133,
+      "learning_rate": 0.0001999750824021336,
+      "loss": 3.2051,
+      "step": 61
+    },
+    {
+      "epoch": 0.01725397432775594,
+      "grad_norm": 2.8979313373565674,
+      "learning_rate": 0.00019997409570402961,
+      "loss": 3.3868,
+      "step": 62
+    },
+    {
+      "epoch": 0.017532264236268132,
+      "grad_norm": 2.8584742546081543,
+      "learning_rate": 0.0001999730898500799,
+      "loss": 3.4808,
+      "step": 63
+    },
+    {
+      "epoch": 0.017810554144780326,
+      "grad_norm": 2.4810538291931152,
+      "learning_rate": 0.0001999720648404772,
+      "loss": 3.2123,
+      "step": 64
+    },
+    {
+      "epoch": 0.018088844053292516,
+      "grad_norm": 2.732379674911499,
+      "learning_rate": 0.00019997102067541796,
+      "loss": 3.1345,
+      "step": 65
+    },
+    {
+      "epoch": 0.01836713396180471,
+      "grad_norm": 3.2496063709259033,
+      "learning_rate": 0.0001999699573551022,
+      "loss": 3.5387,
+      "step": 66
+    },
+    {
+      "epoch": 0.018645423870316903,
+      "grad_norm": 2.8145253658294678,
+      "learning_rate": 0.00019996887487973365,
+      "loss": 3.1685,
+      "step": 67
+    },
+    {
+      "epoch": 0.018923713778829097,
+      "grad_norm": 2.7135744094848633,
+      "learning_rate": 0.00019996777324951973,
+      "loss": 3.1059,
+      "step": 68
+    },
+    {
+      "epoch": 0.019202003687341287,
+      "grad_norm": 3.733142137527466,
+      "learning_rate": 0.00019996665246467155,
+      "loss": 3.2716,
+      "step": 69
+    },
+    {
+      "epoch": 0.01948029359585348,
+      "grad_norm": 2.9407293796539307,
+      "learning_rate": 0.00019996551252540382,
+      "loss": 3.3199,
+      "step": 70
+    },
+    {
+      "epoch": 0.019758583504365674,
+      "grad_norm": 2.5942862033843994,
+      "learning_rate": 0.000199964353431935,
+      "loss": 3.0171,
+      "step": 71
+    },
+    {
+      "epoch": 0.020036873412877864,
+      "grad_norm": 3.57987904548645,
+      "learning_rate": 0.00019996317518448714,
+      "loss": 3.3179,
+      "step": 72
+    },
+    {
+      "epoch": 0.020315163321390058,
+      "grad_norm": 3.3481435775756836,
+      "learning_rate": 0.00019996197778328602,
+      "loss": 2.9369,
+      "step": 73
+    },
+    {
+      "epoch": 0.02059345322990225,
+      "grad_norm": 3.3575656414031982,
+      "learning_rate": 0.0001999607612285611,
+      "loss": 3.4142,
+      "step": 74
+    },
+    {
+      "epoch": 0.020871743138414445,
+      "grad_norm": 2.5996809005737305,
+      "learning_rate": 0.00019995952552054544,
+      "loss": 3.2364,
+      "step": 75
+    },
+    {
+      "epoch": 0.021150033046926635,
+      "grad_norm": 2.7393929958343506,
+      "learning_rate": 0.00019995827065947584,
+      "loss": 3.2158,
+      "step": 76
+    },
+    {
+      "epoch": 0.02142832295543883,
+      "grad_norm": 3.2959396839141846,
+      "learning_rate": 0.00019995699664559276,
+      "loss": 3.561,
+      "step": 77
+    },
+    {
+      "epoch": 0.021706612863951022,
+      "grad_norm": 3.0207879543304443,
+      "learning_rate": 0.00019995570347914026,
+      "loss": 3.367,
+      "step": 78
+    },
+    {
+      "epoch": 0.021984902772463212,
+      "grad_norm": 3.2401602268218994,
+      "learning_rate": 0.0001999543911603661,
+      "loss": 3.1579,
+      "step": 79
+    },
+    {
+      "epoch": 0.022263192680975406,
+      "grad_norm": 3.10953426361084,
+      "learning_rate": 0.00019995305968952183,
+      "loss": 3.3987,
+      "step": 80
+    },
+    {
+      "epoch": 0.0225414825894876,
+      "grad_norm": 2.6518425941467285,
+      "learning_rate": 0.00019995170906686251,
+      "loss": 2.9922,
+      "step": 81
+    },
+    {
+      "epoch": 0.022819772497999793,
+      "grad_norm": 2.9276158809661865,
+      "learning_rate": 0.00019995033929264694,
+      "loss": 3.3777,
+      "step": 82
+    },
+    {
+      "epoch": 0.023098062406511983,
+      "grad_norm": 2.297638177871704,
+      "learning_rate": 0.00019994895036713756,
+      "loss": 3.131,
+      "step": 83
+    },
+    {
+      "epoch": 0.023376352315024176,
+      "grad_norm": 3.2343451976776123,
+      "learning_rate": 0.00019994754229060052,
+      "loss": 3.3614,
+      "step": 84
+    },
+    {
+      "epoch": 0.02365464222353637,
+      "grad_norm": 6.476233005523682,
+      "learning_rate": 0.00019994611506330562,
+      "loss": 3.5223,
+      "step": 85
+    },
+    {
+      "epoch": 0.02393293213204856,
+      "grad_norm": 2.676745891571045,
+      "learning_rate": 0.00019994466868552627,
+      "loss": 3.4546,
+      "step": 86
+    },
+    {
+      "epoch": 0.024211222040560754,
+      "grad_norm": 2.841125965118408,
+      "learning_rate": 0.00019994320315753973,
+      "loss": 3.5407,
+      "step": 87
+    },
+    {
+      "epoch": 0.024489511949072947,
+      "grad_norm": 2.9516446590423584,
+      "learning_rate": 0.0001999417184796267,
+      "loss": 3.103,
+      "step": 88
+    },
+    {
+      "epoch": 0.02476780185758514,
+      "grad_norm": 2.586479663848877,
+      "learning_rate": 0.00019994021465207174,
+      "loss": 3.4791,
+      "step": 89
+    },
+    {
+      "epoch": 0.02504609176609733,
+      "grad_norm": 2.7659854888916016,
+      "learning_rate": 0.00019993869167516287,
+      "loss": 3.2204,
+      "step": 90
+    },
+    {
+      "epoch": 0.025324381674609524,
+      "grad_norm": 2.610041379928589,
+      "learning_rate": 0.00019993714954919206,
+      "loss": 3.1517,
+      "step": 91
+    },
+    {
+      "epoch": 0.025602671583121718,
+      "grad_norm": 2.58697247505188,
+      "learning_rate": 0.0001999355882744547,
+      "loss": 3.047,
+      "step": 92
+    },
+    {
+      "epoch": 0.025880961491633908,
+      "grad_norm": 2.3641750812530518,
+      "learning_rate": 0.00019993400785124995,
+      "loss": 3.1796,
+      "step": 93
+    },
+    {
+      "epoch": 0.0261592514001461,
+      "grad_norm": 2.2581377029418945,
+      "learning_rate": 0.00019993240827988063,
+      "loss": 2.8717,
+      "step": 94
+    },
+    {
+      "epoch": 0.026437541308658295,
+      "grad_norm": 2.949000120162964,
+      "learning_rate": 0.00019993078956065323,
+      "loss": 3.4251,
+      "step": 95
+    },
+    {
+      "epoch": 0.02671583121717049,
+      "grad_norm": 2.414064645767212,
+      "learning_rate": 0.00019992915169387795,
+      "loss": 3.1424,
+      "step": 96
+    },
+    {
+      "epoch": 0.02699412112568268,
+      "grad_norm": 2.637364149093628,
+      "learning_rate": 0.00019992749467986857,
+      "loss": 3.0854,
+      "step": 97
+    },
+    {
+      "epoch": 0.027272411034194873,
+      "grad_norm": 2.871019124984741,
+      "learning_rate": 0.00019992581851894264,
+      "loss": 3.1808,
+      "step": 98
+    },
+    {
+      "epoch": 0.027550700942707066,
+      "grad_norm": 3.2153420448303223,
+      "learning_rate": 0.00019992412321142127,
+      "loss": 3.1763,
+      "step": 99
+    },
+    {
+      "epoch": 0.027828990851219256,
+      "grad_norm": 2.187028169631958,
+      "learning_rate": 0.00019992240875762934,
+      "loss": 2.9023,
+      "step": 100
+    },
+    {
+      "epoch": 0.027828990851219256,
+      "eval_loss": 3.233719825744629,
+      "eval_runtime": 84.2876,
+      "eval_samples_per_second": 59.321,
+      "eval_steps_per_second": 14.83,
+      "step": 100
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 7187,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7209543008256000.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0310854820f9682e0cf81a554b45e80d944c3283888e751c7698f256bfa0a709
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff