Upload 11 files

Browse files

Files changed (11) hide show

README.md +158 -62
adapter_config.json +25 -0
adapter_model.bin +3 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +6 -0
tokenizer.json +0 -0
tokenizer_config.json +17 -0
trainer_state.json +169 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,24 +1,28 @@
 ---
-# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
-# Doc / guide: https://huggingface.co/docs/hub/model-cards
-{}
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
-This model aims to provide a Question Answering model tuned with a short (128 tokens per row) Question Answering dataset
-The dataset enables fine tuning in local with small HW, such as 1 GPU with 16 Go RAM
 ## Model Details
-The model has been dowloaded from ibm/mpt-7b-instruct2 (Apache 2.0 License.) and tuned with Supervised Fine-tuning Trainer and PEFT LoRa
 ### Model Description
 ### Model Sources [optional]
@@ -28,91 +32,183 @@ The model has been dowloaded from ibm/mpt-7b-instruct2 (Apache 2.0 License.) and
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
 ### Direct Use
-text = "Below is an instruction from Human. Write a response.\n    ### Instruction:\n   How to diagnose Parasites - Baylisascaris infection ?\n    ### Response:"
-inputs = tokenizer(text, return_tensors="pt").to('cuda')
-out = model.generate(**inputs, max_new_tokens=100)
-print(tokenizer.decode(out[0]))
 ## Bias, Risks, and Limitations
-In order to reduce training duration, the model has been trained only with the first 5100 rows of the 15500 rows dataset
-### Recommendations
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
-Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-## Training Details
-per_device_train_batch_size = 1
-gradient_accumulation_steps = 16
-epoch = 5
-Step 	Training Loss
-64 	    1.618400
-128 	1.084200
-192 	1.021800
-256 	1.014300
-320 	0.960500
-384 	0.905900
-448 	0.885200
-512 	0.847400
-576 	0.889400
-640 	0.861000
-704 	0.800400
-768 	0.768600
-832 	0.750300
-896 	0.780200
-960 	0.762700
-1024 	0.698600
-1088 	0.672600
-1152 	0.693100
-1216 	0.708900
-1280 	0.662700
-1344 	0.630400
-1408 	0.624600
-1472 	0.627200
-1536 	0.628000
-1600 	0.603300
 ### Training Data
-Laurent1/MedQuad-MedicalQnADataset_128tokens_max
 #### Preprocessing [optional]
-Dataset already preprocessed (128 tokens max and truncated at a sentence end to keep meaning)
 #### Training Hyperparameters
-bnb_config = BitsAndBytesConfig(
-    load_in_4bit=True,
-    bnb_4bit_quant_type="nf4",
-    bnb_4bit_compute_dtype=torch.float16,
-)
-model = AutoModelForCausalLM.from_pretrained(
-        "ibm/mpt-7b-instruct2",
-        device_map="auto",
-        torch_dtype=torch.float16, #torch.bfloat16,
-        trust_remote_code=True
-            )
-#### Speeds, Sizes, Times [optional]
-Training :
-6287.4s - GPU T4 x2

 ---
+library_name: peft
+base_model: ibm/mpt-7b-instruct2
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
 ## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
 ### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
+[More Information Needed]
 #### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+## Training procedure
+### Framework versions
+- PEFT 0.6.0
+## Training procedure
+### Framework versions
+- PEFT 0.6.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ibm/mpt-7b-instruct2",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "Wqkv",
+    "up_proj",
+    "out_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca46e1175b341953e55194e308775bc2b92762c34d878554c1cf5908a0250a93
+size 268528333

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:409a3e0c87eca2838b4e3b24404caad915c761ba573cdb33903a5474152df9ac
+size 537029125

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e028d427da0864de9c9bb6b990581599720abaaccea53dee180d2d7affd8eded
+size 14575

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01d01d4ece2a97b3018abb37144598fd8cdffd408b84edf962b287a9d8c9a7b4
+size 627

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "add_prefix_space": false,
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "max_length": 2048,
+  "model_max_length": 2048,
+  "pad_to_multiple_of": null,
+  "pad_token": "<|endoftext|>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "stride": 0,
+  "tokenizer_class": "GPTNeoXTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<|endoftext|>"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,169 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.0,
+  "eval_steps": 500,
+  "global_step": 1600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.2,
+      "learning_rate": 9.896907216494846e-05,
+      "loss": 1.6184,
+      "step": 64
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 9.484536082474227e-05,
+      "loss": 1.0842,
+      "step": 128
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 9.072164948453609e-05,
+      "loss": 1.0218,
+      "step": 192
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 8.65979381443299e-05,
+      "loss": 1.0143,
+      "step": 256
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 8.247422680412371e-05,
+      "loss": 0.9605,
+      "step": 320
+    },
+    {
+      "epoch": 1.2,
+      "learning_rate": 7.835051546391753e-05,
+      "loss": 0.9059,
+      "step": 384
+    },
+    {
+      "epoch": 1.4,
+      "learning_rate": 7.422680412371135e-05,
+      "loss": 0.8852,
+      "step": 448
+    },
+    {
+      "epoch": 1.6,
+      "learning_rate": 7.010309278350515e-05,
+      "loss": 0.8474,
+      "step": 512
+    },
+    {
+      "epoch": 1.8,
+      "learning_rate": 6.597938144329897e-05,
+      "loss": 0.8894,
+      "step": 576
+    },
+    {
+      "epoch": 2.0,
+      "learning_rate": 6.185567010309279e-05,
+      "loss": 0.861,
+      "step": 640
+    },
+    {
+      "epoch": 2.2,
+      "learning_rate": 5.7731958762886594e-05,
+      "loss": 0.8004,
+      "step": 704
+    },
+    {
+      "epoch": 2.4,
+      "learning_rate": 5.360824742268041e-05,
+      "loss": 0.7686,
+      "step": 768
+    },
+    {
+      "epoch": 2.6,
+      "learning_rate": 4.948453608247423e-05,
+      "loss": 0.7503,
+      "step": 832
+    },
+    {
+      "epoch": 2.8,
+      "learning_rate": 4.536082474226804e-05,
+      "loss": 0.7802,
+      "step": 896
+    },
+    {
+      "epoch": 3.0,
+      "learning_rate": 4.1237113402061855e-05,
+      "loss": 0.7627,
+      "step": 960
+    },
+    {
+      "epoch": 3.2,
+      "learning_rate": 3.7113402061855674e-05,
+      "loss": 0.6986,
+      "step": 1024
+    },
+    {
+      "epoch": 3.4,
+      "learning_rate": 3.2989690721649485e-05,
+      "loss": 0.6726,
+      "step": 1088
+    },
+    {
+      "epoch": 3.6,
+      "learning_rate": 2.8865979381443297e-05,
+      "loss": 0.6931,
+      "step": 1152
+    },
+    {
+      "epoch": 3.8,
+      "learning_rate": 2.4742268041237116e-05,
+      "loss": 0.7089,
+      "step": 1216
+    },
+    {
+      "epoch": 4.0,
+      "learning_rate": 2.0618556701030927e-05,
+      "loss": 0.6627,
+      "step": 1280
+    },
+    {
+      "epoch": 4.2,
+      "learning_rate": 1.6494845360824743e-05,
+      "loss": 0.6304,
+      "step": 1344
+    },
+    {
+      "epoch": 4.4,
+      "learning_rate": 1.2371134020618558e-05,
+      "loss": 0.6246,
+      "step": 1408
+    },
+    {
+      "epoch": 4.6,
+      "learning_rate": 8.247422680412371e-06,
+      "loss": 0.6272,
+      "step": 1472
+    },
+    {
+      "epoch": 4.8,
+      "learning_rate": 4.123711340206186e-06,
+      "loss": 0.628,
+      "step": 1536
+    },
+    {
+      "epoch": 5.0,
+      "learning_rate": 0.0,
+      "loss": 0.6033,
+      "step": 1600
+    }
+  ],
+  "logging_steps": 64,
+  "max_steps": 1600,
+  "num_train_epochs": 5,
+  "save_steps": 64,
+  "total_flos": 1.0172436457734144e+17,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff167a52ec01ecc304782bbe68032336f2138182867621872aafe714497cc79b
+size 4027