01GangaPutraBheeshma
/

colab_code_generator_FT_code_gen_UT

Text2Text Generation

PEFT

Safetensors

English

code

Model card Files Files and versions Community

01GangaPutraBheeshma commited on Nov 25, 2023

Commit

b58d68b

•

1 Parent(s): 27ff0cb

Update README.md

Browse files

Files changed (1) hide show

README.md +33 -3

README.md CHANGED Viewed

@@ -41,8 +41,38 @@ test_tokenizer_UT = AutoTokenizer.from_pretrained("01GangaPutraBheeshma/colab_co
 # Documentation
 This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
-Setting lora_alpha to 16 suggests that I chose a relatively strong regularization. The specific value of this hyperparameter often requires experimentation and tuning to find the optimal balance between preventing overfitting and allowing the model to capture important patterns in the data.
-The lora_dropout rate is 0.1, which dropped out 10% of the neurons randomly during training. This helps to prevent overfitting by introducing a level of randomness and redundancy in the network.
-'r' in LoRa represents a rank which helps to decide the level of representation of the model in terms of number of dimensions or features. This proved to be advantageous for tasks like fine-tuning, where reducing the complexity of the model while preserving information is paramount.

 # Documentation
 This model was fine-tuned using LoRA because I wanted the model's weights to be efficient in solving other types of Python problems(Ones that were not included in the training data).
+Setting lora_alpha to 16 suggests a relatively strong regularization. The specific value of this hyperparameter often requires experimentation and tuning to find the optimal balance between preventing overfitting and allowing the model to capture important patterns in the data.
+The lora_dropout rate is 0.1, which dropped out 10% of the neurons randomly during training. This helped to prevent overfitting by introducing a level of randomness and redundancy in the network.
+'r' in LoRa represents a rank that helps to decide the level of representation of the model in terms of a number of dimensions or features. This proved to be advantageous for tasks like fine-tuning, where we witness a reduction in the complexity of the model while preserving information is our paramount goal.
+I am using bitsAndBytesConfig by loading the main model in 4 bits, as I wanted something to be trained quickly and be efficient rather than being super precise with its results. This tradeoff was needed due to the cluster that I am involved in working with.
+There is a use of double quantization for the 4-bit representation. Quantization is a process of mapping a range of values to a smaller set of discrete values. "Double quantization" here implies an additional refinement or quantization step, possibly to improve the precision of the representation within the constraints of 4-bit storage.
+The Datatype involved during the computational steps involving 4-bit representation is "float16". Using floating-point numbers allows for more precision in mathematical operations in comparision to integer representations.
+### Lora Config
+'''
+peft_config = LoraConfig(
+      lora_alpha=16,
+      lora_dropout=0.1,
+      r=64,
+      bias="none",
+      task_type="CAUSAL_LM",
+)
+'''
+### BitsAndBytesConfig
+```
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype="float16"
+)
+```