This is a Finetuning of GPT-J-6B using LoRa - https://huggingface.co/EleutherAI/gpt-j-6B The dataset is the cleaned version of the Alpaca dataset - https://github.com/gururise/AlpacaDataCleaned A model similar to this has been talked about The performance is good but not as good as the orginal Alpaca trained from a base model of LLaMa This is mostly due to the LLaMa 7B model being pretrained on 1T tokens and GPT-J-6B being trained on 300-400M tokens You will need a 3090 or A100 to run it, unfortunately this current version won't work on a T4. --- library_name: peft license: apache-2.0 language: - en tags: - Text Generation --- ## Training procedure The following `bitsandbytes` quantization config was used during training: - load_in_8bit: True - load_in_4bit: False - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: fp4 - bnb_4bit_use_double_quant: False - bnb_4bit_compute_dtype: float32 The following `bitsandbytes` quantization config was used during training: - load_in_8bit: True - load_in_4bit: False - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: fp4 - bnb_4bit_use_double_quant: False - bnb_4bit_compute_dtype: float32 The following `bitsandbytes` quantization config was used during training: - load_in_8bit: True - load_in_4bit: False - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: fp4 - bnb_4bit_use_double_quant: False - bnb_4bit_compute_dtype: float32 ### Framework versions - PEFT 0.4.0.dev0 - PEFT 0.4.0.dev0 - PEFT 0.4.0.dev0