About

Hi, this is the Readme.

This Model was created as a study experiment, to re-create alpaca on my end.
It uses the gururise/AlpacaDataCleaned Dataset ( From April 7 )

Specifications

Base Model:
LLaMA 7B

Training Parameters:
Micro_Batch_Size = 8
Batch_Size = 128
Gradient_Accumulation_Steps = Batch_Size / Micro_Batch_Size # ( 0.0625 )
Epochs = 2
Learning_Rate = 2e-5
Cutoff_Len = 256 # This ( 256 ) accounts for about 96% of all data
Lora_R = 4
Lora_Alpha = 16
Lora_Dropout = 0.05

Files

adapter_model.bin # This is the Fine-tuned Weights that goes over the base LLaMA Model.
adapter_config.bin # This is Config File for the adapter_model file.

consolidated.00.pth # This File is the Base Model File ( LLaMA 7B ), merged with the fine-tuned weights ( adapter_model.bin ).
tokenizer.model # This is the tokenizer file, it converts the input text ( prompt ) to tokens that the NN can understand.
params.json # Parameters of the Model.

ggml_model_f16.bin # This is the same model ( consolidated.00.pth ), but now it's in 'ggml f16' format. We need this format to quantize it with llama.cpp.
llama-hf-7b # This folder contains the same model ( consolidated.00.pth ), but now it's in 'huggingface' format. We need this format to quantize it with GPTQ.

quantized-model:
ggml-model-q4_0.bin # This is the 4-bit Quantized Model by llama.cpp, I found this to be better than GPTQ.
llama7b-4bit-128g.pt # This is the Quantized Model by GPTQ. It takes longer to train and gives worse results compared to llama.cpp, but it does have a ( 7.6% ) smaller file size.