yep-search/flan-llama-7b-delta

NOTE: This "delta model" cannot be used directly.

Users have to apply it on top of the original LLaMA weights to get actual flan-llama weights. (sample refer below)

How to Use:

device = 0 # Define your GPU device here
llama_path = '' # Define your original llama-7b load path here (huggingface checkpoint)

import transformers 
from collections import OrderedDict
model_llama = transformers.AutoModelForCausalLM.from_pretrained(llama_path)
tokenizer = transformers.AutoTokenizer.from_pretrained(llama_path)
model_flan_llama = transformers.AutoModelForCausalLM.from_pretrained("yep-search/flan-llama-7b-ckpt-diff")

model_state_dict = []
for key in model_flan_llama.state_dict().keys():
    model_state_dict.append((key, model_flan_llama.state_dict()[key]+model_llama.state_dict()[key]))
model_state_dict = OrderedDict(model_state_dict)
model_flan_llama.load_state_dict(model_state_dict)

model_flan_llama = model_flan_llama.to(device)
model_flan_llama.eval()

def generate(prompt, model, device):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    gen_output = model.generate(input_ids.to(device), max_new_tokens=512, early_stopping=True)[0]
    answer_cot = tokenizer.decode(gen_output, skip_special_tokens=True)
    return answer_cot

prompt = "Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering."
print(generate(prompt, model_flan_llama, device))

output:

Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering. Geoffrey Hinton is a living person. George Washington was not alive when Geoffrey Hinton was born. The final answer: no.

Dataset and Training:

We finetune the original llama-7b model on extracted and sampled Flan-2022 dataset. The data are filtered to be limited to maximum source sequence length of 1536, and maximum target sequence length of 512, which accounts for roughly 5.5mil samples. (The sampled and extracted unfiltered dataset to be published on huggingface datasets soon)

We finetune the original llama-7b model on 8 A100 GPUs using pytorch's FSDP, with a learning rate of 2e-5, with warm up ratio of 0.03 and cosine rate decay, and batch size of 128.

Reference:

@inproceedings{weifinetuned,
  title={Finetuned Language Models are Zero-Shot Learners},
  author={Wei, Jason and Bosma, Maarten and Zhao, Vincent and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M and Le, Quoc V},
  booktitle={International Conference on Learning Representations}
}

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

yep-search
/

flan-llama-7b-delta

You need to agree to share your contact information to access this model

NOTE: This "delta model" cannot be used directly.

How to Use:

Dataset and Training:

Reference:

Spaces using yep-search/flan-llama-7b-delta 2