Galactica-6.7b finetuned on webgpt and prompt_dialogue (version v2) Demo use: ``` import torch from torch import nn from torch.nn import functional as F import transformers base_path = 'OpenAssistant/galactica-6.7b-finetuned' model = transformers.OPTForCausalLM.from_pretrained( base_path, load_in_8bit=True, device_map='auto', low_cpu_mem_usage=True, torch_dtype=torch.float16, offload_state_dict=True ) model.gradient_checkpointing_enable() # reduce number of stored activations model.model.decoder.project_in = lambda x: x.requires_grad_(True) class CastOutputToFloat(nn.Sequential): def forward(self, x): return super().forward(x).to(torch.float32) model.lm_head = CastOutputToFloat(model.lm_head) tokenizer = transformers.AutoTokenizer.from_pretrained(base_path) batch = tokenizer.encode("What are the symptoms of Alzheimer's disease?", return_tensors="pt") with torch.cuda.amp.autocast(): out = model.generate(, max_length=300, do_sample=True, top_k=40, num_beams=1, num_return_sequences=1, eos_token_id=tokenizer.additional_special_tokens_ids[tokenizer.additional_special_tokens.index('')] ) print(tokenizer.decode(out[0, :-1]).replace('', "User:\n").replace('', '\nAssistant:\n')) ``` # [Open LLM Leaderboard Evaluation Results]( Detailed results can be found [here]( | Metric | Value | |-----------------------|---------------------------| | Avg. | 34.34 | | ARC (25-shot) | 41.55 | | HellaSwag (10-shot) | 51.01 | | MMLU (5-shot) | 38.03 | | TruthfulQA (0-shot) | 41.65 | | Winogrande (5-shot) | 57.7 | | GSM8K (5-shot) | 3.11 | | DROP (3-shot) | 7.3 |