Text Generation
English
sft
jordiclive's picture
Update README.md
13443d6
---
license: mit
datasets:
- Nebulous/gpt4all_pruned
- sahil2801/CodeAlpaca-20k
- yahma/alpaca-cleaned
language:
- en
tags:
- sft
pipeline_tag: text-generation
widget:
- text: <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
- text: <|prompter|>What's the Earth total population</s><|assistant|>
- text: <|prompter|>Write a story about future of AI development</s><|assistant|>
---
# LoRA Adapter for LLaMA 13B trained on more datasets than tloen/alpaca-lora-7b
This repo contains a low-rank adapter for **LLaMA-13b** fit on
- `Nebulous/gpt4all_pruned`
- `sahil2801/CodeAlpaca-20k`
- `yahma/alpaca-cleaned`
- datasets part of the OpenAssistant project.
This version of the weights was trained with the following hyperparameters:
- Epochs: 2
- Batch size: 128
- Max Length: 2048
- Learning rate: 4e-6
- Lora _r_: 16
- Lora Alpha: 32
- Lora target modules: q_proj, k_proj, v_proj, o_proj
The model was trained with flash attention and gradient checkpointing.
## Model Details
- **Developed** as part of the OpenAssistant Project
- **Model type:** PEFT Adapter for frozen LLaMA
- **Language:** English
## Prompting
Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should
start generating the assistant reply.
# Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:
```
from typing import List, NamedTuple
import torch
import transformers
from huggingface_hub import hf_hub_download
from peft import PeftModel
from transformers import GenerationConfig
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b")
model = transformers.AutoModelForCausalLM.from_pretrained(
"decapoda-research/llama-13b-hf", torch_dtype=torch.float16
) # Load Base Model
model.resize_token_embeddings(
32016
) # This model repo also contains several embeddings for special tokens that need to be loaded.
model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
lora_weights = "jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b"
model = PeftModel.from_pretrained(
model,
lora_weights,
torch_dtype=torch.float16,
) # Load Lora model
model.eos_token_id = tokenizer.eos_token_id
filename = hf_hub_download("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b", "extra_embeddings.pt")
embed_weights = torch.load(
filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
) # Load embeddings for special tokens
model.base_model.model.model.embed_tokens.weight[32000:, :] = embed_weights.to(
model.base_model.model.model.embed_tokens.weight.dtype
).to(
device
) # Add special token embeddings
model = model.half().to(device)
generation_config = GenerationConfig(
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=4,
)
def format_system_prompt(prompt, eos_token="</s>"):
return "{}{}{}{}".format(
"<|prompter|>",
prompt,
eos_token,
"<|assistant|>"
)
def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
prompt = format_system_prompt(prompt) # OpenAssistant Prompt Format expected
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=max_new_tokens,
eos_token_id=2,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
print("Text generated:")
print(output)
return output
generate("What is a meme, and what's the history behind this word?")
generate("What's the Earth total population")
generate("Write a story about future of AI development")
```