ARCHIVED.
Download from original repo: https://huggingface.co/openlm-research/open_llama_3b_600bt_preview
I made a few PRs to the original repo to include my changes!
Original model from https://huggingface.co/openlm-research/open_llama_3b_600bt_preview. Example below edited from https://github.com/openlm-research/open_llama
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "openlm-research/open_llama_3b_600bt_preview"
fast_model_name = "danielhanchen/open_llama_3b_600bt_preview"
tokenizer = AutoTokenizer.from_pretrained(fast_model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.float16, device_map = "auto")
prompt = "Q: What is the largest animal?\nA:"
input_ids = tokenizer(prompt, return_tensors = "pt").input_ids
print( tokenizer.decode( model.generate( input_ids, max_new_tokens = 32).ravel() ) )
This repo includes:
- Ported
LlamaTokenizer
toLlamaTokenizerFast
via a few lines of code. Loading viaAutoTokenizer
takes 4 to 5 minutes. Now, a few seconds! Essentially the porting is done via the below code:
# from huggingface_hub import notebook_login
# notebook_login()
from transformers import LlamaTokenizerFast
from tokenizers import AddedToken
tokenizer = LlamaTokenizerFast.from_pretrained(
"openlm-research/open_llama_3b_600bt_preview",
add_bos_token = True,
add_eos_token = False, # Original LLaMA is False -> add </s> during processing.
bos_token = AddedToken("<s>", single_word = True),
eos_token = AddedToken("</s>", single_word = True),
unk_token = AddedToken("<unk>", single_word = True),
pad_token = AddedToken("<unk>", single_word = True)
)
tokenizer.push_to_hub("open_llama_3b_600bt_preview")
AutoTokenizer
does not recognize the BOS, EOS and UNK tokens. Weirdly<unk>
ie the 0 token was added instead of the<s>
or</s>
token.- Manually added BOS
<s>
, EOS</s>
, UNK<unk>
tokens, with PAD (padding) being also the<unk>
token.
- Downloads last month
- 752
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.