--- license: apache-2.0 language: - en --- # ARCHIVED. ## Download from original repo: https://huggingface.co/openlm-research/open_llama_3b_600bt_preview ### I made a few PRs to the original repo to include my changes! Original model from https://huggingface.co/openlm-research/open_llama_3b_600bt_preview. Example below edited from https://github.com/openlm-research/open_llama ``` import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "openlm-research/open_llama_3b_600bt_preview" fast_model_name = "danielhanchen/open_llama_3b_600bt_preview" tokenizer = AutoTokenizer.from_pretrained(fast_model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.float16, device_map = "auto") prompt = "Q: What is the largest animal?\nA:" input_ids = tokenizer(prompt, return_tensors = "pt").input_ids print( tokenizer.decode( model.generate( input_ids, max_new_tokens = 32).ravel() ) ) ``` This repo includes: 1) Ported `LlamaTokenizer` to `LlamaTokenizerFast` via a few lines of code. Loading via `AutoTokenizer` takes 4 to 5 minutes. Now, a few seconds! Essentially the porting is done via the below code: ``` # from huggingface_hub import notebook_login # notebook_login() from transformers import LlamaTokenizerFast from tokenizers import AddedToken tokenizer = LlamaTokenizerFast.from_pretrained( "openlm-research/open_llama_3b_600bt_preview", add_bos_token = True, add_eos_token = False, # Original LLaMA is False -> add during processing. bos_token = AddedToken("", single_word = True), eos_token = AddedToken("", single_word = True), unk_token = AddedToken("", single_word = True), pad_token = AddedToken("", single_word = True) ) tokenizer.push_to_hub("open_llama_3b_600bt_preview") ``` 2) `AutoTokenizer` does not recognize the BOS, EOS and UNK tokens. Weirdly `` ie the 0 token was added instead of the `` or `` token. 3) Manually added BOS ``, EOS ``, UNK `` tokens, with PAD (padding) being also the `` token.