Edit model card

ARCHIVED.

Download from original repo: https://huggingface.co/openlm-research/open_llama_3b_600bt_preview

I made a few PRs to the original repo to include my changes!

Original model from https://huggingface.co/openlm-research/open_llama_3b_600bt_preview. Example below edited from https://github.com/openlm-research/open_llama

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "openlm-research/open_llama_3b_600bt_preview"
fast_model_name = "danielhanchen/open_llama_3b_600bt_preview"

tokenizer = AutoTokenizer.from_pretrained(fast_model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype = torch.float16, device_map = "auto")

prompt = "Q: What is the largest animal?\nA:"
input_ids = tokenizer(prompt, return_tensors = "pt").input_ids
print( tokenizer.decode( model.generate( input_ids, max_new_tokens = 32).ravel() ) )

This repo includes:

  1. Ported LlamaTokenizer to LlamaTokenizerFast via a few lines of code. Loading via AutoTokenizer takes 4 to 5 minutes. Now, a few seconds! Essentially the porting is done via the below code:
# from huggingface_hub import notebook_login
# notebook_login()
from transformers import LlamaTokenizerFast
from tokenizers import AddedToken
tokenizer = LlamaTokenizerFast.from_pretrained(
    "openlm-research/open_llama_3b_600bt_preview",
    add_bos_token = True,
    add_eos_token = False, # Original LLaMA is False -> add </s> during processing.
    bos_token = AddedToken("<s>",   single_word = True),
    eos_token = AddedToken("</s>",  single_word = True),
    unk_token = AddedToken("<unk>", single_word = True),
    pad_token = AddedToken("<unk>", single_word = True)
)
tokenizer.push_to_hub("open_llama_3b_600bt_preview")
  1. AutoTokenizer does not recognize the BOS, EOS and UNK tokens. Weirdly <unk> ie the 0 token was added instead of the <s> or </s> token.
  2. Manually added BOS <s>, EOS </s>, UNK <unk> tokens, with PAD (padding) being also the <unk> token.
Downloads last month
1,082
Safetensors
Model size
3.43B params
Tensor type
F32
Β·
FP16
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using danielhanchen/open_llama_3b_600bt_preview 24