Quantizations of https://huggingface.co/allenai/OLMo-1.7-7B-hf
From original readme
Uses
Inference
Install Transformers from source, or update to the next version when this PR is integrated.
Now, proceed as usual with HuggingFace:
from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
# optional verifying cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> 'Language modeling is the first step to build natural language generation...'
Alternatively, with the pipeline abstraction:
from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
print(olmo_pipe("Language modeling is "))
>> 'Language modeling is a branch of natural language processing that aims to...'
Or, you can make this slightly faster by quantizing the model, e.g. AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True)
(requires bitsandbytes
).
The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda')
to avoid potential issues.
Note, you may see the following error if ai2-olmo
is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: hf_olmo. Run `pip install hf_olmo`
Fine-tuning
Model fine-tuning can be done from the final checkpoint (the main
revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
- Fine-tune with the OLMo repository:
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
--data.paths=[{path_to_data}/input_ids.npy] \
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
--load_path={path_to_checkpoint} \
--reset_trainer_state
For more documentation, see the GitHub readme.
- Downloads last month
- 63