Sparse Foundational Llama 2 Models
Collection
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
•
18 items
•
Updated
•
3
This repo contains a Llama 2 7B finetuned for code generation tasks using the Evolved CodeAlpaca dataset.
Authors: Neural Magic, Cerebras
Below we share some code snippets on how to get quickly started with running the model.
By leveraging a pre-sparsified model's structure, you can efficiently fine-tune on new data, leading to reduced hyperparameter tuning, training times, and computational costs. Learn about this process here.
This model may be run with the transformers library. For accelerated inference with sparsity, deploy with nm-vllm or deepsparse.
# pip install transformers accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("neuralmagic/Llama-2-7b-evolcodealpaca")
model = AutoModelForCausalLM.from_pretrained("neuralmagic/Llama-2-7b-evolcodealpaca", device_map="auto")
input_text = "def fibonacci(n):\n"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Model evaluation metrics and results.
Benchmark | Metric | Llama-2-7b-evolcodealpaca |
---|---|---|
HumanEval | pass@1 | 32.03 |
Coming soon.
For further support, and discussions on these models and AI in general, join Neural Magic's Slack Community