Meta-Llama-2-7b-chat-hf-Quantized
Collection
Different quantized versions of Meta's Llama-2-7b-chat-hf model
•
8 items
•
Updated
This repo GGUF quantized version of Meta's meta-llama/Llama-2-7b-chat-hf model using llama.cpp.
Use the code below to get started with the model.
# Base ctransformers with CUDA GPU acceleration
! pip install ctransformers[cuda]>=0.2.24
# Or with no GPU acceleration
# ! pip install ctransformers>=0.2.24
! pip install -U sentence-transformers
! pip install transformers huggingface_hub torch
from ctransformers import AutoModelForCausalLM
from transformers import pipeline, AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer
import os
# Load LLM and Tokenizer
model_llama = AutoModelForCausalLM.from_pretrained(
"alokabhishek/Llama-2-7b-chat-hf-GGUF",
model_file="llama-2-7b-chat-hf.Q4_K_M.gguf", # replace Q4_K_M.gguf with Q5_K_M.gguf as needed
model_type="llama",
gpu_layers=50, # Use `gpu_layers` to specify how many layers will be offloaded to the GPU.
hf=True
)
tokenizer_llama = AutoTokenizer.from_pretrained(
"alokabhishek/Llama-2-7b-chat-hf-GGUF",
use_fast=True
)
# Create a pipeline
pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')
prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_llama = pipe_llama(prompt_llama, max_new_tokens=512)
print(output_llama[0]["generated_text"])
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]