Model Card for alokabhishek/Llama-2-7b-chat-hf-GGUF

This repo GGUF quantized version of Meta's meta-llama/Llama-2-7b-chat-hf model using llama.cpp.

Model Details

Model creator: Meta
Original model: Llama-2-7b-chat-hf

About GGUF quantization using llama.cpp

llama.cpp github repo: llama.cpp github repo

How to Get Started with the Model

Use the code below to get started with the model.

How to run from Python code

First install the package

# Base ctransformers with CUDA GPU acceleration
! pip install ctransformers[cuda]>=0.2.24
# Or with no GPU acceleration
# ! pip install ctransformers>=0.2.24
! pip install -U sentence-transformers
! pip install transformers huggingface_hub torch

Import

from ctransformers import AutoModelForCausalLM
from transformers import pipeline, AutoModel, AutoTokenizer
from sentence_transformers import SentenceTransformer
import os

Use a pipeline as a high-level helper


# Load LLM and Tokenizer


model_llama = AutoModelForCausalLM.from_pretrained(
    "alokabhishek/Llama-2-7b-chat-hf-GGUF",
    model_file="llama-2-7b-chat-hf.Q4_K_M.gguf", # replace Q4_K_M.gguf with Q5_K_M.gguf as needed
    model_type="llama", 
    gpu_layers=50, # Use `gpu_layers` to specify how many layers will be offloaded to the GPU.
    hf=True
)
tokenizer_llama = AutoTokenizer.from_pretrained(
    "alokabhishek/Llama-2-7b-chat-hf-GGUF", 
    use_fast=True
)



# Create a pipeline
pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')

prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_llama = pipe_llama(prompt_llama, max_new_tokens=512)

print(output_llama[0]["generated_text"])

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

alokabhishek
/

Llama-2-7b-chat-hf-GGUF

Model Card for alokabhishek/Llama-2-7b-chat-hf-GGUF

Model Details

About GGUF quantization using llama.cpp

How to Get Started with the Model

How to run from Python code

First install the package

Import

Use a pipeline as a high-level helper

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Model Card Authors [optional]

Model Card Contact

Collection including alokabhishek/Llama-2-7b-chat-hf-GGUF

Meta-Llama-2-7b-chat-hf-Quantized