README.md · sourabhdattawad/meta-llama-3-8b-instruct-gguf at 084dae6f5bde4d16593f88ab6ef77d25658f818d

Usage

Package installation

pip install llama-cpp-python "huggingface_hub[cli]"

Download the model:

huggingface-cli download sourabhdattawad/meta-llama-3-8b-instruct-gguf meta-llama-3-8b-instruct.Q8_0.gguf --local-dir . --local-dir-use-symlinks False

from llama_cpp import Llama
llm = Llama(
      model_path="meta-llama-3-8b-instruct.Q8_0.gguf",
      # n_gpu_layers=-1, # Uncomment to use GPU acceleration
      # seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
)

Google Colab

https://colab.research.google.com/drive/1vhrCKGzY7KP5mScHNUl7hjmbPsUyj_sj?usp=sharing)