GGUF
English
orpo
Edit model card

gemma-2b-orpo-GGUF

This is a GGUF quantized version of the gemma-2b-orpo model: an ORPO fine-tune of google/gemma-2b.

You can find more information, including evaluation and training/usage notebook in the gemma-2b-orpo model card

๐ŸŽฎ Model in action

The model can run with all the libraries that are part of the Llama.cpp ecosystem.

If you need to apply the prompt template manually, take a look at the tokenizer_config.json of the original model.

๐Ÿ“ฑ Run the model on a budget smartphone -> see my recent post

Here a simple example with Llama.cpp python:

! pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="anakin87/gemma-2b-orpo-GGUF",
    filename="gemma-2b-orpo.Q5_K_M.gguf",
    verbose=True # for a known bug, verbose must be True
)

# text generation - prompt template applied manually
llm("<bos><|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75)

# chat completion - prompt template automatically applied
llm.create_chat_completion(
      messages = [
          {
              "role": "user",
              "content": "Please list some places to visit in Italy"
          }
      ]
)
Downloads last month
173
GGUF
Unable to determine this model's library. Check the docs .

Quantized from

Dataset used to train anakin87/gemma-2b-orpo-GGUF

Space using anakin87/gemma-2b-orpo-GGUF 1