How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "robinsmits/Schaapje-2B-Chat-V1.0-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robinsmits/Schaapje-2B-Chat-V1.0-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/robinsmits/Schaapje-2B-Chat-V1.0-GGUF:
Quick Links

Schaapje logo

Schaapje-2B-Chat-V1.0-GGUF

Introduction

This is a collection of GGUF files created from Schaapje-2B-Chat-V1.0

It contains the files in the following quantization formats:

Q5_0, Q5_K_M, Q6_K, Q8_0

Requirements

Before you can use the GGUF files you need to clone llama.cpp repository and install it following the official guide.

Recommendation

Experimenting with the llama.cpp parameters can have a big impact on the quality of the generated text. It is therefore recommended to do your own experimentation with different settings. In my own experiments it looks like quantization 'Q5_0' or better gives good quality.

Downloads last month
19
GGUF
Model size
3B params
Architecture
granite
Hardware compatibility
Log In to add your hardware

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including robinsmits/Schaapje-2B-Chat-V1.0-GGUF