Instructions to use AM8-3568/Kappy-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AM8-3568/Kappy-model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AM8-3568/Kappy-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AM8-3568/Kappy-model")
model = AutoModelForCausalLM.from_pretrained("AM8-3568/Kappy-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AM8-3568/Kappy-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AM8-3568/Kappy-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AM8-3568/Kappy-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AM8-3568/Kappy-model

SGLang

How to use AM8-3568/Kappy-model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AM8-3568/Kappy-model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AM8-3568/Kappy-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AM8-3568/Kappy-model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AM8-3568/Kappy-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AM8-3568/Kappy-model with Docker Model Runner:
```
docker model run hf.co/AM8-3568/Kappy-model
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

KΛPPY

KΛPPY is a lightweight conversational language model built on the TinyLlama architecture and fine-tuned using the Alpaca instruction dataset.

The model is designed for lightweight chatbot experimentation, local inference, and educational purposes.

Project Repository

The full KΛPPY chatbot project, inference pipeline, and application source code are available on GitHub:

https://github.com/AM8-3568/Kappy

Features

TinyLlama-based conversational model
Fine-tuned on Alpaca instruction data
Lightweight and efficient
Compatible with Hugging Face Transformers
Supports both CPU and GPU inference
Suitable for local offline chatbot usage

Usage

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM
)

model_name = "AM8-3568/Kappy-model"

tokenizer = AutoTokenizer.from_pretrained(
    model_name
)

model = AutoModelForCausalLM.from_pretrained(
    model_name
)

prompt = "What is artificial intelligence?"

inputs = tokenizer(
    prompt,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=100
)

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True
)

print(response)

Model Files

This repository contains:

model.safetensors
config.json
generation_config.json
tokenizer.json
tokenizer_config.json
chat_template.jinja

Limitations

As a compact language model, KΛPPY may occasionally:

Struggle with complex reasoning tasks
Generate repetitive responses
Produce hallucinated or inaccurate information
Lose consistency during long conversations
Perform below larger language models on advanced tasks

These limitations are expected for smaller language models and can be improved through additional fine-tuning, larger datasets, and more advanced architectures.

Intended Use

KΛPPY is intended for:

Educational projects
Local chatbot experimentation
Lightweight conversational AI research
Learning Transformer inference pipelines

This model is not intended for production-critical or high-risk applications.

Base Model

TinyLlama

Dataset

Alpaca Instruction Dataset

Downloads last month: 95

Safetensors

Model size

1B params

Tensor type

F16

Model tree for AM8-3568/Kappy-model

Quantizations

3 models