How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="KookiesXy/Neo50M",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Neo50M

Neo50M is a tiny decoder-only chat language model trained from scratch. It is designed for toy/local assistant use, educational experiments, lightweight generation, and testing training pipelines.

Model Details

  • Type: decoder-only causal language model, Llama-compatible architecture
  • Parameters: approximately 52.6M
  • Context length target: 16k tokens
  • Training target: about 15B pretraining tokens plus chat/instruction tuning
  • Hardware: 8x NVIDIA RTX 5090 cloud GPUs
  • Tokenizer: TinyLlama/Llama-style 32k tokenizer with a Neo50M chat template

Intended Uses

  • toy/local assistant experiments
  • educational training and inference demos
  • lightweight generation
  • testing HF, GGUF, ONNX, and distributed training pipelines

Limitations

Neo50M is very small. It is not reliable for factual accuracy, has limited reasoning ability, may hallucinate, and should not be used for safety-critical decisions or high-stakes advice.

Transformers Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "KookiesXy/Neo50M"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id, device_map="auto")

messages = [{"role": "user", "content": "Write a short thank-you note."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=120, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

GGUF Usage

After downloading a GGUF file:

llama-cli -m neo50m-q4_k_m.gguf -p "User: Write a haiku about GPUs.\nAssistant:"

ONNX Usage

The ONNX export is intended for forward-pass validation and integration experiments. Use ONNX Runtime to load onnx/model.onnx and feed integer input_ids plus attention_mask.

Dataset Summary

The training pipeline streams a configurable mixture of FineWeb-Edu, Cosmopedia, Wikipedia-like text, TinyStories, and a small permissive code component. SFT uses OpenHermes-style, UltraChat-style, Alpaca-style, and small refusal/helpfulness examples when available. Dataset availability can change; the exact configs are included with the upload.

Eval Results

Eval artifacts, when present, are uploaded under evals/.

Downloads last month
89
Safetensors
Model size
52.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support