Uses

v v small model for running on edge with :fire: TTFT & Throughput

Use llama.cpp to inference the model

GGUF

Model size

362M params

Architecture

llama

16-bit

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.