Edit model card

QuantLM 190M 8 bit

QuantLM, unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, QuantLM has the same architecture as LLaMa.

import transformers as tf, torch
model_name = "SpectraSuite/QuantLM_190M_8bit_Unpacked"
# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")
# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")
Downloads last month
10
Safetensors
Model size
191M params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including SpectraSuite/QuantLM_190M_8bit_Unpacked