Edit model card

This model is for debugging. It is randomly initialized using the config from meta-llama/Meta-Llama-3.1-70B-Instruct but with smaller size.

Codes:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = "yujiepan/meta-llama-3.1-tiny-random-hidden128"
quant_config = {
    "zero_point": True,
    "q_group_size": 64,
    "w_bit": 4,
    "version": "GEMM",
}
# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, low_cpu_mem_usage=True, use_cache=False, device_map='cuda',
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)
Downloads last month
20
Safetensors
Model size
32.9M params
Tensor type
I32
·
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.