TestLlama 3.2 Test Model

This model has no pretrained weights. It will not generate meaninful outputs.

Model Description

This is a lobotomize version of the Llama 3.2 architecture created specifically for testing and development purposes.

It maintains the architectural structure of Llama 3.2 but with dramatically reduced dimensions to create an extremely lightweight model that can be used for debugging pipelines with a close-to-real model.

Intended Use

  • Software testing: API integration testing, pipeline validation
  • Development environments: Testing code without resource constraints
  • CI/CD pipelines: Automated testing with minimal resource requirements

Model Details

  • Framework: Hugging Face Transformers
  • Architecture: Llama 3.2 (scaled down)
  • Parameter count: ~72M parameters
  • Architecture configuration:
    • hidden_size: 512 (reduced from 2048)
    • intermediate_size: 1024 (reduced from 8192)
    • num_hidden_layers: 2 (reduced from 16)
    • num_attention_heads: 8 (reduced from 32)
    • num_key_value_heads: 2 (reduced from 8)
    • vocab_size: 128256 (maintained from original)

Important Limitations

  • Not for production use: This model contains random weights and is not trained
  • No meaningful outputs: The model will produce random token sequences
  • Architectural test only: This is purely for testing software compatibility
  • Not for benchmarking: Performance metrics derived from this model are not representative

Usage Notes

This model is intentionally created with random weights and minimized architecture. It will not produce coherent or meaningful text. It's specifically designed for:

  1. Testing inference pipelines
  2. Validating model loading/saving
  3. Testing quantization workflows
  4. Architectural compatibility testing
  5. Software development with minimal resource requirements

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_id = "vaughankraska/TestLlama3.2ish"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Test generation (outputs will be random)
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=20)
print(tokenizer.decode(outputs[0]))

Creation Method

This model was created by:

  1. Defining a minimal LlamaConfig with dramatically reduced dimensions
  2. Initializing a model with random weights
  3. Preserving architectural patterns (like GQA, RoPE settings)
  4. Using the authentic tokenizer from Llama 3.2

License

MIT. It contains no trained weights from Meta's Llama 3.2 models.

Downloads last month
132
Safetensors
Model size
72.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.