TestLlama 3.2 Test Model

This model has no pretrained weights. It will not generate meaninful outputs.

Model Description

This is a lobotomize version of the Llama 3.2 architecture created specifically for testing and development purposes.

It maintains the architectural structure of Llama 3.2 but with dramatically reduced dimensions to create an extremely lightweight model that can be used for debugging pipelines with a close-to-real model.

Intended Use

Software testing: API integration testing, pipeline validation
Development environments: Testing code without resource constraints
CI/CD pipelines: Automated testing with minimal resource requirements

Model Details

Framework: Hugging Face Transformers
Architecture: Llama 3.2 (scaled down)
Parameter count: ~72M parameters
Architecture configuration:
- hidden_size: 512 (reduced from 2048)
- intermediate_size: 1024 (reduced from 8192)
- num_hidden_layers: 2 (reduced from 16)
- num_attention_heads: 8 (reduced from 32)
- num_key_value_heads: 2 (reduced from 8)
- vocab_size: 128256 (maintained from original)

Important Limitations

Not for production use: This model contains random weights and is not trained
No meaningful outputs: The model will produce random token sequences
Architectural test only: This is purely for testing software compatibility
Not for benchmarking: Performance metrics derived from this model are not representative

Usage Notes

This model is intentionally created with random weights and minimized architecture. It will not produce coherent or meaningful text. It's specifically designed for:

Testing inference pipelines
Validating model loading/saving
Testing quantization workflows
Architectural compatibility testing
Software development with minimal resource requirements

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model_id = "vaughankraska/TestLlama3.2ish"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Test generation (outputs will be random)
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=20)
print(tokenizer.decode(outputs[0]))

Creation Method

This model was created by:

Defining a minimal LlamaConfig with dramatically reduced dimensions
Initializing a model with random weights
Preserving architectural patterns (like GQA, RoPE settings)
Using the authentic tokenizer from Llama 3.2

License

MIT. It contains no trained weights from Meta's Llama 3.2 models.