jtatman
/

llama-3.2-1b-trismegistus

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model Details

Model Description

Trismegistus for Llama 3.2 1b. Credits to teknium for dataset and original model.

Model Sources [optional]

Llama 3.2 1b

Uses

Use for esoteric joy.

Bias, Risks, and Limitations

May be biased as hell.
Recommendation:
- Don't take it personally.

How to Get Started with the Model

Run it.

Training Data

Training Hyperparameters

lora 4bit peft

Speeds, Sizes, Times [optional]

global_step=16905
training_loss=1.169401215731269
train_runtime: 21882.4747
train_samples_per_second: 3.09
train_steps_per_second: 0.773
total_flos: 4.437195883099177e+17
train_loss': 1.169401215731269
epoch: 5.0

Evaluation and Metrics

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.3345	±	0.0138
		none	0	acc_norm	↑	0.3695	±	0.0141
arc_easy	1	none	0	acc	↑	0.6044	±	0.0100
		none	0	acc_norm	↑	0.5694	±	0.0102
boolq	2	none	0	acc	↑	0.6410	±	0.0084
hellaswag	1	none	0	acc	↑	0.4400	±	0.0050
		none	0	acc_norm	↑	0.5728	±	0.0049
openbookqa	1	none	0	acc	↑	0.2260	±	0.0187
		none	0	acc_norm	↑	0.3540	±	0.0214
piqa	1	none	0	acc	↑	0.7002	±	0.0107
		none	0	acc_norm	↑	0.7024	±	0.0107
winogrande	1	none	0	acc	↑	0.5785	±	0.0139

Environmental Impact

Will steal your horse and kill your cat.

Downloads last month: 14

Safetensors

Model size

1.24B params

Tensor type

F32

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jtatman/llama-3.2-1b-trismegistus

Base model

meta-llama/Llama-3.2-1B

Finetuned

(178)

this model

Quantizations

1 model

Dataset used to train jtatman/llama-3.2-1b-trismegistus