English

This is a Llama 2 architecture model series trained on the TinyStories dataset, intended for use in the llama2.c project by Andrej Karpathy.

Trained on a single v100 32GB GPU for 3 epochs, we achieve an inference speed of ~72 tokens/sec on the same.

Achieved tok/s: 161.819538 on 12th Gen Intel(R) Core(TM) i9-12900HK

Learn more on how to run inference in pure C using llama2.c

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train Tensoic/Tiny-Stories