nicholasKluge
/

TeenyTinyLlama-460m-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

nicholasKluge commited on Jan 21

Commit

409479a

•

1 Parent(s): f643bcf

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ co2_eq_emissions:
 ## Model Summary
-**Note: This model is a quantized version of [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter, with almost no performance loss.**
 Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._
@@ -114,7 +114,7 @@ The primary intended use of TeenyTinyLlama is to research the behavior, function
 ## Basic usage
-**Note: The use of quantized models required the installation of `autoawq==0.1.7`.**
 Using the `pipeline`:

 ## Model Summary
+**Note: This model is a quantized version of [TeenyTinyLlama-460m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-460m). Quantization was performed using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), allowing this version to be 80% lighter with almost no performance loss. A GPU is required to run the AWQ-quantized models.**
 Given the lack of available monolingual foundational models in non-English languages and the fact that some of the most used and downloaded models by the community are those small enough to allow individual researchers and hobbyists to use them in low-resource environments, we developed the TeenyTinyLlama: _a pair of small foundational models trained in Brazilian Portuguese._
 ## Basic usage
+**Note: The use of quantized models required the installation of `autoawq==0.1.7`. A GPU is required to run the AWQ-quantized models.**
 Using the `pipeline`: