|
--- |
|
license: mit |
|
language: |
|
- en |
|
datasets: |
|
- openbmb/UltraFeedback |
|
model_creator: OpenBMB |
|
model_name: UltraCM-13b |
|
model_type: llama |
|
base_model: openbmb/UltraCM-13b |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- dpo |
|
- rlaif |
|
- preference |
|
- ultrafeedback |
|
quantized_by: alvarobartt |
|
--- |
|
|
|
## Model Card for UltraCM-13b-GGUF |
|
|
|
[UltraCM-13B](https://huggingface.co/openbmb/UltraCM-13b) is a fine-tuned LLM for completion-critique in order to evaluate |
|
LLM outputs on helpfulness, truthfulness, honesty, and to what extent the answer follows the given instructions. |
|
|
|
UltraCM-13B is a 13b param LLM that was released by [OpenBMB](https://huggingface.co/openbmb), as part of their paper |
|
[UltraFeedback: Boosting Language Models with High-quality Feedback](https://arxiv.org/abs/2310.01377). |
|
|
|
This model contains the quantized variants using the GGUF format, introduced by the [llama.cpp](https://github.com/ggerganov/llama.cpp) team, |
|
and also heavily inspired by [TheBloke](https://huggingface.co/TheBloke) work on quantizing most of the LLMs out there. |
|
|
|
### Model Details |
|
|
|
#### Model Description |
|
|
|
- **Model type:** Llama |
|
- **Fine-tuned from model:** [Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
|
- **Created by**: [Meta AI](https://huggingface.co/meta-llama) |
|
- **Fine-tuned by:** [OpenBMB](https://huggingface.co/openbmb) |
|
- **Quantized by:** [alvarobartt](https://huggingface.co/alvarobartt) |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache 2.0 |
|
|
|
### Model Files |
|
|
|
| Name | Quant method | Bits | Size | Max RAM required | Use case | |
|
| ---- | ---- | ---- | ---- | ---- | ----- | |
|
| [UltraCM-13b.q4_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_0.gguf) | Q4_0 | 4 | 3.83 GB| 6.33 GB | legacy; small, very high quality loss - prefer using Q3_K_M | |
|
| [UltraCM-13b.q4_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_s.gguf) | Q4_K_S | 4 | 7.41 GB| 9.91 GB | small, greater quality loss | |
|
| [UltraCM-13b.q4_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q4_k_m.gguf) | Q4_K_M | 4 | 7.87 GB| 10.37 GB | medium, balanced quality - recommended | |
|
| [UltraCM-13b.q5_0.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_0.gguf) | Q5_0 | 5 | 4.65 GB| 7.15 GB | legacy; medium, balanced quality - prefer using Q4_K_M | |
|
| [UltraCM-13b.q5_k_s.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_s.gguf) | Q5_K_S | 5 | 8.97 GB| 11.47 GB | large, low quality loss - recommended | |
|
| [UltraCM-13b.q5_k_m.gguf](https://huggingface.co/alvarobartt/UltraCM-13b-GGUF/blob/main/UltraCM-13b.q5_k_m.gguf) | Q5_K_M | 5 | 9.23 GB| 11.73 GB | large, very low quality loss - recommended | |
|
|
|
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead. |
|
|
|
For more information on quantization, I'd highly suggest anyone reading to go check [TheBloke](https://huggingface.co/TheBloke) out, as well as joining [their |
|
Discord server](https://discord.gg/Jq4vkcDakD). |
|
|
|
### Uses |
|
|
|
#### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
[More Information Needed] |
|
|
|
### Citation |
|
|
|
Since this is only a GGUF-quantization of the original weights, please refer and cite the original authors instead. |
|
|
|
```bibtex |
|
@misc{cui2023ultrafeedback, |
|
title={UltraFeedback: Boosting Language Models with High-quality Feedback}, |
|
author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun}, |
|
year={2023}, |
|
eprint={2310.01377}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |