Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) rho-math-1b-v0.1 - GGUF - Model creator: https://huggingface.co/microsoft/ - Original model: https://huggingface.co/microsoft/rho-math-1b-v0.1/ | Name | Quant method | Size | | ---- | ---- | ---- | | [rho-math-1b-v0.1.Q2_K.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q2_K.gguf) | Q2_K | 0.4GB | | [rho-math-1b-v0.1.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.IQ3_XS.gguf) | IQ3_XS | 0.44GB | | [rho-math-1b-v0.1.IQ3_S.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.IQ3_S.gguf) | IQ3_S | 0.47GB | | [rho-math-1b-v0.1.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q3_K_S.gguf) | Q3_K_S | 0.47GB | | [rho-math-1b-v0.1.IQ3_M.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.IQ3_M.gguf) | IQ3_M | 0.48GB | | [rho-math-1b-v0.1.Q3_K.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q3_K.gguf) | Q3_K | 0.51GB | | [rho-math-1b-v0.1.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q3_K_M.gguf) | Q3_K_M | 0.51GB | | [rho-math-1b-v0.1.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q3_K_L.gguf) | Q3_K_L | 0.55GB | | [rho-math-1b-v0.1.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.IQ4_XS.gguf) | IQ4_XS | 0.57GB | | [rho-math-1b-v0.1.Q4_0.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q4_0.gguf) | Q4_0 | 0.59GB | | [rho-math-1b-v0.1.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.IQ4_NL.gguf) | IQ4_NL | 0.6GB | | [rho-math-1b-v0.1.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q4_K_S.gguf) | Q4_K_S | 0.6GB | | [rho-math-1b-v0.1.Q4_K.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q4_K.gguf) | Q4_K | 0.62GB | | [rho-math-1b-v0.1.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q4_K_M.gguf) | Q4_K_M | 0.62GB | | [rho-math-1b-v0.1.Q4_1.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q4_1.gguf) | Q4_1 | 0.65GB | | [rho-math-1b-v0.1.Q5_0.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q5_0.gguf) | Q5_0 | 0.71GB | | [rho-math-1b-v0.1.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q5_K_S.gguf) | Q5_K_S | 0.71GB | | [rho-math-1b-v0.1.Q5_K.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q5_K.gguf) | Q5_K | 0.73GB | | [rho-math-1b-v0.1.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q5_K_M.gguf) | Q5_K_M | 0.73GB | | [rho-math-1b-v0.1.Q5_1.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q5_1.gguf) | Q5_1 | 0.77GB | | [rho-math-1b-v0.1.Q6_K.gguf](https://huggingface.co/RichardErkhov/microsoft_-_rho-math-1b-v0.1-gguf/blob/main/rho-math-1b-v0.1.Q6_K.gguf) | Q6_K | 0.84GB | Original model description: --- license: mit tags: - nlp - math language: - en pipeline_tag: text-generation ---
[📜 Arxiv] • [💬 HF Paper] • [🤗 Models] • [🐱 GitHub]
Figure 1: Rho-1 is pre-trained with Selective Language Modeling (SLM). SLM improves average few-shot accuracy on GSM8k and MATH by over 16%, achieving the baseline performance 5-10x faster.
Figure 2:
Upper: Even an extensively filtered pretraining corpus contains token-level noise.
Left: Previous Causal Language Modeling (CLM) trains on all tokens.
Right: Our proposed Selective Language Modeling (SLM) selectively applies loss on those useful and clean tokens.
Figure 3: The pipeline of Selective Language Modeling.
SLM optimizes language model performance by concentrating on valuable, clean tokens during pre-training.
It involves three steps:
(Step 1) Initially, train a reference model on high-quality data.
(Step 2) Then, score each token's loss in a corpus using the reference model.
(Step 3) Finally, train the language model selectively on tokens that show higher excess loss compared to the reference loss.