|
--- |
|
license: llama3.1 |
|
base_model: nvidia/OpenMath2-Llama3.1-8B |
|
datasets: |
|
- nvidia/OpenMathInstruct-2 |
|
language: |
|
- en |
|
tags: |
|
- nvidia |
|
- math |
|
- llama-cpp |
|
- gguf-my-repo |
|
library_name: transformers |
|
--- |
|
|
|
# Triangle104/OpenMath2-Llama3.1-8B-Q5_K_S-GGUF |
|
This model was converted to GGUF format from [`nvidia/OpenMath2-Llama3.1-8B`](https://huggingface.co/nvidia/OpenMath2-Llama3.1-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/nvidia/OpenMath2-Llama3.1-8B) for more details on the model. |
|
|
|
--- |
|
Model details |
|
- |
|
OpenMath2-Llama3.1-8B is obtained by finetuning Llama3.1-8B-Base with OpenMathInstruct-2. |
|
|
|
The model outperforms Llama3.1-8B-Instruct on all the popular math benchmarks we evaluate on, especially on MATH by 15.9%. |
|
[Performance of Llama-3.1-8B-Instruct as it is trained on increasing proportions of OpenMathInstruct-2] [Comparison of OpenMath2-Llama3.1-8B vs. Llama-3.1-8B-Instruct across MATH levels] |
|
Model GSM8K MATH AMC 2023 AIME 2024 Omni-MATH |
|
Llama3.1-8B-Instruct 84.5 51.9 9/40 2/30 12.7 |
|
OpenMath2-Llama3.1-8B (nemo | HF) 91.7 67.8 16/40 3/30 22.0 |
|
+ majority@256 94.1 76.1 23/40 3/30 24.6 |
|
Llama3.1-70B-Instruct 95.8 67.9 19/40 6/30 19.0 |
|
OpenMath2-Llama3.1-70B (nemo | HF) 94.9 71.9 20/40 4/30 23.1 |
|
+ majority@256 96.0 79.6 24/40 6/30 27.6 |
|
|
|
The pipeline we used to produce the data and models is fully open-sourced! |
|
|
|
Code |
|
Models |
|
Dataset |
|
|
|
See our paper to learn more details! |
|
How to use the models? |
|
|
|
Our models are trained with the same "chat format" as Llama3.1-instruct models (same system/user/assistant tokens). Please note that these models have not been instruction tuned on general data and thus might not provide good answers outside of math domain. |
|
|
|
We recommend using instructions in our repo to run inference with these models, but here is an example of how to do it through transformers api: |
|
|
|
import transformers |
|
import torch |
|
|
|
model_id = "nvidia/OpenMath2-Llama3.1-8B" |
|
|
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": "Solve the following math problem. Make sure to put the answer (and only answer) inside \\boxed{}.\n\n" + |
|
"What is the minimum value of $a^2+6a-7$?"}, |
|
] |
|
|
|
outputs = pipeline( |
|
messages, |
|
max_new_tokens=4096, |
|
) |
|
print(outputs[0]["generated_text"][-1]['content']) |
|
|
|
Reproducing our results |
|
|
|
We provide all instructions to fully reproduce our results. |
|
Citation |
|
|
|
If you find our work useful, please consider citing us! |
|
|
|
@article{toshniwal2024openmath2, |
|
title = {OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}, |
|
author = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman}, |
|
year = {2024}, |
|
journal = {arXiv preprint arXiv:2410.01560} |
|
} |
|
|
|
Terms of use |
|
|
|
By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the license, acceptable use policy and Meta’s privacy policy |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/OpenMath2-Llama3.1-8B-Q5_K_S-GGUF --hf-file openmath2-llama3.1-8b-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/OpenMath2-Llama3.1-8B-Q5_K_S-GGUF --hf-file openmath2-llama3.1-8b-q5_k_s.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/OpenMath2-Llama3.1-8B-Q5_K_S-GGUF --hf-file openmath2-llama3.1-8b-q5_k_s.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/OpenMath2-Llama3.1-8B-Q5_K_S-GGUF --hf-file openmath2-llama3.1-8b-q5_k_s.gguf -c 2048 |
|
``` |
|
|