slicexai
/

Llama3.1-elm-turbo-4B-instruct

@@ -25,9 +25,9 @@ _Fast Inference with Customization:_ As with our previous version, once trained,
 - **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
 ## ELM Turbo Model Release
-In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `meta-llama/Meta-Llama-3.1-8B-Instruct` (8B params) (check [Llama-license](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE) for usage). After training, we generated three smaller slices with parameter counts ranging from 3B billion to 6B billion. Furthermore, we seamlessly integrated these slices into NVIDIA's [TensoRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), providing trtllm engines compatible with A100 and H100 GPUs, respectively.
-- [Section 1.](https://huggingface.co/slicexai/Llama3.1-elm-turbo-3B-instruct#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library.
 **NOTE**: The open-source datasets from the HuggingFace hub used for instruction fine-tuning ELM Turbo include, but are not limited to: `allenai/tulu-v2-sft-mixture`, `microsoft/orca-math-word-problems-200k`, `mlabonne/WizardLM_evol_instruct_70k-ShareGPT`, and `mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT`. We advise users to exercise caution when utilizing ELM Turbo, as these datasets may contain factually incorrect information, unintended biases, inappropriate content, and other potential issues. It is recommended to thoroughly evaluate the model's outputs and implement appropriate safeguards for your specific use case.
@@ -39,12 +39,12 @@ There are three ELM Turbo slices derived from the `Meta-Llama-3.1-8B-Instruct` m
 Make sure to update your transformers installation via pip install --upgrade transformers.
-Example - To run the `slicexai/Llama3.1-elm-turbo-3B-instruct`
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 import torch
-elm_turbo_model = "slicexai/Llama3.1-elm-turbo-3B-instruct"
 model = AutoModelForCausalLM.from_pretrained(
     elm_turbo_model,
     device_map="cuda",

 - **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
 ## ELM Turbo Model Release
+In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `meta-llama/Meta-Llama-3.1-8B-Instruct` (8B params) (check [Llama-license](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE) for usage). After training, we generated three smaller slices with parameter counts ranging from 3B billion to 6B billion.
+- [Section 1.](https://huggingface.co/slicexai/Llama3.1-elm-turbo-4B-instruct#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library.
 **NOTE**: The open-source datasets from the HuggingFace hub used for instruction fine-tuning ELM Turbo include, but are not limited to: `allenai/tulu-v2-sft-mixture`, `microsoft/orca-math-word-problems-200k`, `mlabonne/WizardLM_evol_instruct_70k-ShareGPT`, and `mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT`. We advise users to exercise caution when utilizing ELM Turbo, as these datasets may contain factually incorrect information, unintended biases, inappropriate content, and other potential issues. It is recommended to thoroughly evaluate the model's outputs and implement appropriate safeguards for your specific use case.
 Make sure to update your transformers installation via pip install --upgrade transformers.
+Example - To run the `slicexai/Llama3.1-elm-turbo-4B-instruct`
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 import torch
+elm_turbo_model = "slicexai/Llama3.1-elm-turbo-4B-instruct"
 model = AutoModelForCausalLM.from_pretrained(
     elm_turbo_model,
     device_map="cuda",