License: apache-2.0
Language:
- En
Pipeline_tag: text-generation
Base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
tags:
- Chat
This repo contains EXL2 quants of the model. If you need the original weights, please find them here.
Base repo only contains the measurement file, see revisions for your quant of choice.
This is the eighth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml.
Prompting
Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:
"""<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
Support
In order to inference this model you will have to use Aphrodite or vLLM as llama.cpp has not yet merged the required pull request to fix llama3.1 rope_freqs not respecting custom head_dim - You can however get around this by quanting the model yourself with the following fixes for a working GGUF. However, it will be stuck at 8k context until this PR is merged.
- Remove
"rope_scaling": {}
fromconfig.json
- Change
"max_position_embeddings"
to8192
inconfig.json
- Add
"add_bos_token": false
totokenizer_config.json
Credits
- anthracite-org/Stheno-Data-Filtered
- anthracite-org/kalo-opus-instruct-22k-no-refusal
- lodrick-the-lafted/NopmWritingStruct
- NewEden/Gryphe-3.5-16k-Subset
- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
This model has been a team effort, and the credits goes to all members of Anthracite.
Training
The training was done for 2 epochs. We used 2 x RTX 6000s GPUs graciously provided by Kubernetes_Bad for the full-parameter fine-tuning of the model.
Safety
...