magnum-v2-4b-exl2 / README.md
lucyknada's picture
Create README.md
114affa verified
|
raw
history blame
3.65 kB
metadata
License: apache-2.0
Language:
  - En
Pipeline_tag: text-generation
Base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
tags:
  - Chat

This repo contains EXL2 quants of the model. If you need the original weights, please find them here.

Base repo only contains the measurement file, see revisions for your quant of choice.

image/png This is the eighth in a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This model is fine-tuned on top of IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml.

Prompting

Model has been Instruct tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>system
system prompt<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Support

In order to inference this model you will have to use Aphrodite or vLLM as llama.cpp has not yet merged the required pull request to fix llama3.1 rope_freqs not respecting custom head_dim - You can however get around this by quanting the model yourself with the following fixes for a working GGUF. However, it will be stuck at 8k context until this PR is merged.

  1. Remove "rope_scaling": {} from config.json
  2. Change "max_position_embeddings" to 8192 in config.json
  3. Add "add_bos_token": false to tokenizer_config.json

Credits

This model has been a team effort, and the credits goes to all members of Anthracite.

Training

The training was done for 2 epochs. We used 2 x RTX 6000s GPUs graciously provided by Kubernetes_Bad for the full-parameter fine-tuning of the model.

Built with Axolotl

Safety

...