--- base_model: [] library_name: transformers tags: - mergekit - merge license: apache-2.0 --- # Credit for the model card's description goes to ddh0, mergekit, and, mistralai # Mistral-12.25B-Instruct-v0.2 This is Mistral-12.25B-Instruct-v0.2, a depth-upscaled version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model. Paper detailing how Depth-Up Scaling works: [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166) This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). # UpStage's conclusionary limitations of their research: "Our study on the Depth Up-Scaling (DUS) has important limitations and considerations. **One key limitation is the need for more thorough explorations of hyperparameters used in the DUS approach. Namely, we removed m = 8 layers from both ends of our base model, primarily due to hardware limitations. However, we have not yet determined if this value is optimal for enhancing performance.** The extended time and cost of continued pretraining made it challenging to conduct more comprehensive experiments, which we aim to address in future work through various comparative analyses." This model was made to help test whether 10.7B parameters (m = 8) is better or worse than 10.7B+ parameters (m < 8) ## Merge Details ### Merge Method This model was merged using the passthrough merge method. ### Models Merged The following models were included in the merge: * /Users/jsarnecki/opt/Workspace/mistralai/Mistral-7B-Instruct-v0.2 ### Configuration The following YAML configuration was used to produce this model: ```yaml dtype: bfloat16 merge_method: passthrough # Depth UpScaled (DUS) version of Mistral-7B-Instruct-v0.2 # where m = 4 (The number of layers to remove from the model) # s = 56 (The number of layers the model will have after the DUS) slices: - sources: - layer_range: [0, 28] model: /Users/jsarnecki/opt/Workspace/mistralai/Mistral-7B-Instruct-v0.2 - sources: - layer_range: [4, 32] model: /Users/jsarnecki/opt/Workspace/mistralai/Mistral-7B-Instruct-v0.2 ``` # Model Card for Mistral-7B-Instruct-v0.2 The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1 - 32k context window (vs 8k context in v0.1) - Rope-theta = 1e6 - No Sliding-Window Attention For full details of this model please read our [paper](https://arxiv.org/abs/2310.06825) and [release blog post](https://mistral.ai/news/la-plateforme/). ## Instruction format In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. E.g. ``` text = "[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen! " "[INST] Do you have mayonnaise recipes? [/INST]" ``` This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method: ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") model_inputs = encodeds.to(device) model.to(device) generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) decoded = tokenizer.batch_decode(generated_ids) print(decoded[0]) ``` ## Troubleshooting - If you see the following error: ``` Traceback (most recent call last): File "", line 1, in File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]] File "/transformers/models/auto/configuration_auto.py", line 723, in getitem raise KeyError(key) KeyError: 'mistral' ``` Installing transformers from source should solve the issue pip install git+https://github.com/huggingface/transformers This should not be required after transformers-v4.33.4. ## Limitations The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs. ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.