--- license: other license_name: yi-license license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE language: - en library_name: transformers base_model: [] tags: - mergekit - merge - Yi --- Just an experiment to try and extend the context of SUS, a 4K Yi model, and DPO Bagel, which breaks down after 4K context. Yi 4K was used as a base (even for bagel which is technically a Yi 200K model), and Yi 200K is merged in with a density of 1. I wanted to include Hermes 34B, but something funky about its tokenizer breaks mergekit. A component of another merge. Auto generated mergekit description below: *** This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama as a base. ### Models Merged The following models were included in the merge: * /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B * /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama * /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2 * /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2 ### Configuration The following YAML configuration was used to produce this model: ```yaml models: - model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama # No parameters necessary for base model - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama parameters: weight: 0.5 density: 1 - model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B parameters: weight: 0.2 density: 0.12 - model: /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2 parameters: weight: 0.2 density: 0.15 - model: /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2 parameters: weight: 0.1 density: 0.12 merge_method: dare_ties tokenizer_source: union base_model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama parameters: int8_mask: true dtype: bfloat16 ```