--- license: apache-2.0 tags: - trl - orpo - generated_from_trainer - exl2 base_model: mistral-community/Mixtral-8x22B-v0.1 datasets: - argilla/distilabel-capybara-dpo-7k-binarized model-index: - name: zephyr-orpo-141b-A35b-v0.1 results: [] ---

# machinez/zephyr-orpo-141b-A35b-v0.1-exl2 This model was converted to EXL2 format from [`HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1`](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1). Refer to the [original model card](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1) for more details on the model. Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions. 1.5 bits per weight - Fits Dual RTX 3090/4090 or Triple Nvidia Tesla P100 16gb at 4k context 2.75 bits per weight - Fits Quad Nvidia Tesla P100 16gb at 16k context ## Sample instructions to load in TabbyAPI @ 1.5bpw on 3x Nvidia Tesla P100 16gb at 4k context ```JSON { "name": "Machinez_zephyr-orpo-141b-A35b-v0.1_1.5bpw", "max_seq_len": 4096, "override_base_seq_len": 4096, "gpu_split_auto": false, "autosplit_reserve": [ 96 ], "gpu_split": [ 14.15, 14, 15 ], "rope_scale": 1, "rope_alpha": 1, "no_flash_attention": false, "cache_mode": "fp16", "prompt_template": "string", "num_experts_per_token": 0, "use_cfg": true, "fasttensors": false, "skip_queue": false } ``` ## Sample instructions to load in TabbyAPI @ 2.75bpw on 4x Nvidia Tesla P100 16gb at 16k context ```JSON { "name": "Machinez_zephyr-orpo-141b-A35b-v0.1_2.75bpw", "max_seq_len": 16384, "override_base_seq_len": 16384, "gpu_split_auto": false, "autosplit_reserve": [ 96 ], "gpu_split": [ 12.5, 13, 13, 16.1 ], "rope_scale": 1, "rope_alpha": 1, "no_flash_attention": false, "cache_mode": "fp16", "prompt_template": "string", "num_experts_per_token": 0, "use_cfg": true, "fasttensors": false, "skip_queue": false } ``` ## Download instructions With git: ```shell git clone --single-branch --branch 2_75 https://huggingface.co/machinez/zephyr-orpo-141b-A35b-v0.1-exl2 ``` With huggingface hub (credit to TheBloke for instructions, borrowed from bartowski): ```shell pip3 install -U "huggingface_hub[cli]" ``` ## (optional) ```shell git config --global credential.helper 'store --file ~/.my-credentials' huggingface-cli login ``` To download the `main` (only useful if you only care about measurement.json) branch to a folder called `zephyr-orpo-141b-A35b-v0.1-exl22`: ```shell mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw --local-dir-use-symlinks False ``` To download from a different branch, add the `--revision` parameter: ```shell mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --revision 2_75 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw --local-dir-use-symlinks False