This is the OpenNMT-py converted version of Mixtral 8x7b, 4-bit AWQ quantized. The safetensors file is 24GB hence needs 2x24GB GPUs (3090 or 4090) or 1x48GB (A6000). To run the model on 2 GPU the config file needs to have: world_size: 2 gpu_ranks: [0, 1] parallel_mode: "tensor_parallel" If you are lucky to have a A6000 (or V/A/H100 with more than 32GB), then use: world_size: 1 gpu_ranks: [0] #parallel_mode: "tensor_parallel" Command line to run is: `python onmt/bin/translate.py --config /pathto/mixtral-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt` Where for instance, input-vicuna.txt contains: `USER:⦅newline⦆Show me some attractions in Boston.⦅newline⦆⦅newline⦆ASSISTANT:⦅newline⦆` Output will be: `Here are some attractions in Boston:⦅newline⦆⦅newline⦆1. Boston Common: This is a historic park located in the heart of Boston. It features a variety of attractions, including the Boston Common Fountain, the Boston Common Bandstand, and the Boston Common Carousel.⦅newline⦆⦅newline⦆2. Boston Public Garden: This is a historic park located in the heart of Boston. It features a variety of attractions, including the Boston Public Garden Fountain, the Boston Public Garden Bandstand, and the Boston Public Garden Carousel.⦅newline⦆⦅newline⦆3. Boston Museum of Fine Arts: This is a world-renowned art museum located in the heart of Boston. It features a variety of attractions, including the Boston Museum of Fine Arts Fountain, the Boston Museum of Fine Arts Bandstand, and the Boston Museum of Fine Arts Carousel.⦅newline⦆⦅newline⦆4. Boston Museum of Science: This is a world-renowned science museum located in the heart of Boston. It features a variety of attractions, including the Boston Museum of Science Fountain, the Boston Museum of Science Bandstand, and the Boston Museum of Science Carousel.⦅newline⦆⦅newline⦆5. Boston Museum of History: This is a world-renowned history museum located in the heart of Boston` Installation instruction: Visit: https://github.com/OpenNMT/OpenNMT-py make sure you install flash-attn and autoawq Enjoy detailed MMLU scoring: ``` ACC-abstract_algebra: 0.3600 ACC-anatomy: 0.6444 ACC-astronomy: 0.7303 ACC-business_ethics: 0.6400 ACC-clinical_knowledge: 0.7283 ACC-college_biology: 0.8056 ACC-college_chemistry: 0.5300 ACC-college_computer_science: 0.5900 ACC-college_mathematics: 0.3700 ACC-college_medicine: 0.6936 ACC-college_physics: 0.4510 ACC-computer_security: 0.7900 ACC-conceptual_physics: 0.6468 ACC-econometrics: 0.5614 ACC-electrical_engineering: 0.6414 ACC-elementary_mathematics: 0.4630 ACC-formal_logic: 0.4524 ACC-global_facts: 0.4600 ACC-high_school_biology: 0.8000 ACC-high_school_chemistry: 0.5320 ACC-high_school_computer_science: 0.7400 ACC-high_school_european_history: 0.8121 ACC-high_school_geography: 0.8081 ACC-high_school_government_and_politics: 0.9275 ACC-high_school_macroeconomics: 0.6923 ACC-high_school_mathematics: 0.3667 ACC-high_school_microeconomics: 0.7731 ACC-high_school_physics: 0.4636 ACC-high_school_psychology: 0.8569 ACC-high_school_statistics: 0.5278 ACC-high_school_us_history: 0.8431 ACC-high_school_world_history: 0.8650 ACC-human_aging: 0.7175 ACC-human_sexuality: 0.7710 ACC-international_law: 0.8347 ACC-jurisprudence: 0.7778 ACC-logical_fallacies: 0.7791 ACC-machine_learning: 0.5357 ACC-management: 0.7767 ACC-marketing: 0.9145 ACC-medical_genetics: 0.7100 ACC-miscellaneous: 0.8404 ACC-moral_disputes: 0.7775 ACC-moral_scenarios: 0.4112 ACC-nutrition: 0.7876 ACC-philosophy: 0.7492 ACC-prehistory: 0.7963 ACC-professional_accounting: 0.5177 ACC-professional_law: 0.5111 ACC-professional_medicine: 0.7390 ACC-professional_psychology: 0.7304 ACC-public_relations: 0.6727 ACC-security_studies: 0.7061 ACC-sociology: 0.8706 ACC-us_foreign_policy: 0.9100 ACC-virology: 0.5060 ACC-world_religions: 0.8538 ACC-all: 0.6707 [2023-12-22 16:35:03,999 INFO] total run time 7156.16 ```