This is the OpenNMT-py converted version of Mixtral 8x7b, 4-bit AWQ quantized.

The safetensors file is 24GB hence needs 2x24GB GPUs (3090 or 4090) or 1x48GB (A6000).

To run the model on 2 GPU the config file needs to have:
world_size: 2
gpu_ranks: [0, 1]
parallel_mode: "tensor_parallel"

If you are lucky to have a A6000 (or V/A/H100 with more than 32GB), then use:
world_size: 1
gpu_ranks: [0]
#parallel_mode: "tensor_parallel"

Command line to run is:

`python onmt/bin/translate.py --config /pathto/mixtral-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt`

Where for instance, input-vicuna.txt contains:

`USER:｟newline｠Show me some attractions in Boston.｟newline｠｟newline｠ASSISTANT:｟newline｠`

Output will be:

`Here are some attractions in Boston:｟newline｠｟newline｠1. Boston Common: This is a historic park located in the heart of Boston. It features a variety of attractions, including the Boston Common Fountain, the Boston Common Bandstand, and the Boston Common Carousel.｟newline｠｟newline｠2. Boston Public Garden: This is a historic park located in the heart of Boston. It features a variety of attractions, including the Boston Public Garden Fountain, the Boston Public Garden Bandstand, and the Boston Public Garden Carousel.｟newline｠｟newline｠3. Boston Museum of Fine Arts: This is a world-renowned art museum located in the heart of Boston. It features a variety of attractions, including the Boston Museum of Fine Arts Fountain, the Boston Museum of Fine Arts Bandstand, and the Boston Museum of Fine Arts Carousel.｟newline｠｟newline｠4. Boston Museum of Science: This is a world-renowned science museum located in the heart of Boston. It features a variety of attractions, including the Boston Museum of Science Fountain, the Boston Museum of Science Bandstand, and the Boston Museum of Science Carousel.｟newline｠｟newline｠5. Boston Museum of History: This is a world-renowned history museum located in the heart of Boston`


Installation instruction:

Visit: https://github.com/OpenNMT/OpenNMT-py
make sure you install flash-attn and autoawq

Enjoy

detailed MMLU scoring:
```
ACC-abstract_algebra: 0.3600
ACC-anatomy: 0.6444
ACC-astronomy: 0.7303
ACC-business_ethics: 0.6400
ACC-clinical_knowledge: 0.7283
ACC-college_biology: 0.8056
ACC-college_chemistry: 0.5300
ACC-college_computer_science: 0.5900
ACC-college_mathematics: 0.3700
ACC-college_medicine: 0.6936
ACC-college_physics: 0.4510
ACC-computer_security: 0.7900
ACC-conceptual_physics: 0.6468
ACC-econometrics: 0.5614
ACC-electrical_engineering: 0.6414
ACC-elementary_mathematics: 0.4630
ACC-formal_logic: 0.4524
ACC-global_facts: 0.4600
ACC-high_school_biology: 0.8000
ACC-high_school_chemistry: 0.5320
ACC-high_school_computer_science: 0.7400
ACC-high_school_european_history: 0.8121
ACC-high_school_geography: 0.8081
ACC-high_school_government_and_politics: 0.9275
ACC-high_school_macroeconomics: 0.6923
ACC-high_school_mathematics: 0.3667
ACC-high_school_microeconomics: 0.7731
ACC-high_school_physics: 0.4636
ACC-high_school_psychology: 0.8569
ACC-high_school_statistics: 0.5278
ACC-high_school_us_history: 0.8431
ACC-high_school_world_history: 0.8650
ACC-human_aging: 0.7175
ACC-human_sexuality: 0.7710
ACC-international_law: 0.8347
ACC-jurisprudence: 0.7778
ACC-logical_fallacies: 0.7791
ACC-machine_learning: 0.5357
ACC-management: 0.7767
ACC-marketing: 0.9145
ACC-medical_genetics: 0.7100
ACC-miscellaneous: 0.8404
ACC-moral_disputes: 0.7775
ACC-moral_scenarios: 0.4112
ACC-nutrition: 0.7876
ACC-philosophy: 0.7492
ACC-prehistory: 0.7963
ACC-professional_accounting: 0.5177
ACC-professional_law: 0.5111
ACC-professional_medicine: 0.7390
ACC-professional_psychology: 0.7304
ACC-public_relations: 0.6727
ACC-security_studies: 0.7061
ACC-sociology: 0.8706
ACC-us_foreign_policy: 0.9100
ACC-virology: 0.5060
ACC-world_religions: 0.8538
ACC-all: 0.6707
[2023-12-22 16:35:03,999 INFO] total run time 7156.16

```