Environment Setup
Download this directory to a local machine and set up uv.
Install
uv(if you haven't already):curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | shSync the environment:
uv sync(This automatically creates a virtual environment at
.venvand strictly installs the dependencies locked inuv.lock.)Activate the environment:
source .venv/bin/activate
Evaluation Script
Run:
accelerate launch eval.py \
--model cloverlm \
--model_args "pretrained=daslab-testing/CloverLM,dtype=bfloat16,quartet_2_impl=quartet2,attn_backend=pytorch" \
--tasks "arc_easy_mi,arc_challenge_mi,hellaswag,piqa" \
--num_fewshot 0 \
--include_path ./ \
--trust_remote_code \
--confirm_run_unsafe_code \
--batch_size auto
Expected Evaluation Results
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|----------------|------:|------|-----:|---------------|---|-----:|---|-----:|
|arc_challenge_mi| 1|none | 0|acc |↑ |0.4625|± |0.0146|
| | |none | 0|acc_mutual_info|↑ |0.5094|± |0.0146|
| | |none | 0|acc_norm |↑ |0.4923|± |0.0146|
|arc_easy_mi | 1|none | 0|acc |↑ |0.7997|± |0.0082|
| | |none | 0|acc_mutual_info|↑ |0.7239|± |0.0092|
| | |none | 0|acc_norm |↑ |0.7731|± |0.0086|
|hellaswag | 1|none | 0|acc |↑ |0.5392|± |0.0050|
| | |none | 0|acc_norm |↑ |0.7167|± |0.0045|
|piqa | 1|none | 0|acc |↑ |0.7922|± |0.0095|
| | |none | 0|acc_norm |↑ |0.8058|± |0.0092|
Alternative Backends
Replace quartet_2_impl=quartet2 with quartet_2_impl=pseudoquant on non-Blackwell GPUs.
You can try attn_backend=pytorch/flash2/flash3/flash4 if you have the corresponding libs installed.