issue about evaluate script

#4
by hh2017 - opened

Hello,

Firstly, I want to extend my gratitude for your incredible work in the field of open-source NLP. Your contributions have been invaluable to the community.

I am reaching out to inquire if you could provide the zero-shot script used for testing the Jais-30B-chat model on MMLU-like Arabic datasets. I've been attempting to use the model for multiple-choice questions, but unfortunately, my attempts result in responses such as, "I'm sorry, I cannot provide the correct answer to multiple-choice questions. " For reference, I am using the 'prompt_ar' provided on your hf platform.

Additionally, I have another question regarding benchmark assessments for multiple-choice questions. When you conduct these benchmark evaluations, is the 'config' also set with 'do_sample=true'?

Any guidance or information you can provide would be greatly appreciated.

Best regards,

Sign up or log in to comment