Why is the "measured" benchmark score of Llama-3-8B so low?

#6
by c6sneaky - opened

Why are the "measured" benchmark scores of Llama-3-8B so much lower than the "reported" one? Would you mind to share the sampling settings and inference framework of the your measurement in order to reproduce the results?

OpenChat org

Because Llama-3 8B Instruct can't follow few-shot templates correctly. It's likely that the official results are obtained with special prompts. To reproduce, use the following command with openchat repo

python -m ochat.evaluation.run_eval --model meta-llama/Meta-Llama-3-8B-Instruct --model-type llama3_instruct --eval_sets fs_cothub/mmlu fs_cothub/gsm8k fs_cothub/math zs/gpqa

HumanEval results are from EvalPlus leaderboard

For example,

Instruction

The following are multiple choice questions (with answers) about high school biology.

Q: In animal cells, which of the following represents the most likely pathway that a secretory protein takes as it is synthesized in a cell?
(A) Plasma membrane–Golgi apparatus–ribosome–secretory vesicle–rough ER (B) Ribosome–Golgi apparatus–rough ER–secretory vesicle–plasma membrane (C) Plasma membrane–Golgi apparatus–ribosome–secretory vesicle–rough ER (D) Ribosome–rough ER–Golgi apparatus–secretory vesicle–plasma membrane
A: Let's think step by step. Protein synthesis starts at the ribosome, so we can eliminate (A) and (C). The ribosome is often in the endoplasmic reticulum and moves from there to the Golgi apparatus, where it is modified and packaged into a vesicle. The vesicle then floats to the plasma membrane and is secreted. The answer is (D).

Q: A mutation in a bacterial enzyme changed a previously polar amino acid into a nonpolar amino acid. This amino acid was located at a site distant from the enzyme’s active site. How might this mutation alter the enzyme’s substrate specificity?
(A) By changing the enzyme’s pH optimum (B) By changing the enzyme’s location in the cell (C) By changing the shape of the protein (D) An amino acid change away from the active site cannot alter the enzyme’s substrate specificity.
A: Let's think step by step. A change in an amino acid leads to a change in the primary structure of the protein. A change in the primary structure may lead to a change in the secondary and the tertiary structure of the protein. A change in the tertiary structure means a change in the shape of the protein, so (C) has to be correct. Since the change does not affect the active site of the enzyme, we do not expect the activity of the enzyme to be affected. The answer is (C).

Q: Which of the following is not a way to form recombinant DNA?
(A) Translation (B) Conjugation (C) Specialized transduction (D) Transformation
A: Let's think step by step. The introduction of foreign DNA or RNA into bacteria or eukaryotic cells is a common technique in molecular biology and scientific research. There are multiple ways foreign DNA can be introduced into cells including transformation, transduction, conjugation, and transfection. In contrast, (A) is not a way to form DNA: during translation the ribosomes synthesize proteins from RNA. The answer is (A).

Q: Homologous structures are often cited as evidence for the process of natural selection. All of the following are examples of homologous structures EXCEPT
(A) the wings of a bird and the wings of a bat (B) the flippers of a whale and the arms of a man (C) the pectoral fins of a porpoise and the flippers of a seal (D) the forelegs of an insect and the forelimbs of a dog
A: Let's think step by step. ​​Homologous structures are similar physical features in organisms that share a common ancestor ​​but different functions. Comparisons (B) and (C) are clearly homologous because they share a common ancestor and the structures serve different purposes. Bat wings and birg wings are also homologous, while they are both wings, the forelimbs serve different purposes. Insects and dogs are very far ancestors since one is vertebrate while the other is invertebrate and the forelimbs serve the same purpose, so they are not homologous. The answer is (D).

Q: Which of the following is not known to be involved in the control of cell division?
(A) Cyclins (B) Protein kinases (C) Checkpoints (D) Fibroblast cells
A: Let's think step by step. Normal cells move through the cell cycle in a regulated way. At the checkpoint stage, they use information about their own internal state and cues from the environment around them to decide whether to proceed with cell division. Cues like these act by changing the activity of core cell cycle regulators inside the cell. The most common regulators are cyclins and cyclin-dependent kinases. Fibroblast cells do not play any role in cell division. The answer is (D).

Mosquitoes resistant to the pesticide DDT first appeared in India in 1959 within 15 years of widespread spraying of the insecticide. Which of the following statement best explains how the resistant mosquitoes arose?
(A) Some mosquitoes experienced a mutation after being exposed to DDT that made them resistant to the insecticide. Then their population expanded because these moquitoes had no competition. (B) Some mosquitoes were already resistant to DDT when DDT was first sprayed. Then their population expanded because all the susceptible mosquitoes had been exterminated. (C) DDT is generally a very effective insecticide. One can only conclude that it was manufactured improperly. (D) Although DDT is effective against a wide range of insects, it is not effective against mosquitoes. 
A: Let's think step by step.

Response

It looks like you're providing multiple-choice questions with answers about high school biology. I'd be happy to help you go through them!

Please go ahead and provide the next question, and I'll help you analyze it step by step.

Sign up or log in to comment