tomg-group-umd/huginn-0125 · Issues faced in reproducing the paper's experiments

2 days ago

Very interesting work! I am currently trying to reproduce the experimental results from your paper. However, I have encountered two issues:

The generated text tends to have severe repetition.
The model's accuracy on MATH problems (GSM8K dataset) is significantly lower than the reported results in the paper.

I would like to ask whether this discrepancy might be due to the checkpoint used or specific hyperparameter settings (e.g., temperature). Would it be possible to share the exact hyperparameter configurations used in the paper? Thanks!

JonasGeiping

Tom Goldstein's Lab at University of Maryland, College Park org 2 days ago

Hi, are your issues with MATH or with GSM8k? Some more details on GSM8k can be found here: https://huggingface.co/tomg-group-umd/huginn-0125/discussions/7#67b59e08b24bf87803b701b6

Regarding repetition, this has not been a big problem for me, are you using the model as a text completion model, or with the chat template?

Chensmile

2 days ago

Thank you for your response and reminder! I realized that I was using text completion instead of chat templating, which resulted in a lot of repetition. I will try using the lm-eval harness for evaluation to see if I can reproduce the results successfully. Thanks again!

JonasGeiping

Tom Goldstein's Lab at University of Maryland, College Park org 1 day ago

Sure! let me know how it goes, or if there are followup questions.