Reproducing Spider Eval scores

#9
by baasitsh - opened

Hi,

I've been trying to reproduce the results as shown in the blog but I haven't been able to get any close to the scores mentioned based on the official test-suite for sql-eval(https://github.com/taoyds/test-suite-sql-eval), to my surprise when I went to look at the results on the Spider dev set, it seemed like the model had around 190 response incomplete and the score coming as 0.509 for execution, instead of 0.75.

So, I wanted to know if I'm doing anything wrong. Currently, I'm simply using model.generate without any extra parameters except max_length of 500 passed (Greedy Search). Are there any generation parameters I need to look at or is there any evaluation scripts open-sourced that I can make use of?

Thanks in advance.

NumbersStation org

Please use torch.float16 or torch.bfloat16 and set max_new_tokens greater than 300 tokens. The incomplete response might be because 1) different dtype 2) not enough max_new_token.

For what it's worth, I had the same experience of trying to reproduce the reported Spider benchmark scores and finding that the score I got was much lower than advertised.

There are some examples in their github repo, but I still found low scores even after being careful to follow the examples exactly: https://github.com/NumbersStationAI/NSQL/tree/main/examples.

For what it's worth, I had the same experience of trying to reproduce the reported Spider benchmark scores and finding that the score I got was much lower than advertised.

@djm2131 I remember facing the same issue even after setting the dtype to torch.bfloat16 and max_new_tokens to 400. I got somewhere around 0.647.

NumbersStation org

Thanks for your interest in our work!

Here are three things:

  1. The model works best with torch.bfloat16.
  2. Please give enough max_new_token such as 500.
  3. Please use the code from the official github and the database schema from the dataset instead of the database schema.

Please let us know if you still have issues.

Sign up or log in to comment