How to query on float values in model?

#3
by ratandhruv - opened

Thanks for sharing your model. I am trying to load some data from csv. The rows have string values and float values for some columns. Idea is to query on these float values and get results.

from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd

tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-sql-execution")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-sql-execution")

csv_file_path = "smallData.csv"
data = pd.read_csv(csv_file_path)
 #Convert all values to string to resolve error sequence item 4: expected str instance, float found
for key, values in data.items():
    data[key] = [str(value) for value in values]

table = pd.DataFrame.from_dict(data)

# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "Give me asins with protein_value below 111"
encoding = tokenizer(table=data, query=query, return_tensors="pt")

outputs = model.generate(**encoding,max_length=1024)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

I convert the float values to string to resolve an error, but lose the ability to query on numeric values. Such as Select products where float_value_YY < 75.0 -> This does not understand 75.0 as its stored in string fashion.

In the examples, I see I am able to query over float values as well. What can I do to fix this?

Microsoft org

@ratandhruv Thanks for your interest on our work! Since the model only accepts string as the input, so it seems okay for me to convert float values into strings to feed into model. Since the model is already pre-trained on SQL execution and it should be sensitive to the number values. In conclusion, it is okay to convert float values into string.

Somehow the model does not return correct data while comparing. Only when checking exact queries do I get the correct answer.

Microsoft org

@ratandhruv Can you try more examples? If your table contains a lot of floating numbers, I would recommend you to convert them into integers for the model to have a better performance. And also, understanding numbers for current models are quite challenging though.

SivilTaram changed discussion status to closed

Sign up or log in to comment