TAPAS table question answering for large tables

#2
by Kamil27 - opened

TAPAS is a great model, and its fine-tuning for table question answering is incredible. However, we face a problem when using it: the token limit. In fact, we can only work with small tables. It's quite frustrating.

In reality, it is possible to create multiple nodes using the model to bypass this limit and work with much larger tables. I am currently working on it, and I have made significant progress.

Here is a diagram that explains simply what this system consists of:
Beige Café Site Carte Diagramme.png

Hi, I am also using this model but it is not working with large datasets. Can you explain me how can we use this model on large data. Your reply will be useful to me.

Hello !

To use this pipeline, you have to import and run the model on multiple python files. In each file, you will have to import one part of you dataset. This pipeline will allow you to run the model on different parts of the dataset. This will increase the efficiency and the precision of the model.

There is still one problem that this pipeline can’t resolve : datasets with a high number of columns.
In this case, it will be impossible to split the dataset into multiple smaller sets.

I hope this explication will give you a better understanding of the pipeline.

If you have other questions, don’t hesitate to answer this comment.

Best regards,
Kamil.

Hi, Thanks for your reply. I am using this TAPAS model but it is having some limitations like max_num_rows = 64, but I need to work it on large dataset having 10k rows. As of now I implemented batch processing on the model but it is taking some time to give answers. Is there any other possible way to use it efficiently on large data.

Thanks,
Manju

Sign up or log in to comment