Training model with Custom Data

#1
by anantsaxena - opened

I want to train this model with custom data for the tables structures I have. The accuracy of default model is not at par. Any repo to get help on this.

Microsoft org

Hi @nielsr ,

I am facing an issue in downloading the data from Microsoft open dataset; Do you have any suggestions on the custom table structure data annotation/tagging or downloading the Microsoft open dataset PubTables-1M?

Reference:https://msropendata.com/datasets/505fcbe3-1383-42b1-913a-f651b8b712d3

Issue: Not able to log in to Microsoft open dataset.

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine tune the microsoft structure recognition model ?

@nielsr just to add to the above, I have a high quality dataset I used to try and fine-tune the model but it just weakened the model, making it worse than with pre-trained weights. I'm having a hard time figuring out if pretrained weights should be taken from DetrForObjectDetection or TableTransformerForObjectDetection when loading the pretrained model, what other params should be used, what is the purpose of using no_timm in the original Detr object detection fine-tuning example etc. I'm also not sure how to apply the difference of Table Transformer normalizing before MLP instead of after during training.

This worked (no errors during training), but gave bad results
self.model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")
This gave errors during training and bad results
self.model = DetrForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine-tune the microsoft structure recognition model ?

Hey @ankitom
Have you tried training the same on custom dataset using AutoModelForObjectDetection ?
https://huggingface.co/docs/transformers/tasks/object_detection#training-the-detr-model

I haven't tried it yet but I am going to do the same in coming weeks so I'll update on the same once done

Hey everyone. Does anyone have annotated image dataset for structure recognition model?

@nielsr just to add to the above, I have a high quality dataset I used to try and fine-tune the model but it just weakened the model, making it worse than with pre-trained weights. I'm having a hard time figuring out if pretrained weights should be taken from DetrForObjectDetection or TableTransformerForObjectDetection when loading the pretrained model, what other params should be used, what is the purpose of using no_timm in the original Detr object detection fine-tuning example etc. I'm also not sure how to apply the difference of Table Transformer normalizing before MLP instead of after during training.

This worked (no errors during training), but gave bad results
self.model = TableTransformerForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")
This gave errors during training and bad results
self.model = DetrForObjectDetection.from_pretrained("microsoft/table-transformer-structure-recognition")

Hey @qooob can you please share your dataset of tables ?

Hi @amitkumarp . Did you get any custom dataset of tables for fine tuning ?

@

Hi @nielsr ,

I have fine tuned the microsoft table detector using custom data using your approach and results are great , but when I tried to fine tune the microsoft table structure recognition with four classes table, column, row and header, results were very bad. Any suggestions how to fine-tune the microsoft structure recognition model ?

Hey @ankitom
Have you tried training the same on custom dataset using AutoModelForObjectDetection ?
https://huggingface.co/docs/transformers/tasks/object_detection#training-the-detr-model

I haven't tried it yet but I am going to do the same in coming weeks so I'll update on the same once done

@pathikg Have you tried it yet ?

Sign up or log in to comment