Docmatix - a huge dataset for Document Visual Question Answering
•
71
It was roughly trained for 1 month on 32 nodes of 8 H100s
Thanks for the feedback. If flash attention is a problem you can always enable/disable it in the loading of the model
We will publish all the details on how the foundation model was trained during its release!