Would it be difficult to get a xlm-roberta-large version of this model up?

#4
by marhlder - opened

Pretty please :D

I think it might need pretraining the visual network again with layers that run parallel to roberta-large for it to work. you might have to add layers since attention heads are more in the large variant & from how I understand the architecture, the attention heads in both the networks must be same for Bi-ACM to work.

Sign up or log in to comment