Would it be difficult to get a xlm-roberta-large version of this model up?
#4
by
marhlder
- opened
Pretty please :D
I think it might need pretraining the visual network again with layers that run parallel to roberta-large for it to work. you might have to add layers since attention heads are more in the large variant & from how I understand the architecture, the attention heads in both the networks must be same for Bi-ACM to work.