Would it be difficult to get a xlm-roberta-large version of this model up?

by marhlder - opened May 12, 2023

Discussion

marhlder

May 12, 2023

Pretty please :D

pikaduck

May 16, 2023

•

edited May 16, 2023

I think it might need pretraining the visual network again with layers that run parallel to roberta-large for it to work. you might have to add layers since attention heads are more in the large variant & from how I understand the architecture, the attention heads in both the networks must be same for Bi-ACM to work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment