thesephist/contra-bottleneck-t5-small-wikipedia · Training Script and Model details

Hello! I'm really intrigued by the potential of this model for a research project I'm working on at the University of Zurich. I was wondering if you might be willing to share some details about your training process and the model details. If that's not something you're comfortable with, I completely understand and respect your decision. From your YouTube presentation, I gathered that you made some modifications to the attention mechanism, and I'm curious about how that might affect my ability to fine-tune or pretrain the model for a different domain. I am not an expert in model architectures, thus I was wondering how the training looks like. Thank you for your time!