Training Script and Model details

#3
by 4luc - opened

Hello! I'm really intrigued by the potential of this model for a research project I'm working on at the University of Zurich. I was wondering if you might be willing to share some details about your training process and the model details. If that's not something you're comfortable with, I completely understand and respect your decision. From your YouTube presentation, I gathered that you made some modifications to the attention mechanism, and I'm curious about how that might affect my ability to fine-tune or pretrain the model for a different domain. I am not an expert in model architectures, thus I was wondering how the training looks like. Thank you for your time!

Sign up or log in to comment