Can you quantify how much of the performance gains are due to the UL2 training objective?
Or would you even share the checkpoint after UL2 but before task fine-tuning?
Thank you for your question. The improvement in performance can primarily be attributed to the increase in available data, which allows the model to learn more knowledge effectively. Additionally, the UL2 training objective also contributes to the overall performance, although it should be noted that the improvement from this is relatively small (~1%). To summarize, adding more data is often one of the most effective ways to improve a specific task. However, once you hit the wall, UL2 can also be considered as a potential approach to further improve the performance.