Could you please add evaluation for above dataset across the different sentence agree variants?

Thanks for reporting @nichmuchi! I've opened an issue on the datasets side that you can track here:

Just note that this dataset only has a train split, so the evaluations should be taken with a grain of salt :)