Edit model card

A small version of DeBERTa trained on the clean version of google C4 dataset. For more info about the size of the model, see config.json.

The model has been trained for 100K steps with a batch size of 2048 and a sequence length of 512, for a total of 104B tokens.

The vocabulary and the tokenizer are the same as microsoft/deberta-base.

Downloads last month
2

Dataset used to train lucadiliello/deberta-small