Large as in what number?

by Delcos - opened

Does this use the same naming scheme as DialoGPT. If not what does the large stand for?

Microsoft org

Yeah actually they're both called "large" but they differ a bit in terms of hyperparameters.

BioGPT-large has 48 layers, and uses a hidden size of 1600. You can check the config for all details.

DialoGPT-large on the other hand has 36 layers, and uses a hidden size of 1280. See also the config for all details.

Awesome thanks :) .

Delcos changed discussion status to closed

Sign up or log in to comment