From Megratron GPT-2 or GPT-3?

#75
by jmassot - opened

Hi all,
I have a question regarding the architecture. In the HF documentation, it is mentioned that Bloom is similar to GPT-3 but the model card indicates that the architecture is derived from Megatron-GPT2.
GPT-3 has a similar parameter count as Bloom and also has fixed bugs compared to GPT-2.
What is the best way to describe Bloom's architecture? GPT-2 or GPT-3 like?
I am a little bit lost here when I need to choose the best ancestor for Bloom :-)
Thanks
Best regards
Jerome

The authors state under the ,,technical specifications" that BLOOM's code is a modified version of Megatron-LM GPT2.

Thanks R1MN. Technical specifications indicate modified from Megraton-LM GPT2 but after it is mentioned in the HF documentation that Bloom is a GPT-3 like model...

Sign up or log in to comment