More Model Information requested

#1
by robsi94 - opened

Hi
Nice work :)
Do you have more information about your model? Like a filled out model card.

What kind of Hardware did you use?
Any evaluation?

What does “twc” mean?

The settings are basically the same as with https://huggingface.co/malteos/gpt2-xl-wechsel-german

Except for the adaption approach, which is TWC and not WECHSEL. More details on this will be in our upcoming paper.

I see.
Keep me posted :)

robsi94 changed discussion status to closed

I see.
Keep me posted :)

More details and a 6B model are now available! See our preprint: https://arxiv.org/abs/2301.09626

malteos changed discussion status to open

Nice work 💪🏻

do you have any stats on how much compute you needed? What specific hardware did you use and how long was the training in time.

Will be any smaller model available ?

Sign up or log in to comment