Dataset used to train SantaCoder

#43
by nihaljn - opened

Which dataset between The Stack (v1.1) and The Stack Dedup (v1.1) was used to train SantaCoder?

The SantaCoder repo links to the former but can this be confirmed?

Sign up or log in to comment