How to add other languages?

#1
by NERDDISCO - opened

This project is super nice, thanks for creating it. It's working well with English, but I'm wondering what would be needed to also support other languages.

I would love to also have this for German and I'm open to help out with whatever is needed here.

This project is super nice, thanks for creating it. It's working well with English, but I'm wondering what would be needed to also support other languages.

I would love to also have this for German and I'm open to help out with whatever is needed here.

Hi there! Sorry for the delayed response, I hadn't seen this yet.
I've not tried to include other languages yet mostly because there aren't that many datasets available for this type of task. If you wanted to support German (or any other language) the best idea would be to make a dataset similar to the one used to train this model, and then fine-tune it on that.

Maybe it'd even be possible to simply translate the dataset used for this model (in whole or in part) so that the model can train on what is/isn't acceptible.

For a base model, I would recommend using "mdeberta-v3", as it is already trained on multiple languages so it'll have a head start on this task. I believe the regular deberta-v3 (that this moderation model is trained on) is primarily English-only.

I hope this information helps, and if you'd be interested in making a German dataset, feel free to let me know as I'd love to expand this project! :)

Sign up or log in to comment