About the dataset

#1
by Abnmd - opened

Hello brother, it would be really helpful for me if you provide the details of the dataset you used for finetuning. I have been struggling to get a good Malayalam dataset.

Hey!
There are some really awesome Malayalam datasets available.
Here's the list of a few:

I hope this helps!

(viswanathakarthav@gmail.com)

Thank you very much. Looking forward to using the dataset you have provided.

But don't you think in Malayalm there is a lack real world data like different slangs ?

To be fair yes, the availability of open data that has different slangs and stuff are quite hard to get, but not necessarily impossible for huge companies. Ig, if we could collaborate with Manglish keyboard or something else that has awesome data, it'll be great. But overally, the lack of real world data is a huge problem, especially considering the inherent diversity in malayalam language, still I'd say it'll be an awesome win if we could, at the very least make a excellent model that can work on print language itself.

True

Sign up or log in to comment