Is SEA-LION trained on Singaporean culture?
#13
by
SBSTFRNNDZ
- opened
Just wondering, as it is a South East Asian LLM, is Singapore represented during the training?
Hi,
Thank you for your interest in SEA-LION.
SEA-LION 7B is pre-trained on texts extracted from CommonCrawl and websites from Singapore domains are certainly included.
However, due to the small amount of data available from the SG domains, and especially since most SG data are in English, representation of our other national languages like Singaporean Chinese, Malay and Tamil are relatively small.
Understood thank you Raymond!