Dataset of lora ?

by chau9ho - opened Jan 19, 2024

Jan 19, 2024

Hi Joseph, I'm currently in the process of gathering a dataset for fine-tuning Cantonese LLM models with Qlora. I was wondering if you could shed some light on what kind of dataset you've been using for the "lora" model. Thanks!!

indiejoseph

Owner Jan 19, 2024

Hi, this model used translated OASST dataset, but the result was not good, we’ve found 2 factors might affected, 1.) Llama2 isn’t fluent in Cantonese https://hon9kon9ize.com/posts/2023-12-18-llm-finetuning1 , 2.) SFT dataset translated from zh contains a lot of mistakes , we are working on an open source Cantonese SFT dataset, will publish soon . Contact me if you interested to learn more indiejoseph@gmail.com

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment