Training method

#5
by darraghd - opened

Hi @pszemraj
Your model works pretty well compared to other publically available tools - better than grammarly I think!. Nice job.
Would you mind sharing some details on how you trained. I saw it is an expanded version of JFLEG, do you add many extra samples ? Also, are the model params seen here what you use ? I would like to give google/flan-t5-xl or -xxl a try.
Best,
Darragh.

Hi Darragh,

Thank you for your kind words about the model! I'm glad to hear that it has been performing well for you.

I apologize for the delay in responding - this project has been on the back burner for a while now. Now that the FLAN models have shown some success, I'd like to write up a paper or blog post detailing what I did before releasing the dataset/repo (I want to make it completely open source.. after I write fancy words). The "full" v5 version of the dataset is approximately 180k unique rows. If you have the compute and are more interested in the inference side of things, I'd be more than happy to train those for you πŸ˜‰

I trained the model as a standard text-to-text model and did expand upon the JFLEG dataset by adding extra samples. The model parameters in that repository are similar to the ones I use for inference. BTW, see this discussion thread for details, but essentially the dataset consists of several sentences at a time, and so I'd recommend running inference in the same fashion: batches of 64-96 tokens ish (or, 2-3 sentences split with regex)

Happy to discuss this further if you have any other questions!

Thanks @pszemraj I would be very interested in the write-up. And thank you for the tip on inference - I found it worked well on longer passages but I only went up to ~ 100 tokens. I checked ChatGPT and this works pretty well, but there is no API and GPT3 api has the cost behind. I will give it a try to train and let you know how it goes.

darraghd changed discussion status to closed

Sign up or log in to comment