Model returning repeating words

#1
by abdulmatinomotoso - opened

Hello chinhon, thanks for this amazing model. I observed something which I will like to call your attention to when I tested the model in this article below.

''' "The NAACP is calling on the Boston School Committee not to choose a new superintendent Wednesday, calling the search process “fundamentally flawed” from its short timeline to the lack of Black and Latino finalists.“The lack of representation in the finalist pool should have immediately caused the process to pause, review, and reopen (if necessary),” Tanisha Sullivan, the organization’s president, wrote in a letter that was sent to School Committee Chair Jeri Robinson Friday.A search committee, jointly appointed by Mayor Michelle Wu and the School Committee, had hoped to present a more diverse slate of finalists, but two would be-finalists, a Latina and a Black woman, withdrew prior to the announcement. The remaining two are Somerville Superintendent Mary Skipper, who is white, and Tommy Welch, who is Japanese American and white.They are competing to replace outgoing Superintendent Brenda Cassellius who is leaving Thursday.Meanwhile, competition for Skipper intensified Monday night. The School Committee authorized its chair to sign a two-year extension to Skipper’s contract, which expires Thursday. The agreement, however, still allows for Skipper to go to Boston with at least 90 days notice.Having no Black or Latino finalists is a departure from past superintendent searches in a district where about three-quarters of students identify that way. A number of education advocates, including City Councilor Julia Mejia, have expressed disappointment over the lack of Black and Latino finalists.Sullivan in her letter urged the School Committee to expand the slate of finalists so there’s a more robust mix of representation and experience, noting that having only “two finalists for a nationally respected district like Boston should raise an automatic caution flag in the process. '''

It is giving the result below

" NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP NAACP calls for a 'fundamentally flawed' search for a 'fundamentally flawed' search for a 'fundamentally flawed' search for a 'fundamentally flawed' process "

Can this be a result of the dataset on which the model was fine-tuned? And also, can you release the dataset on which the model was fine-tuned.
Thanks, and I await your kind response.

Hi, thanks for pointing this out. Yes, I also suspect this is due to the fine-tuning dataset, which comprised of news articles from Singapore media outlets. So the subject matter and elements in that story have not been seen by the model before. Still, it's an unusual glitch. I have seen this before, but haven't investigated it at length. Unfortunately I can't release the dataset as it contains copy-righted material which I can use for prototyping, but can't disseminate in public.

I understand, thanks for your reply

Just curious chinhon, what was the size of the datasets you used in fine-tuning. > 1000 or > 10000?
Thanks

This finetuning dataset had 48,000 rows. In my limited experience, the pegasus models need at least 10,000 rows for decent quality fine tuning results

I get that. Thanks for your swift response.

This comment has been hidden
abdulmatinomotoso changed discussion status to closed
This comment has been hidden
This comment has been hidden
This comment has been hidden

Sign up or log in to comment