Thanks for sharing

#1
by JuLuComputing - opened

This is an interesting model, thanks for sharing!

I am interested in the 'bitsandbytes' script you used to make this 8 bit, would you share that script and any other scripts pertaining to this model?

Thanks!

You're welcome! It's actually super easy with a recent update, see this section on the example repo

BTW if you find it useful, I have been running a bunch of different summarization models over the same set of source documents to compare the results, you can see that at the below links:

You sure do have a treasure trove of useful scripts and models, here and on Github. I thank you again for sharing all of this work!

The SummComparer is a great project that fits a well needed niche. I have been seeking a versatile summing LLM that can do programming languages, business docs, instructional manuals, and books. It seems most LLMs fall flat on their face in one area or another, most can't ingest the large number of tokens of a book, and if one is good with books, it is usually terrible at code.

Do you have a particular one from your line of experimental summing LLMs you would recommend I try?

Thanks so much! yeah the generalizable summarization stuff is interesting. any feedback on that btw is more than welcome. Someday I plan to make a V2 of this gauntlet and will include some more missing document 'genres', but first want to try and get some sort of results/overview relating what sorts of architectures/fine-tuning datasets, so on relate to "general performance" on the different 'document genres'

As far as the best model to try - this 'generalizable summarization model' is still very much a work in progress, but I would recommend either this one or my latest upload: long-t5-xl on elife split of scientific lay-summaries. I can't tell you which generalizes better yet, but from some initial tests on the gauntlet documents, pszemraj/long-t5-tglobal-xl-sci-simplify-elife does better on non-scientific/technical docs than I would have thought.

  • in general I'd recommend sticking to booksum-based things when in doubt (like this one pszemraj/long-t5-tglobal-xl-16384-book-summary-8bit). my theory is that by having to summarize a story or narrative, the summary a) pays attention to beginning/middle/end b) makes few/no assumptions on what the reader knows and therefore explains things well. these two qualities (theory here very rough) enable the generalization for unseen document types.
  • caveat: one issue is that because of the 'general public domain' requirement to be in kmfoda/booksum, books/source material are rather old, and therefore models trained on this dataset may miss-spell terms etc that either did not exist/were not common at the time. there are ofc several ways to potentially solve that
  • an initial experiment (there are many permutations to be done, which will be slow) on improving this with the base model is here which you are welcome to try. it seems to do better than the base, but has a strange issue where inference is muuuch slower than the starting checkpoint, despite use_cache=True as it should be - unsure why

Sign up or log in to comment