book-summary dataset & training

#1
by ccdv - opened

Hey @pszemraj ,

I want to train my own long summarization model on the same book-summary dataset.
Where does this dataset come from (any paper? ) ?
How long did the training last? Did you use gradient-checkpointing? 32Gb gpus ?

Thank you

HI! thanks for reaching out. SalesForce Research initially released the dataset:

Training for the tglobal attention variant to create this checkpoint took about seven days with gradient checkpointing on a V100 GPU with 52 GB CPU + deepspeed. The local attention variant seems to train faster, not sure why precisely except for the layman's idea of "worse sparse attention mechanism so has to pay attention to less stuff" when training.

btw I don't plan to keep training the local variant much because I want to focus on getting this tglobal variant optimized (I am also training the large one too), but if you'd instead start from a mid-training checkpoint of longT5-local-base on booksum I am happy to post that publically

Thank you for your answer.
I plan to train my own long summarization model (from my repo) to compare performance to the LongT5. I will use a smaller model, its too expensive to run.
Did you compute any rouge metric yet?
Does n_positions=4096 refers to the max input length in the config?

hey sorry I didn't see your question somehow! checking on this

As for me, I've always found psychology to be fascinating and have aspired to work in the field. This is why I picked this specialty. But it wasn't as easy as it first appeared to be. The problem is that I want the freedom to enjoy my time as a student. This is why I require help with my study essays. When I came across https://writix.com/essay-examples/psychology I was overjoyed. Additionally, I choose it since they provide affordable essay writing services. Their best quality is that they do extensive study before writing, which is essential to getting excellent results.

@ccdv can do more detailed checking, but I believe that might refer to an attention block; as far as I can tell, the max input length is indeed 16384 and not reduced (you can also validate by trying to summarize long text and seeing if crucial information at the end of your text appears)

pszemraj changed discussion status to closed

Sign up or log in to comment