Running on single Nvidia K80 GPU with large context to generate long output

#29
by airtable - opened

Hi there

I previously used dolly-v2-3b and dolly-v2-7b to try and generate long text but it kept regenerating the same text over and over.

My scenario is that I need to find relevant documents using FAISS vector search where k=10/k=20 etc and expect model to generate similar documents using Prompt with data given in the prompt.

Can I use MPT-7b to generate this type of document with long context on a single GPU Azure cloud instance?

VM Size
Standard_NC6 (6 cores, 56 GB RAM, 380 GB disk) With Nvidia K80 GPU

Mosaic ML, Inc. org

The model should fit onto a k80. You'll need to use standard torch attention. If you instantiate the model with max_seq_len=4096 you should be able to get sequences twice as long as the dolly models you were trying.

Depending on the type of documents this could work? Really depends how similar the task is to pretraining data.

If you get repetitive output, try searching for a good no_repeat_ngram_size (somewhere between 3-9) and repetition_penalty(somewhere between 1.01 and 1.2), as well as increasing the temperature.

sam-mosaic changed discussion status to closed

Thanks @sam-mosaic , appreciate your response, I will try to see how to configure these settings and try again

airtable changed discussion status to open
Mosaic ML, Inc. org

Closing for now as this issue has gone stale

abhi-mosaic changed discussion status to closed

Sign up or log in to comment