Non llama.cpp GPU compatible libraries?

by FriendlyVisage - opened May 20, 2023

May 20, 2023

I see that it (currently) only works with ggml and rustformers' llm. Do either of those support GPU? If not, are GPTQ versions of MPT-7B-* possible?

TheBloke

Owner May 21, 2023

There's now also a Python library it works with - https://github.com/marella/ctransformers

The Python library doesn't yet support CUDA but it is planned to be added soon.

There is already a GPTQ model for MPT 7B on Huggingface. I've not tried it myself yet but I'm told it works in KobaldAI and text-generation-webui. You need to pip install einops and then apparently it should work.

I'm planning to make my own at some point, once I've investigated the process a bit more.

FriendlyVisage

May 22, 2023

I've seen the storywriter MPT-7b, but not the chat or the instruct. (The two that I'm really interested in.)

ongkn

May 22, 2023

I've seen the storywriter MPT-7b, but not the chat or the instruct. (The two that I'm really interested in.)

currently, those two have a context window of only 2048 and 1024 (errored out when inputting larger context, but perhaps there's some config that overrides that. haven't had time to investigate yet)

FriendlyVisage

May 22, 2023

According to their documentation, the context windows can be lengthened:

Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)

Unless you're saying that they errored out even with max_seq_len increased.

ongkn

May 23, 2023

According to their documentation, the context windows can be lengthened:

Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)
Unless you're saying that they errored out even with max_seq_len increased.

Gotta try. Thank you. I must have missed that when reading the docs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment