Transformers
mpt
Composer
MosaicML
llm-foundry
StreamingDatasets
text-generation-inference
6 papers

Non llama.cpp GPU compatible libraries?

#1
by FriendlyVisage - opened

I see that it (currently) only works with ggml and rustformers' llm. Do either of those support GPU? If not, are GPTQ versions of MPT-7B-* possible?

There's now also a Python library it works with - https://github.com/marella/ctransformers

The Python library doesn't yet support CUDA but it is planned to be added soon.

There is already a GPTQ model for MPT 7B on Huggingface. I've not tried it myself yet but I'm told it works in KobaldAI and text-generation-webui. You need to pip install einops and then apparently it should work.

I'm planning to make my own at some point, once I've investigated the process a bit more.

I've seen the storywriter MPT-7b, but not the chat or the instruct. (The two that I'm really interested in.)

I've seen the storywriter MPT-7b, but not the chat or the instruct. (The two that I'm really interested in.)

currently, those two have a context window of only 2048 and 1024 (errored out when inputting larger context, but perhaps there's some config that overrides that. haven't had time to investigate yet)

According to their documentation, the context windows can be lengthened:

Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)

Unless you're saying that they errored out even with max_seq_len increased.

According to their documentation, the context windows can be lengthened:

Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)

Unless you're saying that they errored out even with max_seq_len increased.

Gotta try. Thank you. I must have missed that when reading the docs.

Sign up or log in to comment