Transformers
mpt
Composer
MosaicML
llm-foundry
StreamingDatasets
text-generation-inference
6 papers

Python library

#2
by marella - opened

Hi, thanks for uploading the models.

I created Python bindings for the GGML models https://github.com/marella/ctransformers which currently supports MPT, LLaMA and many other models (see Supported Models).

It provides a unified interface for all models, supports LangChain and can be used with Hugging Face Hub models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained('TheBloke/MPT-7B-GGML', model_type='mpt', model_file='mpt-7b.ggmlv3.q4_0.bin')

print(llm('AI is going to'))

If you add config.json file to your model repo with { "model_type": "mpt" } then model_type can be omitted:

llm = AutoModelForCausalLM.from_pretrained('TheBloke/MPT-7B-GGML', model_file='mpt-7b.ggmlv3.q4_0.bin')

If there is only one model file in repo then model_file can be omitted:

llm = AutoModelForCausalLM.from_pretrained('TheBloke/MPT-7B-GGML')

Please see marella/gpt-2-ggml for reference.

Oh wonderful, this looks really great @marella . I will update the README to point people to this great tool.

Does it support CUDA offloading?

Currently it doesn't support CUDA but I'm planning to add it in future after adding few more features and models.

OK thanks. I know I'm going to get a lot of people asking about that. Like this guy: https://huggingface.co/TheBloke/MPT-7B-GGML/discussions/1#6469176ab2321e47d3288721

I've added your library and GPT4All-UI to the READMEs for my three MPT models

image.png

Sign up or log in to comment