Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
conversational
custom_code
text-generation-inference

How to run on Colab's CPU?

#4
by deepakkaura26 - opened

Can someone suggest or show me through piece of code that how to run this model (i.e MPT-30B-CHAT) on colab's CPU

Colab has only 12.7 GB of RAM and MPT-30B-CHAT files are almost 60 GB so it's not possible.

@beoswindvip Can you suggest me which other models I can use ?

@beoswindvip Can you suggest me which other models I can use ?

You can run 7B models(4bit or 8bit quantization) on the Colab Free Plan GPU,
Such as https://huggingface.co/TheBloke/vicuna-7B-v1.3-GPTQ .

@swulling Does this or these 7B models can run easily on CPU also ?

@swulling Does this or these 7B models can run easily on CPU also ?

You can use ggml version of the models to run on CPU.

try GPT4ALL or LLaMA.cpp

@swulling firstly thanku so much and one last question,

from text_generation import InferenceAPIClient

client = InferenceAPIClient("OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")

complete_answer = ""
for response in client.generate_stream("<|prompter|>Write Job Description for Data Scientist<|endoftext|><|assistant|>"):
print(response.token)
complete_answer += response.token.text

print(complete_answer)

Apart from OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 model as per above piece of code "which other models I can use"?

I suggest choosing a Chat model with a higher ranking to achieve better results.

Ref: https://chat.lmsys.org/ Leaderboard

Apart from OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 model as per above piece of code "which other models I can use"?

Sign up or log in to comment