How can I use this model on CPU?
I tried and got error (ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
)
But flash_attn requires cuda/GPU so i want able to install it.
What command did you run to load the model?
This one:
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-1b-redpajama-200b', trust_remote_code=True)
However, I tried on a GPU and quality is pretty bad. It almost cannot generate anything that make sense,it is able to write just basic things, maybe 200B is just too short training. Hope 7B model will be better.
But I appreciate effort of working on the open-source models.
I don't expect the quality to be that good. It's a pretty small model, and the underlying dataset is of unknown quality. We intend this model to be another way of getting to know the redpajama dataset, not something necessarily good that you should try to use in production. It's possible that (1B, 200B) is too little, or that the dataset is of poor quality. We leave that analysis to the community, and we hope this model is helpful in making that determination.
It sounds like you've been able to use the model, though, so I'm going to close this issue.
For anyone else ending up here, you should be able to run on CPU without installing flash/Triton. I suspect you may need a more recent transformers version, as they recently added skipping of try/except blocks when checking imports.
I was able to do (run on cpu)
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-1b-redpajama-200b', trust_remote_code=True)
tokenizer = transformers.AutoTokenizer.from_pretrained('mosaicml/mpt-1b-redpajama-200b', trust_remote_code=True)
print(model.generate(**tokenizer('hello', return_tensors='pt'), max_new_tokens=2))
after pip install transformers torch==1.13.1 einops