Not able to run without GPU
Is there a way to run the model on a cpu? I've tried passing device="cpu" to the SenteceTransformer but am getting this error:
File ~/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
...
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
784 has_bias = bias is not None
As a follow up, when I remove the trust_from_remote_source and revision numbers, embeddings are returned but with a message that makes me unsure of what model is actually being used to generate them:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('Hum-Works/lodestone-base-4096-v1')
sentences = ["This is an example sentence", "Each sentence is converted"]
Response:
Some weights of the model checkpoint at /home//.cache/torch/sentence_transformers/Hum-Works_lodestone-base-4096-v1/ were not used when initializing BertModel: ['encoder.layer.8.mlp.gated_layers.weight', 'encoder.layer.0.attention.self.Wqkv.weight', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.7.mlp.wo.bias', 'encoder.layer.10.attention.self.Wqkv.bias', 'encoder.layer.2.mlp.wo.bias', 'encoder.layer.7.mlp.wo.weight', 'encoder.layer.1.attention.self.Wqkv.bias', 'encoder.layer.9.mlp.wo.weight', 'encoder.layer.4.mlp.layernorm.weight', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.0.mlp.layernorm.bias', 'encoder.layer.2.mlp.wo.weight', 'encoder.layer.7.mlp.gated_layers.weight', 'encoder.layer.11.mlp.layernorm.bias', 'encoder.layer.6.mlp.layernorm.weight', 'encoder.layer.8.attention.self.Wqkv.weight', 'encoder.layer.0.attention.self.Wqkv.bias', 'encoder.layer.10.mlp.layernorm.weight', 'encoder.layer.1.mlp.layernorm.bias', 'encoder.layer.3.mlp.wo.weight', 'encoder.layer.10.mlp.wo.weight', 'encoder.layer.8.mlp.wo.bias', 'encoder.layer.3.attention.self.Wqkv.weight', 'encoder.layer.10.mlp.layernorm.bias', 'encoder.layer.9.attention.self.Wqkv.bias', 'encoder.layer.9.attention.self.Wqkv.weight', 'encoder.layer.11.mlp.gated_layers.weight', 'encoder.layer.3.mlp.wo.bias', 'encoder.layer.0.mlp.layernorm.weight', 'encoder.layer.8.attention.self.Wqkv.bias', 'encoder.layer.10.mlp.gated_layers.weight', 'encoder.layer.7.attention.self.Wqkv.weight', 'encoder.layer.6.mlp.gated_layers.weight', 'encoder.layer.6.mlp.wo.bias', 'encoder.layer.5.mlp.layernorm.weight', 'encoder.layer.7.mlp.layernorm.bias', 'encoder.layer.5.mlp.layernorm.bias', 'encoder.layer.8.mlp.layernorm.weight', 'encoder.layer.5.mlp.wo.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.1.mlp.wo.weight', 'encoder.layer.4.mlp.gated_layers.weight', 'encoder.layer.4.attention.self.Wqkv.bias', 'encoder.layer.11.attention.self.Wqkv.bias', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.1.attention.self.Wqkv.weight', 'encoder.layer.5.attention.self.Wqkv.bias', 'encoder.layer.6.mlp.wo.weight', 'encoder.layer.0.mlp.wo.bias', 'encoder.layer.2.attention.self.Wqkv.bias', 'encoder.layer.5.attention.self.Wqkv.weight', 'encoder.layer.6.mlp.layernorm.bias', 'encoder.layer.1.mlp.layernorm.weight', 'encoder.layer.4.mlp.wo.weight', 'encoder.layer.10.attention.self.Wqkv.weight', 'encoder.layer.9.mlp.layernorm.weight', 'encoder.layer.7.mlp.layernorm.weight', 'encoder.layer.9.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.bias', 'encoder.layer.3.attention.self.Wqkv.bias', 'encoder.layer.11.mlp.layernorm.weight', 'encoder.layer.4.attention.self.Wqkv.weight', 'encoder.layer.2.attention.self.Wqkv.weight', 'encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.weight', 'encoder.layer.9.mlp.wo.bias', 'encoder.layer.0.mlp.gated_layers.weight', 'encoder.layer.4.mlp.wo.bias', 'encoder.layer.3.mlp.gated_layers.weight', 'encoder.layer.9.mlp.layernorm.bias', 'encoder.layer.6.attention.self.Wqkv.bias', 'encoder.layer.11.mlp.wo.bias', 'encoder.layer.8.mlp.wo.weight', 'encoder.layer.6.attention.self.Wqkv.weight', 'encoder.layer.3.mlp.layernorm.bias', 'encoder.layer.11.attention.self.Wqkv.weight', 'encoder.layer.8.mlp.layernorm.bias', 'encoder.layer.2.mlp.gated_layers.weight', 'encoder.layer.7.attention.self.Wqkv.bias', 'encoder.layer.11.mlp.wo.weight', 'encoder.layer.1.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertModel were not initialized from the model checkpoint at /home//.cache/torch/sentence_transformers/Hum-Works_lodestone-base-4096-v1/ and are newly initialized: ['encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.2.attention.self.value.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.0.attention.self.query.bias', 'encoder.layer.0.attention.self.value.bias', 'encoder.layer.10.attention.self.value.weight', 'encoder.layer.7.attention.self.query.bias', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.11.attention.self.value.bias', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.7.attention.self.value.weight', 'encoder.layer.4.attention.self.query.weight', 'encoder.layer.10.attention.self.key.weight', 'encoder.layer.0.attention.self.value.weight', 'encoder.layer.3.attention.self.key.bias', 'encoder.layer.11.attention.self.key.weight', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.5.attention.self.value.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.11.output.LayerNorm.weight', 'embeddings.position_embeddings.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.6.attention.self.value.bias', 'encoder.layer.4.attention.self.key.bias', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.4.attention.self.key.weight', 'encoder.layer.11.attention.self.query.weight', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.9.attention.self.key.weight', 'encoder.layer.11.attention.self.key.bias', 'encoder.layer.5.attention.self.query.weight', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.4.output.dense.weight', 'encoder.layer.6.attention.self.key.bias', 'encoder.layer.10.attention.self.query.bias', 'encoder.layer.11.output.dense.weight', 'encoder.layer.10.attention.self.value.bias', 'encoder.layer.6.output.dense.bias', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.4.attention.self.query.bias', 'encoder.layer.2.attention.self.key.weight', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.9.attention.self.value.weight', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.9.attention.self.query.bias', 'encoder.layer.1.attention.self.query.weight', 'encoder.layer.9.output.dense.weight', 'encoder.layer.10.attention.self.query.weight', 'encoder.layer.3.attention.self.key.weight', 'encoder.layer.6.output.dense.weight', 'encoder.layer.3.output.dense.weight', 'encoder.layer.7.attention.self.value.bias', 'encoder.layer.8.attention.self.value.bias', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.5.output.dense.weight', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.9.attention.self.query.weight', 'encoder.layer.1.attention.self.query.bias', 'encoder.layer.6.attention.self.key.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.2.attention.self.query.bias', 'encoder.layer.4.output.dense.bias', 'encoder.layer.1.attention.self.key.bias', 'encoder.layer.8.output.dense.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.3.attention.self.value.bias', 'encoder.layer.4.attention.self.value.weight', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.1.output.dense.bias', 'encoder.layer.6.attention.self.query.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.0.attention.self.query.weight', 'encoder.layer.9.attention.self.key.bias', 'encoder.layer.10.output.dense.weight', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.9.output.dense.bias', 'encoder.layer.6.attention.self.value.weight', 'encoder.layer.1.attention.self.key.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.7.attention.self.key.weight', 'encoder.layer.5.attention.self.value.bias', 'encoder.layer.11.output.dense.bias', 'encoder.layer.0.attention.self.key.weight', 'encoder.layer.8.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.8.attention.self.query.weight', 'encoder.layer.11.attention.self.value.weight', 'encoder.layer.2.output.dense.bias', 'encoder.layer.5.output.dense.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.8.attention.self.value.weight', 'encoder.layer.2.attention.self.key.bias', 'encoder.layer.6.attention.self.query.bias', 'encoder.layer.2.attention.self.query.weight', 'encoder.layer.7.intermediate.dense.weight', 'encoder.layer.5.attention.self.key.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.8.attention.self.key.bias', 'encoder.layer.9.attention.self.value.bias', 'encoder.layer.5.attention.self.query.bias', 'encoder.layer.8.attention.self.query.bias', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.0.output.dense.bias', 'encoder.layer.5.attention.self.key.weight', 'encoder.layer.11.attention.self.query.bias', 'encoder.layer.2.attention.self.value.bias', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.10.attention.self.key.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.1.attention.self.value.bias', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.4.attention.self.value.bias', 'encoder.layer.1.attention.self.value.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.0.attention.self.key.bias', 'encoder.layer.7.attention.self.query.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.3.attention.self.query.weight', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.7.output.dense.weight', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.8.attention.self.key.weight', 'encoder.layer.7.attention.self.key.bias', 'encoder.layer.10.output.dense.bias', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.3.attention.self.value.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.3.attention.self.query.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Hey @megil-paxton ,
Good catch, there is a bug with the code from our base model. If you have Triton installed it tries to use it, but Triton doesn't support running on CPU.
I didn't catch this when I tested CPU usage on my machine as I was on Windows (terrible I know) and therefore didn't have triton installed.
It's a pretty straightforward fix. I'll get it pushed for you in the next couple days.