JonasGeiping
/

crammed-bert-legacy

Inference Endpoints

Model card Files Files and versions Community

JonasGeiping commited on Feb 7, 2023

Commit

440a8a9

•

1 Parent(s): 53976ce

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -42,8 +42,16 @@ model  = AutoModelForMaskedLM.from_pretrained("JonasGeiping/test-crammedBERT-c5"
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
 ```
 ### Limitations and bias

 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
+# The c5 variant can only run on CUDA with AMP autocasting.
+model.cuda()
+cuda_input = {k:i.cuda() for k,i in encoded_input.items()}
+with torch.autocast("cuda"):
+    output = model(**cuda_input)
 ```
+If you want to use the `c5` model (which include flash-attention) on `cpu`, load the config with `config.arch["attention"]["type"] = "pytorch"` instead and convert all missing weights.
 ### Limitations and bias