EchoStreet
/

mpt-7b

Text Generation

StreamingDatasets

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

rlanner-echocap commited on Jul 28, 2023

Commit

e234444

•

1 Parent(s): 2c50b69

Update handler.py

pushing model to cuda:0

Files changed (1) hide show

handler.py +1 -1

handler.py CHANGED Viewed

@@ -12,7 +12,7 @@ class EndpointHandler:
         # load the model
         tokenizer = AutoTokenizer.from_pretrained(path)
         model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", torch_dtype=dtype, trust_remote_code=True)
-        model.to('cuda')
         # create inference pipeline
         self.pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, device='cuda:0')

         # load the model
         tokenizer = AutoTokenizer.from_pretrained(path)
         model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", torch_dtype=dtype, trust_remote_code=True)
+        model.to('cuda:0')
         # create inference pipeline
         self.pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, device='cuda:0')