Compatibility with mps/Mac M1?

by ymgenesis - opened Aug 10, 2023

Aug 10, 2023

•

edited Aug 10, 2023

torch_dtype=auto doesn't seem to take mps.
I get AttributeError: 'GPTNeoXForCausalLM' object has no attribute 'mps' when trying to troubleshoot.
I installed the nightly torch with:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

Here's the changes I made:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablecode-instruct-alpha-3b",
  trust_remote_code=True,
  torch_dtype=torch.bfloat16,
)
model=model.to("mps")
model.mps()
inputs = tokenizer("###Instruction\nGenerate a python function to find number of CPU cores###Response\n", return_tensors="pt").to("mps")
tokens = model.generate(
  **inputs,
  max_new_tokens=48,
  temperature=0.2,
  do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

and I get:

TypeError: BFloat16 is not supported on MPS

I thought float16 was made compatible recently?

froznfox

Aug 10, 2023

This is not a solution, but you can run it using the CPU.

model.to("cpu")    <---- Add this
inputs = tokenizer(
    "###Instruction\nGenerate a java function to find number of CPU cores###Response\n", 
    return_tensors="pt",
    return_token_type_ids=False,    <---- Add this
).to("cpu")    <---- Add this

tokens = model.generate(
  **inputs,
  max_new_tokens=48,
  temperature=0.2,
  do_sample=True,
  pad_token_id=50256    <---- Add this
)

ymgenesis

Aug 11, 2023

Thanks that seemed to work, though using CPU it takes about 10 minutes to generate an answer (expectedly).

Looking forward to mps compatibility with PyTorch (https://pytorch.org/docs/stable/notes/mps.html).

ymgenesis

Aug 11, 2023

•

edited Aug 11, 2023

Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    print("Attempting to use MPS...")
    mps_device = torch.device("mps")

    tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
    streamer = TextStreamer(tokenizer)
    model = AutoModelForCausalLM.from_pretrained(
      "stabilityai/stablecode-instruct-alpha-3b",
      trust_remote_code=True,
    )
    model.to(mps_device)
    
    inputs = tokenizer(
      "\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n",
      return_tensors="pt",
      return_token_type_ids=False,
    ).to(mps_device)
    tokens = model.generate(
      **inputs,
      max_new_tokens=48,
      temperature=0.2,
      do_sample=True,
      pad_token_id=50256,
      streamer=streamer
    )

    print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Changing max_new_tokens to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.

AJDP

Aug 14, 2023

@ymgenesis thanks so much for this!
After using your solution I ran into another issue,

RuntimeError: MPS does not support cumsum op with int64 input

Got it working on my M1 Macbook Pro by following this solution:
pip3 install --upgrade --no-deps --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Ref: https://github.com/pytorch/pytorch/issues/96610#issuecomment-1593230620

Taking guidance from the link from pytorch above I seem to have got it working with mps on my M1 Macbook Pro.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    print("Attempting to use MPS...")
    mps_device = torch.device("mps")

    tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-instruct-alpha-3b")
    streamer = TextStreamer(tokenizer)
    model = AutoModelForCausalLM.from_pretrained(
      "stabilityai/stablecode-instruct-alpha-3b",
      trust_remote_code=True,
    )
    model.to(mps_device)
    
    inputs = tokenizer(
      "\n###Instruction\n\nGenerate a python function to find number of CPU cores\n\n###Response\n",
      return_tensors="pt",
      return_token_type_ids=False,
    ).to(mps_device)
    tokens = model.generate(
      **inputs,
      max_new_tokens=48,
      temperature=0.2,
      do_sample=True,
      pad_token_id=50256,
      streamer=streamer
    )

    print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Changing max_new_tokens to something larger to get anything of length. Added TextStreamer to visualize the output as generation is still slow with mps, but it's definitely using it.

patchu

Aug 16, 2023

This comment has been hidden

LinyiZheng

10 days ago

you could use torch.float16 replace torch. bfloat16，bfloat16 run with cpu and cuda only currently。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment