Text2Text Generation
Transformers
PyTorch
English
switch_transformers

Thank you

#9
by ehartford - opened

This is beautiful.

I concur.

What's weird though is that some files show up as "commited on Nov 15, 2022", glitch? o_o

This is beautiful.

this is 5 years old you know

This is beautiful.

this is 5 years old you know

Does not make it any less beautiful.

This is beautiful.

Yes, I'm very impressed as well. Are you going to do something cool with this, like fine tuning or swapping out some components for smaller (and more competent/contemporary) models? My vote is: Yes, please!

I could see a system of special Samanthas tearin' it up. ;-)

This is beautiful.

this is 5 years old you know

Does not make it any less beautiful.

agreed

How does anyone run this

Is there a gguf version @thebloke

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

The Model Card gives another option. It gives examples for GPU and CPU, a GPU-poor person could run it on a machine with a lot of ram and no GPU:

https://huggingface.co/google/switch-c-2048#running-the-model-on-a-cpu

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

Have any extras :)

+1 for the GGUF version. Personally, I am very excited to be able to mask and then unmask tokens on the CPU

🤯 How many GB of memory do you have to run this @pszemraj

+1 for the GGUF version. Personally, I am very excited to be able to mask and then unmask tokens on the CPU

It's quite the beast. Have you looked at Switch 128 or 256? You may not need the GGUF.

🤯 How many GB of memory do you have to run this @pszemraj

I was curious, too. I just added the the switch 2048, 256, and 128 XXL to my download list. I'll report back in a few days after the turtle speed downloads on my end are complete.

@JuLuComputing If downloading speed is an issue, have you considered pget? I think it uses multithreading or something by downloading different chunks of the model at once to speed things up. Seems a bit faster for HF models than wget, however I haven't benchmarked it

@JuLuComputing If downloading speed is an issue, have you considered pget? I think it uses multithreading or something by downloading different chunks of the model at once to speed things up. Seems a bit faster for HF models than wget, however I haven't benchmarked it

Thanks for the suggestion! I'm going to check it out and see if how well it works. I gave up on wget a long time ago. Currently, I use a modified version of github/bodaay. It has some issues with sometimes thinking a download is complete when there are certain types of data errors, so I built an error handling script around it to make up for its deficiencies. It does 5 threads by default and has some level of data error handling and SHA checking. As for speed, I think it's negligible, if not slower that just using a browser download due to some Huggingface algorithm, I presume. Nonetheless, the advantages of a downloader is for handling Huggingface folders with large numbers of files and the ability to resume downloads.

This would need to be fine tuned. It has no direct utility as it is, as far as I can understand.

Thus "simply" running it will not achieve anything. It needs to be trained on further data in order for it to do anything beyond interesting.

EDIT: actually no... it can do a lot as an encoder model. Can even use it as a super tokenizer of sorts. I wonder if that would help the causal language models perform better since they would be fed tokens generated by a presumably very capable MoE.

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

the model is 3.1 TB in size. How does your math work that out, or am I wrong, or missing something? Even at INT8 the model would need approx 780GB of VRAM to load and 4 x H100 NVL would supply 752 GB.

EDIT: actually no... it can do a lot as an encoder model. Can even use it as a super tokenizer of sorts. I wonder if that would help the causal language models perform better since they would be fed tokens generated by a presumably very capable MoE.

Congrats 🎉 you have discovered the encoder-decoder architecture with a slight twist

This comment has been hidden

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

the model is 3.1 TB in size. How does your math work that out, or am I wrong, or missing something? Even at INT8 the model would need approx 780GB of VRAM to load and 4 x H100 NVL would supply 752 GB.

Wait let me check my math again, you would need around 24800gb of vram assuming it has a max length of 2048 tokens. The math is (3100 * 4) * (2048/1024)

Alright, @ehartford , @pszemraj , @breadlicker45 , and anyone else reading this, I have tested the smaller '256' version of this model and posted a script I made for running it. For comparison, the 256 model used 60GB of ram while working. I'm not sure I'll be able to load this big 2048 model on any machine I have without ram caching to disk.... a lot of caching to disk....

Check out the script, I hope this work is useful for you guys!
https://huggingface.co/google/switch-base-256/discussions/6#6566121866d5f87c6297fbeb

My script reported that on a Dell R820, quad Xeon E5-4657L v2 CPUs, 1TB DDR3 1866 ram, I got a consistent 2 tokens/sec.

Alright, @ehartford , @pszemraj , @breadlicker45 , and anyone else reading this, I have tested the smaller '256' version of this model and posted a script I made for running it. For comparison, the 256 model used 60GB of ram while working. I'm not sure I'll be able to load this big 2048 model on any machine I have without ram caching to disk.... a lot of caching to disk....

Check out the script, I hope this work is useful for you guys!
https://huggingface.co/google/switch-base-256/discussions/6#6566121866d5f87c6297fbeb

My script reported that on a Dell R820, quad Xeon E5-4657L v2 CPUs, 1TB DDR3 1866 ram, I got a consistent 2 tokens/sec.

your going to need 24tb of ram to run switch-c-2048

Sign up or log in to comment