Thank you

by ehartford - opened Nov 20, 2023

Discussion

ehartford

Nov 20, 2023

This is beautiful.

KarimJedda

Nov 20, 2023

I concur.

What's weird though is that some files show up as "commited on Nov 15, 2022", glitch? o_o

breadlicker45

Nov 21, 2023

This is beautiful.

this is 5 years old you know

alpindale

Nov 21, 2023

This is beautiful.

this is 5 years old you know

Does not make it any less beautiful.

JuLuComputing

Nov 21, 2023

•

edited Nov 21, 2023

This is beautiful.

Yes, I'm very impressed as well. Are you going to do something cool with this, like fine tuning or swapping out some components for smaller (and more competent/contemporary) models? My vote is: Yes, please!

I could see a system of special Samanthas tearin' it up. ;-)

breadlicker45

Nov 22, 2023

This is beautiful.

this is 5 years old you know

Does not make it any less beautiful.

agreed

mrfakename

Nov 22, 2023

How does anyone run this

mrfakename

Nov 22, 2023

Is there a gguf version @thebloke

breadlicker45

Nov 23, 2023

•

edited Nov 23, 2023

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

JuLuComputing

Nov 23, 2023

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

The Model Card gives another option. It gives examples for GPU and CPU, a GPU-poor person could run it on a machine with a lot of ram and no GPU:

https://huggingface.co/google/switch-c-2048#running-the-model-on-a-cpu

mrfakename

Nov 23, 2023

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

Have any extras :)

pszemraj

Nov 25, 2023

+1 for the GGUF version. Personally, I am very excited to be able to mask and then unmask tokens on the CPU

mrfakename

Nov 26, 2023

🤯 How many GB of memory do you have to run this @pszemraj

JuLuComputing

Nov 26, 2023

+1 for the GGUF version. Personally, I am very excited to be able to mask and then unmask tokens on the CPU

It's quite the beast. Have you looked at Switch 128 or 256? You may not need the GGUF.

JuLuComputing

Nov 26, 2023

🤯 How many GB of memory do you have to run this @pszemraj

I was curious, too. I just added the the switch 2048, 256, and 128 XXL to my download list. I'll report back in a few days after the turtle speed downloads on my end are complete.

mrfakename

Nov 26, 2023

@JuLuComputing If downloading speed is an issue, have you considered pget? I think it uses multithreading or something by downloading different chunks of the model at once to speed things up. Seems a bit faster for HF models than wget, however I haven't benchmarked it

JuLuComputing

Nov 26, 2023

@JuLuComputing If downloading speed is an issue, have you considered pget? I think it uses multithreading or something by downloading different chunks of the model at once to speed things up. Seems a bit faster for HF models than wget, however I haven't benchmarked it

Thanks for the suggestion! I'm going to check it out and see if how well it works. I gave up on wget a long time ago. Currently, I use a modified version of github/bodaay. It has some issues with sometimes thinking a download is complete when there are certain types of data errors, so I built an error handling script around it to make up for its deficiencies. It does 5 threads by default and has some level of data error handling and SHA checking. As for speed, I think it's negligible, if not slower that just using a browser download due to some Huggingface algorithm, I presume. Nonetheless, the advantages of a downloader is for handling Huggingface folders with large numbers of files and the ability to resume downloads.

vmajor

Nov 26, 2023

•

edited Nov 26, 2023

This would need to be fine tuned. It has no direct utility as it is, as far as I can understand.

Thus "simply" running it will not achieve anything. It needs to be trained on further data in order for it to do anything beyond interesting.

EDIT: actually no... it can do a lot as an encoder model. Can even use it as a super tokenizer of sorts. I wonder if that would help the causal language models perform better since they would be fed tokens generated by a presumably very capable MoE.

vmajor

Nov 26, 2023

•

edited Nov 26, 2023

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

the model is 3.1 TB in size. How does your math work that out, or am I wrong, or missing something? Even at INT8 the model would need approx 780GB of VRAM to load and 4 x H100 NVL would supply 752 GB.

pszemraj

Nov 26, 2023

EDIT: actually no... it can do a lot as an encoder model. Can even use it as a super tokenizer of sorts. I wonder if that would help the causal language models perform better since they would be fed tokens generated by a presumably very capable MoE.

Congrats 🎉 you have discovered the encoder-decoder architecture with a slight twist

breadlicker45

Nov 28, 2023

This comment has been hidden

breadlicker45

Nov 28, 2023

•

edited Nov 28, 2023

How does anyone run this

you would need 4 h100 gpus to run it, that's what my math says

the model is 3.1 TB in size. How does your math work that out, or am I wrong, or missing something? Even at INT8 the model would need approx 780GB of VRAM to load and 4 x H100 NVL would supply 752 GB.

Wait let me check my math again, you would need around 24800gb of vram assuming it has a max length of 2048 tokens. The math is (3100 * 4) * (2048/1024)

JuLuComputing

Nov 28, 2023

•

edited Nov 28, 2023

Alright, @ehartford , @pszemraj , @breadlicker45 , and anyone else reading this, I have tested the smaller '256' version of this model and posted a script I made for running it. For comparison, the 256 model used 60GB of ram while working. I'm not sure I'll be able to load this big 2048 model on any machine I have without ram caching to disk.... a lot of caching to disk....

Check out the script, I hope this work is useful for you guys!
https://huggingface.co/google/switch-base-256/discussions/6#6566121866d5f87c6297fbeb

My script reported that on a Dell R820, quad Xeon E5-4657L v2 CPUs, 1TB DDR3 1866 ram, I got a consistent 2 tokens/sec.

breadlicker45

Nov 29, 2023

Alright, @ehartford , @pszemraj , @breadlicker45 , and anyone else reading this, I have tested the smaller '256' version of this model and posted a script I made for running it. For comparison, the 256 model used 60GB of ram while working. I'm not sure I'll be able to load this big 2048 model on any machine I have without ram caching to disk.... a lot of caching to disk....

Check out the script, I hope this work is useful for you guys!
https://huggingface.co/google/switch-base-256/discussions/6#6566121866d5f87c6297fbeb

My script reported that on a Dell R820, quad Xeon E5-4657L v2 CPUs, 1TB DDR3 1866 ram, I got a consistent 2 tokens/sec.

your going to need 24tb of ram to run switch-c-2048

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment