Brilliant!

by shafiqalibhai - opened Sep 6, 2023

Discussion

shafiqalibhai

Sep 6, 2023

Your speed. {{slow clapping}}

deleted

Sep 6, 2023

180B? Ouch. Gonna be a big one.

Flanua

Sep 6, 2023

Imagine how good it can be with 180B.

Pourfard

Sep 6, 2023

I hope I could run it on 4 3090.

mirek190

Sep 6, 2023

•

edited Sep 6, 2023

how good?
So far falcon 40B is worse than 13B llama2 .... so 180b maybe get level of 34b llama2 or a bit better .....

edit

HA!
I was right
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

falcon 180b has level llama2 34b ....

penut85420

Sep 7, 2023

Wow, how did you all manage to complete it so incredibly quickly?!!!!!!!!!!!!!!!!!!!

TheBloke

Owner Sep 7, 2023

@penut85420 I only uploaded it a couple of hours ago, so it took me 24 hours. Far too long :)

Had some problems overnight; I forgot the files would be >50GB so they failed to upload until I manually split them this morning.

Flanua

Sep 7, 2023

•

edited Sep 7, 2023

I hope I could run it on 4 3090.

Haha it's impossible. You need like 300+ GB of ram for that. :))

Corrections about ram:
You will need at least 400GB of memory to swiftly run inference with Falcon-180B.

Flanua

Sep 7, 2023

•

edited Sep 7, 2023

how good?
So far falcon 40B is worse than 13B llama2 .... so 180b maybe get level of 34b llama2 or a bit better .....

edit

HA!
I was right
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

falcon 180b has level llama2 34b ....

It's hard to train 180B model to unleash it's potential. A lot of resourced are required though.

TheBloke

Owner Sep 7, 2023

•

edited Sep 7, 2023

No it should work - you only need enough RAM to load each piece into RAM before it goes to VRAM. And the model is now sharded (split into multiple smaller files).

Each piece is only 10GB, so in theory you only need 10GB RAM + whatever overhead there is.

As for 4 x 24GB - that won't be enough to load the 4-bit, but should be enough to load the 3-bit.

Give it a try @Pourfard and let us know! So far I've only tested it on 2 x A100 80GB and 6 x L40 48GB.

Note: the model has to be loaded with Transformers directly (not AutoGPTQ), or Text Generation Inference. Loading with AutoGPTQ, or clients that use AutoGPTQ, won't currently work due to the sharding. If you're using text-generation-webui, it should work using the Transformers loader, though I've not tested that yet myself.

TheBloke

Owner Sep 7, 2023

It's hard to train 180B model to unleash it's potential. A lot of resourced are required though.

Sequence length of 2048 is also disappointing. And I don't think RoPE scaling works with Falcon yet (though I might be wrong - haven't checked if that was added in Transformers 4.33.0)

mirek190

Sep 7, 2023

falcon was promising few a months ago ( maybe will be in the future ) now seems obsolete comparing to llama2 variations ... 70b llama2 easily beats 180b! model.
Meta said they are going to release llma3 in near future the should be as powerful as GPT4 sooo .... not mention they also said even started working on llama4 as well ...

Flanua

Sep 7, 2023

falcon was promising few a months ago ( maybe will be in the future ) now seems obsolete comparing to llama2 variations ... 70b llama2 easily beats 180b! model.
Meta said they are going to release llma3 in near future the should be as powerful as GPT4 sooo .... not mention they also said even started working on llama4 as well ...

Base Llama2 was already beaten by this 180B model. 70b llama2 variations are fine tuned models and this 180B Falcon model is barebone model when it gets more trained and fine tuned you will see how capable it actually is.

tea-lover-418

Sep 8, 2023

falcon was promising few a months ago ( maybe will be in the future ) now seems obsolete comparing to llama2 variations ... 70b llama2 easily beats 180b! model.
Meta said they are going to release llma3 in near future the should be as powerful as GPT4 sooo .... not mention they also said even started working on llama4 as well ...

Base Llama2 was already beaten by this 180B model. 70b llama2 variations are fine tuned models and this 180B Falcon model is barebone model when it gets more trained and fine tuned you will see how capable it actually is.

But how would you run it. You need insane amounts of RAM to even run this model, let alone fine tune it.

Flanua

Sep 8, 2023

•

edited Sep 9, 2023

falcon was promising few a months ago ( maybe will be in the future ) now seems obsolete comparing to llama2 variations ... 70b llama2 easily beats 180b! model.
Meta said they are going to release llma3 in near future the should be as powerful as GPT4 sooo .... not mention they also said even started working on llama4 as well ...

Base Llama2 was already beaten by this 180B model. 70b llama2 variations are fine tuned models and this 180B Falcon model is barebone model when it gets more trained and fine tuned you will see how capable it actually is.

But how would you run it. You need insane amounts of RAM to even run this model, let alone fine tune it.

I will just use hp proliant dl360 gen10 server with 400GB of ram. I don't have that much ram yet but I will add it in the near future. Also we can use a swap file to increase the amount of ram available but with extremely slow performance though.

linavelz

Sep 15, 2023

What was your Tk/s on the 180B on 2 A100's?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment