why we can not make this fully HF ready?

#11

by CUIGuy - opened Jul 25, 2023

Jul 25, 2023

Loading cerebras/btlm-3b-8k-base requires to execute some code in that repo, you can inspect the content of the repository at https://hf.co/cerebras/btlm-3b-8k-base. You can dismiss this prompt by passing trust_remote_code=True.

It will be nice if people can just run it without checking these details.

daria-soboleva

Cerebras org Jul 25, 2023

@CUIGuy I agree :) One of the constraints that we had is that HuggingFace does not support MuP implementation that we shared with our BTLM model in a custom class implementation. We believe it can greatly benefit your fine-tuning regime. Once MuP is fully adopted by the HuggingFace, it should just work without adding additional flags. We are in close communication with HuggingFace, so we believe it should happen soon.

CUIGuy

Jul 25, 2023

•

edited Jul 26, 2023

Is there more information on your claim on MuP implementation will benefit fine-tuning ? I can understand it is useful for pretraining. @daria-soboleva

CUIGuy

Jul 25, 2023

also, will this be compatible with something like vllm down the road?

daria-soboleva

Cerebras org Jul 26, 2023

@CUIGuy we are releasing our paper soon with all the details on how MuP is helpful, but for now feel free to take a look at https://arxiv.org/abs/2304.03208 or https://github.com/microsoft/mup for details on how it works. On a high-level, it should drastically reduce amount of experiments that you want to try out with HP. You can do that with smaller scale and zero shot those params into a larger scale. Saves you compute needed to find best HP for the large scale.

Thank you for the recommendation to support on the vllm, for now we have support on HF, but if there is more demand on adding it to vllm codebase, can certainly do that :)

CUIGuy

Jul 26, 2023

@CUIGuy we are releasing our paper soon with all the details on how MuP is helpful, but for now feel free to take a look at https://arxiv.org/abs/2304.03208 or https://github.com/microsoft/mup for details on how it works. On a high-level, it should drastically reduce amount of experiments that you want to try out with HP. You can do that with smaller scale and zero shot those params into a larger scale. Saves you compute needed to find best HP for the large scale.

Thank you for the recommendation to support on the vllm, for now we have support on HF, but if there is more demand on adding it to vllm codebase, can certainly do that :)

Thanks. By the way, do you have a timeline on when the hf version will be ready? Also, when it (hf version will be better) will be on the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?

daria-soboleva

Cerebras org Jul 26, 2023

@CUIGuy we are releasing our paper soon with all the details on how MuP is helpful, but for now feel free to take a look at https://arxiv.org/abs/2304.03208 or https://github.com/microsoft/mup for details on how it works. On a high-level, it should drastically reduce amount of experiments that you want to try out with HP. You can do that with smaller scale and zero shot those params into a larger scale. Saves you compute needed to find best HP for the large scale.

Thank you for the recommendation to support on the vllm, for now we have support on HF, but if there is more demand on adding it to vllm codebase, can certainly do that :)

Thanks. By the way, do you have a timeline on when the hf version will be ready? Also, when it (hf version will be better) will be on the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?

I guess you mean when HF version without trust remote code flag will be ready? I would imagine in the next few months, but unfortunately cannot provide any more concrete deadlines.

CUIGuy

Jul 26, 2023

ok, meanwhile, is it possible to release a frozen version (without MuP) so that we can be use it without poking into it? Many people only care about fine tuning on the specific sized models, so MuP is not that useful at all. @daria-soboleva

rskuzma

Jul 26, 2023

•

edited Jul 26, 2023

Hi @CUIGuy , thanks for your interest! I don't believe HF currently supports SwiGLU and ALiBi for its GPT2 model class that we use (though maybe I've missed an alternative) so even without muP a custom class and trust_remote_code may be required for models with the BTLM architecture

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment