Add to HF Inference APIs

by mrfakename - opened Jun 14

Jun 14

Hi,
It would be really useful to be able to use this through the Hugging Face Inference APIs (which would require this model to be compatible with Transformers). Are there any plans to add Transformers support to the model?
Thanks!

cc @reach-vb

Qubitium

Jun 15

Second this. Please have this model transformer-fied. I would like to release gptq quant for this but need a hf transformer compatible model.

nicoboss

Jun 15

•

edited Jun 15

Someone made the Tokenizer Hugging Face compatible but not sure what this helps if the weights itself are only available in the NeMo format: https://huggingface.co/Xenova/Nemotron-4-340B-Instruct-Tokenizer

failspy

Jun 16

Working on this here: https://huggingface.co/failspy/Nemotron-4-340B-Instruct-SafeTensors

Lacking a HF Transformers class for it as of now -- still working on that part if anyone wants to help, but the weights are ported to be similar to Llama-3's arch (though not perfect, for example QKV proj is not split), and plausible hypothetical config.json. Also includes the tokenizer from @Xenova

ZQ-Dev

Jun 18

nealv

NVIDIA org Jun 24

Hi all -- regarding inference APIs, you can use the model on https://build.nvidia.com/nvidia/nemotron-4-340b-instruct. There's an interactive widget there as well as an API you can use.

ZQ-Dev

Jun 24

@nealv I think one of the main reasons people would like the model released in HF format is to more easily create quantizations with the intent of running inference on a local stack.

Anything in the works from your team that might assist with that effort?

nealv

NVIDIA org Jun 24

@ZQ-Dev yep, we're working on it. As @failspy pointed out we'd need to modify and upstream the model class as well.
We're looking at fp8 quantization too. Hopefully that will make it easier to deploy.

hellwigt-eq

Jun 26

@nealv +1 to fp8, as 8xA100 nodes are much more readily available than 16x at this time.

natolambert

Jul 20

There's now a paid bounty for this to get closed ASAP. $175 and growing.
https://x.com/natolambert/status/1814735390877884823

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment