Quantized using llmcompressor 0.9.0 with following recipe:
default_stage:
default_modifiers:
QuantizationModifier:
targets: [Linear]
ignore: ['re:.*lm_head', lm_head]
scheme: FP8_DYNAMIC
You can deploy the model using sglang with following command:
python3 -m sglang.launch_server \
--model-path CHNtentes/Nanbeige4-3B-Thinking-2511-FP8-DYNAMIC \
--host 0.0.0.0 \
--port 9999 \
--trust-remote-code \
--mem-fraction-static 0.9 \
--context-length 16384
Modify the context length based on your vram size.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for CHNtentes/Nanbeige4-3B-Thinking-2511-FP8-DYNAMIC
Base model
Nanbeige/Nanbeige4-3B-Base
Finetuned
Nanbeige/Nanbeige4-3B-Thinking-2511