Solar-Open-100B-NVFP4A16
This repository contains an NVFP4-quantized build of upstage/Solar-Open-100B, together with a Furiosa Executable Bundle (FXB) for running it on FuriosaAI RNGD with Furiosa-LLM. The base model also runs on other frameworks (such as vLLM, SGLang, and Transformers); for usage with those, see the upstream upstage/Solar-Open-100B model card.
Overview
Solar-Open-100B is a large-scale open-weight language model developed by Upstage. It is an auto-regressive Mixture-of-Experts (MoE) transformer that supports English and Korean, and it handles both reasoning and non-reasoning chat as well as tool (function) calling. Its intended use is the same as the upstream upstage/Solar-Open-100B, and it is released under the Upstage Solar License.
- Architecture: SolarOpen (Mixture-of-Experts)
- Input / Output: Text / Text
- Supported Inference Engine: Furiosa LLM
- Supported Hardware: FuriosaAI RNGD
Quantization
The weights are quantized to NVFP4 (4-bit floating point), while activations and the KV cache remain in 16-bit precision (NVFP4A16).
Features
- Reasoning. Solar-Open is a reasoning model. Launch the server with
--reasoning-parser solar_opento have the chain of thought returned in a separate field. - Tool calling. The model supports tool (function) calling through the
solar_opentool-call parser.
Parallelism Strategy
On RNGD, Solar-Open-100B-NVFP4A16 runs with a tensor-parallel size of 32 PEs, which maps to four RNGD cards (8 PEs per card).
Usage
To run this model with Furiosa-LLM, follow the example commands below after installing Furiosa-LLM and its prerequisites.
Launch the server
The simplest way to serve the model is:
# Launch the server, listening on port 8000 by default
furiosa-llm serve furiosa-ai/Solar-Open-100B-NVFP4A16 \
--reasoning-parser solar_open
The --reasoning-parser solar_open flag separates the model's chain of thought
from the final answer (see Reasoning below).
When the server is ready, you will see:
INFO: Started server process [27507]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Launch the server with tool calling
To enable tool (function) calling, start the server with the solar_open tool-call
parser:
furiosa-llm serve furiosa-ai/Solar-Open-100B-NVFP4A16 \
--reasoning-parser solar_open \
--enable-auto-tool-choice \
--tool-call-parser solar_open
Query the server
The server exposes an OpenAI-compatible API. You can send a request with curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "furiosa-ai/Solar-Open-100B-NVFP4A16",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}' \
| python -m json.tool
Reasoning
With --reasoning-parser solar_open, Solar-Open returns its reasoning separately
from the final answer:
response.choices[].message.reasoning(non-streaming)response.choices[].delta.reasoning(streaming)
Note: The
reasoningfield is not part of the OpenAI API specification, but it is the convention OpenAI recommends for returning the chain-of-thought (CoT) in Chat Completions-compatible APIs. The OpenAI Agents SDK usesreasoningas its primary property for the CoT, and many LLM serving frameworks (such as vLLM) follow the same convention. It appears only in responses that contain reasoning content; accessing it on a response without reasoning content raises anAttributeError.
Tool calling
With the server launched using --enable-auto-tool-choice --tool-call-parser solar_open,
you can pass tools and let the model decide when to call them. See the
Tool Calling guide
for a complete client example and details on tool-choice options.
Learn more
- Tool Calling — parsers, tool-choice options, and more examples
- Furiosa-LLM Server (
furiosa-llm serve) — full OpenAI-compatible API reference and serving options - upstage/Solar-Open-100B — upstream model card
- Downloads last month
- 1,427
Model tree for furiosa-ai/Solar-Open-100B-NVFP4A16
Base model
upstage/Solar-Open-100B