Export and Push
Merge LoRA
- See here.
Quantization
SWIFT supports quantization exports for AWQ, GPTQ, FP8, and BNB models. AWQ and GPTQ require a calibration dataset, which yields better quantization performance but takes longer to quantize. On the other hand, FP8 and BNB does not require a calibration dataset and is quicker to quantize.
Quantization Technique | Multimodal | Inference Acceleration | Continued Training |
---|---|---|---|
GPTQ | β | β | β |
AWQ | β | β | β |
BNB | β | β | β |
In addition to the SWIFT installation, the following additional dependencies need to be installed:
# For AWQ quantization:
# The versions of autoawq and CUDA are correlated; please choose the version according to `https://github.com/casper-hansen/AutoAWQ`.
# If there are dependency conflicts with torch, please add the `--no-deps` option.
pip install autoawq -U
# For GPTQ quantization:
# The versions of auto_gptq and CUDA are correlated; please choose the version according to `https://github.com/PanQiWei/AutoGPTQ#quick-installation`.
pip install auto_gptq optimum -U
# For BNB quantization:
pip install bitsandbytes -U
We provide a series of scripts to demonstrate SWIFT's quantization export capabilities:
- Supports AWQ/GPTQ/BNB quantization exports.
- Multimodal quantization: Supports quantizing multimodal models using GPTQ and AWQ, with limited multimodal models supported by AWQ. Refer to here.
- Support for more model series: Supports quantization exports for BERT and Reward Model.
- Models exported with SWIFT's quantization support inference acceleration using vllm/sglang/lmdeploy; they also support further SFT/RLHF using QLoRA.
Push Model
SWIFT supports re-pushing trained/quantized models to ModelScope/Hugging Face. By default, it pushes to ModelScope, but you can specify --use_hf true
to push to Hugging Face.
swift export \
--model output/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<model-id>' \
--hub_token '<sdk-token>' \
--use_hf false
Tips:
- You can use
--model <checkpoint-dir>
or--adapters <checkpoint-dir>
to specify the checkpoint directory to be pushed. There is no difference between these two methods in the model pushing scenario. - When pushing to ModelScope, you need to make sure you have registered for a ModelScope account. Your SDK token can be obtained from this page. Ensure that the account associated with the SDK token has edit permissions for the organization corresponding to the model_id. The model pushing process will automatically create a model repository corresponding to the model_id (if it does not already exist), and you can use
--hub_private_repo true
to automatically create a private model repository.