Converting to litertlm with STATIC_WI8_AI16 q recipe requires QSVs

#1
by 4ntoine - opened

I'm trying to convert it to litertlm model:

litert-torch export_hf \
    --model=$model \
    --output_dir="./static_wi8_ai16" \
    --quantization_recipe="static_wi8_ai16" \
    --bundle_litert_lm=true

and it fails:

./convert.sh
W0523 12:00:38.290000 4684 torch/distributed/elastic/multiprocessing/redirects.py:35] NOTE: Redirects are currently not supported in MacOs.
W0523 12:00:38.305000 4684 torch/utils/_pytree.py:630] <enum 'KernelPreference'> is an Enum subclass and is now natively supported by torch.compile as an opaque value type. Calling register_constant() on Enum subclasses is deprecated and will be an error in a future release.
W0523 12:00:39.225000 4684 torch/utils/_pytree.py:630] <enum 'ScaleCalculationMode'> is an Enum subclass and is now natively supported by torch.compile as an opaque value type. Calling register_constant() on Enum subclasses is deprecated and will be an error in a future release.
============== Export Configuration ==============
aot_backend            : None
aot_compilation_config_dict : None
aot_soc_model          : None
auto_model_override    : None
batch_size             : 1
bundle_litert_lm       : 'true'
cache_implementation   : 'LiteRTLMCache'
cache_length           : 4096
cache_length_dim       : None
enable_dynamic_shape   : False
experimental_lightweight_conversion : False
experimental_use_mixed_precision : False
export_vision_encoder  : False
externalize_embedder   : False
externalize_rope       : False
extra_kwargs           : {}
jinja_chat_template_override : None
k_ts_idx               : 2
keep_temporary_files   : False
litert_lm_llm_metadata_override : None
litert_lm_model_type_override : None
model                  : 'Qwen/Qwen2.5-Coder-3B-Instruct'
output_dir             : './static_wi8_ai16'
prefill_length_dim     : None
prefill_lengths        : [128]
quantization_recipe    : 'static_wi8_ai16'
single_token_embedder  : False
split_cache            : False
task                   : <ExportTask.TEXT_GENERATION: 'text_generation'>
trust_remote_code      : False
use_jinja_template     : True
v_ts_idx               : 3
vision_encoder_quantization_recipe : 'dynamic_wi8_afp32'
work_dir               : './static_wi8_ai16/tmptlcjhpwr'
==================================================
(00:00) [START] LiteRT GenAI Export
(00:00) [START] LiteRT GenAI Export > Load source model
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 434/434 [00:01<00:00, 266.46it/s]
(00:05) [ DONE] LiteRT GenAI Export > Load source model (+00:05)
(00:05) [START] LiteRT GenAI Export > Export text prefill-decode model
(00:05) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert
(00:05) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: prefill_128
(00:07) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: prefill_128 > ExportedProgram Run Decompositions
(00:10) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: prefill_128 > ExportedProgram Run Decompositions (+00:03)
(00:10) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: prefill_128 (+00:05)
(00:10) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: decode
(00:12) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: decode > ExportedProgram Run Decompositions
(00:15) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: decode > ExportedProgram Run Decompositions (+00:03)
(00:15) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Torch Export: decode (+00:05)
(00:15) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes
(00:15) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes > ExportedProgram Run Decompositions
(00:15) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes > ExportedProgram Run Decompositions (+00:00)
(00:16) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes > ExportedProgram Run Decompositions
(00:16) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes > ExportedProgram Run Decompositions (+00:00)
(00:16) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run FX Passes (+00:00)
(00:16) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128
(00:16) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > ExportedProgram Run Decompositions
(00:20) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > ExportedProgram Run Decompositions (+00:03)
(00:20) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > ExportedProgram Run Decompositions
(00:20) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > ExportedProgram Run Decompositions (+00:00)
(00:20) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > Create MLIR Module
(00:26) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 > Create MLIR Module (+00:06)
(00:26) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: prefill_128 (+00:10)
(00:26) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode
(00:26) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > ExportedProgram Run Decompositions
(00:30) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > ExportedProgram Run Decompositions (+00:03)
(00:30) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > ExportedProgram Run Decompositions
(00:30) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > ExportedProgram Run Decompositions (+00:00)
(00:30) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > Create MLIR Module
(00:33) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode > Create MLIR Module (+00:03)
(00:33) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Lower to MLIR: decode (+00:06)
(00:33) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Merge MLIR Modules
(00:33) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Merge MLIR Modules (+00:00)
(00:33) [START] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run LiteRT Converter Passes
(02:52) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert > Run LiteRT Converter Passes (+02:19)
(02:52) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > LiteRT-Torch Convert (+02:47)
(02:54) [START] LiteRT GenAI Export > Export text prefill-decode model > Write Model to ./static_wi8_ai16/tmptlcjhpwr/model.tflite
Module size is greater than 2GB
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1779519814.308279  269705 flatbuffer_export.cc:4346] Estimated count of arithmetic ops: 719.641 G  ops, equivalently 359.820 G  MACs
(03:01) [ DONE] LiteRT GenAI Export > Export text prefill-decode model > Write Model to ./static_wi8_ai16/tmptlcjhpwr/model.tflite (+00:06)
(03:02) [START] LiteRT GenAI Export > Export text prefill-decode model > Quantize model
(03:02) [ FAIL] LiteRT GenAI Export > Export text prefill-decode model > Quantize model
(03:02) [ FAIL] LiteRT GenAI Export > Export text prefill-decode model
(03:02) [ FAIL] LiteRT GenAI Export
Traceback (most recent call last):
  File "/opt/homebrew/bin/litert-torch", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/cli.py", line 30, in main
    fire.Fire(CLI())
  File "/opt/homebrew/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/generative/export_hf/export.py", line 194, in export
    exported_model_artifacts = run_export_tasks(
                               ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/generative/export_hf/export.py", line 67, in run_export_tasks
    exported_model_artifacts = export_task(
                               ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/generative/export_hf/core/export_lib.py", line 353, in export_text_prefill_decode_model
    model_path = maybe_quantize_model(model_path, recipe)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/generative/export_hf/core/export_lib.py", line 369, in maybe_quantize_model
    return quantize_model(model_path, quantization_recipe)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/litert_torch/generative/export_hf/core/export_lib.py", line 394, in quantize_model
    qt.quantize().export_model(quantized_model_path, overwrite=True)
    ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/ai_edge_quantizer/quantizer.py", line 470, in quantize
    quant_params = self._get_quantization_params(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/ai_edge_quantizer/quantizer.py", line 562, in _get_quantization_params
    return params_generator_instance.generate_quantization_parameters(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/ai_edge_quantizer/params_generator.py", line 91, in generate_quantization_parameters
    raise RuntimeError(
RuntimeError: Model quantization statistics values (QSVs) are required for the input recipe. This can be obtained by running calibration on sample dataset.

Anybody?

Sign up or log in to comment