Export to ONNX
ð€ Transformersã¢ãã«ãæ¬çªç°å¢ã«å±éããéã«ã¯ãã¢ãã«ãç¹æ®ãªã©ã³ã¿ã€ã ããã³ããŒããŠã§ã¢ã§èªã¿èŸŒã¿ãå®è¡ã§ããããã«ãã¢ãã«ãã·ãªã¢ã©ã€ãºããã圢åŒã«ãšã¯ã¹ããŒãããããšãå¿ èŠã§ãããããã®æ©æµãåããããšãã§ããããšããããŸãã
ð€ Optimumã¯ãTransformersã®æ¡åŒµæ©èœã§ãããPyTorchãŸãã¯TensorFlowããã¢ãã«ãONNXãTFLiteãªã©ã®ã·ãªã¢ã©ã€ãºããã圢åŒã«ãšã¯ã¹ããŒãããããšãå¯èœã«ãããexportersãã¢ãžã¥ãŒã«ãæäŸããŠããŸãããŸããð€ Optimumã¯ãæ倧ã®å¹çã§ã¿ãŒã²ããããŒããŠã§ã¢ã§ã¢ãã«ããã¬ãŒãã³ã°ããã³å®è¡ããããã®ããã©ãŒãã³ã¹æé©åããŒã«ãæäŸããŠããŸãã
ãã®ã¬ã€ãã§ã¯ãð€ Transformersã¢ãã«ãð€ Optimumã䜿çšããŠONNXã«ãšã¯ã¹ããŒãããæ¹æ³ã瀺ããŠãããã¢ãã«ãTFLiteã«ãšã¯ã¹ããŒãããæ¹æ³ã«ã€ããŠã¯Export to TFLiteããŒãžãåç §ããŠãã ããã
Export to ONNX
ONNXïŒOpen Neural Network eXchangeïŒã¯ãPyTorchããã³TensorFlowãå«ãããŸããŸãªãã¬ãŒã ã¯ãŒã¯ã§æ·±å±€åŠç¿ã¢ãã«ãè¡šçŸããããã®å ±éã®äžé£ã®æŒç®åãšãã¡ã€ã«åœ¢åŒãå®çŸ©ãããªãŒãã³ã¹ã¿ã³ããŒãã§ããã¢ãã«ãONNX圢åŒã«ãšã¯ã¹ããŒãããããšããããã®æŒç®åã¯ãã¥ãŒã©ã«ãããã¯ãŒã¯ãä»ããããŒã¿ã®æµããè¡šãèšç®ã°ã©ãïŒäžè¬çã«ã¯ãäžéè¡šçŸããšåŒã°ããïŒãæ§ç¯ããããã«äœ¿çšãããŸãã
æšæºåãããæŒç®åãšããŒã¿åãåããã°ã©ããå ¬éããããšã§ãONNXã¯ãã¬ãŒã ã¯ãŒã¯éã®åãæ¿ãã容æã«ããŸããããšãã°ãPyTorchã§ãã¬ãŒãã³ã°ãããã¢ãã«ã¯ONNX圢åŒã«ãšã¯ã¹ããŒããããããTensorFlowã§ã€ã³ããŒãããããšãã§ããŸãïŒéãåæ§ã§ãïŒã
ONNX圢åŒã«ãšã¯ã¹ããŒããããã¢ãã«ã¯ã以äžã®ããã«äœ¿çšã§ããŸãïŒ
- ã°ã©ãæé©åãéååãªã©ã®ãã¯ããã¯ã䜿çšããŠæšè«ã®ããã«æé©åã
ORTModelForXXX
ã¯ã©ã¹ãä»ããŠONNX Runtimeã§å®è¡ããð€ Transformersã§ããªãã¿ã®AutoModel
APIã«åŸããŸãã- æé©åãããæšè«ãã€ãã©ã€ã³ãä»ããŠå®è¡ããð€ Transformersã®pipeline()é¢æ°ãšåãAPIãæã£ãŠããŸãã
ð€ Optimumã¯ãèšå®ãªããžã§ã¯ãã掻çšããŠONNXãšã¯ã¹ããŒãããµããŒãããŠããããããã®èšå®ãªããžã§ã¯ãã¯å€ãã®ã¢ãã«ã¢ãŒããã¯ãã£çšã«äºåã«äœæãããŠãããä»ã®ã¢ãŒããã¯ãã£ã«ãç°¡åã«æ¡åŒµã§ããããã«èšèšãããŠããŸãã
äºåã«äœæãããèšå®ã®ãªã¹ãã«ã€ããŠã¯ãð€ Optimumããã¥ã¡ã³ããåç §ããŠãã ããã
ð€ Transformersã¢ãã«ãONNXã«ãšã¯ã¹ããŒãããæ¹æ³ã¯2ã€ãããŸãã以äžã§ã¯äž¡æ¹ã®æ¹æ³ã瀺ããŸãïŒ
- export with ð€ Optimum via CLI.
- export with ð€ Optimum with
optimum.onnxruntime
.
Exporting a ð€ Transformers model to ONNX with CLI
ð€ Transformersã¢ãã«ãONNXã«ãšã¯ã¹ããŒãããã«ã¯ããŸãè¿œå ã®äŸåé¢ä¿ãã€ã³ã¹ããŒã«ããŠãã ããïŒ
pip install optimum[exporters]
ãã¹ãŠã®å©çšå¯èœãªåŒæ°ã確èªããã«ã¯ãð€ Optimumããã¥ã¡ã³ããåç §ããŠãã ããããŸãã¯ãã³ãã³ãã©ã€ã³ã§ãã«ãã衚瀺ããããšãã§ããŸãïŒ
optimum-cli export onnx --help
ð€ Hubããã¢ãã«ã®ãã§ãã¯ãã€ã³ãããšã¯ã¹ããŒãããã«ã¯ãäŸãã° distilbert-base-uncased-distilled-squad
ã䜿ãããå Žåã以äžã®ã³ãã³ããå®è¡ããŠãã ããïŒ
optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/
é²è¡ç¶æ³ã瀺ããçµæã® model.onnx
ãä¿åãããå Žæã衚瀺ãããã°ã¯ã以äžã®ããã«è¡šç€ºãããã¯ãã§ãïŒ
Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
-[â] ONNX model output names match reference model (start_logits, end_logits)
- Validating ONNX Model output "start_logits":
-[â] (2, 16) matches (2, 16)
-[â] all values close (atol: 0.0001)
- Validating ONNX Model output "end_logits":
-[â] (2, 16) matches (2, 16)
-[â] all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx
äžèšã®äŸã¯ð€ Hubããã®ãã§ãã¯ãã€ã³ãã®ãšã¯ã¹ããŒãã瀺ããŠããŸããããŒã«ã«ã¢ãã«ããšã¯ã¹ããŒãããå ŽåããŸãã¢ãã«ã®éã¿ãšããŒã¯ãã€ã¶ã®ãã¡ã€ã«ãåããã£ã¬ã¯ããªïŒlocal_path
ïŒã«ä¿åããŠãã ãããCLIã䜿çšããå Žåãð€ Hubã®ãã§ãã¯ãã€ã³ãåã®ä»£ããã«model
åŒæ°ã«local_path
ãæž¡ãã--task
åŒæ°ãæå®ããŠãã ãããð€ Optimumããã¥ã¡ã³ãã§ãµããŒããããŠããã¿ã¹ã¯ã®ãªã¹ãã確èªã§ããŸããtask
åŒæ°ãæå®ãããŠããªãå Žåãã¿ã¹ã¯åºæã®ããããæããªãã¢ãã«ã¢ãŒããã¯ãã£ãããã©ã«ãã§éžæãããŸãã
optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/
ãšã¯ã¹ããŒãããã model.onnx
ãã¡ã€ã«ã¯ãONNXæšæºããµããŒãããå€ãã®ã¢ã¯ã»ã©ã¬ãŒã¿ã®1ã€ã§å®è¡ã§ããŸããããšãã°ãONNX Runtimeã䜿çšããŠã¢ãã«ãèªã¿èŸŒã¿ãå®è¡ããæ¹æ³ã¯ä»¥äžã®éãã§ãïŒ
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
>>> outputs = model(**inputs)
ð€ HubããTensorFlowã®ãã§ãã¯ãã€ã³ãããšã¯ã¹ããŒãããããã»ã¹ã¯ãåæ§ã§ããäŸãã°ãKeras organizationããçŽç²ãªTensorFlowã®ãã§ãã¯ãã€ã³ãããšã¯ã¹ããŒãããæ¹æ³ã¯ä»¥äžã®éãã§ãïŒ
optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/
Exporting a ð€ Transformers model to ONNX with optimum.onnxruntime
CLIã®ä»£ããã«ãð€ Transformersã¢ãã«ãONNXã«ããã°ã©ã çã«ãšã¯ã¹ããŒãããããšãã§ããŸãã以äžã®ããã«è¡ããŸãïŒ
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer
>>> model_checkpoint = "distilbert_base_uncased_squad"
>>> save_directory = "onnx/"
>>> # Load a model from transformers and export it to ONNX
>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
>>> # Save the onnx model and tokenizer
>>> ort_model.save_pretrained(save_directory)
>>> tokenizer.save_pretrained(save_directory)
Exporting a model for an unsupported architecture
çŸåšãšã¯ã¹ããŒãã§ããªãã¢ãã«ããµããŒãããããã«è²¢ç®ãããå ŽåããŸãoptimum.exporters.onnx
ã§ãµããŒããããŠãããã©ããã確èªãããµããŒããããŠããªãå Žåã¯ð€ Optimumã«è²¢ç®ããŠãã ããã
Exporting a model with transformers.onnx
transformers.onnx
ã¯ãã¯ãã¡ã³ããã³ã¹ãããŠããªããããã¢ãã«ãäžèšã§èª¬æããããã«ð€ Optimumã§ãšã¯ã¹ããŒãããŠãã ããããã®ã»ã¯ã·ã§ã³ã¯å°æ¥ã®ããŒãžã§ã³ã§åé€ãããŸãã
ð€ Transformersã¢ãã«ãONNXã«ãšã¯ã¹ããŒãããã«ã¯ãè¿œå ã®äŸåé¢ä¿ãã€ã³ã¹ããŒã«ããŠãã ããïŒ
pip install transformers[onnx]
transformers.onnx
ããã±ãŒãžãPythonã¢ãžã¥ãŒã«ãšããŠäœ¿çšããŠãäºåã«çšæãããèšå®ã䜿çšããŠãã§ãã¯ãã€ã³ãããšã¯ã¹ããŒãããæ¹æ³ã¯ä»¥äžã®éãã§ãïŒ
python -m transformers.onnx --model=distilbert-base-uncased onnx/
ãã®æ¹æ³ã¯ã--model
åŒæ°ã§å®çŸ©ããããã§ãã¯ãã€ã³ãã®ONNXã°ã©ãããšã¯ã¹ããŒãããŸããð€ Hubã®ããããã®ãã§ãã¯ãã€ã³ããŸãã¯ããŒã«ã«ã«ä¿åããããã§ãã¯ãã€ã³ããæž¡ãããšãã§ããŸãããšã¯ã¹ããŒããããmodel.onnx
ãã¡ã€ã«ã¯ãONNXæšæºããµããŒãããå€ãã®ã¢ã¯ã»ã©ã¬ãŒã¿ã§å®è¡ã§ããŸããäŸãã°ãONNX Runtimeã䜿çšããŠã¢ãã«ãèªã¿èŸŒãã§å®è¡ããæ¹æ³ã¯ä»¥äžã®éãã§ãïŒ
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
å¿
èŠãªåºååïŒäŸ: ["last_hidden_state"]
ïŒã¯ãåã¢ãã«ã®ONNXæ§æã確èªããããšã§ååŸã§ããŸããäŸãã°ãDistilBERTã®å Žåã次ã®ããã«ãªããŸãïŒ
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
ããããçŽç²ãªTensorFlowã®ãã§ãã¯ãã€ã³ããããã°ã©ã çã«ãšã¯ã¹ããŒãããããã»ã¹ã¯ã以äžã®ããã«åæ§ã§ãïŒ
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
ããŒã«ã«ã«ä¿åãããã¢ãã«ããšã¯ã¹ããŒãããå Žåãã¢ãã«ã®éã¿ãšããŒã¯ãã€ã¶ã®ãã¡ã€ã«ãåããã£ã¬ã¯ããªã«ä¿åããŠãã ããïŒäŸïŒ local-pt-checkpoint
ïŒããã®åŸãtransformers.onnx
ããã±ãŒãžã® --model
åŒæ°ãåžæãããã£ã¬ã¯ããªã«åããŠèšå®ããŠãONNXã«ãšã¯ã¹ããŒãããŸãïŒ
python -m transformers.onnx --model=local-pt-checkpoint onnx/