ahassoun's picture
Upload 3018 files
ee6e328

ONNX๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ [[export-to-onnx]]

๐Ÿค— Transformers ๋ชจ๋ธ์„ ์ œํ’ˆ ํ™˜๊ฒฝ์—์„œ ๋ฐฐํฌํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ชจ๋ธ์„ ์ง๋ ฌํ™”๋œ ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ณ  ํŠน์ • ๋Ÿฐํƒ€์ž„๊ณผ ํ•˜๋“œ์›จ์–ด์—์„œ ๋กœ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๐Ÿค— Optimum์€ Transformers์˜ ํ™•์žฅ์œผ๋กœ, PyTorch ๋˜๋Š” TensorFlow์—์„œ ๋ชจ๋ธ์„ ONNX์™€ TFLite์™€ ๊ฐ™์€ ์ง๋ ฌํ™”๋œ ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ผ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” exporters ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ๐Ÿค— Optimum์€ ๋˜ํ•œ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋„๊ตฌ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ํŠน์ • ํ•˜๋“œ์›จ์–ด์—์„œ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ์‹คํ–‰ํ•  ๋•Œ ์ตœ๋Œ€ ํšจ์œจ์„ฑ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์•ˆ๋‚ด์„œ๋Š” ๐Ÿค— Optimum์„ ์‚ฌ์šฉํ•˜์—ฌ ๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. TFLite๋กœ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด๋Š” ์•ˆ๋‚ด์„œ๋Š” TFLite๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ONNX๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ [[export-to-onnx]]

ONNX (Open Neural Network eXchange)๋Š” PyTorch์™€ TensorFlow๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ์‹ฌ์ธต ํ•™์Šต ๋ชจ๋ธ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ณตํ†ต ์—ฐ์‚ฐ์ž ์„ธํŠธ์™€ ๊ณตํ†ต ํŒŒ์ผ ํ˜•์‹์„ ์ •์˜ํ•˜๋Š” ์˜คํ”ˆ ํ‘œ์ค€์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ONNX ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด์ง€๋ฉด ์ด๋Ÿฌํ•œ ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ํ๋ฅด๋Š” ํ๋ฆ„์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„(์ผ๋ฐ˜์ ์œผ๋กœ _์ค‘๊ฐ„ ํ‘œํ˜„_์ด๋ผ๊ณ  ํ•จ)๊ฐ€ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

ํ‘œ์ค€ํ™”๋œ ์—ฐ์‚ฐ์ž์™€ ๋ฐ์ดํ„ฐ ์œ ํ˜•์„ ๊ฐ€์ง„ ๊ทธ๋ž˜ํ”„๋ฅผ ๋…ธ์ถœํ•จ์œผ๋กœ์จ, ONNX๋Š” ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐ„์— ์‰ฝ๊ฒŒ ์ „ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, PyTorch์—์„œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ONNX ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ณ  TensorFlow์—์„œ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ทธ ๋ฐ˜๋Œ€๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค).

ONNX ํ˜•์‹์œผ๋กœ ๋‚ด๋ณด๋‚ธ ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

๐Ÿค— Optimum์€ ๊ตฌ์„ฑ ๊ฐ์ฒด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ONNX ๋‚ด๋ณด๋‚ด๊ธฐ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์„ฑ ๊ฐ์ฒด๋Š” ์—ฌ๋Ÿฌ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•ด ๋ฏธ๋ฆฌ ์ค€๋น„๋˜์–ด ์žˆ์œผ๋ฉฐ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜์— ์‰ฝ๊ฒŒ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ฏธ๋ฆฌ ์ค€๋น„๋œ ๊ตฌ์„ฑ ๋ชฉ๋ก์€ ๐Ÿค— Optimum ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ๋ชจ๋‘ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค:

  • ๐Ÿค— Optimum์„ ์‚ฌ์šฉํ•˜์—ฌ CLI๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ
  • optimum.onnxruntime์„ ์‚ฌ์šฉํ•˜์—ฌ ๐Ÿค— Optimum์œผ๋กœ ONNX๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ

CLI๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ [[exporting-a-transformers-model-to-onnx-with-cli]]

๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๋ ค๋ฉด ๋จผ์ € ์ถ”๊ฐ€ ์ข…์†์„ฑ์„ ์„ค์น˜ํ•˜์„ธ์š”:

pip install optimum[exporters]

์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ธ์ˆ˜๋ฅผ ํ™•์ธํ•˜๋ ค๋ฉด ๐Ÿค— Optimum ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜ ๋ช…๋ น์ค„์—์„œ ๋„์›€๋ง์„ ๋ณด์„ธ์š”.

optimum-cli export onnx --help

์˜ˆ๋ฅผ ๋“ค์–ด, ๐Ÿค— Hub์—์„œ distilbert-base-uncased-distilled-squad์™€ ๊ฐ™์€ ๋ชจ๋ธ์˜ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‚ด๋ณด๋‚ด๋ ค๋ฉด ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”:

optimum-cli export onnx --model distilbert-base-uncased-distilled-squad distilbert_base_uncased_squad_onnx/

์œ„์™€ ๊ฐ™์ด ์ง„ํ–‰ ์ƒํ™ฉ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋กœ๊ทธ๊ฐ€ ํ‘œ์‹œ๋˜๊ณ  ๊ฒฐ๊ณผ์ธ model.onnx๊ฐ€ ์ €์žฅ๋œ ์œ„์น˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

Validating ONNX model distilbert_base_uncased_squad_onnx/model.onnx...
    -[โœ“] ONNX model output names match reference model (start_logits, end_logits)
    - Validating ONNX Model output "start_logits":
        -[โœ“] (2, 16) matches (2, 16)
        -[โœ“] all values close (atol: 0.0001)
    - Validating ONNX Model output "end_logits":
        -[โœ“] (2, 16) matches (2, 16)
        -[โœ“] all values close (atol: 0.0001)
The ONNX export succeeded and the exported model was saved at: distilbert_base_uncased_squad_onnx

์œ„์˜ ์˜ˆ์ œ๋Š” ๐Ÿค— Hub์—์„œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ๊ฒƒ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๋กœ์ปฌ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ผ ๋•Œ์—๋Š” ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์™€ ํ† ํฌ๋‚˜์ด์ € ํŒŒ์ผ์„ ๋™์ผํ•œ ๋””๋ ‰ํ† ๋ฆฌ(local_path)์— ์ €์žฅํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. CLI๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์—๋Š” ๐Ÿค— Hub์˜ ์ฒดํฌํฌ์ธํŠธ ์ด๋ฆ„ ๋Œ€์‹  model ์ธ์ˆ˜์— local_path๋ฅผ ์ „๋‹ฌํ•˜๊ณ  --task ์ธ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜์„ธ์š”. ์ง€์›๋˜๋Š” ์ž‘์—…์˜ ๋ชฉ๋ก์€ ๐Ÿค— Optimum ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. task ์ธ์ˆ˜๊ฐ€ ์ œ๊ณต๋˜์ง€ ์•Š์œผ๋ฉด ์ž‘์—…์— ํŠนํ™”๋œ ํ—ค๋“œ ์—†์ด ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋กœ ๊ธฐ๋ณธ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

optimum-cli export onnx --model local_path --task question-answering distilbert_base_uncased_squad_onnx/

๊ทธ ๊ฒฐ๊ณผ๋กœ ์ƒ์„ฑ๋œ model.onnx ํŒŒ์ผ์€ ONNX ํ‘œ์ค€์„ ์ง€์›ํ•˜๋Š” ๋งŽ์€ ๊ฐ€์†๊ธฐ ์ค‘ ํ•˜๋‚˜์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ONNX Runtime์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> model = ORTModelForQuestionAnswering.from_pretrained("distilbert_base_uncased_squad_onnx")
>>> inputs = tokenizer("What am I using?", "Using DistilBERT with ONNX Runtime!", return_tensors="pt")
>>> outputs = model(**inputs)

Hub์˜ TensorFlow ์ฒดํฌํฌ์ธํŠธ์— ๋Œ€ํ•ด์„œ๋„ ๋™์ผํ•œ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Keras organization์—์„œ ์ˆœ์ˆ˜ํ•œ TensorFlow ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_squad_onnx/

optimum.onnxruntime์„ ์‚ฌ์šฉํ•˜์—ฌ ๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ [[exporting-a-transformers-model-to-onnx-with-optimumonnxruntime]]

CLI ๋Œ€์‹ ์— optimum.onnxruntime์„ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฐฉ์‹์œผ๋กœ ๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ง„ํ–‰ํ•˜์„ธ์š”:

>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> from transformers import AutoTokenizer

>>> model_checkpoint = "distilbert_base_uncased_squad"
>>> save_directory = "onnx/"

>>> # Load a model from transformers and export it to ONNX
>>> ort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

>>> # Save the onnx model and tokenizer
>>> ort_model.save_pretrained(save_directory)
>>> tokenizer.save_pretrained(save_directory)

์ง€์›๋˜์ง€ ์•Š๋Š” ์•„ํ‚คํ…์ฒ˜์˜ ๋ชจ๋ธ ๋‚ด๋ณด๋‚ด๊ธฐ [[exporting-a-model-for-an-unsupported-architecture]]

ํ˜„์žฌ ๋‚ด๋ณด๋‚ผ ์ˆ˜ ์—†๋Š” ๋ชจ๋ธ์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์—ฌํ•˜๋ ค๋ฉด, ๋จผ์ € optimum.exporters.onnx์—์„œ ์ง€์›๋˜๋Š”์ง€ ํ™•์ธํ•œ ํ›„ ์ง€์›๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์—๋Š” ๐Ÿค— Optimum์— ๊ธฐ์—ฌํ•˜์„ธ์š”.

transformers.onnx๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๋‚ด๋ณด๋‚ด๊ธฐ [[exporting-a-model-with-transformersonnx]]

tranformers.onnx๋Š” ๋” ์ด์ƒ ์œ ์ง€๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์„ค๋ช…ํ•œ ๋Œ€๋กœ ๐Ÿค— Optimum์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด์„ธ์š”. ์ด ์„น์…˜์€ ํ–ฅํ›„ ๋ฒ„์ „์—์„œ ์ œ๊ฑฐ๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

๐Ÿค— Transformers ๋ชจ๋ธ์„ ONNX๋กœ ๋‚ด๋ณด๋‚ด๋ ค๋ฉด ์ถ”๊ฐ€ ์ข…์†์„ฑ์„ ์„ค์น˜ํ•˜์„ธ์š”:

pip install transformers[onnx]

transformers.onnx ํŒจํ‚ค์ง€๋ฅผ Python ๋ชจ๋“ˆ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ค€๋น„๋œ ๊ตฌ์„ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค:

python -m transformers.onnx --model=distilbert-base-uncased onnx/

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด --model ์ธ์ˆ˜์— ์ •์˜๋œ ์ฒดํฌํฌ์ธํŠธ์˜ ONNX ๊ทธ๋ž˜ํ”„๊ฐ€ ๋‚ด๋ณด๋‚ด์ง‘๋‹ˆ๋‹ค. ๐Ÿค— Hub์—์„œ ์ œ๊ณตํ•˜๋Š” ์ฒดํฌํฌ์ธํŠธ๋‚˜ ๋กœ์ปฌ์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋กœ ์ƒ์„ฑ๋œ model.onnx ํŒŒ์ผ์€ ONNX ํ‘œ์ค€์„ ์ง€์›ํ•˜๋Š” ๋งŽ์€ ๊ฐ€์†๊ธฐ ์ค‘ ํ•˜๋‚˜์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ONNX Runtime์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))

ํ•„์š”ํ•œ ์ถœ๋ ฅ ์ด๋ฆ„(์˜ˆ: ["last_hidden_state"])์€ ๊ฐ ๋ชจ๋ธ์˜ ONNX ๊ตฌ์„ฑ์„ ํ™•์ธํ•˜์—ฌ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, DistilBERT์˜ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig

>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]

Hub์˜ TensorFlow ์ฒดํฌํฌ์ธํŠธ์— ๋Œ€ํ•ด์„œ๋„ ๋™์ผํ•œ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆœ์ˆ˜ํ•œ TensorFlow ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค:

python -m transformers.onnx --model=keras-io/transformers-qa onnx/

๋กœ์ปฌ์— ์ €์žฅ๋œ ๋ชจ๋ธ์„ ๋‚ด๋ณด๋‚ด๋ ค๋ฉด ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜ ํŒŒ์ผ๊ณผ ํ† ํฌ๋‚˜์ด์ € ํŒŒ์ผ์„ ๋™์ผํ•œ ๋””๋ ‰ํ† ๋ฆฌ์— ์ €์žฅํ•œ ๋‹ค์Œ, transformers.onnx ํŒจํ‚ค์ง€์˜ --model ์ธ์ˆ˜๋ฅผ ์›ํ•˜๋Š” ๋””๋ ‰ํ† ๋ฆฌ๋กœ ์ง€์ •ํ•˜์—ฌ ONNX๋กœ ๋‚ด๋ณด๋ƒ…๋‹ˆ๋‹ค:

python -m transformers.onnx --model=local-pt-checkpoint onnx/