File size: 10,314 Bytes
ee6e328
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# TensorFlow λͺ¨λΈμ„ μœ„ν•œ XLA 톡합 [[xla-integration-for-tensorflow-models]]

[[open-in-colab]]

XLA(Accelerated Linear Algebra)λŠ” TensorFlow λͺ¨λΈμ˜ μ‹€ν–‰ μ‹œκ°„μ„ κ°€μ†ν™”ν•˜κΈ° μœ„ν•œ μ»΄νŒŒμΌλŸ¬μž…λ‹ˆλ‹€. [곡식 λ¬Έμ„œ](https://www.tensorflow.org/xla)에 λ”°λ₯΄λ©΄ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:

XLA(Accelerated Linear Algebra)λŠ” μ„ ν˜• λŒ€μˆ˜λ₯Ό μœ„ν•œ 도메인 νŠΉν™” 컴파일러둜, TensorFlow λͺ¨λΈμ„ μ†ŒμŠ€ μ½”λ“œ λ³€κ²½ 없이 가속화할 수 μžˆμŠ΅λ‹ˆλ‹€.

TensorFlowμ—μ„œ XLAλ₯Ό μ‚¬μš©ν•˜λŠ” 것은 κ°„λ‹¨ν•©λ‹ˆλ‹€. XLAλŠ” `tensorflow` 라이브러리 내에 νŒ¨ν‚€μ§€λ‘œ 제곡되며, [`tf.function`](https://www.tensorflow.org/guide/intro_to_graphs)κ³Ό 같은 κ·Έλž˜ν”„ 생성 ν•¨μˆ˜μ—μ„œ `jit_compile` 인수λ₯Ό μ‚¬μš©ν•˜μ—¬ ν™œμ„±ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€. `fit()` 및 `predict()`와 같은 Keras λ©”μ†Œλ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 경우, `jit_compile` 인수λ₯Ό `model.compile()`에 μ „λ‹¬ν•˜μ—¬ XLAλ₯Ό κ°„λ‹¨ν•˜κ²Œ ν™œμ„±ν™”ν•  수 μžˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ XLAλŠ” μ΄λŸ¬ν•œ λ©”μ†Œλ“œμ— κ΅­ν•œλ˜μ§€ μ•Šκ³  μž„μ˜μ˜ `tf.function`을 κ°€μ†ν™”ν•˜λŠ” 데에도 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

πŸ€— Transformersμ—μ„œλŠ” [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2), [T5](https://huggingface.co/docs/transformers/model_doc/t5), [OPT](https://huggingface.co/docs/transformers/model_doc/opt)와 같은 λͺ¨λΈμ˜ ν…μŠ€νŠΈ 생성, 그리고 [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)와 같은 λͺ¨λΈμ˜ μŒμ„± 처리λ₯Ό ν¬ν•¨ν•˜μ—¬ μ—¬λŸ¬ TensorFlow λ©”μ†Œλ“œκ°€ XLA와 ν˜Έν™˜λ˜λ„λ‘ λ‹€μ‹œ μž‘μ„±λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

μ •ν™•ν•œ 속도 ν–₯상은 λͺ¨λΈμ— 따라 λ‹€λ₯΄μ§€λ§Œ, πŸ€— Transformers λ‚΄μ˜ TensorFlow ν…μŠ€νŠΈ 생성 λͺ¨λΈμ˜ 경우 μ΅œλŒ€ 100배의 속도 ν–₯상을 ν™•μΈν–ˆμŠ΅λ‹ˆλ‹€. 이 λ¬Έμ„œμ—μ„œλŠ” μ΄λŸ¬ν•œ λͺ¨λΈμ— λŒ€ν•΄ XLAλ₯Ό μ‚¬μš©ν•˜μ—¬ μ΅œλŒ€ μ„±λŠ₯을 μ–»λŠ” 방법을 μ„€λͺ…ν•©λ‹ˆλ‹€. λ˜ν•œ XLA ν†΅ν•©μ˜ 벀치마크 및 λ””μžμΈ 철학에 λŒ€ν•œ μΆ”κ°€ 자료 링크도 μ œκ³΅ν•  κ²ƒμž…λ‹ˆλ‹€.

## XLAλ₯Ό μ‚¬μš©ν•˜μ—¬ TF ν•¨μˆ˜ μ‹€ν–‰ν•˜κΈ° [[running-tf-functions-with-xla]]

TensorFlowμ—μ„œ λ‹€μŒκ³Ό 같은 λͺ¨λΈμ„ κ³ λ €ν•΄ λ΄…μ‹œλ‹€:

```py
import tensorflow as tf

model = tf.keras.Sequential(
    [tf.keras.layers.Dense(10, input_shape=(10,), activation="relu"), tf.keras.layers.Dense(5, activation="softmax")]
)
```

μœ„ λͺ¨λΈμ€ 차원이 `(10, )`인 μž…λ ₯을 λ°›μŠ΅λ‹ˆλ‹€. λ‹€μŒκ³Ό 같이 λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ μˆœμ „νŒŒλ₯Ό μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```py
# λͺ¨λΈμ— λŒ€ν•œ μž„μ˜μ˜ μž…λ ₯을 μƒμ„±ν•©λ‹ˆλ‹€.
batch_size = 16
input_vector_dim = 10
random_inputs = tf.random.normal((batch_size, input_vector_dim))

# μˆœμ „νŒŒλ₯Ό μ‹€ν–‰ν•©λ‹ˆλ‹€.
_ = model(random_inputs)
```

XLA둜 컴파일된 ν•¨μˆ˜λ‘œ μˆœμ „νŒŒλ₯Ό μ‹€ν–‰ν•˜λ €λ©΄ λ‹€μŒκ³Ό 같이 ν•΄μ•Ό ν•©λ‹ˆλ‹€:

```py
xla_fn = tf.function(model, jit_compile=True)
_ = xla_fn(random_inputs)
```

`model`의 κΈ°λ³Έ `call()` ν•¨μˆ˜λŠ” XLA κ·Έλž˜ν”„λ₯Ό μ»΄νŒŒμΌν•˜λŠ” 데 μ‚¬μš©λ©λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ λ‹€λ₯Έ λͺ¨λΈ ν•¨μˆ˜λ₯Ό XLA둜 μ»΄νŒŒμΌν•˜λ €λ©΄ λ‹€μŒκ³Ό 같이 ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€:

```py
my_xla_fn = tf.function(model.my_xla_fn, jit_compile=True)
```

## πŸ€— Transformersμ—μ„œ XLAλ₯Ό μ‚¬μš©ν•˜μ—¬ TF ν…μŠ€νŠΈ 생성 λͺ¨λΈ μ‹€ν–‰ν•˜κΈ° [[running-a-tf-text-generation-model-with-xla-from-transformers]]

πŸ€— Transformersμ—μ„œ XLA둜 κ°€μ†ν™”λœ 생성을 ν™œμ„±ν™”ν•˜λ €λ©΄ μ΅œμ‹  λ²„μ „μ˜ `transformers`κ°€ μ„€μΉ˜λ˜μ–΄ μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€. λ‹€μŒκ³Ό 같이 μ„€μΉ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```bash
pip install transformers --upgrade
```

그리고 λ‹€μŒ μ½”λ“œλ₯Ό μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```py
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM

# μ΅œμ†Œ λ²„μ „μ˜ Transformersκ°€ μ„€μΉ˜λ˜μ–΄ μžˆμ§€ μ•Šλ‹€λ©΄ 였λ₯˜κ°€ λ°œμƒν•©λ‹ˆλ‹€.
from transformers.utils import check_min_version

check_min_version("4.21.0")


tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
input_string = ["TensorFlow is"]

# XLA 생성 ν•¨μˆ˜λ₯Ό λ§Œλ“€κΈ° μœ„ν•œ ν•œ 쀄
xla_generate = tf.function(model.generate, jit_compile=True)

tokenized_input = tokenizer(input_string, return_tensors="tf")
generated_tokens = xla_generate(**tokenized_input, num_beams=2)

decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print(f"Generated -- {decoded_text}")
# Generated -- TensorFlow is an open-source, open-source, distributed-source application # framework for the
```

μ•Œ 수 μžˆλ“―μ΄, `generate()`μ—μ„œ XLAλ₯Ό ν™œμ„±ν™”ν•˜λŠ” 것은 단 ν•œ μ€„μ˜ μ½”λ“œμž…λ‹ˆλ‹€. μ½”λ“œμ˜ λ‚˜λ¨Έμ§€ 뢀뢄은 λ³€κ²½λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ μœ„ μ½”λ“œ μŠ€λ‹ˆνŽ«μ—μ„œλŠ” XLA에 νŠΉμ •ν•œ λͺ‡ 가지 μ£Όμ˜ν•  점이 μžˆμŠ΅λ‹ˆλ‹€. XLAκ°€ 가져닀쀄 속도 ν–₯상을 μ‹€ν˜„ν•˜κΈ° μœ„ν•΄μ„œλŠ” 이λ₯Ό μ•Œκ³  μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€. λ‹€μŒ μ„Ήμ…˜μ—μ„œ 이에 λŒ€ν•΄ λ…Όμ˜ν•©λ‹ˆλ‹€.

## μ£Όμ˜ν•  점 [[gotchas-to-be-aware-of]]

XLA ν™œμ„±ν™” ν•¨μˆ˜(`xla_generate()`와 같은)λ₯Ό 처음 μ‹€ν–‰ν•  λ•Œ λ‚΄λΆ€μ μœΌλ‘œ 계산 κ·Έλž˜ν”„λ₯Ό μΆ”λ‘ ν•˜λ €κ³  ν•˜λ©°, μ΄λŠ” μ‹œκ°„μ΄ μ†Œμš”λ©λ‹ˆλ‹€. 이 과정은 [β€œμΆ”μ (tracing)”](https://www.tensorflow.org/guide/intro_to_graphs#when_is_a_function_tracing)이라고 μ•Œλ €μ Έ μžˆμŠ΅λ‹ˆλ‹€.

생성 μ‹œκ°„μ΄ λΉ λ₯΄μ§€ μ•Šλ‹€λŠ” 것을 μ•Œ 수 μžˆμ„ κ²ƒμž…λ‹ˆλ‹€. `xla_generate()`(λ˜λŠ” λ‹€λ₯Έ XLA ν™œμ„±ν™” ν•¨μˆ˜)의 연속 ν˜ΈμΆœμ€ ν•¨μˆ˜μ— μ „λ‹¬λœ μž…λ ₯이 μ΄ˆκΈ°μ— κ΅¬μΆ•λœ 계산 κ·Έλž˜ν”„μ™€ λ™μΌν•œ ν˜•νƒœλ₯Ό λ”°λ₯Έλ‹€λ©΄, 계산 κ·Έλž˜ν”„λ₯Ό μΆ”λ‘ ν•  ν•„μš”κ°€ μ—†μŠ΅λ‹ˆλ‹€. μ΄λŠ” μž…λ ₯ ν˜•νƒœκ°€ κ³ μ •λœ λͺ¨λ‹¬λ¦¬ν‹°(예: 이미지)μ—λŠ” λ¬Έμ œκ°€ λ˜μ§€ μ•Šμ§€λ§Œ, κ°€λ³€ μž…λ ₯ ν˜•νƒœ λͺ¨λ‹¬λ¦¬ν‹°(예: ν…μŠ€νŠΈ)λ₯Ό μ‚¬μš©ν•  λ•Œ μ£Όμ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.

`xla_generate()`κ°€ 항상 λ™μΌν•œ μž…λ ₯ ν˜•νƒœλ‘œ λ™μž‘ν•˜λ„λ‘ ν•˜λ €λ©΄, ν† ν¬λ‚˜μ΄μ €λ₯Ό ν˜ΈμΆœν•  λ•Œ `padding` 인수λ₯Ό 지정할 수 μžˆμŠ΅λ‹ˆλ‹€.

```py
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
input_string = ["TensorFlow is"]

xla_generate = tf.function(model.generate, jit_compile=True)

# μ—¬κΈ°μ„œ, padding μ˜΅μ…˜μ΄ μžˆλŠ” ν† ν¬λ‚˜μ΄μ €λ₯Ό ν˜ΈμΆœν•©λ‹ˆλ‹€.
tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf")

generated_tokens = xla_generate(**tokenized_input, num_beams=2)
decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print(f"Generated -- {decoded_text}")
```

μ΄λ ‡κ²Œ ν•˜λ©΄ `xla_generate()`에 λŒ€ν•œ μž…λ ₯이 항상 μΆ”μ λœ ν˜•νƒœλ‘œ μ „λ‹¬λ˜μ–΄ 생성 μ‹œκ°„μ΄ κ°€μ†ν™”λ©λ‹ˆλ‹€. λ‹€μŒ μ½”λ“œλ‘œ 이λ₯Ό 확인할 수 μžˆμŠ΅λ‹ˆλ‹€:

```py
import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")

xla_generate = tf.function(model.generate, jit_compile=True)

for input_string in ["TensorFlow is", "TensorFlow is a", "TFLite is a"]:
    tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf")
    start = time.time_ns()
    generated_tokens = xla_generate(**tokenized_input, num_beams=2)
    end = time.time_ns()
    print(f"Execution time -- {(end - start) / 1e6:.1f} ms\n")
```

Tesla T4 GPUμ—μ„œλŠ” λ‹€μŒκ³Ό 같은 좜λ ₯을 μ˜ˆμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```bash
Execution time -- 30819.6 ms

Execution time -- 79.0 ms

Execution time -- 78.9 ms
```
`xla_generate()`의 첫 번째 ν˜ΈμΆœμ€ 좔적 λ•Œλ¬Έμ— μ‹œκ°„μ΄ 였래 κ±Έλ¦¬μ§€λ§Œ, 연속 ν˜ΈμΆœμ€ λͺ‡ λ°°λ‚˜ λΉ λ¦…λ‹ˆλ‹€. 생성 μ˜΅μ…˜μ— λŒ€ν•œ μ–΄λ–€ 변경이든 λ‹€μ‹œ 좔적을 μœ λ°œν•˜λ―€λ‘œ 생성 μ‹œκ°„μ΄ 느렀질 수 μžˆμŒμ„ λͺ…μ‹¬ν•˜μ„Έμš”.

이 λ¬Έμ„œμ—μ„œλŠ” πŸ€— Transformersμ—μ„œ μ œκ³΅ν•˜λŠ” λͺ¨λ“  ν…μŠ€νŠΈ 생성 μ˜΅μ…˜μ„ 닀루지 μ•Šμ•˜μŠ΅λ‹ˆλ‹€. κ³ κΈ‰ μ‚¬μš© 사둀에 λŒ€ν•΄ λ¬Έμ„œλ₯Ό μ°Έμ‘°ν•˜μ‹œκΈ° λ°”λžλ‹ˆλ‹€.

## μΆ”κ°€ 자료 [[additional-resources]]

여기에 πŸ€— Transformers와 XLA에 λŒ€ν•΄ 더 μžμ„Ένžˆ μ•Œκ³  싢은 경우 도움이 될 수 μžˆλŠ” λͺ‡ 가지 μΆ”κ°€ 자료λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 
 
* [이 Colab λ…ΈνŠΈλΆ](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/91_tf_xla_generate.ipynb)은 XLA와 ν˜Έν™˜λ˜λŠ” 인코더-디코더([T5](https://huggingface.co/docs/transformers/model_doc/t5)와 같은) 및 디코더 μ „μš©([GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)와 같은) ν…μŠ€νŠΈ 생성 λͺ¨λΈμ„ μ‹€ν—˜ν•΄ λ³Ό 수 μžˆλŠ” λŒ€ν™”ν˜• 데λͺ¨λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.
* [이 λΈ”λ‘œκ·Έ κΈ€](https://huggingface.co/blog/tf-xla-generate)은 TensorFlowμ—μ„œ XLA에 λŒ€ν•œ μΉœμ ˆν•œ μ†Œκ°œμ™€ ν•¨κ»˜ XLA와 ν˜Έν™˜λ˜λŠ” λͺ¨λΈμ˜ 비ꡐ λ²€μΉ˜λ§ˆν¬μ— λŒ€ν•œ κ°œμš”λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.
* [이 λΈ”λ‘œκ·Έ κΈ€](https://blog.tensorflow.org/2022/11/how-hugging-face-improved-text-generation-performance-with-xla.html)은 πŸ€— Transformers의 TensorFlow λͺ¨λΈμ— XLA 지원을 μΆ”κ°€ν•˜λŠ” 것에 λŒ€ν•œ λ””μžμΈ 철학을 λ…Όμ˜ν•©λ‹ˆλ‹€.
* XLA와 TensorFlow κ·Έλž˜ν”„μ— λŒ€ν•΄ 더 μžμ„Ένžˆ μ•Œκ³  싢은 경우 μΆ”μ²œν•˜λŠ” κΈ€:
    * [XLA: 기계 ν•™μŠ΅μ„ μœ„ν•œ μ΅œμ ν™” 컴파일러](https://www.tensorflow.org/xla)
    * [κ·Έλž˜ν”„ 및 tf.function μ†Œκ°œ](https://www.tensorflow.org/guide/intro_to_graphs)
    * [tf.function으둜 μ„±λŠ₯ ν–₯μƒν•˜κΈ°](https://www.tensorflow.org/guide/function)