Benjamin Consolvo
commited on
Commit
·
ad676d5
1
Parent(s):
7645d86
doc updates 3
Browse files- app.py +1 -1
- info/deployment.py +7 -1
app.py
CHANGED
@@ -30,7 +30,7 @@ with demo:
|
|
30 |
follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
|
31 |
on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
|
32 |
the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
|
33 |
-
gr.Markdown("""
|
34 |
talk about everything from GenAI, HPC, to Quantum Computing.""")
|
35 |
gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
36 |
team for generously sharing their code and best
|
|
|
30 |
follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
|
31 |
on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
|
32 |
the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
|
33 |
+
gr.Markdown("""Join 5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and
|
34 |
talk about everything from GenAI, HPC, to Quantum Computing.""")
|
35 |
gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
36 |
team for generously sharing their code and best
|
info/deployment.py
CHANGED
@@ -95,9 +95,11 @@ The Intel® Data Center GPU Max Series is Intel's highest performing, highest de
|
|
95 |
|
96 |
### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
|
97 |
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
|
|
|
98 |
👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
|
99 |
|
100 |
Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
|
|
|
101 |
👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
|
102 |
|
103 |
```python
|
@@ -125,6 +127,7 @@ The Intel® Xeon® CPUs have the most built-in accelerators of any CPU on the ma
|
|
125 |
|
126 |
### Optimum Intel and Intel Extension for PyTorch (no quantization)
|
127 |
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
|
|
|
128 |
👍 [Optimum Intel GitHub](https://github.com/huggingface/optimum-intel)
|
129 |
|
130 |
Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]`
|
@@ -179,6 +182,7 @@ Intel® Core™ Ultra Processors are optimized for premium thin and powerful lap
|
|
179 |
|
180 |
### Intel® NPU Acceleration Library
|
181 |
The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
|
|
|
182 |
👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
|
183 |
|
184 |
```python
|
@@ -214,6 +218,7 @@ _ = model.generate(**generation_kwargs)
|
|
214 |
|
215 |
### OpenVINO Tooling with Optimum Intel
|
216 |
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
|
|
|
217 |
👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
|
218 |
|
219 |
```python
|
@@ -235,12 +240,13 @@ pipe("In the spring, beautiful flowers bloom...")
|
|
235 |
# Intel® Gaudi Accelerators
|
236 |
The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
|
237 |
|
238 |
-
|
239 |
The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
|
240 |
(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
|
241 |
and graph-level optimizations).
|
242 |
|
243 |
Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
|
|
|
244 |
👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
|
245 |
|
246 |
The "run_generation.py" script below can be found [here on GitHub](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
|
|
|
95 |
|
96 |
### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
|
97 |
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
|
98 |
+
|
99 |
👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
|
100 |
|
101 |
Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
|
102 |
+
|
103 |
👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
|
104 |
|
105 |
```python
|
|
|
127 |
|
128 |
### Optimum Intel and Intel Extension for PyTorch (no quantization)
|
129 |
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
|
130 |
+
|
131 |
👍 [Optimum Intel GitHub](https://github.com/huggingface/optimum-intel)
|
132 |
|
133 |
Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]`
|
|
|
182 |
|
183 |
### Intel® NPU Acceleration Library
|
184 |
The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
|
185 |
+
|
186 |
👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
|
187 |
|
188 |
```python
|
|
|
218 |
|
219 |
### OpenVINO Tooling with Optimum Intel
|
220 |
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
|
221 |
+
|
222 |
👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
|
223 |
|
224 |
```python
|
|
|
240 |
# Intel® Gaudi Accelerators
|
241 |
The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
|
242 |
|
243 |
+
Intel Gaudi Software supports PyTorch and DeepSpeed for accelerating LLM training and inference.
|
244 |
The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
|
245 |
(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
|
246 |
and graph-level optimizations).
|
247 |
|
248 |
Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
|
249 |
+
|
250 |
👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
|
251 |
|
252 |
The "run_generation.py" script below can be found [here on GitHub](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
|