Spaces:

Intel
/

powered_by_intel_llm_leaderboard

Running

App Files Files Community

Benjamin Consolvo commited on Mar 9

Commit

7645d86

•

1 Parent(s): 491fabd

doc updates 2

Browse files

Files changed (5) hide show

app.py +1 -1
info/deployment.py +56 -38
info/programs.py +9 -3
info/submit.py +10 -8
info/train_a_model.py +17 -21

app.py CHANGED Viewed

@@ -30,7 +30,7 @@ with demo:
         follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
         on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
         the  [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
-    gr.Markdown("""Join  5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and
                 talk about everything from GenAI, HPC, to Quantum Computing.""")
     gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
                 team for generously sharing their code and best

         follow the instructions and complete the form in the 🏎️ Submit tab. Models submitted to the leaderboard are evaluated
         on the Intel Developer Cloud ☁️. The evaluation platform consists of Gaudi Accelerators and Xeon CPUs running benchmarks from
         the  [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).""")
+    gr.Markdown("""![DevHub-image](assets/DevHub_Logo.png) Join  5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and
                 talk about everything from GenAI, HPC, to Quantum Computing.""")
     gr.Markdown("""A special shout-out to the 🤗 [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
                 team for generously sharing their code and best

info/deployment.py CHANGED Viewed

@@ -5,7 +5,7 @@ DEPLOY_TEXT = f"""
 A collection of powerful models is valuable, but ultimately, you need to be able to use them effectively.
 This tab is dedicated to providing guidance and code snippets for performing inference with leaderboard models on Intel platforms.
-Below, you'll find a table of open-source software options for inference, along with the supported Intel Hardware Platforms.
 A 🚀 indicates that inference with the associated software package is supported on the hardware. We hope this information
 helps you choose the best option for your specific use case. Happy building!
@@ -72,8 +72,8 @@ helps you choose the best option for your specific use case. Happy building!
     <td>PyTorch</td>
     <td>🚀</td>
     <td>🚀</td>
-    <td>🚀</td>
-    <td>🚀</td>
     <td>🚀</td>
   </tr>
 </tr>
@@ -81,43 +81,25 @@ helps you choose the best option for your specific use case. Happy building!
     <td>Tensorflow</td>
     <td>🚀</td>
     <td>🚀</td>
-    <td>🚀</td>
-    <td>🚀</td>
     <td>🚀</td>
 </tr>
 </table>
 </div>
 <hr>
-# Intel® Gaudi Accelerators
-Habana's SDK, Intel Gaudi Software, supports PyTorch and DeepSpeed for accelerating LLM training and inference.
-The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
-(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
-and graph-level optimizations).
-Optimum Habana provides covenient functionality for various tasks, below you'll find the command line
-snippet that you would run to perform inference on Gaudi with meta-llama/Llama-2-7b-hf.
-The "run_generation.py" script below can be found [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
-```bash
-python run_generation.py \
---model_name_or_path meta-llama/Llama-2-7b-hf \
---use_hpu_graphs \
---use_kv_cache \
---max_new_tokens 100 \
---do_sample \
---batch_size 2 \
---prompt "Hello world" "How are you?"
-```
-<hr>
-# Intel® Max Series GPU
-### INT4 Inference (GPU)
 ```python
 import intel_extension_for_pytorch as ipex
 from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
@@ -138,11 +120,15 @@ output = model.generate(inputs)
 ```
 <hr>
-# Intel® Xeon CPUs
-### Intel Extension for PyTorch - Optimum Intel (no quantization)
-Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]
-`
 ```python
 from optimum.intel import IPEXModelForCausalLM
 from transformers import AutoTokenizer, pipeline
@@ -154,6 +140,7 @@ results = pipe("A fisherman at sea...")
 ```
 ### Intel® Extension for PyTorch - Mixed Precision (fp32 and bf16)
 ```python
 import torch
 import intel_extension_for_pytorch as ipex
@@ -188,9 +175,12 @@ outputs = model.generate(inputs)
 <hr>
 # Intel® Core Ultra (NPUs and iGPUs)
 ### Intel® NPU Acceleration Library
 ```python
 from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
 import intel_npu_acceleration_library
@@ -222,7 +212,9 @@ print("Run inference")
 _ = model.generate(**generation_kwargs)
 ```
-### OpenVINO Toolking with Optimum Habana
 ```python
 from optimum.intel import OVModelForCausalLM
@@ -240,9 +232,35 @@ pipe("In the spring, beautiful flowers bloom...")
 <hr>
-# Intel ARC GPUs
-Coming Soon!
 """

 A collection of powerful models is valuable, but ultimately, you need to be able to use them effectively.
 This tab is dedicated to providing guidance and code snippets for performing inference with leaderboard models on Intel platforms.
+Below is a table of open-source software options for inference, along with the supported Intel hardware platforms.
 A 🚀 indicates that inference with the associated software package is supported on the hardware. We hope this information
 helps you choose the best option for your specific use case. Happy building!
     <td>PyTorch</td>
     <td>🚀</td>
     <td>🚀</td>
+    <td></td>
+    <td></td>
     <td>🚀</td>
   </tr>
 </tr>
     <td>Tensorflow</td>
     <td>🚀</td>
     <td>🚀</td>
+    <td></td>
+    <td></td>
     <td>🚀</td>
 </tr>
 </table>
 </div>
 <hr>
+# Intel® Max Series GPU
+The Intel® Data Center GPU Max Series is Intel's highest performing, highest density, general-purpose discrete GPU, which packs over 100 billion transistors into one package and contains up to 128 Xe Cores--Intel's foundational GPU compute building block. You can learn more about this GPU [here](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html).
+### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
+Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
+👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
+Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
+👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
 ```python
 import intel_extension_for_pytorch as ipex
 from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
 ```
 <hr>
+# Intel® Xeon® CPUs
+The Intel® Xeon® CPUs have the most built-in accelerators of any CPU on the market, including Advanced Matrix Extensions (AMX) to accelerate matrix multiplication in deep learning training and inference. Learn more about the Xeon CPUs [here](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html).
+### Optimum Intel and Intel Extension for PyTorch (no quantization)
+🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
+👍 [Optimum Intel GitHub](https://github.com/huggingface/optimum-intel)
+Requires installing/updating optimum `pip install --upgrade-strategy eager optimum[ipex]`
 ```python
 from optimum.intel import IPEXModelForCausalLM
 from transformers import AutoTokenizer, pipeline
 ```
 ### Intel® Extension for PyTorch - Mixed Precision (fp32 and bf16)
 ```python
 import torch
 import intel_extension_for_pytorch as ipex
 <hr>
 # Intel® Core Ultra (NPUs and iGPUs)
+Intel® Core™ Ultra Processors are optimized for premium thin and powerful laptops, featuring 3D performance hybrid architecture, advanced AI capabilities, and available with built-in Intel® Arc™ GPU. Learn more about Intel Core Ultra [here](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html). For now, there is support for smaller models like [TinyLama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
 ### Intel® NPU Acceleration Library
+The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
+👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
 ```python
 from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
 import intel_npu_acceleration_library
 _ = model.generate(**generation_kwargs)
 ```
+### OpenVINO Tooling with Optimum Intel
+OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
+👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
 ```python
 from optimum.intel import OVModelForCausalLM
 <hr>
+# Intel® Gaudi Accelerators
+The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
+Habana's SDK, Intel Gaudi Software, supports PyTorch and DeepSpeed for accelerating LLM training and inference.
+The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
+(e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
+and graph-level optimizations).
+Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
+👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
+The "run_generation.py" script below can be found [here on GitHub](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation)
+```bash
+python run_generation.py \
+--model_name_or_path meta-llama/Llama-2-7b-hf \
+--use_hpu_graphs \
+--use_kv_cache \
+--max_new_tokens 100 \
+--do_sample \
+--batch_size 2 \
+--prompt "Hello world" "How are you?"
+```
+<hr>
+# Intel Arc GPUs
+You can learn more about Arc GPUs [here](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/arc.html).
+Code snippets coming soon!
 """

info/programs.py CHANGED Viewed

@@ -2,8 +2,7 @@ PROGRAMS_TEXT= """
 # 👩‍💻 Developer Programs
 Intel offers a range of programs to grant early, short, and long-term access to developers. A great way to build
-and share models on the "Powered by Intel" LLM Leaderboard is to join one of these programs. Learn more about
-these opportunities below:
 <hr>
@@ -14,7 +13,7 @@ helps you innovate and scale, no matter where you are in your entrepreneurial jo
 Through Intel Liftoff, startups can access the computational power they need to build powerful LLMs on platforms
 like Gaudi, Max Series GPUs, and Xeon Processors.
-Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/tools/oneapi/liftoff.html
 <hr>
@@ -41,4 +40,11 @@ environment for projects on the latest Intel technology and as a oneAPI expert,
 others in the community and within Intel
 Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
 """

 # 👩‍💻 Developer Programs
 Intel offers a range of programs to grant early, short, and long-term access to developers. A great way to build
+and share models on the "Powered by Intel" LLM Leaderboard is to join one of these programs.
 <hr>
 Through Intel Liftoff, startups can access the computational power they need to build powerful LLMs on platforms
 like Gaudi, Max Series GPUs, and Xeon Processors.
+Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/tools/oneapi/liftoff.html.
 <hr>
 others in the community and within Intel
 Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
+<hr>
+## Intel DevHub Discord
+Join 5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and talk about everything from GenAI, HPC, to Quantum Computing.
 """

info/submit.py CHANGED Viewed

@@ -1,8 +1,8 @@
 SUBMIT_TEXT = f"""
 # 🏎️ Submit
-Models added here will be queued for evaluation on the Intel Developer Cloud ☁️ Depending on the queue, your model may take up to 10 days to show up on the leaderboard.
-We will work to create greater transperancy as our leaderboard community grows!
 ## First steps before submitting a model
@@ -14,21 +14,23 @@ model = AutoModel.from_pretrained("your model name", revision=revision)
 tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
 ```
 If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
-Note: make sure your model is public!
-Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
 ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
-It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`!
-### 3) Make sure your model has an open license!
 This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗 A good example of an open source license is apache-2.0.
-Typically model licenses that are allow for commercial and research use tend to be the most attractive to other developers in the ecosystem!
 ### 4) Fill up your model card
 We use your model card to better understand the properties of your model and make them more easily discoverable for other users.
 Model cards are required to have mentions of the hardware, software, and infrastructure used for training - without this information
 we cannot accept your model as a valid submission. Remember, only models trained on these processors are eligle to participate in evaluation:
-Intel® Gaudi Accelerators, Intel® Xeon® Processors, Intel® Data Center GPU Max Series, Intel® ARC GPUs, and Intel® Core Ultra
 ### 5) Select the correct precision
 Not all models are converted properly from `float16` to `bfloat16`, and selecting the wrong precision can sometimes cause evaluation error (as loading a `bf16` model in `fp16` can sometimes generate NaNs, depending on the weight range).

 SUBMIT_TEXT = f"""
 # 🏎️ Submit
+Models added here will be queued for evaluation on the Intel Developer Cloud ☁️. Depending on the queue, your model may take up to 10 days to show up on the leaderboard.
+We will work to create greater transperancy as our leaderboard community grows.
 ## First steps before submitting a model
 tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
 ```
 If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
+Note: Make sure your model is public!
+Note: If your model needs `use_remote_code=True`, we do not support this option yet, but we are working on adding it.
 ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
+It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`.
+### 3) Make sure your model has an open license.
 This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗 A good example of an open source license is apache-2.0.
+Typically model licenses that are allow for commercial and research use tend to be the most attractive to other developers in the ecosystem.
 ### 4) Fill up your model card
 We use your model card to better understand the properties of your model and make them more easily discoverable for other users.
 Model cards are required to have mentions of the hardware, software, and infrastructure used for training - without this information
 we cannot accept your model as a valid submission. Remember, only models trained on these processors are eligle to participate in evaluation:
+Intel® Gaudi Accelerators, Intel® Xeon® Processors, Intel® Data Center GPU Max Series, Intel® ARC GPUs, and Intel® Core Ultra,
 ### 5) Select the correct precision
 Not all models are converted properly from `float16` to `bfloat16`, and selecting the wrong precision can sometimes cause evaluation error (as loading a `bf16` model in `fp16` can sometimes generate NaNs, depending on the weight range).

info/train_a_model.py CHANGED Viewed

@@ -2,22 +2,21 @@
 LLM_BENCHMARKS_TEXT = f"""
 # 🧰 Train a Model
-Intel offers a variety of platforms that can be used to train LLMs including datacenter and consumer grade CPUs, GPUs, and ASICs.
-Below, you'll find documentation on how to access free and paid resources to train a model and submit it to the Powered-by-Intel LLM Leaderboard.
 ## Intel Developer Cloud - Quick Start
 The Intel Developer Cloud is one of the best places to access free and paid compute instances for model training. Intel offers Jupyter Notebook instances supported by
-224 Core 4th Generation Xeon Baremetal nodes with 4x Max Series GPU 1100 GPUs. To access these resources please follow the instructions below:
-1. Visit [cloud.intel.com](cloud.intel.com) and create a free account.
-2. Navigate to the "Training" module under the "Software" section in the left panel
-3. Under the GenAI Essentials section, select the LLM Fine-Tuning with QLoRA notebook and click "Launch"
-4. Follow the instructions in the notebook to train your model using Intel® Data Center GPU Max 1100
-5. Upload your model to the Hugging Face Model Hub
-6. Go to the "Submit" tab follow instructions to create a leaderboard evaluation request
-## Additional Training Code Samples
-Below you will find a list of additional resources for training models on different intel hardware platforms:
 - Intel® Gaudi® Accelerators
     - [Parameter Efficient Fine-Tuning of Llama-2 70B](https://github.com/HabanaAI/Gaudi-tutorials/blob/main/PyTorch/llama2_fine_tuning_inference/llama2_fine_tuning_inference.ipynb)
 - Intel® Xeon® Processors
@@ -25,13 +24,12 @@ Below you will find a list of additional resources for training models on differ
     - [Fine-tuning Falcon 7B on Xeon Processors](https://medium.com/@eduand-alvarez/fine-tune-falcon-7-billion-on-xeon-cpus-with-hugging-face-and-oneapi-a25e10803a53)
 - Intel® Data Center GPU Max Series
     - [LLM Fine-tuning with QLoRA on Max Series GPUs](https://console.idcservice.net/training/detail/159c24e4-5598-3155-a790-2qv973tlm172)
-## Submitting your Model to the Hub
-Once you have trained your model, it is a straighforward process to upload and open source it on the Hugging Face Hub.
-```python
 # Logging in to Hugging Face
 from huggingface_hub import notebook_login, Repository
 # Login to Hugging Face
@@ -49,8 +47,6 @@ model = AutoModelForSequenceClassification.from_pretrained(checkpoint_path)
 # Load the tokenizer
 tokenizer = AutoTokenizer.from_pretrained("") #add name of your model's tokenizer on Hugging Face OR custom tokenizer
-#Saving and Uploading the Model and Tokenizer
 # Save the model and tokenizer
 model_name_on_hub = "desired-model-name"
 model.save_pretrained(model_name_on_hub)
@@ -61,10 +57,10 @@ model.push_to_hub(model_name_on_hub)
 tokenizer.push_to_hub(model_name_on_hub)
 # Congratulations! Your fine-tuned model is now uploaded to the Hugging Face Model Hub.
-# You can view and share your model using its URL: https://huggingface.co/your-username/your-model-name
 ```
 """
 SUBMIT_TEXT = f"""

 LLM_BENCHMARKS_TEXT = f"""
 # 🧰 Train a Model
+Intel offers a variety of platforms that can be used to train LLMs including data center and consumer grade CPUs, GPUs, and ASICs.
+Below, you can find documentation on how to access free and paid resources to train a model on Intel hardware and submit it to the Hugging Face Model Hub.
 ## Intel Developer Cloud - Quick Start
 The Intel Developer Cloud is one of the best places to access free and paid compute instances for model training. Intel offers Jupyter Notebook instances supported by
+224 Core 4th Generation Xeon Bare Metal nodes with 4x GPU Max Series 1100. To access these resources please follow the instructions below:
+1. Visit the [Intel Developer Cloud](https://cloud.intel.com/) and sign up for the "Standard - Free" tier to get started.
+2. Navigate to the "Training" module under the "Software" section in the left panel.
+3. Under the GenAI Essentials section, select the LLM Fine-Tuning with QLoRA notebook and click "Launch".
+4. Follow the instructions in the notebook to train your model using Intel® Data Center GPU Max 1100.
+5. Upload your model to the Hugging Face Model Hub.
+6. Go to the "Submit" tab on this Leaderboard and follow the instructions to submit your model.
+## Training Code Samples
+Below are some resources to get you started on training models on Intel platforms:
 - Intel® Gaudi® Accelerators
     - [Parameter Efficient Fine-Tuning of Llama-2 70B](https://github.com/HabanaAI/Gaudi-tutorials/blob/main/PyTorch/llama2_fine_tuning_inference/llama2_fine_tuning_inference.ipynb)
 - Intel® Xeon® Processors
     - [Fine-tuning Falcon 7B on Xeon Processors](https://medium.com/@eduand-alvarez/fine-tune-falcon-7-billion-on-xeon-cpus-with-hugging-face-and-oneapi-a25e10803a53)
 - Intel® Data Center GPU Max Series
     - [LLM Fine-tuning with QLoRA on Max Series GPUs](https://console.idcservice.net/training/detail/159c24e4-5598-3155-a790-2qv973tlm172)
+## Submitting your Model to the Hugging Face Model Hub
+Once your model is trained, it is a straighforward process to upload and open source it on the Hugging Face Model Hub. The commands from a Jupyter notebook are given below:
+```python
 # Logging in to Hugging Face
 from huggingface_hub import notebook_login, Repository
 # Login to Hugging Face
 # Load the tokenizer
 tokenizer = AutoTokenizer.from_pretrained("") #add name of your model's tokenizer on Hugging Face OR custom tokenizer
 # Save the model and tokenizer
 model_name_on_hub = "desired-model-name"
 model.save_pretrained(model_name_on_hub)
 tokenizer.push_to_hub(model_name_on_hub)
 # Congratulations! Your fine-tuned model is now uploaded to the Hugging Face Model Hub.
+# You can view and share your model using its URL: https://huggingface.co/<your-username>/<your-model-name>
 ```
+Once your model is uploaded, make sure to update your model card, specifying your use of Intel software and hardware. Hugging Face has a great description on [how to build model cards here](https://huggingface.co/docs/hub/en/model-cards).
 """
 SUBMIT_TEXT = f"""