You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Llama-3.2-3B-Instruct - Renesas X5H

Introduction

This repository contains Llama-3.2-3B-Instruct model, optimized for Renesas X5H platform for text-generation inference.

  • Model Architecture: Llama 3.2-3B is an auto-regressive language model that uses an optimized transformer architecture.
  • Model Summary:
    Parameter Llama-3.2-3B-Instruct
    NUM_LAYERS 28
    HIDDEN_SIZE 3072
    FFN_DIM 8192
    NUM_HEADS 24
    NUM_KV_HEADS 8
    HEAD_DIM 128
    GROUP_SIZE 3
    VOCAB_SIZE 128256
    RMS_NORM_EPS 1e-5
    ROPE_THETA 500000.0
  • Source Model: meta-llama/Llama-3.2-3B-Instruct

Performance

The following performance metrics were measured with a prompt.

Model Precision Device Response Rate (tokens/sec)
Llama-3.2-3B-Instruct FP16 NPX6 8.76
Llama-3.2-3B-Instruct W4A16 NPX6 17.46

Prerequisites

To run model, you need:

  1. Renesas X5H Board with SDK v4.32.0
  2. Hugging Face CLI: For downloading the model and installer.

Deployment

Download the installer in any linux PC from Files and versions tab "llama3p2-3b-runner-0.1.0-Linux.sh"

Llama-3.2-3B-Instruct (W4A16)

  1. Download the installer llama3p2-3b-w4a16-runner-0.1.0-Linux.sh from Files and version tab under w4a16\binaries\rcar-x5hv1\xOS_v4. folder.
  2. Copy the installer to the X5H board and run the installer.
    bash ./llama3p2-3b-w4a16-runner-0.1.0-Linux.sh --prefix=./ --exclude-subdir --skip-license
    
  3. Download GGUF Llama-3.2-3B-Instruct-f16.gguf from Files and version tab under fp16 folder and copy to the installed directory on the X5H board.
  4. Expected directory structure on the X5H board.
     llama3p2-3b-w4a16-runner
     ├── Llama-3.2-3B-Instruct-f16.gguf
     ├── firmwares
     ├── kernel_modules
     ├── llama3p2-3b-w4a16
     ├── llama3p2-3b-w4a16-runner
     ├── scripts
     └── setup_npu.sh
    

Inference - Llama-3.2-3B-Instruct (W4A16)

bash ./setup_npu.sh
./llama3p2-3b-w4a16-runner <PROMPT>

Llama-3.2-3B-Instruct (FP16)

  1. Download the installer llama3p2-3b-runner-0.1.0-Linux.sh from Files and version tab under fp16\binaries\rcar-x5hv1\xOS_v4. folder.
  2. Copy the installer to the X5H board and run the installer.
    bash ./llama3p2-3b-runner-0.1.0-Linux.sh --prefix=./ --exclude-subdir --skip-license
    
  3. Download GGUF Llama-3.2-3B-Instruct-f16.gguf from Files and version tab under fp16 folder and copy to the installed directory on the X5H board.
  4. Expected directory structure on the X5H board.
     llama3p2-3b-runner
     ├── Llama-3.2-3B-Instruct-f16.gguf
     ├── firmwares
     ├── kernel_modules
     ├── llama3p2-3b-runner
     ├── scripts
     └── setup_npu.sh
    

Inference - Llama-3.2-3B-Instruct (FP16)

bash ./setup_npu.sh
./llama3p2-3b-runner <PROMPT>
Downloads last month
25
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Renesas/Llama-3.2-3B-Instruct-GGUF

Quantized
(480)
this model

Collections including Renesas/Llama-3.2-3B-Instruct-GGUF