NeMo
nvidia
jiaqiz commited on
Commit
47b4812
·
verified ·
1 Parent(s): cf97eef

Add files using large-upload tool

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -27,7 +27,7 @@ NVIDIA does not claim ownership to any outputs generated using the Models or Der
27
 
28
  ### Intended use
29
 
30
- Nemotron-4-340B-Base is a completion model intended for use in over 50+ natural and 40+ coding languages. For best performance on a given task, users are encouraged to customize the completion model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA), and SFT/Steer-LM/RLHF using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
31
 
32
  **Model Developer:** NVIDIA
33
 
@@ -59,7 +59,7 @@ Nemotron-4
59
 
60
  ### Usage
61
 
62
- 1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``
63
 
64
  ```python
65
  import requests
@@ -101,7 +101,7 @@ print(response)
101
  ```
102
 
103
 
104
- 2. Given this python script, we will create a bash script, which spins up the inference server within the NeMo container(docker pull nvcr.io/nvidia/nemo:24.01.framework) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,
105
 
106
 
107
  ```bash
@@ -151,13 +151,13 @@ depends_on () {
151
  ```
152
 
153
 
154
- 3, We can launch the ``nemo_inferece.sh`` with a slurm script defined like below, which starts a 2-node job for the model inference.
155
 
156
  ```bash
157
  #!/bin/bash
158
  #SBATCH -A SLURM-ACCOUNT
159
  #SBATCH -p SLURM-PARITION
160
- #SBATCH -N 2 # number of nodes
161
  #SBATCH -J generation
162
  #SBATCH --ntasks-per-node=8
163
  #SBATCH --gpus-per-node=8
@@ -167,8 +167,9 @@ RESULTS=<PATH_TO_YOUR_SCRIPTS_FOLDER>
167
  OUTFILE="${RESULTS}/slurm-%j-%n.out"
168
  ERRFILE="${RESULTS}/error-%j-%n.out"
169
  MODEL=<PATH_TO>/Nemotron-4-340B-Base
170
-
171
  MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
 
172
  read -r -d '' cmd <<EOF
173
  bash /scripts/nemo_inference.sh /model
174
  EOF
 
27
 
28
  ### Intended use
29
 
30
+ Nemotron-4-340B-Base is a completion model intended for use in over 50+ natural and 40+ coding languages. For best performance on a given task, users are encouraged to customize the completion model using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA), and SFT/Steer-LM/RLHF using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner).
31
 
32
  **Model Developer:** NVIDIA
33
 
 
59
 
60
  ### Usage
61
 
62
+ 1. We will spin up an inference server and then call the inference server in a python script. Let’s first define the python script ``call_server.py``.
63
 
64
  ```python
65
  import requests
 
101
  ```
102
 
103
 
104
+ 2. Given this python script, we will create a bash script, which spins up the inference server within the NeMo container (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the python script ``call_server.py``. The bash script ``nemo_inference.sh`` is as follows,
105
 
106
 
107
  ```bash
 
151
  ```
152
 
153
 
154
+ 3. We can launch the ``nemo_inferece.sh`` with a slurm script defined like below, which starts a 2-node job for the model inference.
155
 
156
  ```bash
157
  #!/bin/bash
158
  #SBATCH -A SLURM-ACCOUNT
159
  #SBATCH -p SLURM-PARITION
160
+ #SBATCH -N 2
161
  #SBATCH -J generation
162
  #SBATCH --ntasks-per-node=8
163
  #SBATCH --gpus-per-node=8
 
167
  OUTFILE="${RESULTS}/slurm-%j-%n.out"
168
  ERRFILE="${RESULTS}/error-%j-%n.out"
169
  MODEL=<PATH_TO>/Nemotron-4-340B-Base
170
+ CONTAINER="nvcr.io/nvidia/nemo:24.01.framework"
171
  MOUNTS="--container-mounts=<PATH_TO_YOUR_SCRIPTS_FOLDER>:/scripts,MODEL:/model"
172
+
173
  read -r -d '' cmd <<EOF
174
  bash /scripts/nemo_inference.sh /model
175
  EOF