zetavg commited on
Commit
0e73bed
1 Parent(s): 45df9da

update instructions for SkyPilot

Browse files
Files changed (1) hide show
  1. README.md +44 -14
README.md CHANGED
@@ -42,10 +42,10 @@ After approximately 5 minutes of running, you will see the public URL in the out
42
  After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
43
 
44
  ```yaml
45
- # llama-lora-tuner.yaml
46
 
47
  resources:
48
- accelerators: A10:1 # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud.
49
  cloud: lambda # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
50
 
51
  file_mounts:
@@ -53,30 +53,46 @@ file_mounts:
53
  # (to store train datasets trained models)
54
  # See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
55
  /data:
56
- name: llama-lora-tuner-data # Make sure this name is unique or you own this bucket. If it does not exists, SkyPilot will try to create a bucket with this name.
57
  store: s3 # Could be either of [s3, gcs]
58
  mode: MOUNT
59
 
60
  # Clone the LLaMA-LoRA Tuner repo and install its dependencies.
61
  setup: |
62
- git clone https://github.com/zetavg/LLaMA-LoRA-Tuner.git llama_lora_tuner
63
- cd llama_lora_tuner && pip install -r requirements.lock.txt
 
 
 
 
 
64
  pip install wandb
65
- cd ..
 
 
 
66
  echo 'Dependencies installed.'
67
- echo 'Pre-downloading base models so that you won't have to wait for long once the app is ready...'
68
- python llama_lora_tuner/download_base_model.py --base_model_names='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b'
 
69
 
70
- # Start the app.
71
  run: |
72
- echo 'Starting...'
73
- python llama_lora_tuner/app.py --data_dir='/data' --wandb_api_key="$([ -f /data/secrets/wandb_api_key ] && cat /data/secrets/wandb_api_key | tr -d '\n')" --timezone='Atlantic/Reykjavik' --base_model=decapoda-research/llama-7b-hf --base_model_choices='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b --share
 
 
 
 
 
 
 
74
  ```
75
 
76
  Then launch a cluster to run the task:
77
 
78
  ```
79
- sky launch -c llama-lora-tuner llama-lora-tuner.yaml
80
  ```
81
 
82
  `-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
@@ -87,14 +103,28 @@ Note that exiting `sky launch` will only exit log streaming and will not stop th
87
 
88
  When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ### Run locally
91
 
92
  <details>
93
  <summary>Prepare environment with conda</summary>
94
 
95
  ```bash
96
- conda create -y python=3.8 -n llama-lora-tuner
97
- conda activate llama-lora-tuner
98
  ```
99
  </details>
100
 
 
42
  After following the [installation guide of SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), create a `.yaml` to define a task for running the app:
43
 
44
  ```yaml
45
+ # llm-tuner.yaml
46
 
47
  resources:
48
+ accelerators: A10:1 # 1x NVIDIA A10 GPU, about US$ 0.6 / hr on Lambda Cloud. Run `sky show-gpus` for supported GPU types, and `sky show-gpus [GPU_NAME]` for the detailed information of a GPU type.
49
  cloud: lambda # Optional; if left out, SkyPilot will automatically pick the cheapest cloud.
50
 
51
  file_mounts:
 
53
  # (to store train datasets trained models)
54
  # See https://skypilot.readthedocs.io/en/latest/reference/storage.html for details.
55
  /data:
56
+ name: llm-tuner-data # Make sure this name is unique or you own this bucket. If it does not exists, SkyPilot will try to create a bucket with this name.
57
  store: s3 # Could be either of [s3, gcs]
58
  mode: MOUNT
59
 
60
  # Clone the LLaMA-LoRA Tuner repo and install its dependencies.
61
  setup: |
62
+ conda create -q python=3.8 -n llm-tuner -y
63
+ conda activate llm-tuner
64
+ # Clone the LLaMA-LoRA Tuner repo and install its dependencies
65
+ [ ! -d llm_tuner ] && git clone https://github.com/zetavg/LLaMA-LoRA-Tuner.git llm_tuner
66
+ echo 'Installing dependencies...'
67
+ pip install -r llm_tuner/requirements.lock.txt
68
+ # Optional: install wandb to enable logging to Weights & Biases
69
  pip install wandb
70
+ # Optional: patch bitsandbytes to workaround error "libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats"
71
+ BITSANDBYTES_LOCATION="$(pip show bitsandbytes | grep 'Location' | awk '{print $2}')/bitsandbytes"
72
+ [ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" ] && [ ! -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" ] && [ -f "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" ] && echo 'Patching bitsandbytes for GPU support...' && mv "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so.bak" && cp "$BITSANDBYTES_LOCATION/libbitsandbytes_cuda121.so" "$BITSANDBYTES_LOCATION/libbitsandbytes_cpu.so"
73
+ conda install -n llm-tuner cudatoolkit -y
74
  echo 'Dependencies installed.'
75
+ # Optional: pre-download models
76
+ echo "Pre-downloading base models so that you won't have to wait for long once the app is ready..."
77
+ python llm_tuner/download_base_model.py --base_model_names='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j'
78
 
79
+ # Start the app. `wandb_api_key` and `wandb_project_name` are optional.
80
  run: |
81
+ conda activate llm-tuner
82
+ python llm_tuner/app.py \
83
+ --data_dir='/data' \
84
+ --wandb_api_key="$([ -f /data/secrets/wandb_api_key.txt ] && cat /data/secrets/wandb_api_key.txt | tr -d '\n')" \
85
+ --wandb_project_name='llm-tuner' \
86
+ --timezone='Atlantic/Reykjavik' \
87
+ --base_model='decapoda-research/llama-7b-hf' \
88
+ --base_model_choices='decapoda-research/llama-7b-hf,nomic-ai/gpt4all-j,databricks/dolly-v2-7b' \
89
+ --share
90
  ```
91
 
92
  Then launch a cluster to run the task:
93
 
94
  ```
95
+ sky launch -c llm-tuner llm-tuner.yaml
96
  ```
97
 
98
  `-c ...` is an optional flag to specify a cluster name. If not specified, SkyPilot will automatically generate one.
 
103
 
104
  When you are done, run `sky stop <cluster_name>` to stop the cluster. To terminate a cluster instead, run `sky down <cluster_name>`.
105
 
106
+ **Remember to stop or shutdown the cluster when you are done to avoid incurring unexpected charges.** Run `sky cost-report` to see the cost of your clusters.
107
+
108
+ <details>
109
+ <summary>Log into the cloud machine or mount the filesystem of the cloud machine on your local computer</summary>
110
+
111
+ To log into the cloud machine, run `ssh <cluster_name>`, such as `ssh llm-tuner`.
112
+
113
+ If you have `sshfs` installed on your local machine, you can mount the filesystem of the cloud machine on your local computer by running a command like the following:
114
+
115
+ ```bash
116
+ mkdir -p /tmp/llm_tuner_server && umount /tmp/llm_tuner_server || : && sshfs llm-tuner:/ /tmp/llm_tuner_server
117
+ ```
118
+ </details>
119
+
120
  ### Run locally
121
 
122
  <details>
123
  <summary>Prepare environment with conda</summary>
124
 
125
  ```bash
126
+ conda create -y python=3.8 -n llm-tuner
127
+ conda activate llm-tuner
128
  ```
129
  </details>
130