added more info and updated setups

Browse files

can't update zipped files on laptop

Files changed (9) hide show

auto-exl2-upload/INSTRUCTIONS.txt +7 -5
auto-exl2-upload/exl2-quant.py +7 -2
auto-exl2-upload/linux-setup.sh +8 -1
auto-exl2-upload/windows-setup.bat +7 -0
exl2-multi-quant-local/INSTRUCTIONS.txt +7 -5
exl2-multi-quant-local/exl2-quant.py +8 -3
exl2-multi-quant-local/linux-setup.sh +7 -0
exl2-multi-quant-local/windows-setup.bat +7 -0
ipynb/Multi_Quant_exl2.ipynb +12 -20

auto-exl2-upload/INSTRUCTIONS.txt CHANGED Viewed

@@ -19,18 +19,20 @@ For example, on Ubuntu use: sudo apt-get install build-essential
 This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
-First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then every file and folder except for windows.bat, linux.sh, and exl2-quant.py should be deleted then try again.
 After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
 Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
-Make sure to also have a lot of RAM depending on the model.
-If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c pause at any time during the quant process.
-To add more options to the quantization process, you can add them to line 171. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
 Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
@@ -44,4 +46,4 @@ https://github.com/oobabooga
 Credit to Lucain Pouget for maintaining huggingface-hub.
 https://github.com/Wauplin
-Only tested with CUDA 12.1 on Windows 11 and half-tested Linux through WSL2 but I don't have enough RAM to fully test but quantization did start.

 This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
+Only python 3.10 and 3.11
+First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then delete venv folder and try again.
 After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
 Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
+Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot.
+If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
+To add more options to the quantization process, you can add them to line 174. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
 Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
 Credit to Lucain Pouget for maintaining huggingface-hub.
 https://github.com/Wauplin
+Only tested with CUDA 12.1 on Windows 11

auto-exl2-upload/exl2-quant.py CHANGED Viewed

@@ -108,6 +108,7 @@ bpwvalue = list(qnum.values())
 #sort the list from smallest to largest
 bpwvalue.sort()
 if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model was converted to safetensors, skip download if it was
     result = subprocess.run(f"{pyt} download-model.py {repo_url}", shell=True) #download model from hf (Credit to oobabooga for this script)
     if result.returncode != 0:
@@ -115,6 +116,7 @@ if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model
         sys.exit("Exiting...")
     clear_screen()
 if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model exists
     convertst = input("Couldn't find safetensors model, do you want to convert to safetensors? (y/n): ")
     while convertst != 'y' and convertst != 'n':
@@ -125,8 +127,8 @@ if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model e
         if result.returncode != 0:
             print("Converting failed. Please look for a safetensors model or convert model manually.")
             sys.exit("Exiting...")
-        subprocess.run(f"{osrmd} models{slsh}{model}", shell=True)
-        subprocess.run(f"{osmv} models{slsh}{model}-st models{slsh}{model}", shell=True)
         open(f"models{slsh}{model}{slsh}converted-st", 'w').close()
         print("Finished converting")
     else:
@@ -189,6 +191,7 @@ if file_exists(f"{whoami().get('name', None)}/{modelname}-exl2", "measurement.js
     upload_file(path_or_fileobj=f"measurements{slsh}{model}-measure{slsh}measurement.json", path_in_repo="measurement.json", repo_id=f"{whoami().get('name', None)}/{modelname}-exl2", commit_message="Add measurement.json") #upload measurement.json to main
 clear_screen()
 delmodel = input("Do you want to delete the original model? (y/n): ")
 while delmodel != 'y' and delmodel != 'n':
     delmodel = input("Please enter 'y' or 'n': ")
@@ -198,6 +201,7 @@ if delmodel == 'y':
     time.sleep(2)
 clear_screen()
 priv2pub = input("Do you want to make the repo public? (y/n): ")
 while priv2pub != 'y' and priv2pub != 'n':
     priv2pub = input("Please enter 'y' or 'n': ")
@@ -207,6 +211,7 @@ if priv2pub == 'y':
     time.sleep(2)
 clear_screen()
 if tfound == 'false':
     print(f'''
               You are now logged in as {whoami().get('fullname', None)}.

 #sort the list from smallest to largest
 bpwvalue.sort()
+#downloading the model
 if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model was converted to safetensors, skip download if it was
     result = subprocess.run(f"{pyt} download-model.py {repo_url}", shell=True) #download model from hf (Credit to oobabooga for this script)
     if result.returncode != 0:
         sys.exit("Exiting...")
     clear_screen()
+#convert to safetensors if bin
 if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model exists
     convertst = input("Couldn't find safetensors model, do you want to convert to safetensors? (y/n): ")
     while convertst != 'y' and convertst != 'n':
         if result.returncode != 0:
             print("Converting failed. Please look for a safetensors model or convert model manually.")
             sys.exit("Exiting...")
+        subprocess.run(f"{osrmd} models{slsh}{model}", shell=True) #remove previous weights
+        subprocess.run(f"{osmv} models{slsh}{model}-st models{slsh}{model}", shell=True) #replace with safetensors
         open(f"models{slsh}{model}{slsh}converted-st", 'w').close()
         print("Finished converting")
     else:
     upload_file(path_or_fileobj=f"measurements{slsh}{model}-measure{slsh}measurement.json", path_in_repo="measurement.json", repo_id=f"{whoami().get('name', None)}/{modelname}-exl2", commit_message="Add measurement.json") #upload measurement.json to main
 clear_screen()
+#ask to delete original fp16 weights
 delmodel = input("Do you want to delete the original model? (y/n): ")
 while delmodel != 'y' and delmodel != 'n':
     delmodel = input("Please enter 'y' or 'n': ")
     time.sleep(2)
 clear_screen()
+#ask to change repo visibility to public on hf hub
 priv2pub = input("Do you want to make the repo public? (y/n): ")
 while priv2pub != 'y' and priv2pub != 'n':
     priv2pub = input("Please enter 'y' or 'n': ")
     time.sleep(2)
 clear_screen()
+#if new sign in, tell user
 if tfound == 'false':
     print(f'''
               You are now logged in as {whoami().get('fullname', None)}.

auto-exl2-upload/linux-setup.sh CHANGED Viewed

@@ -6,7 +6,7 @@
 if [ ! -d "venv" ]; then
     python3 -m venv venv
 else
-    echo "venv directory already exists. If something is broken, delete everything but exl2-quant.py and run this script again."
     read -p "Press enter to continue"
     exit
 fi
@@ -40,6 +40,13 @@ else
     exit
 fi
 # download stuff
 echo "Downloading files"
 git clone https://github.com/turboderp/exllamav2

 if [ ! -d "venv" ]; then
     python3 -m venv venv
 else
+    echo "venv directory already exists. If something is broken, delete venv folder and run this script again."
     read -p "Press enter to continue"
     exit
 fi
     exit
 fi
+echo "Deleting potential conflicting files"
+rm convert-to-safetensors.py
+rm download-model.py
+rm -rf exllamav2
+rm start-quant.sh
+rm enter-venv.sh
 # download stuff
 echo "Downloading files"
 git clone https://github.com/turboderp/exllamav2

auto-exl2-upload/windows-setup.bat CHANGED Viewed

@@ -41,6 +41,13 @@ if "%cuda_version%"=="11" (
     exit
 )
 REM download stuff
 echo Downloading files...
 git clone https://github.com/turboderp/exllamav2

     exit
 )
+echo Deleting potential conflicting files
+del convert-to-safetensors.py
+del download-model.py
+rmdir /s /q exllamav2
+del start-quant.sh
+del enter-venv.sh
 REM download stuff
 echo Downloading files...
 git clone https://github.com/turboderp/exllamav2

exl2-multi-quant-local/INSTRUCTIONS.txt CHANGED Viewed

@@ -19,18 +19,20 @@ For example, on Ubuntu use: sudo apt-get install build-essential
 This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
-First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then every file and folder except for windows.bat, linux.sh, and exl2-quant.py should be deleted then try again.
 After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
 Make sure that your storage space is 3x the amount of the model's size plus 1 more time per quant. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
-Make sure to also have a lot of RAM depending on the model.
-If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c pause at any time during the quant process.
-To add more options to the quantization process, you can add them to line 125. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
 Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
@@ -44,4 +46,4 @@ https://github.com/oobabooga
 Credit to Lucain Pouget for maintaining huggingface-hub.
 https://github.com/Wauplin
-Only tested with CUDA 12.1 on Windows 11 and half-tested Linux through WSL2 but I don't have enough RAM to fully test but quantization did start.

 This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
+Only python 3.10 and 3.11
+First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then delete venv folder and try again.
 After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
 Make sure that your storage space is 3x the amount of the model's size plus 1 more time per quant. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
+Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot.
+If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
+To add more options to the quantization process, you can add them to line 136. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
 Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
 Credit to Lucain Pouget for maintaining huggingface-hub.
 https://github.com/Wauplin
+Only tested with CUDA 12.1 on Windows 11

exl2-multi-quant-local/exl2-quant.py CHANGED Viewed

@@ -51,7 +51,7 @@ else:
         login(input("API token not detected. Enter your HuggingFace token (empty to skip): "))
     except:
         print("Skipping login... (Unable to access private or gated models)")
-        tfound = "false but skipped" #doesn't matter what this is, only false is used
         time.sleep(3)
 clear_screen()
@@ -71,6 +71,7 @@ qmount = int(input("Enter the number of quants you want to create: "))
 qmount += 1
 clear_screen()
 delmodel = input("Do you want to delete the original model after finishing? (Won't delete if canceled or failed) (y/n): ")
 while delmodel != 'y' and delmodel != 'n':
     delmodel = input("Please enter 'y' or 'n': ")
@@ -92,6 +93,7 @@ bpwvalue = list(qnum.values())
 #sort the list from smallest to largest
 bpwvalue.sort()
 if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model was converted to safetensors, skip download if it was
     result = subprocess.run(f"{pyt} download-model.py {repo_url}", shell=True) #download model from hf (Credit to oobabooga for this script)
     if result.returncode != 0:
@@ -99,6 +101,7 @@ if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model
         sys.exit("Exiting...")
     clear_screen()
 if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model exists
     convertst = input("Couldn't find safetensors model, do you want to convert to safetensors? (y/n): ")
     while convertst != 'y' and convertst != 'n':
@@ -109,8 +112,8 @@ if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model e
         if result.returncode != 0:
             print("Converting failed. Please look for a safetensors model or convert model manually.")
             sys.exit("Exiting...")
-        subprocess.run(f"{osrmd} models{slsh}{model}", shell=True)
-        subprocess.run(f"{osmv} models{slsh}{model}-st models{slsh}{model}", shell=True)
         open(f"models{slsh}{model}{slsh}converted-st", 'w').close()
         print("Finished converting")
     else:
@@ -140,12 +143,14 @@ for bpw in bpwvalue:
         open(f"measurements{slsh}{model}-measure/Delete folder when no more quants are needed from this model", 'w').close()
     subprocess.run(f"{osrmd} {model}-exl2-{bpw}bpw-WD", shell=True) #remove working directory
 if delmodel == 'y':
     subprocess.run(f"{osrmd} models{slsh}{model}", shell=True)
     print(f"Deleted models/{model}")
     time.sleep(2)
 clear_screen()
 if tfound == 'false':
     print(f'''
               You are now logged in as {whoami().get('fullname', None)}.

         login(input("API token not detected. Enter your HuggingFace token (empty to skip): "))
     except:
         print("Skipping login... (Unable to access private or gated models)")
+        tfound = "false but skipped" #doesn't matter what this is, only 'false' is used
         time.sleep(3)
 clear_screen()
 qmount += 1
 clear_screen()
+#ask to delete fp16 after done
 delmodel = input("Do you want to delete the original model after finishing? (Won't delete if canceled or failed) (y/n): ")
 while delmodel != 'y' and delmodel != 'n':
     delmodel = input("Please enter 'y' or 'n': ")
 #sort the list from smallest to largest
 bpwvalue.sort()
+#downloading the model
 if not os.path.exists(f"models{slsh}{model}{slsh}converted-st"): #check if model was converted to safetensors, skip download if it was
     result = subprocess.run(f"{pyt} download-model.py {repo_url}", shell=True) #download model from hf (Credit to oobabooga for this script)
     if result.returncode != 0:
         sys.exit("Exiting...")
     clear_screen()
+#convert to safetensors if bin
 if not glob.glob(f"models/{model}/*.safetensors"): #check if safetensors model exists
     convertst = input("Couldn't find safetensors model, do you want to convert to safetensors? (y/n): ")
     while convertst != 'y' and convertst != 'n':
         if result.returncode != 0:
             print("Converting failed. Please look for a safetensors model or convert model manually.")
             sys.exit("Exiting...")
+        subprocess.run(f"{osrmd} models{slsh}{model}", shell=True) #remove previous weights
+        subprocess.run(f"{osmv} models{slsh}{model}-st models{slsh}{model}", shell=True) #replace with safetensors
         open(f"models{slsh}{model}{slsh}converted-st", 'w').close()
         print("Finished converting")
     else:
         open(f"measurements{slsh}{model}-measure/Delete folder when no more quants are needed from this model", 'w').close()
     subprocess.run(f"{osrmd} {model}-exl2-{bpw}bpw-WD", shell=True) #remove working directory
+# if chose to delete model at the beginning, delete the model
 if delmodel == 'y':
     subprocess.run(f"{osrmd} models{slsh}{model}", shell=True)
     print(f"Deleted models/{model}")
     time.sleep(2)
 clear_screen()
+#if new sign in, tell user
 if tfound == 'false':
     print(f'''
               You are now logged in as {whoami().get('fullname', None)}.

exl2-multi-quant-local/linux-setup.sh CHANGED Viewed

@@ -40,6 +40,13 @@ else
     exit
 fi
 # download stuff
 echo "Downloading files"
 git clone https://github.com/turboderp/exllamav2

     exit
 fi
+echo "Deleting potential conflicting files"
+rm convert-to-safetensors.py
+rm download-model.py
+rm -rf exllamav2
+rm start-quant.sh
+rm enter-venv.sh
 # download stuff
 echo "Downloading files"
 git clone https://github.com/turboderp/exllamav2

exl2-multi-quant-local/windows-setup.bat CHANGED Viewed

@@ -41,6 +41,13 @@ if "%cuda_version%"=="11" (
     exit
 )
 REM download stuff
 echo Downloading files...
 git clone https://github.com/turboderp/exllamav2

     exit
 )
+echo Deleting potential conflicting files
+del convert-to-safetensors.py
+del download-model.py
+rmdir /s /q exllamav2
+del start-quant.sh
+del enter-venv.sh
 REM download stuff
 echo Downloading files...
 git clone https://github.com/turboderp/exllamav2

ipynb/Multi_Quant_exl2.ipynb CHANGED Viewed

@@ -63,7 +63,6 @@
         "          You cannot log in.\n",
         "          Either set the environment variable to a 'WRITE' token or remove it.\n",
         "                  ''')\n",
-        "            input(\"Press enter to continue.\")\n",
         "            sys.exit(\"Exiting...\")\n",
         "        if os.environ.get('COLAB_BACKEND_VERSION', None) is not None:\n",
         "            print('''\n",
@@ -95,7 +94,7 @@
         "#@title Start quant\n",
         "#@markdown ### Using subprocess to execute scripts doesn't output on Colab. If something seems frozen, please wait. Any detected errors will automatically stop Colab\n",
         "#import required modules\n",
-        "from huggingface_hub import login, get_token, whoami, repo_exists, model_info, upload_folder, create_repo, upload_file, create_branch\n",
         "import os\n",
         "import sys\n",
         "import subprocess\n",
@@ -158,24 +157,17 @@
         "        sys.exit(\"Exiting...\")\n",
         "    print(\"Download finished\\n\\n\")\n",
         "\n",
-        "#@markdown You will also be asked to convert model to safetensors if needed\n",
-        "if not glob.glob(f\"models/{model}/*.safetensors\"): #check if safetensors model exists\n",
-        "    convertst = input(\"Couldn't find safetensors model, do you want to convert to safetensors? (y/n): \")\n",
-        "    while convertst != 'y' and convertst != 'n':\n",
-        "        convertst = input(\"Please enter 'y' or 'n': \")\n",
-        "    if convertst == 'y':\n",
-        "        print(\"Converting weights to safetensors, please wait...\")\n",
-        "        result = subprocess.run(f\"{pyt} convert-to-safetensors.py models{slsh}{model} --output models{slsh}{model}-st\", shell=True) #convert to safetensors (Credit to oobabooga for this script as well)\n",
-        "        if result.returncode != 0:\n",
-        "            print(\"Converting failed. Please look for a safetensors model or convert model manually.\")\n",
-        "            sys.exit(\"Exiting...\")\n",
-        "        subprocess.run(f\"{osrmd} models{slsh}{model}\", shell=True)\n",
-        "        subprocess.run(f\"{osmv} models{slsh}{model}-st models{slsh}{model}\", shell=True)\n",
-        "        open(f\"models{slsh}{model}{slsh}converted-st\", 'w').close()\n",
-        "        print(\"Finished converting\")\n",
-        "        print(\"\\n\\n\")\n",
-        "    else:\n",
-        "        sys.exit(\"Can't quantize a non-safetensors model. Exiting...\")\n",
         "\n",
         "#create new repo if one doesn't already exist\n",
         "if repo_exists(f\"{whoami().get('name', None)}/{modelname}-exl2\") == False:\n",

         "          You cannot log in.\n",
         "          Either set the environment variable to a 'WRITE' token or remove it.\n",
         "                  ''')\n",
         "            sys.exit(\"Exiting...\")\n",
         "        if os.environ.get('COLAB_BACKEND_VERSION', None) is not None:\n",
         "            print('''\n",
         "#@title Start quant\n",
         "#@markdown ### Using subprocess to execute scripts doesn't output on Colab. If something seems frozen, please wait. Any detected errors will automatically stop Colab\n",
         "#import required modules\n",
+        "from huggingface_hub import repo_exists, upload_folder, create_repo, upload_file, create_branch\n",
         "import os\n",
         "import sys\n",
         "import subprocess\n",
         "        sys.exit(\"Exiting...\")\n",
         "    print(\"Download finished\\n\\n\")\n",
         "\n",
+        "if not glob.glob(f\"models/{model}/*.safetensors\"): #check if safetensors model exists, if not try converting\n",
+        "    print(\"Converting weights to safetensors, please wait...\")\n",
+        "    result = subprocess.run(f\"{pyt} convert-to-safetensors.py models{slsh}{model} --output models{slsh}{model}-st\", shell=True) #convert to safetensors (Credit to oobabooga for this script as well)\n",
+        "    if result.returncode != 0:\n",
+        "        print(\"Converting failed. Please look for a safetensors/bin model.\")\n",
+        "        sys.exit(\"Exiting...\")\n",
+        "    subprocess.run(f\"{osrmd} models{slsh}{model}\", shell=True)\n",
+        "    subprocess.run(f\"{osmv} models{slsh}{model}-st models{slsh}{model}\", shell=True)\n",
+        "    open(f\"models{slsh}{model}{slsh}converted-st\", 'w').close()\n",
+        "    print(\"Finished converting\")\n",
+        "    print(\"\\n\\n\")\n",
         "\n",
         "#create new repo if one doesn't already exist\n",
         "if repo_exists(f\"{whoami().get('name', None)}/{modelname}-exl2\") == False:\n",