LLaVA-1.6

Paused

liuhaotian commited on Feb 3

Commit

7564980

•

1 Parent(s): c5721f2

Add flash attention

Files changed (2) hide show

app.py CHANGED Viewed

@@ -40,6 +40,7 @@ def start_worker(model_path: str, bits=16):
         model_path,
         "--model-name",
         model_name,
     ]
     if bits != 16:
         worker_command += [f"--load-{bits}bit"]
@@ -65,12 +66,12 @@ if __name__ == "__main__":
 ONLY WORKS WITH GPU! By default, we load the model with 4-bit quantization to make it fit in smaller hardwares. Set the environment variable `bits` to control the quantization.
 Set the environment variable `model` to change the model, and switch hardware accordingly:
-| Model | Hardware          |
-|-------|-------------------|
-| liuhaotian/llava-v1.6-mistral-7b  | T4-medium |
-| liuhaotian/llava-v1.6-vicuna-7b   | T4-medium |
-| liuhaotian/llava-v1.6-vicuna-13b  | T4-medium |
-| liuhaotian/llava-v1.6-34b      | 2xA10G large |
 """
     print(f"args: {gws.args}")

         model_path,
         "--model-name",
         model_name,
+        "--use-flash-attn",
     ]
     if bits != 16:
         worker_command += [f"--load-{bits}bit"]
 ONLY WORKS WITH GPU! By default, we load the model with 4-bit quantization to make it fit in smaller hardwares. Set the environment variable `bits` to control the quantization.
 Set the environment variable `model` to change the model, and switch hardware accordingly:
+| Model                             | Hardware   |
+|-----------------------------------|------------|
+| liuhaotian/llava-v1.6-mistral-7b  | T4 small   |
+| liuhaotian/llava-v1.6-vicuna-7b   | T4 small   |
+| liuhaotian/llava-v1.6-vicuna-13b  | T4 small   |
+| liuhaotian/llava-v1.6-34b         | A10G large |
 """
     print(f"args: {gws.args}")

requirements.txt CHANGED Viewed