Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

171

reach-vb HF Staff commited on May 30, 2024

Commit

9b6f1ba

verified ·

1 Parent(s): 1504cda

Update app.py

Browse files

Files changed (1) hide show

app.py +31 -9

app.py CHANGED Viewed

@@ -119,31 +119,53 @@ def process_model(model_id, q_method, private_repo, split_model, split_max_tenso
             card.data.tags = []
         card.data.tags.append("llama-cpp")
         card.data.tags.append("gguf-my-repo")
         card.text = dedent(
             f"""
             # {new_repo_id}
             This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
             Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
             ## Use with llama.cpp
             Install llama.cpp through brew.
             ```bash
-            brew install ggerganov/ggerganov/llama.cpp
             ```
             Invoke the llama.cpp server or the CLI.
-            CLI:
             ```bash
-            llama-cli --hf-repo {new_repo_id} --model {quantized_gguf_name} -p "The meaning to life and the universe is"
             ```
-            Server:
             ```bash
-            llama-server --hf-repo {new_repo_id} --model {quantized_gguf_name} -c 2048
             ```
             Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
             ```
-            git clone https://github.com/ggerganov/llama.cpp && \\
-            cd llama.cpp && \\
-            make && \\
-            ./main -m {quantized_gguf_name} -n 128
             ```
             """
         )

             card.data.tags = []
         card.data.tags.append("llama-cpp")
         card.data.tags.append("gguf-my-repo")
+        if card.data.base_model is None:
+            card.data.base_model = []
+        card.data.base_model.append({model_id})
         card.text = dedent(
             f"""
             # {new_repo_id}
             This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
             Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
             ## Use with llama.cpp
             Install llama.cpp through brew.
             ```bash
+            brew install llama.cpp
             ```
             Invoke the llama.cpp server or the CLI.
+            ### CLI:
             ```bash
+            llama --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
             ```
+            ### Server:
             ```bash
+            llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
             ```
             Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
+            Step 1: Clone llama.cpp from GitHub.
+            ```
+            git clone https://github.com/ggerganov/llama.cpp
+            ```
+            Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
+            ```
+            cd llama.cpp && LLAMA_CURL=1 make
+            ```
+            Step 3: Run inference through the main binary.
+            ```
+            ./main --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
+            ```
+            or
             ```
+            ./server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
             ```
             """
         )