Spaces:

AIencoder
/

Axon

Running

App Files Files Community

AIencoder commited on 20 days ago

Commit

8a87fa8

verified ·

1 Parent(s): 8e06386

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +114 -0
.gitgnore +49 -0
.gradio/certificate.pem +31 -0
README.md +3 -9
agents/__init__.py +1 -0
agents/coder.py +23 -0
agents/reviewer.py +27 -0
app.py +194 -0
assets/.gitkeep +1 -0
assets/response.wav +0 -0
llama.cpp/.clang-format +164 -0
llama.cpp/.clang-tidy +27 -0
llama.cpp/.devops/cann.Dockerfile +130 -0
llama.cpp/.devops/cloud-v-pipeline +22 -0
llama.cpp/.devops/cpu.Dockerfile +92 -0
llama.cpp/.devops/cuda.Dockerfile +94 -0
llama.cpp/.devops/intel.Dockerfile +95 -0
llama.cpp/.devops/llama-cli-cann.Dockerfile +44 -0
llama.cpp/.devops/llama-cpp-cuda.srpm.spec +83 -0
llama.cpp/.devops/llama-cpp.srpm.spec +85 -0
llama.cpp/.devops/musa.Dockerfile +101 -0
llama.cpp/.devops/nix/apps.nix +21 -0
llama.cpp/.devops/nix/devshells.nix +52 -0
llama.cpp/.devops/nix/docker.nix +37 -0
llama.cpp/.devops/nix/jetson-support.nix +39 -0
llama.cpp/.devops/nix/nixpkgs-instances.nix +45 -0
llama.cpp/.devops/nix/package-gguf-py.nix +36 -0
llama.cpp/.devops/nix/package.nix +248 -0
llama.cpp/.devops/nix/python-scripts.nix +66 -0
llama.cpp/.devops/nix/scope.nix +41 -0
llama.cpp/.devops/nix/sif.nix +27 -0
llama.cpp/.devops/rocm.Dockerfile +113 -0
llama.cpp/.devops/tools.sh +49 -0
llama.cpp/.devops/vulkan.Dockerfile +89 -0
llama.cpp/.dockerignore +20 -0
llama.cpp/.ecrc +6 -0
llama.cpp/.editorconfig +54 -0
llama.cpp/.flake8 +18 -0
llama.cpp/.github/ISSUE_TEMPLATE/010-bug-compilation.yml +87 -0
llama.cpp/.github/ISSUE_TEMPLATE/011-bug-results.yml +101 -0
llama.cpp/.github/ISSUE_TEMPLATE/019-bug-misc.yml +91 -0
llama.cpp/.github/ISSUE_TEMPLATE/020-enhancement.yml +51 -0
llama.cpp/.github/ISSUE_TEMPLATE/030-research.yml +52 -0
llama.cpp/.github/ISSUE_TEMPLATE/040-refactor.yml +28 -0
llama.cpp/.github/ISSUE_TEMPLATE/config.yml +11 -0
llama.cpp/.github/actions/get-tag-name/action.yml +22 -0
llama.cpp/.github/actions/windows-setup-cuda/action.yml +67 -0
llama.cpp/.github/actions/windows-setup-curl/action.yml +30 -0
llama.cpp/.github/labeler.yml +94 -0
llama.cpp/.github/pull_request_template.md +1 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,117 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+llama.cpp/.vs/llama.cpp/CopilotIndices/17.14.786.1071/CodeChunks.db filter=lfs diff=lfs merge=lfs -text
+llama.cpp/.vs/llama.cpp/CopilotIndices/17.14.786.1071/SemanticSymbols.db filter=lfs diff=lfs merge=lfs -text
+llama.cpp/.vs/llama.cpp/FileContentIndex/fa8fa901-0eee-48bf-a604-aa4561f07b11.vsidx filter=lfs diff=lfs merge=lfs -text
+llama.cpp/.vs/llama.cpp/v17/Browse.VC.db filter=lfs diff=lfs merge=lfs -text
+llama.cpp/.vs/slnx.sqlite filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-base.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-base.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-base.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-alderlake.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-haswell.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-icelake.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-sandybridge.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-sapphirerapids.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-skylakex.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-sse42.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu-x64.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-cpu.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml-rpc.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/ggml.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/libcurl-x64.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/libomp140.x86_64.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-batched-bench.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-bench.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-cli.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-completion.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-fit-params.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-imatrix.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-mtmd-cli.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-perplexity.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-quantize.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-run.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-server.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-server.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-server.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-tokenize.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama-tts.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/llama.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/mtmd.dll filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/mtmd.ilk filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/mtmd.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/bin/rpc-server.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/CMakeFiles/4.1.0-rc4/CompilerIdC/CMakeCCompilerId.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/CMakeFiles/4.1.0-rc4/CompilerIdCXX/CMakeCXXCompilerId.exe filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/arg.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/chat-parser.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/chat.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/common.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/common.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/console.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/json-partial.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/log.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/ngram-cache.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/regex-partial.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/sampling.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/CMakeFiles/common.dir/speculative.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/common/common.lib filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/ggml.c.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-base.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/quants.c.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/repack.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/binary-ops.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ops.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/repack.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/unary-ops.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml-cpu.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/CMakeFiles/ggml.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/ggml/src/ggml-base.lib filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-adapter.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-arch.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-batch.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-chat.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-context.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-grammar.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-graph.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-impl.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-io.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-kv-cache-unified-iswa.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-kv-cache-unified.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-memory-hybrid.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-memory-recurrent.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-mmap.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-model-loader.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-model-saver.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-model.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-quant.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-sampling.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama-vocab.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/llama.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/unicode-data.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/unicode.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/src/CMakeFiles/llama.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/mtmd/CMakeFiles/mtmd.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/server/CMakeFiles/llama-server.dir/server.cpp.obj filter=lfs diff=lfs merge=lfs -text
+llama.cpp/build/tools/server/CMakeFiles/llama-server.dir/vc140.pdb filter=lfs diff=lfs merge=lfs -text
+llama.cpp/docs/development/llama-star/idea-arch.key filter=lfs diff=lfs merge=lfs -text

.gitgnore ADDED Viewed

	@@ -0,0 +1,49 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg
+# Virtual Environment
+venv/
+env/
+ENV/
+.venv/
+# IDE & Editors
+.idea/
+*.swp
+*.swo
+.DS_Store
+.vscode/
+*.sublime-project
+*.sublime-workspace
+# Logs & Databases
+*.log
+*.sqlite3
+# Generated by app
+response.wav
+sandbox/*.py
+!sandbox/.gitkeep
+# Hugging Face cache
+~/.cache/huggingface/
+.gitattributes

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

README.md CHANGED Viewed

@@ -1,12 +1,6 @@
 ---
-title: AI Coding Genius
-emoji: 🏃
-colorFrom: pink
-colorTo: gray
-sdk: gradio
-sdk_version: 6.2.0
 app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: AI-Coding-Genius
 app_file: app.py
+sdk: gradio
+sdk_version: 5.42.0
 ---

agents/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

agents/coder.py ADDED Viewed

	@@ -0,0 +1,23 @@

+# agents/coder.py
+from transformers import AutoTokenizer, AutoModelForCausalLM
+class CoderAgent:
+    def __init__(self, model_name="deepseek-ai/deepseek-coder-6.7b-instruct"):
+        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+        self.model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            torch_dtype="auto",
+            device_map="auto"
+        )
+    def generate(self, prompt):
+        full_prompt = f"""
+You're a brilliant, friendly AI coder. Explain clearly and write clean Python.
+Include comments and use best practices.
+Task: {prompt}
+        """.strip()
+        inputs = self.tokenizer(full_prompt, return_tensors="pt").to(self.model.device)
+        outputs = self.model.generate(**inputs, max_new_tokens=1024, temperature=0.4)
+        code = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return code[len(full_prompt):].strip()

agents/reviewer.py ADDED Viewed

	@@ -0,0 +1,27 @@

+# agents/reviewer.py
+from transformers import pipeline
+class ReviewerAgent:
+    def __init__(self, model_name="Qwen/Qwen2-7B-Instruct"):
+        self.pipe = pipeline(
+            "text-generation",
+            model=model_name,
+            torch_dtype="auto",
+            device_map="auto"
+        )
+    def review(self, code):
+        prompt = f"""
+Review this Python code for:
+- Bugs
+- Performance
+- Readability
+- Best practices
+Code:
+{code}
+Provide a clear, constructive review.
+        """
+        result = self.pipe(prompt, max_new_tokens=512)
+        return result[0]['generated_text']

app.py ADDED Viewed

	@@ -0,0 +1,194 @@

+import os
+import torch
+import gradio as gr
+from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+import librosa
+import soundfile as sf
+import numpy as np
+from llama_cpp import Llama
+from huggingface_hub import hf_hub_download  # Needed to get the model
+# ─────────────────────────────────────────────────────────────
+# 🧠 Load Qwen Coder (The Brain) - INSIDE Python now
+# ─────────────────────────────────────────────────────────────
+print("🧠 Downloading/Loading Qwen Model...")
+try:
+    # 1. Download the model file from Hugging Face automatically
+    model_path = hf_hub_download(
+        repo_id="Qwen/Qwen2.5-Coder-1.5B-Instruct-GGUF",
+        filename="qwen2.5-coder-1.5b-instruct-q8_0.gguf"
+    )
+    # 2. Load it directly into memory
+    llm = Llama(
+        model_path=model_path,
+        n_ctx=4096,          # Context window
+        n_threads=2,         # Use 2 CPU threads (good for free tier)
+        verbose=False
+    )
+    print("✅ Qwen Model Loaded Successfully!")
+    llm_ready = True
+except Exception as e:
+    print(f"❌ Failed to load Qwen: {e}")
+    llm_ready = False
+# ─────────────────────────────────────────────────────────────
+# 🔊 Load Text-to-Speech (TTS)
+# ─────────────────────────────────────────────────────────────
+print("🔊 Loading TTS model...")
+try:
+    tts_processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
+    tts_model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
+    speaker_embeddings = torch.zeros(1, 512)  # Safe fallback
+    tts_ready = True
+    print("✅ TTS loaded!")
+except Exception as e:
+    print(f"❌ TTS failed to load: {e}")
+    tts_ready = False
+# ─────────────────────────────────────────────────────────────
+# 🎤 Load Speech-to-Text (STT) - Whisper Tiny
+# ─────────────────────────────────────────────────────────────
+print("🎤 Loading STT model (whisper-tiny)...")
+try:
+    stt_processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
+    stt_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
+    stt_model.eval()
+    stt_ready = True
+    print("✅ STT loaded!")
+except Exception as e:
+    print(f"❌ STT failed to load: {e}")
+    stt_ready = False
+# Create folders
+os.makedirs("assets", exist_ok=True)
+# ─────────────────────────────────────────────────────────────
+# 🎤 Convert Speech to Text
+# ─────────────────────────────────────────────────────────────
+def speech_to_text(audio):
+    if not stt_ready or audio is None:
+        return "Voice input not available."
+    try:
+        sample_rate, y = audio
+        if y.dtype != np.float32:
+            y = y.astype(np.float32) / 32768.0
+        if len(y.shape) > 1:
+            y = y.mean(axis=1)
+        if sample_rate != 16000:
+            y = librosa.resample(y, orig_sr=sample_rate, target_sr=16000)
+        inputs = stt_processor(y, sampling_rate=16000, return_tensors="pt")
+        outputs = stt_model.generate(inputs["input_features"])
+        text = stt_processor.batch_decode(outputs, skip_special_tokens=True)[0]
+        return text.strip()
+    except Exception as e:
+        return f"❌ STT Error: {str(e)}"
+# ─────────────────────────────────────────────────────────────
+# 💬 Generate Code (Now using Internal LLM)
+# ─────────────────────────────────────────────────────────────
+def generate_code(prompt):
+    if not prompt.strip():
+        yield "# 👋 Hello!", "Hi! I'm your AI coding partner.", None
+        return
+    if not llm_ready:
+        yield "# Error", "❌ Model failed to load. Check logs.", None
+        return
+    yield "# Thinking...", "🧠 AI is thinking...", None
+    try:
+        # Create the prompt in ChatML format
+        messages = [
+            {"role": "system", "content": "You are a helpful AI coder."},
+            {"role": "user", "content": prompt}
+        ]
+        # Ask the internal model to generate
+        output = llm.create_chat_completion(
+            messages=messages,
+            max_tokens=512,
+            temperature=0.4,
+            top_p=0.95,
+            stream=True
+        )
+        # Stream the response
+        raw_code = ""
+        for chunk in output:
+            if "content" in chunk["choices"][0]["delta"]:
+                text_chunk = chunk["choices"][0]["delta"]["content"]
+                raw_code += text_chunk
+                # Live update the code block
+                yield f"```python\n{raw_code}\n```", "🚀 Generating...", None
+        # Clean up code
+        clean_code = raw_code
+        if "```python" in clean_code:
+            clean_code = clean_code.split("```python")[1].split("```")[0].strip()
+        elif "```" in clean_code:
+            clean_code = clean_code.split("```")[1].split("```")[0].strip()
+        final_display = f"```python\n{clean_code}\n```"
+        # 🎙️ Generate voice
+        audio_path = None
+        if tts_ready:
+            try:
+                voice_text = f"Here is the code for {prompt[:20]}"
+                inputs_tts = tts_processor(text=voice_text, return_tensors="pt")
+                speech = tts_model.generate_speech(inputs_tts["input_ids"], speaker_embeddings)
+                audio_path = os.path.abspath("assets/response.wav")
+                sf.write(audio_path, speech.cpu().numpy(), samplerate=16000)
+            except Exception as e:
+                print(f"⚠️ TTS failed: {e}")
+        yield final_display, f"✅ Done!", audio_path
+    except Exception as e:
+        yield "print('Error')", f"❌ Error: {str(e)}", None
+# ─────────────────────────────────────────────────────────────
+# 💾 Save As Function
+# ─────────────────────────────────────────────────────────────
+def save_as_code(code, filename):
+    if not filename.strip():
+        filename = "ai_generated_code.py"
+    elif not filename.endswith(".py"):
+        filename += ".py"
+    try:
+        clean_code = code.replace("```python", "").replace("```", "").strip()
+        # In cloud, we just save to volatile memory, but this works for the demo
+        with open(filename, "w", encoding="utf-8") as f:
+            f.write(clean_code)
+        return f"💾 Saved (Temporary): {filename}"
+    except Exception as e:
+        return f"❌ Save failed: {str(e)}"
+# ─────────────────────────────────────────────────────────────
+# 🚀 Gradio UI
+# ─────────────────────────────────────────────────────────────
+with gr.Blocks(title="AI Coding Genius", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("# 🤖 AI Coding Genius (Cloud Edition)")
+    with gr.Row():
+        stt_input = gr.Audio(label="🎤 Speak", type="numpy", format="wav")
+    inp = gr.Textbox(label="💬 Prompt", placeholder="Make a snake game...")
+    btn = gr.Button("🚀 Generate", variant="primary")
+    code_out = gr.Code(label="💻 Code", language="python", lines=15)
+    status_out = gr.Textbox(label="Status")
+    audio_out = gr.Audio(label="Voice", autoplay=True)
+    # Wiring
+    stt_input.change(speech_to_text, stt_input, inp)
+    btn.click(generate_code, inp, [code_out, status_out, audio_out])
+if __name__ == "__main__":
+    # Removed share=True for Cloud deployment
+    demo.launch()

assets/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

assets/response.wav ADDED Viewed

Binary file (18 kB). View file

llama.cpp/.clang-format ADDED Viewed

	@@ -0,0 +1,164 @@

+---
+Language:        Cpp
+AlignAfterOpenBracket: Align
+AlignArrayOfStructures: Left
+AlignConsecutiveAssignments: AcrossComments
+AlignConsecutiveBitFields: AcrossComments
+AlignConsecutiveDeclarations: AcrossComments
+AlignConsecutiveMacros: AcrossComments
+# AlignConsecutiveShortCaseStatements: AcrossComments
+AlignEscapedNewlines: Left # LeftWithLastLine
+AlignOperands:   Align
+AlignTrailingComments:
+  Kind: Always
+  OverEmptyLines: 1
+AllowAllArgumentsOnNextLine: true
+AllowAllParametersOfDeclarationOnNextLine: false
+# AllowBreakBeforeNoexceptSpecifier: OnlyWithParen
+AllowShortBlocksOnASingleLine: Never
+AllowShortCaseLabelsOnASingleLine: false
+AllowShortFunctionsOnASingleLine: Inline
+AllowShortIfStatementsOnASingleLine: Never
+AllowShortLambdasOnASingleLine: Inline
+AllowShortLoopsOnASingleLine: false
+AlwaysBreakBeforeMultilineStrings: true
+BinPackArguments: false
+BinPackParameters: false # OnePerLine
+BitFieldColonSpacing: Both
+BreakBeforeBraces: Custom # Attach
+BraceWrapping:
+  AfterCaseLabel:  true
+  AfterClass:      false
+  AfterControlStatement: false
+  AfterEnum:       false
+  AfterFunction:   false
+  AfterNamespace:  false
+  AfterObjCDeclaration: false
+  AfterStruct:     false
+  AfterUnion:      false
+  AfterExternBlock: false
+  BeforeCatch:     false
+  BeforeElse:      false
+  BeforeLambdaBody: false
+  BeforeWhile: false
+  IndentBraces:    false
+  SplitEmptyFunction: false
+  SplitEmptyRecord: false
+  SplitEmptyNamespace: false
+# BreakAdjacentStringLiterals: true
+BreakAfterAttributes: Never
+BreakBeforeBinaryOperators: None
+BreakBeforeInlineASMColon: OnlyMultiline
+BreakBeforeTernaryOperators: false
+# BreakBinaryOperations: Never
+BreakConstructorInitializers: AfterColon
+# BreakFunctionDefinitionParameters: false
+BreakInheritanceList: AfterComma
+BreakStringLiterals: true
+# BreakTemplateDeclarations: Yes
+ColumnLimit:     120
+CommentPragmas:  '^ IWYU pragma:'
+CompactNamespaces: false
+ConstructorInitializerIndentWidth: 4
+ContinuationIndentWidth: 4
+Cpp11BracedListStyle: false
+DerivePointerAlignment: false
+DisableFormat:   false
+EmptyLineBeforeAccessModifier: Leave
+EmptyLineAfterAccessModifier: Never
+ExperimentalAutoDetectBinPacking: false
+FixNamespaceComments: true
+IncludeBlocks:   Regroup
+IncludeCategories:
+  - Regex:           '".*"'
+    Priority:        1
+    SortPriority:    0
+  - Regex:           '^<.*\.h>'
+    Priority:        2
+    SortPriority:    0
+  - Regex:           '^<.*'
+    Priority:        3
+    SortPriority:    0
+  - Regex:           '.*'
+    Priority:        4
+    SortPriority:    0
+IncludeIsMainRegex: '([-_](test|unittest))?$'
+IncludeIsMainSourceRegex: ''
+IndentAccessModifiers: false
+IndentCaseBlocks: true
+IndentCaseLabels: true
+IndentExternBlock: NoIndent
+IndentGotoLabels: false
+IndentPPDirectives: AfterHash
+IndentWidth:     4
+IndentWrappedFunctionNames: false
+InsertBraces:    true # NOTE: may lead to incorrect formatting
+InsertNewlineAtEOF: true
+JavaScriptQuotes: Leave
+JavaScriptWrapImports: true
+KeepEmptyLinesAtTheStartOfBlocks: false
+LambdaBodyIndentation: Signature
+LineEnding: LF
+MacroBlockBegin: ''
+MacroBlockEnd:   ''
+MaxEmptyLinesToKeep: 1
+NamespaceIndentation: None
+ObjCBinPackProtocolList: Auto
+ObjCBlockIndentWidth: 4
+ObjCSpaceAfterProperty: true
+ObjCSpaceBeforeProtocolList: true
+PPIndentWidth: -1
+PackConstructorInitializers: CurrentLine
+PenaltyBreakAssignment: 2
+PenaltyBreakBeforeFirstCallParameter: 1
+PenaltyBreakComment: 300
+PenaltyBreakFirstLessLess: 120
+PenaltyBreakString: 1000
+PenaltyBreakTemplateDeclaration: 10
+PenaltyExcessCharacter: 1000000
+PenaltyReturnTypeOnItsOwnLine: 200
+PointerAlignment: Middle
+QualifierAlignment: Left
+#QualifierOrder: ['static', 'inline', 'friend', 'constexpr', 'const', 'volatile', 'type', 'restrict']
+RawStringFormats:
+  - Language:        Cpp
+    Delimiters:
+      - cc
+      - CC
+      - cpp
+      - Cpp
+      - CPP
+      - 'c++'
+      - 'C++'
+    CanonicalDelimiter: ''
+ReferenceAlignment: Middle
+ReflowComments:  false # IndentOnly
+SeparateDefinitionBlocks: Always
+SortIncludes:    CaseInsensitive
+SortUsingDeclarations: LexicographicNumeric
+SpaceAfterCStyleCast: true
+SpaceAfterLogicalNot: false
+SpaceAfterTemplateKeyword: true
+SpaceBeforeAssignmentOperators: true
+SpaceBeforeCpp11BracedList: false
+SpaceBeforeCtorInitializerColon: true
+SpaceBeforeInheritanceColon: true
+SpaceBeforeParens: ControlStatements
+SpaceBeforeRangeBasedForLoopColon: true
+SpaceInEmptyBlock: false
+SpaceInEmptyParentheses: false
+SpacesBeforeTrailingComments: 2
+SpacesInAngles:  Never
+SpacesInContainerLiterals: true
+SpacesInLineCommentPrefix:
+  Minimum: 1
+  Maximum: -1
+SpacesInParentheses: false
+SpacesInSquareBrackets: false
+SpaceBeforeSquareBrackets: false
+Standard:        c++17
+TabWidth:        4
+UseTab:          Never
+WhitespaceSensitiveMacros: ['STRINGIZE']
+...

llama.cpp/.clang-tidy ADDED Viewed

	@@ -0,0 +1,27 @@

+---
+Checks: >
+    bugprone-*,
+    -bugprone-easily-swappable-parameters,
+    -bugprone-implicit-widening-of-multiplication-result,
+    -bugprone-misplaced-widening-cast,
+    -bugprone-narrowing-conversions,
+    readability-*,
+    -readability-avoid-unconditional-preprocessor-if,
+    -readability-function-cognitive-complexity,
+    -readability-identifier-length,
+    -readability-implicit-bool-conversion,
+    -readability-magic-numbers,
+    -readability-uppercase-literal-suffix,
+    -readability-simplify-boolean-expr,
+    -readability-math-missing-parentheses,
+    clang-analyzer-*,
+    -clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
+    performance-*,
+    portability-*,
+    -portability-simd-intrinsics,
+    misc-*,
+    -misc-const-correctness,
+    -misc-non-private-member-variables-in-classes,
+    -misc-no-recursion,
+    -misc-use-anonymous-namespace,
+FormatStyle: none

llama.cpp/.devops/cann.Dockerfile ADDED Viewed

	@@ -0,0 +1,130 @@

+# ==============================================================================
+# ARGUMENTS
+# ==============================================================================
+# Define the CANN base image for easier version updates later
+ARG CANN_BASE_IMAGE=quay.io/ascend/cann:8.1.rc1-910b-openeuler22.03-py3.10
+# ==============================================================================
+# BUILD STAGE
+# Compile all binary files and libraries
+# ==============================================================================
+FROM ${CANN_BASE_IMAGE} AS build
+# Define the Ascend chip model for compilation. Default is Ascend910B3
+ARG ASCEND_SOC_TYPE=Ascend910B3
+# -- Install build dependencies --
+RUN yum install -y gcc g++ cmake make git libcurl-devel python3 python3-pip && \
+    yum clean all && \
+    rm -rf /var/cache/yum
+# -- Set the working directory --
+WORKDIR /app
+# -- Copy project files --
+COPY . .
+# -- Set CANN environment variables (required for compilation) --
+# Using ENV instead of `source` allows environment variables to persist across the entire image layer
+ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
+ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${LD_LIBRARY_PATH}
+ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${PATH}
+ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
+ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH
+# ... You can add other environment variables from the original file as needed ...
+# For brevity, only core variables are listed here. You can paste the original ENV list here.
+# -- Build llama.cpp --
+# Use the passed ASCEND_SOC_TYPE argument and add general build options
+RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh --force \
+    && \
+    cmake -B build \
+        -DGGML_CANN=ON \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DSOC_TYPE=${ASCEND_SOC_TYPE} \
+        . && \
+    cmake --build build --config Release -j$(nproc)
+# -- Organize build artifacts for copying in later stages --
+# Create a lib directory to store all .so files
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+# Create a full directory to store all executables and Python scripts
+RUN mkdir -p /app/full && \
+    cp build/bin/* /app/full/ && \
+    cp *.py /app/full/ && \
+    cp -r gguf-py /app/full/ && \
+    cp -r requirements /app/full/ && \
+    cp requirements.txt /app/full/
+    # If you have a tools.sh script, make sure it is copied here
+    # cp .devops/tools.sh /app/full/tools.sh
+# ==============================================================================
+# BASE STAGE
+# Create a minimal base image with CANN runtime and common libraries
+# ==============================================================================
+FROM ${CANN_BASE_IMAGE} AS base
+# -- Install runtime dependencies --
+RUN yum install -y libgomp curl && \
+    yum clean all && \
+    rm -rf /var/cache/yum
+# -- Set CANN environment variables (required for runtime) --
+ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
+ENV LD_LIBRARY_PATH=/app:${ASCEND_TOOLKIT_HOME}/lib64:${LD_LIBRARY_PATH}
+ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${PATH}
+ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
+# ... You can add other environment variables from the original file as needed ...
+WORKDIR /app
+# Copy compiled .so files from the build stage
+COPY --from=build /app/lib/ /app
+# ==============================================================================
+# FINAL STAGES (TARGETS)
+# ==============================================================================
+### Target: full
+# Complete image with all tools, Python bindings, and dependencies
+# ==============================================================================
+FROM base AS full
+COPY --from=build /app/full /app
+# Install Python dependencies
+RUN yum install -y git python3 python3-pip && \
+    pip3 install --no-cache-dir --upgrade pip setuptools wheel && \
+    pip3 install --no-cache-dir -r requirements.txt && \
+    yum clean all && \
+    rm -rf /var/cache/yum
+# You need to provide a tools.sh script as the entrypoint
+ENTRYPOINT ["/app/tools.sh"]
+# If there is no tools.sh, you can set the default to start the server
+# ENTRYPOINT ["/app/llama-server"]
+### Target: light
+# Lightweight image containing only llama-cli
+# ==============================================================================
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Target: server
+# Dedicated server image containing only llama-server
+# ==============================================================================
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+HEALTHCHECK --interval=5m CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/cloud-v-pipeline ADDED Viewed

	@@ -0,0 +1,22 @@

+node('x86_runner1'){            // Running on x86 runner containing latest vector qemu, latest vector gcc and all the necessary libraries
+    stage('Cleanup'){
+        cleanWs()               // Cleaning previous CI build in workspace
+    }
+    stage('checkout repo'){
+        retry(5){               // Retry if the cloning fails due to some reason
+            checkout scm        // Clone the repo on Runner
+        }
+    }
+    stage('Compiling llama.cpp'){
+        sh'''#!/bin/bash
+            make RISCV=1 RISCV_CROSS_COMPILE=1 # Compiling llama for RISC-V
+        '''
+    }
+    stage('Running llama.cpp'){
+        sh'''#!/bin/bash
+            module load gnu-bin2/0.1            # loading latest versions of vector qemu and vector gcc
+            qemu-riscv64 -L /softwares/gnu-bin2/sysroot  -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./llama-cli -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt            # Running llama.cpp on vector qemu-riscv64
+            cat llama_log.txt                   # Printing results
+        '''
+    }
+}

llama.cpp/.devops/cpu.Dockerfile ADDED Viewed

	@@ -0,0 +1,92 @@

+ARG UBUNTU_VERSION=22.04
+FROM ubuntu:$UBUNTU_VERSION AS build
+ARG TARGETARCH
+ARG GGML_CPU_ARM_ARCH=armv8-a
+RUN apt-get update && \
+    apt-get install -y build-essential git cmake libcurl4-openssl-dev
+WORKDIR /app
+COPY . .
+RUN if [ "$TARGETARCH" = "amd64" ]; then \
+        cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
+    elif [ "$TARGETARCH" = "arm64" ]; then \
+        cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_TESTS=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
+    else \
+        echo "Unsupported architecture"; \
+        exit 1; \
+    fi && \
+    cmake --build build -j $(nproc)
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+## Base image
+FROM ubuntu:$UBUNTU_VERSION AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl\
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+COPY --from=build /app/lib/ /app
+### Full
+FROM base AS full
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y \
+    git \
+    python3 \
+    python3-pip \
+    && pip install --upgrade pip setuptools wheel \
+    && pip install -r requirements.txt \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/cuda.Dockerfile ADDED Viewed

	@@ -0,0 +1,94 @@

+ARG UBUNTU_VERSION=22.04
+# This needs to generally match the container host's environment.
+ARG CUDA_VERSION=12.4.0
+# Target the CUDA build image
+ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
+ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
+FROM ${BASE_CUDA_DEV_CONTAINER} AS build
+# CUDA architecture to build for (defaults to all supported archs)
+ARG CUDA_DOCKER_ARCH=default
+RUN apt-get update && \
+    apt-get install -y build-essential cmake python3 python3-pip git libcurl4-openssl-dev libgomp1
+WORKDIR /app
+COPY . .
+RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
+    export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
+    fi && \
+    cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
+    cmake --build build --config Release -j$(nproc)
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+## Base image
+FROM ${BASE_CUDA_RUN_CONTAINER} AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl\
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+COPY --from=build /app/lib/ /app
+### Full
+FROM base AS full
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y \
+    git \
+    python3 \
+    python3-pip \
+    && pip install --upgrade pip setuptools wheel \
+    && pip install -r requirements.txt \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/intel.Dockerfile ADDED Viewed

	@@ -0,0 +1,95 @@

+ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
+## Build Image
+FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
+ARG GGML_SYCL_F16=OFF
+RUN apt-get update && \
+    apt-get install -y git libcurl4-openssl-dev
+WORKDIR /app
+COPY . .
+RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
+        echo "GGML_SYCL_F16 is set" \
+        && export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
+    fi && \
+    echo "Building with dynamic libs" && \
+    cmake -B build -DGGML_NATIVE=OFF -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${OPT_SYCL_F16} && \
+    cmake --build build --config Release -j$(nproc)
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+FROM intel/oneapi-basekit:$ONEAPI_VERSION AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl\
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+### Full
+FROM base AS full
+COPY --from=build /app/lib/ /app
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update && \
+    apt-get install -y \
+        git \
+        python3 \
+        python3-pip \
+        python3-venv && \
+    python3 -m venv /opt/venv && \
+    . /opt/venv/bin/activate && \
+    pip install --upgrade pip setuptools wheel && \
+    pip install -r requirements.txt && \
+    apt autoremove -y && \
+    apt clean -y && \
+    rm -rf /tmp/* /var/tmp/* && \
+    find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
+    find /var/cache -type f -delete
+ENV PATH="/opt/venv/bin:$PATH"
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/lib/ /app
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/lib/ /app
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/llama-cli-cann.Dockerfile ADDED Viewed

	@@ -0,0 +1,44 @@

+ARG ASCEND_VERSION=8.1.RC1.alpha001-910b-openeuler22.03-py3.10
+FROM ascendai/cann:$ASCEND_VERSION AS build
+WORKDIR /app
+COPY . .
+RUN yum install -y gcc g++ cmake make libcurl-devel
+ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
+ENV LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:$LIBRARY_PATH
+ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling:${LD_LIBRARY_PATH}
+ENV PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:${PYTHONPATH}
+ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:${PATH}
+ENV ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
+ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
+ENV TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
+ENV ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}
+# find libascend_hal.so, because the drive hasn`t been mounted.
+ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/runtime/lib64/stub:$LD_LIBRARY_PATH
+RUN echo "Building with static libs" && \
+    source /usr/local/Ascend/ascend-toolkit/set_env.sh --force && \
+    cmake -B build -DGGML_NATIVE=OFF -DGGML_CANN=ON -DBUILD_SHARED_LIBS=OFF -DLLAMA_BUILD_TESTS=OFF  && \
+    cmake --build build --config Release --target llama-cli
+# TODO: use image with NNRT
+FROM ascendai/cann:$ASCEND_VERSION AS runtime
+COPY --from=build /app/build/bin/llama-cli /llama-cli
+ENV LC_ALL=C.utf8
+ENV ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
+ENV LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:$LIBRARY_PATH
+ENV LD_LIBRARY_PATH=${ASCEND_TOOLKIT_HOME}/lib64:${ASCEND_TOOLKIT_HOME}/lib64/plugin/opskernel:${ASCEND_TOOLKIT_HOME}/lib64/plugin/nnengine:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe/op_tiling:${LD_LIBRARY_PATH}
+ENV PYTHONPATH=${ASCEND_TOOLKIT_HOME}/python/site-packages:${ASCEND_TOOLKIT_HOME}/opp/built-in/op_impl/ai_core/tbe:${PYTHONPATH}
+ENV PATH=${ASCEND_TOOLKIT_HOME}/bin:${ASCEND_TOOLKIT_HOME}/compiler/ccec_compiler/bin:${PATH}
+ENV ASCEND_AICPU_PATH=${ASCEND_TOOLKIT_HOME}
+ENV ASCEND_OPP_PATH=${ASCEND_TOOLKIT_HOME}/opp
+ENV TOOLCHAIN_HOME=${ASCEND_TOOLKIT_HOME}/toolkit
+ENV ASCEND_HOME_PATH=${ASCEND_TOOLKIT_HOME}
+ENTRYPOINT ["/llama-cli" ]

llama.cpp/.devops/llama-cpp-cuda.srpm.spec ADDED Viewed

	@@ -0,0 +1,83 @@

+# SRPM for building from source and packaging an RPM for RPM-based distros.
+# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
+# Built and maintained by John Boero - boeroboy@gmail.com
+# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal
+# Notes for llama.cpp:
+# 1. Tags are currently based on hash - which will not sort asciibetically.
+#    We need to declare standard versioning if people want to sort latest releases.
+# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies.
+# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed.
+#    Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
+# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
+#    It is up to the user to install the correct vendor-specific support.
+Name:           llama.cpp-cuda
+Version:        %( date "+%%Y%%m%%d" )
+Release:        1%{?dist}
+Summary:        CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
+License:        MIT
+Source0:        https://github.com/ggml-org/llama.cpp/archive/refs/heads/master.tar.gz
+BuildRequires:  coreutils make gcc-c++ git cuda-toolkit
+Requires:       cuda-toolkit
+URL:            https://github.com/ggml-org/llama.cpp
+%define debug_package %{nil}
+%define source_date_epoch_from_changelog 0
+%description
+CPU inference for Meta's Lllama2 models using default options.
+%prep
+%setup -n llama.cpp-master
+%build
+make -j GGML_CUDA=1
+%install
+mkdir -p %{buildroot}%{_bindir}/
+cp -p llama-cli %{buildroot}%{_bindir}/llama-cuda-cli
+cp -p llama-server %{buildroot}%{_bindir}/llama-cuda-server
+cp -p llama-simple %{buildroot}%{_bindir}/llama-cuda-simple
+mkdir -p %{buildroot}/usr/lib/systemd/system
+%{__cat} <<EOF  > %{buildroot}/usr/lib/systemd/system/llamacuda.service
+[Unit]
+Description=Llama.cpp server, CPU only (no GPU support in this build).
+After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
+[Service]
+Type=simple
+EnvironmentFile=/etc/sysconfig/llama
+ExecStart=/usr/bin/llama-cuda-server $LLAMA_ARGS
+ExecReload=/bin/kill -s HUP $MAINPID
+Restart=never
+[Install]
+WantedBy=default.target
+EOF
+mkdir -p %{buildroot}/etc/sysconfig
+%{__cat} <<EOF  > %{buildroot}/etc/sysconfig/llama
+LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
+EOF
+%clean
+rm -rf %{buildroot}
+rm -rf %{_builddir}/*
+%files
+%{_bindir}/llama-cuda-cli
+%{_bindir}/llama-cuda-server
+%{_bindir}/llama-cuda-simple
+/usr/lib/systemd/system/llamacuda.service
+%config /etc/sysconfig/llama
+%pre
+%post
+%preun
+%postun
+%changelog

llama.cpp/.devops/llama-cpp.srpm.spec ADDED Viewed

	@@ -0,0 +1,85 @@

+# SRPM for building from source and packaging an RPM for RPM-based distros.
+# https://docs.fedoraproject.org/en-US/quick-docs/creating-rpm-packages
+# Built and maintained by John Boero - boeroboy@gmail.com
+# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal
+# Notes for llama.cpp:
+# 1. Tags are currently based on hash - which will not sort asciibetically.
+#    We need to declare standard versioning if people want to sort latest releases.
+#    In the meantime, YYYYMMDD format will be used.
+# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies.
+# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed.
+#    Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
+# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
+#    It is up to the user to install the correct vendor-specific support.
+Name:           llama.cpp
+Version:        %( date "+%%Y%%m%%d" )
+Release:        1%{?dist}
+Summary:        CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
+License:        MIT
+Source0:        https://github.com/ggml-org/llama.cpp/archive/refs/heads/master.tar.gz
+BuildRequires:  coreutils make gcc-c++ git libstdc++-devel
+Requires:       libstdc++
+URL:            https://github.com/ggml-org/llama.cpp
+%define debug_package %{nil}
+%define source_date_epoch_from_changelog 0
+%description
+CPU inference for Meta's Lllama2 models using default options.
+Models are not included in this package and must be downloaded separately.
+%prep
+%setup -n llama.cpp-master
+%build
+make -j
+%install
+mkdir -p %{buildroot}%{_bindir}/
+cp -p llama-cli %{buildroot}%{_bindir}/llama-cli
+cp -p llama-server %{buildroot}%{_bindir}/llama-server
+cp -p llama-simple %{buildroot}%{_bindir}/llama-simple
+mkdir -p %{buildroot}/usr/lib/systemd/system
+%{__cat} <<EOF  > %{buildroot}/usr/lib/systemd/system/llama.service
+[Unit]
+Description=Llama.cpp server, CPU only (no GPU support in this build).
+After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target
+[Service]
+Type=simple
+EnvironmentFile=/etc/sysconfig/llama
+ExecStart=/usr/bin/llama-server $LLAMA_ARGS
+ExecReload=/bin/kill -s HUP $MAINPID
+Restart=never
+[Install]
+WantedBy=default.target
+EOF
+mkdir -p %{buildroot}/etc/sysconfig
+%{__cat} <<EOF  > %{buildroot}/etc/sysconfig/llama
+LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
+EOF
+%clean
+rm -rf %{buildroot}
+rm -rf %{_builddir}/*
+%files
+%{_bindir}/llama-cli
+%{_bindir}/llama-server
+%{_bindir}/llama-simple
+/usr/lib/systemd/system/llama.service
+%config /etc/sysconfig/llama
+%pre
+%post
+%preun
+%postun
+%changelog

llama.cpp/.devops/musa.Dockerfile ADDED Viewed

	@@ -0,0 +1,101 @@

+ARG UBUNTU_VERSION=22.04
+# This needs to generally match the container host's environment.
+ARG MUSA_VERSION=rc4.2.0
+# Target the MUSA build image
+ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}-amd64
+ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}-amd64
+FROM ${BASE_MUSA_DEV_CONTAINER} AS build
+# MUSA architecture to build for (defaults to all supported archs)
+ARG MUSA_DOCKER_ARCH=default
+RUN apt-get update && \
+    apt-get install -y \
+    build-essential \
+    cmake \
+    python3 \
+    python3-pip \
+    git \
+    libcurl4-openssl-dev \
+    libgomp1
+WORKDIR /app
+COPY . .
+RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
+        export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
+    fi && \
+    cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
+    cmake --build build --config Release -j$(nproc)
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+## Base image
+FROM ${BASE_MUSA_RUN_CONTAINER} AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl\
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+COPY --from=build /app/lib/ /app
+### Full
+FROM base AS full
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y \
+    git \
+    python3 \
+    python3-pip \
+    && pip install --upgrade pip setuptools wheel \
+    && pip install -r requirements.txt \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/nix/apps.nix ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  perSystem =
+    { config, lib, ... }:
+    {
+      apps =
+        let
+          inherit (config.packages) default;
+          binaries = [
+            "llama-cli"
+            "llama-embedding"
+            "llama-server"
+            "llama-quantize"
+          ];
+          mkApp = name: {
+            type = "app";
+            program = "${default}/bin/${name}";
+          };
+        in
+        lib.genAttrs binaries mkApp;
+    };
+}

llama.cpp/.devops/nix/devshells.nix ADDED Viewed

	@@ -0,0 +1,52 @@

+{ inputs, ... }:
+{
+  perSystem =
+    {
+      config,
+      lib,
+      system,
+      ...
+    }:
+    {
+      devShells =
+        let
+          pkgs = import inputs.nixpkgs { inherit system; };
+          stdenv = pkgs.stdenv;
+          scripts = config.packages.python-scripts;
+        in
+        lib.pipe (config.packages) [
+          (lib.concatMapAttrs (
+            name: package: {
+              ${name} = pkgs.mkShell {
+                name = "${name}";
+                inputsFrom = [ package ];
+                shellHook = ''
+                  echo "Entering ${name} devShell"
+                '';
+              };
+              "${name}-extra" =
+                if (name == "python-scripts") then
+                  null
+                else
+                  pkgs.mkShell {
+                    name = "${name}-extra";
+                    inputsFrom = [
+                      package
+                      scripts
+                    ];
+                    # Extra packages that *may* be used by some scripts
+                    packages = [
+                        pkgs.python3Packages.tiktoken
+                    ];
+                    shellHook = ''
+                      echo "Entering ${name} devShell"
+                      addToSearchPath "LD_LIBRARY_PATH" "${lib.getLib stdenv.cc.cc}/lib"
+                    '';
+                  };
+            }
+          ))
+          (lib.filterAttrs (name: value: value != null))
+        ];
+    };
+}

llama.cpp/.devops/nix/docker.nix ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  lib,
+  dockerTools,
+  buildEnv,
+  llama-cpp,
+  interactive ? true,
+  coreutils,
+}:
+# A tar that can be fed into `docker load`:
+#
+# $ nix build .#llamaPackages.docker
+# $ docker load < result
+# For details and variations cf.
+# - https://nixos.org/manual/nixpkgs/unstable/#ssec-pkgs-dockerTools-buildLayeredImage
+# - https://discourse.nixos.org/t/a-faster-dockertools-buildimage-prototype/16922
+# - https://nixery.dev/
+# Approximate (compressed) sizes, at the time of writing, are:
+#
+# .#llamaPackages.docker: 125M;
+# .#llamaPackagesCuda.docker: 537M;
+# .#legacyPackages.aarch64-linux.llamaPackagesXavier.docker: 415M.
+dockerTools.buildLayeredImage {
+  name = llama-cpp.pname;
+  tag = "latest";
+  contents =
+    [ llama-cpp ]
+    ++ lib.optionals interactive [
+      coreutils
+      dockerTools.binSh
+      dockerTools.caCertificates
+    ];
+}

llama.cpp/.devops/nix/jetson-support.nix ADDED Viewed

	@@ -0,0 +1,39 @@

+{ inputs, ... }:
+{
+  perSystem =
+    {
+      config,
+      system,
+      lib,
+      pkgsCuda,
+      ...
+    }:
+    {
+      legacyPackages =
+        let
+          caps.llamaPackagesXavier = "7.2";
+          caps.llamaPackagesOrin = "8.7";
+          caps.llamaPackagesTX2 = "6.2";
+          caps.llamaPackagesNano = "5.3";
+          pkgsFor =
+            cap:
+            import inputs.nixpkgs {
+              inherit system;
+              config = {
+                cudaSupport = true;
+                cudaCapabilities = [ cap ];
+                cudaEnableForwardCompat = false;
+                inherit (pkgsCuda.config) allowUnfreePredicate;
+              };
+            };
+        in
+        builtins.mapAttrs (name: cap: (pkgsFor cap).callPackage ./scope.nix { }) caps;
+      packages = lib.optionalAttrs (system == "aarch64-linux") {
+        jetson-xavier = config.legacyPackages.llamaPackagesXavier.llama-cpp;
+        jetson-orin = config.legacyPackages.llamaPackagesOrin.llama-cpp;
+        jetson-nano = config.legacyPackages.llamaPackagesNano.llama-cpp;
+      };
+    };
+}

llama.cpp/.devops/nix/nixpkgs-instances.nix ADDED Viewed

	@@ -0,0 +1,45 @@

+{ inputs, ... }:
+{
+  # The _module.args definitions are passed on to modules as arguments. E.g.
+  # the module `{ pkgs ... }: { /* config */ }` implicitly uses
+  # `_module.args.pkgs` (defined in this case by flake-parts).
+  perSystem =
+    { system, ... }:
+    {
+      _module.args = {
+        # Note: bringing up https://zimbatm.com/notes/1000-instances-of-nixpkgs
+        # again, the below creates several nixpkgs instances which the
+        # flake-centric CLI will be forced to evaluate e.g. on `nix flake show`.
+        #
+        # This is currently "slow" and "expensive", on a certain scale.
+        # This also isn't "right" in that this hinders dependency injection at
+        # the level of flake inputs. This might get removed in the foreseeable
+        # future.
+        #
+        # Note that you can use these expressions without Nix
+        # (`pkgs.callPackage ./devops/nix/scope.nix { }` is the entry point).
+        pkgsCuda = import inputs.nixpkgs {
+          inherit system;
+          # Ensure dependencies use CUDA consistently (e.g. that openmpi, ucc,
+          # and ucx are built with CUDA support)
+          config.cudaSupport = true;
+          config.allowUnfreePredicate =
+            p:
+            builtins.all (
+              license:
+              license.free
+              || builtins.elem license.shortName [
+                "CUDA EULA"
+                "cuDNN EULA"
+              ]
+            ) (p.meta.licenses or [ p.meta.license ]);
+        };
+        # Ensure dependencies use ROCm consistently
+        pkgsRocm = import inputs.nixpkgs {
+          inherit system;
+          config.rocmSupport = true;
+        };
+      };
+    };
+}

llama.cpp/.devops/nix/package-gguf-py.nix ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  lib,
+  llamaVersion,
+  numpy,
+  tqdm,
+  sentencepiece,
+  pyyaml,
+  poetry-core,
+  buildPythonPackage,
+  pytestCheckHook,
+}:
+buildPythonPackage {
+  pname = "gguf";
+  version = llamaVersion;
+  pyproject = true;
+  nativeBuildInputs = [ poetry-core ];
+  propagatedBuildInputs = [
+    numpy
+    tqdm
+    sentencepiece
+    pyyaml
+  ];
+  src = lib.cleanSource ../../gguf-py;
+  pythonImportsCheck = [
+    "numpy"
+    "gguf"
+  ];
+  nativeCheckInputs = [ pytestCheckHook ];
+  doCheck = true;
+  meta = with lib; {
+    description = "Python package for writing binary files in the GGUF format";
+    license = licenses.mit;
+    maintainers = [ maintainers.ditsuke ];
+  };
+}

llama.cpp/.devops/nix/package.nix ADDED Viewed

	@@ -0,0 +1,248 @@

+{
+  lib,
+  glibc,
+  config,
+  stdenv,
+  runCommand,
+  cmake,
+  ninja,
+  pkg-config,
+  git,
+  mpi,
+  blas,
+  cudaPackages,
+  autoAddDriverRunpath,
+  darwin,
+  rocmPackages,
+  vulkan-headers,
+  vulkan-loader,
+  curl,
+  shaderc,
+  useBlas ?
+    builtins.all (x: !x) [
+      useCuda
+      useMetalKit
+      useRocm
+      useVulkan
+    ]
+    && blas.meta.available,
+  useCuda ? config.cudaSupport,
+  useMetalKit ? stdenv.isAarch64 && stdenv.isDarwin,
+  # Increases the runtime closure size by ~700M
+  useMpi ? false,
+  useRocm ? config.rocmSupport,
+  rocmGpuTargets ? builtins.concatStringsSep ";" rocmPackages.clr.gpuTargets,
+  enableCurl ? true,
+  useVulkan ? false,
+  llamaVersion ? "0.0.0", # Arbitrary version, substituted by the flake
+  # It's necessary to consistently use backendStdenv when building with CUDA support,
+  # otherwise we get libstdc++ errors downstream.
+  effectiveStdenv ? if useCuda then cudaPackages.backendStdenv else stdenv,
+  enableStatic ? effectiveStdenv.hostPlatform.isStatic,
+  precompileMetalShaders ? false,
+}:
+let
+  inherit (lib)
+    cmakeBool
+    cmakeFeature
+    optionalAttrs
+    optionals
+    strings
+    ;
+  stdenv = throw "Use effectiveStdenv instead";
+  suffices =
+    lib.optionals useBlas [ "BLAS" ]
+    ++ lib.optionals useCuda [ "CUDA" ]
+    ++ lib.optionals useMetalKit [ "MetalKit" ]
+    ++ lib.optionals useMpi [ "MPI" ]
+    ++ lib.optionals useRocm [ "ROCm" ]
+    ++ lib.optionals useVulkan [ "Vulkan" ];
+  pnameSuffix =
+    strings.optionalString (suffices != [ ])
+      "-${strings.concatMapStringsSep "-" strings.toLower suffices}";
+  descriptionSuffix = strings.optionalString (
+    suffices != [ ]
+  ) ", accelerated with ${strings.concatStringsSep ", " suffices}";
+  xcrunHost = runCommand "xcrunHost" { } ''
+    mkdir -p $out/bin
+    ln -s /usr/bin/xcrun $out/bin
+  '';
+  # apple_sdk is supposed to choose sane defaults, no need to handle isAarch64
+  # separately
+  darwinBuildInputs =
+    with darwin.apple_sdk.frameworks;
+    [
+      Accelerate
+      CoreVideo
+      CoreGraphics
+    ]
+    ++ optionals useMetalKit [ MetalKit ];
+  cudaBuildInputs = with cudaPackages; [
+    cuda_cudart
+    cuda_cccl # <nv/target>
+    libcublas
+  ];
+  rocmBuildInputs = with rocmPackages; [
+    clr
+    hipblas
+    rocblas
+  ];
+  vulkanBuildInputs = [
+    vulkan-headers
+    vulkan-loader
+    shaderc
+  ];
+in
+effectiveStdenv.mkDerivation (finalAttrs: {
+  pname = "llama-cpp${pnameSuffix}";
+  version = llamaVersion;
+  # Note: none of the files discarded here are visible in the sandbox or
+  # affect the output hash. This also means they can be modified without
+  # triggering a rebuild.
+  src = lib.cleanSourceWith {
+    filter =
+      name: type:
+      let
+        noneOf = builtins.all (x: !x);
+        baseName = baseNameOf name;
+      in
+      noneOf [
+        (lib.hasSuffix ".nix" name) # Ignore *.nix files when computing outPaths
+        (lib.hasSuffix ".md" name) # Ignore *.md changes whe computing outPaths
+        (lib.hasPrefix "." baseName) # Skip hidden files and directories
+        (baseName == "flake.lock")
+      ];
+    src = lib.cleanSource ../../.;
+  };
+  postPatch = ''
+    substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
+      --replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"
+    substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
+      --replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
+  '';
+  # With PR#6015 https://github.com/ggml-org/llama.cpp/pull/6015,
+  # `default.metallib` may be compiled with Metal compiler from XCode
+  # and we need to escape sandbox on MacOS to access Metal compiler.
+  # `xcrun` is used find the path of the Metal compiler, which is varible
+  # and not on $PATH
+  # see https://github.com/ggml-org/llama.cpp/pull/6118 for discussion
+  __noChroot = effectiveStdenv.isDarwin && useMetalKit && precompileMetalShaders;
+  nativeBuildInputs =
+    [
+      cmake
+      ninja
+      pkg-config
+      git
+    ]
+    ++ optionals useCuda [
+      cudaPackages.cuda_nvcc
+      autoAddDriverRunpath
+    ]
+    ++ optionals (effectiveStdenv.hostPlatform.isGnu && enableStatic) [ glibc.static ]
+    ++ optionals (effectiveStdenv.isDarwin && useMetalKit && precompileMetalShaders) [ xcrunHost ];
+  buildInputs =
+    optionals effectiveStdenv.isDarwin darwinBuildInputs
+    ++ optionals useCuda cudaBuildInputs
+    ++ optionals useMpi [ mpi ]
+    ++ optionals useRocm rocmBuildInputs
+    ++ optionals useBlas [ blas ]
+    ++ optionals useVulkan vulkanBuildInputs
+    ++ optionals enableCurl [ curl ];
+  cmakeFlags =
+    [
+      (cmakeBool "LLAMA_BUILD_SERVER" true)
+      (cmakeBool "BUILD_SHARED_LIBS" (!enableStatic))
+      (cmakeBool "CMAKE_SKIP_BUILD_RPATH" true)
+      (cmakeBool "LLAMA_CURL" enableCurl)
+      (cmakeBool "GGML_NATIVE" false)
+      (cmakeBool "GGML_BLAS" useBlas)
+      (cmakeBool "GGML_CUDA" useCuda)
+      (cmakeBool "GGML_HIP" useRocm)
+      (cmakeBool "GGML_METAL" useMetalKit)
+      (cmakeBool "GGML_VULKAN" useVulkan)
+      (cmakeBool "GGML_STATIC" enableStatic)
+    ]
+    ++ optionals useCuda [
+      (
+        with cudaPackages.flags;
+        cmakeFeature "CMAKE_CUDA_ARCHITECTURES" (
+          builtins.concatStringsSep ";" (map dropDot cudaCapabilities)
+        )
+      )
+    ]
+    ++ optionals useRocm [
+      (cmakeFeature "CMAKE_HIP_COMPILER" "${rocmPackages.llvm.clang}/bin/clang")
+      (cmakeFeature "CMAKE_HIP_ARCHITECTURES" rocmGpuTargets)
+    ]
+    ++ optionals useMetalKit [
+      (lib.cmakeFeature "CMAKE_C_FLAGS" "-D__ARM_FEATURE_DOTPROD=1")
+      (cmakeBool "GGML_METAL_EMBED_LIBRARY" (!precompileMetalShaders))
+    ];
+  # Environment variables needed for ROCm
+  env = optionalAttrs useRocm {
+    ROCM_PATH = "${rocmPackages.clr}";
+    HIP_DEVICE_LIB_PATH = "${rocmPackages.rocm-device-libs}/amdgcn/bitcode";
+  };
+  # TODO(SomeoneSerge): It's better to add proper install targets at the CMake level,
+  # if they haven't been added yet.
+  postInstall = ''
+    mkdir -p $out/include
+    cp $src/include/llama.h $out/include/
+  '';
+  meta = {
+    # Configurations we don't want even the CI to evaluate. Results in the
+    # "unsupported platform" messages. This is mostly a no-op, because
+    # cudaPackages would've refused to evaluate anyway.
+    badPlatforms = optionals useCuda lib.platforms.darwin;
+    # Configurations that are known to result in build failures. Can be
+    # overridden by importing Nixpkgs with `allowBroken = true`.
+    broken = (useMetalKit && !effectiveStdenv.isDarwin);
+    description = "Inference of LLaMA model in pure C/C++${descriptionSuffix}";
+    homepage = "https://github.com/ggml-org/llama.cpp/";
+    license = lib.licenses.mit;
+    # Accommodates `nix run` and `lib.getExe`
+    mainProgram = "llama-cli";
+    # These people might respond, on the best effort basis, if you ping them
+    # in case of Nix-specific regressions or for reviewing Nix-specific PRs.
+    # Consider adding yourself to this list if you want to ensure this flake
+    # stays maintained and you're willing to invest your time. Do not add
+    # other people without their consent. Consider removing people after
+    # they've been unreachable for long periods of time.
+    # Note that lib.maintainers is defined in Nixpkgs, but you may just add
+    # an attrset following the same format as in
+    # https://github.com/NixOS/nixpkgs/blob/f36a80e54da29775c78d7eff0e628c2b4e34d1d7/maintainers/maintainer-list.nix
+    maintainers = with lib.maintainers; [
+      philiptaron
+      SomeoneSerge
+    ];
+    # Extend `badPlatforms` instead
+    platforms = lib.platforms.all;
+  };
+})

llama.cpp/.devops/nix/python-scripts.nix ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  lib,
+  stdenv,
+  buildPythonPackage,
+  poetry-core,
+  mkShell,
+  python3Packages,
+  gguf-py,
+}@inputs:
+let
+  llama-python-deps = with python3Packages; [
+    numpy
+    sentencepiece
+    transformers
+    protobuf
+    torchWithoutCuda
+    gguf-py
+    tqdm
+    # for scripts/compare-llama-bench.py
+    gitpython
+    tabulate
+    # for examples/pydantic-models-to-grammar-examples.py
+    docstring-parser
+    pydantic
+  ];
+  llama-python-test-deps = with python3Packages; [
+    # Server bench
+    matplotlib
+    # server tests
+    openai
+    pytest
+    prometheus-client
+  ];
+in
+buildPythonPackage ({
+  pname = "llama-scripts";
+  version = "0.0.0";
+  pyproject = true;
+  # NOTE: The files filtered out here are not visible in the build sandbox, neither
+  # do they affect the output hash. They can be modified without triggering a rebuild.
+  src = lib.cleanSourceWith {
+    filter =
+      name: type:
+      let
+        any = builtins.any (x: x);
+        baseName = builtins.baseNameOf name;
+      in
+      any [
+        (lib.hasSuffix ".py" name)
+        (baseName == "README.md")
+        (baseName == "pyproject.toml")
+      ];
+    src = lib.cleanSource ../../.;
+  };
+  nativeBuildInputs = [ poetry-core ];
+  nativeCheckInputs = llama-python-test-deps;
+  dependencies = llama-python-deps;
+})

llama.cpp/.devops/nix/scope.nix ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  lib,
+  newScope,
+  python3,
+  llamaVersion ? "0.0.0",
+}:
+let
+  pythonPackages = python3.pkgs;
+  buildPythonPackage = pythonPackages.buildPythonPackage;
+  numpy = pythonPackages.numpy;
+  tqdm = pythonPackages.tqdm;
+  sentencepiece = pythonPackages.sentencepiece;
+  pyyaml = pythonPackages.pyyaml;
+  poetry-core = pythonPackages.poetry-core;
+  pytestCheckHook = pythonPackages.pytestCheckHook;
+in
+# We're using `makeScope` instead of just writing out an attrset
+# because it allows users to apply overlays later using `overrideScope'`.
+# Cf. https://noogle.dev/f/lib/makeScope
+lib.makeScope newScope (self: {
+  inherit llamaVersion;
+  gguf-py = self.callPackage ./package-gguf-py.nix {
+    inherit
+      buildPythonPackage
+      numpy
+      tqdm
+      sentencepiece
+      poetry-core
+      pyyaml
+      pytestCheckHook
+      ;
+  };
+  python-scripts = self.callPackage ./python-scripts.nix { inherit buildPythonPackage poetry-core; };
+  llama-cpp = self.callPackage ./package.nix { };
+  docker = self.callPackage ./docker.nix { };
+  docker-min = self.callPackage ./docker.nix { interactive = false; };
+  sif = self.callPackage ./sif.nix { };
+})

llama.cpp/.devops/nix/sif.nix ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  lib,
+  singularity-tools,
+  llama-cpp,
+  bashInteractive,
+  interactive ? false,
+}:
+let
+  optionalInt = cond: x: if cond then x else 0;
+in
+singularity-tools.buildImage rec {
+  inherit (llama-cpp) name;
+  contents = [ llama-cpp ] ++ lib.optionals interactive [ bashInteractive ];
+  # These are excessive (but safe) for most variants. Building singularity
+  # images requires superuser privileges, so we build them inside a VM in a
+  # writable image of pre-determined size.
+  #
+  # ROCm is currently affected by https://github.com/NixOS/nixpkgs/issues/276846
+  #
+  # Expected image sizes:
+  # - cpu/blas: 150M,
+  # - cuda, all gencodes: 560M,
+  diskSize = 4096 + optionalInt llama-cpp.useRocm 16384;
+  memSize = diskSize;
+}

llama.cpp/.devops/rocm.Dockerfile ADDED Viewed

	@@ -0,0 +1,113 @@

+ARG UBUNTU_VERSION=24.04
+# This needs to generally match the container host's environment.
+ARG ROCM_VERSION=6.4
+ARG AMDGPU_VERSION=6.4
+# Target the CUDA build image
+ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete
+### Build image
+FROM ${BASE_ROCM_DEV_CONTAINER} AS build
+# Unless otherwise specified, we make a fat build.
+# List from https://github.com/ggml-org/llama.cpp/pull/1087#issuecomment-1682807878
+# This is mostly tied to rocBLAS supported archs.
+# gfx803, gfx900, gfx1032, gfx1101, gfx1102,not officialy supported
+# gfx906 is deprecated
+#check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/reference/system-requirements.html
+ARG ROCM_DOCKER_ARCH='gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102'
+#ARG ROCM_DOCKER_ARCH=gfx1100
+# Set nvcc architectured
+ENV AMDGPU_TARGETS=${ROCM_DOCKER_ARCH}
+# Enable ROCm
+# ENV CC=/opt/rocm/llvm/bin/clang
+# ENV CXX=/opt/rocm/llvm/bin/clang++
+RUN apt-get update \
+    && apt-get install -y \
+    build-essential \
+    cmake \
+    git \
+    libcurl4-openssl-dev \
+    curl \
+    libgomp1
+WORKDIR /app
+COPY . .
+RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
+    cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$ROCM_DOCKER_ARCH -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
+    && cmake --build build --config Release -j$(nproc)
+RUN mkdir -p /app/lib \
+    && find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+## Base image
+FROM ${BASE_ROCM_DEV_CONTAINER} AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl\
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+COPY --from=build /app/lib/ /app
+### Full
+FROM base AS full
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y \
+    git \
+    python3-pip \
+    python3 \
+    python3-wheel\
+    && pip install --break-system-packages --upgrade setuptools \
+    && pip install --break-system-packages -r requirements.txt \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.devops/tools.sh ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/usr/bin/env bash
+set -e
+# Read the first argument into a variable
+arg1="$1"
+# Shift the arguments to remove the first one
+shift
+if [[ "$arg1" == '--convert' || "$arg1" == '-c' ]]; then
+    exec python3 ./convert_hf_to_gguf.py "$@"
+elif [[ "$arg1" == '--quantize' || "$arg1" == '-q' ]]; then
+    exec ./llama-quantize "$@"
+elif [[ "$arg1" == '--run' || "$arg1" == '-r' ]]; then
+    exec ./llama-cli "$@"
+elif [[ "$arg1" == '--bench' || "$arg1" == '-b' ]]; then
+    exec ./llama-bench "$@"
+elif [[ "$arg1" == '--perplexity' || "$arg1" == '-p' ]]; then
+    exec ./llama-perplexity "$@"
+elif [[ "$arg1" == '--all-in-one' || "$arg1" == '-a' ]]; then
+    echo "Converting PTH to GGML..."
+    for i in $(ls $1/$2/ggml-model-f16.bin*); do
+        if [ -f "${i/f16/q4_0}" ]; then
+            echo "Skip model quantization, it already exists: ${i/f16/q4_0}"
+        else
+            echo "Converting PTH to GGML: $i into ${i/f16/q4_0}..."
+            exec ./llama-quantize "$i" "${i/f16/q4_0}" q4_0
+        fi
+    done
+elif [[ "$arg1" == '--server' || "$arg1" == '-s' ]]; then
+    exec ./llama-server "$@"
+else
+    echo "Unknown command: $arg1"
+    echo "Available commands: "
+    echo "  --run (-r): Run a model previously converted into ggml"
+    echo "              ex: -m /models/7B/ggml-model-q4_0.bin -p \"Building a website can be done in 10 simple steps:\" -n 512"
+    echo "  --bench (-b): Benchmark the performance of the inference for various parameters."
+    echo "              ex: -m model.gguf"
+    echo "  --perplexity (-p): Measure the perplexity of a model over a given text."
+    echo "              ex: -m model.gguf -f file.txt"
+    echo "  --convert (-c): Convert a llama model into ggml"
+    echo "              ex: --outtype f16 \"/models/7B/\" "
+    echo "  --quantize (-q): Optimize with quantization process ggml"
+    echo "              ex: \"/models/7B/ggml-model-f16.bin\" \"/models/7B/ggml-model-q4_0.bin\" 2"
+    echo "  --all-in-one (-a): Execute --convert & --quantize"
+    echo "              ex: \"/models/\" 7B"
+    echo "  --server (-s): Run a model on the server"
+    echo "              ex: -m /models/7B/ggml-model-q4_0.bin -c 2048 -ngl 43 -mg 1 --port 8080"
+fi

llama.cpp/.devops/vulkan.Dockerfile ADDED Viewed

	@@ -0,0 +1,89 @@

+ARG UBUNTU_VERSION=24.04
+FROM ubuntu:$UBUNTU_VERSION AS build
+# Install build tools
+RUN apt update && apt install -y git build-essential cmake wget
+# Install Vulkan SDK and cURL
+RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
+    wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list && \
+    apt update -y && \
+    apt-get install -y vulkan-sdk libcurl4-openssl-dev curl
+# Build it
+WORKDIR /app
+COPY . .
+RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_VULKAN=1  -DLLAMA_BUILD_TESTS=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON && \
+    cmake --build build --config Release -j$(nproc)
+RUN mkdir -p /app/lib && \
+    find build -name "*.so" -exec cp {} /app/lib \;
+RUN mkdir -p /app/full \
+    && cp build/bin/* /app/full \
+    && cp *.py /app/full \
+    && cp -r gguf-py /app/full \
+    && cp -r requirements /app/full \
+    && cp requirements.txt /app/full \
+    && cp .devops/tools.sh /app/full/tools.sh
+## Base image
+FROM ubuntu:$UBUNTU_VERSION AS base
+RUN apt-get update \
+    && apt-get install -y libgomp1 curl libvulkan-dev \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+COPY --from=build /app/lib/ /app
+### Full
+FROM base AS full
+COPY --from=build /app/full /app
+WORKDIR /app
+RUN apt-get update \
+    && apt-get install -y \
+    git \
+    python3 \
+    python3-pip \
+    python3-wheel \
+    && pip install --break-system-packages --upgrade setuptools \
+    && pip install --break-system-packages -r requirements.txt \
+    && apt autoremove -y \
+    && apt clean -y \
+    && rm -rf /tmp/* /var/tmp/* \
+    && find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
+    && find /var/cache -type f -delete
+ENTRYPOINT ["/app/tools.sh"]
+### Light, CLI only
+FROM base AS light
+COPY --from=build /app/full/llama-cli /app
+WORKDIR /app
+ENTRYPOINT [ "/app/llama-cli" ]
+### Server, Server only
+FROM base AS server
+ENV LLAMA_ARG_HOST=0.0.0.0
+COPY --from=build /app/full/llama-server /app
+WORKDIR /app
+HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]
+ENTRYPOINT [ "/app/llama-server" ]

llama.cpp/.dockerignore ADDED Viewed

	@@ -0,0 +1,20 @@

+*.o
+*.a
+.cache/
+# Do not ignore .git directory, otherwise the reported build number will always be 0
+.github/
+.gitignore
+.vs/
+.vscode/
+.DS_Store
+build*/
+models/*
+/llama-cli
+/llama-quantize
+arm_neon.h
+compile_commands.json
+Dockerfile

llama.cpp/.ecrc ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "Exclude": ["^\\.gitmodules$", "stb_image\\.h"],
+  "Disable": {
+    "IndentSize": true
+  }
+}

llama.cpp/.editorconfig ADDED Viewed

	@@ -0,0 +1,54 @@

+# https://EditorConfig.org
+# Top-most EditorConfig file
+root = true
+# Unix-style newlines with a newline ending every file, utf-8 charset
+[*]
+end_of_line = lf
+insert_final_newline = true
+trim_trailing_whitespace = true
+charset = utf-8
+indent_style = space
+indent_size = 4
+[Makefile]
+indent_style = tab
+[scripts/*.mk]
+indent_style = tab
+[prompts/*.txt]
+insert_final_newline = unset
+[tools/server/public/*]
+indent_size = 2
+[tools/server/public/deps_*]
+trim_trailing_whitespace = unset
+indent_style = unset
+indent_size = unset
+[tools/server/deps_*]
+trim_trailing_whitespace = unset
+indent_style = unset
+indent_size = unset
+[examples/llama.swiftui/llama.swiftui.xcodeproj/*]
+indent_style = tab
+[tools/cvector-generator/*.txt]
+trim_trailing_whitespace = unset
+insert_final_newline = unset
+[models/templates/*.jinja]
+indent_style = unset
+indent_size = unset
+end_of_line = unset
+charset = unset
+trim_trailing_whitespace = unset
+insert_final_newline = unset
+[vendor/miniaudio/miniaudio.h]
+trim_trailing_whitespace = unset
+insert_final_newline = unset

llama.cpp/.flake8 ADDED Viewed

	@@ -0,0 +1,18 @@

+[flake8]
+max-line-length = 125
+ignore = E203,E211,E221,E225,E231,E241,E251,E261,E266,E501,E701,E704,W503
+exclude =
+    # Do not traverse examples and tools
+    examples,
+    tools,
+    # Do not include package initializers
+    __init__.py,
+    # No need to traverse our git directory
+    .git,
+    # There's no value in checking cache directories
+    __pycache__,
+    # No need to include the build path
+    build,
+    # This contains builds that we don't want to check
+    dist  # This is generated with `python build .` for package releases
+# max-complexity = 10

llama.cpp/.github/ISSUE_TEMPLATE/010-bug-compilation.yml ADDED Viewed

	@@ -0,0 +1,87 @@

+name: Bug (compilation)
+description: Something goes wrong when trying to compile llama.cpp.
+title: "Compile bug: "
+labels: ["bug-unconfirmed", "compilation"]
+body:
+  - type: markdown
+    attributes:
+      value: >
+        Thanks for taking the time to fill out this bug report!
+        This issue template is intended for bug reports where the compilation of llama.cpp fails.
+        Before opening an issue, please confirm that the compilation still fails with `-DGGML_CCACHE=OFF`.
+        If the compilation succeeds with ccache disabled you should be able to permanently fix the issue
+        by clearing `~/.cache/ccache` (on Linux).
+  - type: textarea
+    id: commit
+    attributes:
+      label: Git commit
+      description: Which commit are you trying to compile?
+      placeholder: |
+        $git rev-parse HEAD
+        84a07a17b1b08cf2b9747c633a2372782848a27f
+    validations:
+      required: true
+  - type: dropdown
+    id: operating-system
+    attributes:
+      label: Operating systems
+      description: Which operating systems do you know to be affected?
+      multiple: true
+      options:
+        - Linux
+        - Mac
+        - Windows
+        - BSD
+        - Other? (Please let us know in description)
+    validations:
+      required: true
+  - type: dropdown
+    id: backends
+    attributes:
+        label: GGML backends
+        description: Which GGML backends do you know to be affected?
+        options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
+        multiple: true
+    validations:
+      required: true
+  - type: textarea
+    id: info
+    attributes:
+      label: Problem description & steps to reproduce
+      description: >
+        Please give us a summary of the problem and tell us how to reproduce it.
+        If you can narrow down the bug to specific compile flags, that information would be very much appreciated by us.
+      placeholder: >
+        I'm trying to compile llama.cpp with CUDA support on a fresh install of Ubuntu and get error XY.
+        Here are the exact commands that I used: ...
+    validations:
+      required: true
+  - type: textarea
+    id: first_bad_commit
+    attributes:
+      label: First Bad Commit
+      description: >
+        If the bug was not present on an earlier version: when did it start appearing?
+        If possible, please do a git bisect and identify the exact commit that introduced the bug.
+    validations:
+      required: false
+  - type: textarea
+    id: command
+    attributes:
+      label: Compile command
+      description: >
+        Please provide the exact command you used to compile llama.cpp. For example: `cmake -B ...`.
+        This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: true
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant log output
+      description: >
+          Please copy and paste any relevant log output, including any generated text.
+          This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: true

llama.cpp/.github/ISSUE_TEMPLATE/011-bug-results.yml ADDED Viewed

	@@ -0,0 +1,101 @@

+name: Bug (model use)
+description: Something goes wrong when using a model (in general, not specific to a single llama.cpp module).
+title: "Eval bug: "
+labels: ["bug-unconfirmed", "model evaluation"]
+body:
+  - type: markdown
+    attributes:
+      value: >
+        Thanks for taking the time to fill out this bug report!
+        This issue template is intended for bug reports where the model evaluation results
+        (i.e. the generated text) are incorrect or llama.cpp crashes during model evaluation.
+        If you encountered the issue while using an external UI (e.g. ollama),
+        please reproduce your issue using one of the examples/binaries in this repository.
+        The `llama-cli` binary can be used for simple and reproducible model inference.
+  - type: textarea
+    id: version
+    attributes:
+      label: Name and Version
+      description: Which version of our software are you running? (use `--version` to get a version string)
+      placeholder: |
+        $./llama-cli --version
+        version: 2999 (42b4109e)
+        built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
+    validations:
+      required: true
+  - type: dropdown
+    id: operating-system
+    attributes:
+      label: Operating systems
+      description: Which operating systems do you know to be affected?
+      multiple: true
+      options:
+        - Linux
+        - Mac
+        - Windows
+        - BSD
+        - Other? (Please let us know in description)
+    validations:
+      required: true
+  - type: dropdown
+    id: backends
+    attributes:
+        label: GGML backends
+        description: Which GGML backends do you know to be affected?
+        options: [AMX, BLAS, CPU, CUDA, HIP, Metal, Musa, RPC, SYCL, Vulkan, OpenCL]
+        multiple: true
+    validations:
+      required: true
+  - type: textarea
+    id: hardware
+    attributes:
+      label: Hardware
+      description: Which CPUs/GPUs are you using?
+      placeholder: >
+        e.g. Ryzen 5950X + 2x RTX 4090
+    validations:
+      required: true
+  - type: textarea
+    id: model
+    attributes:
+      label: Models
+      description: >
+        Which model(s) at which quantization were you using when encountering the bug?
+        If you downloaded a GGUF file off of Huggingface, please provide a link.
+      placeholder: >
+        e.g. Meta LLaMA 3.1 Instruct 8b q4_K_M
+    validations:
+      required: false
+  - type: textarea
+    id: info
+    attributes:
+      label: Problem description & steps to reproduce
+      description: >
+        Please give us a summary of the problem and tell us how to reproduce it.
+        If you can narrow down the bug to specific hardware, compile flags, or command line arguments,
+        that information would be very much appreciated by us.
+      placeholder: >
+        e.g. when I run llama-cli with -ngl 99 I get garbled outputs.
+        When I use -ngl 0 it works correctly.
+        Here are the exact commands that I used: ...
+    validations:
+      required: true
+  - type: textarea
+    id: first_bad_commit
+    attributes:
+      label: First Bad Commit
+      description: >
+        If the bug was not present on an earlier version: when did it start appearing?
+        If possible, please do a git bisect and identify the exact commit that introduced the bug.
+    validations:
+      required: false
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant log output
+      description: >
+          Please copy and paste any relevant log output, including the command that you entered and any generated text.
+          This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: true

llama.cpp/.github/ISSUE_TEMPLATE/019-bug-misc.yml ADDED Viewed

	@@ -0,0 +1,91 @@

+name: Bug (misc.)
+description: Something is not working the way it should (and it's not covered by any of the above cases).
+title: "Misc. bug: "
+labels: ["bug-unconfirmed"]
+body:
+  - type: markdown
+    attributes:
+      value: >
+        Thanks for taking the time to fill out this bug report!
+        This issue template is intended for miscellaneous bugs that don't fit into any other category.
+        If you encountered the issue while using an external UI (e.g. ollama),
+        please reproduce your issue using one of the examples/binaries in this repository.
+  - type: textarea
+    id: version
+    attributes:
+      label: Name and Version
+      description: Which version of our software is affected? (You can use `--version` to get a version string.)
+      placeholder: |
+        $./llama-cli --version
+        version: 2999 (42b4109e)
+        built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
+    validations:
+      required: true
+  - type: dropdown
+    id: operating-system
+    attributes:
+      label: Operating systems
+      description: Which operating systems do you know to be affected?
+      multiple: true
+      options:
+        - Linux
+        - Mac
+        - Windows
+        - BSD
+        - Other? (Please let us know in description)
+    validations:
+      required: false
+  - type: dropdown
+    id: module
+    attributes:
+      label: Which llama.cpp modules do you know to be affected?
+      multiple: true
+      options:
+        - Documentation/Github
+        - libllama (core library)
+        - llama-cli
+        - llama-server
+        - llama-bench
+        - llama-quantize
+        - Python/Bash scripts
+        - Test code
+        - Other (Please specify in the next section)
+    validations:
+      required: false
+  - type: textarea
+    id: command
+    attributes:
+      label: Command line
+      description: >
+        Please provide the exact commands you entered, if applicable. For example: `llama-server -m ... -c ...`, `llama-cli -m ...`, etc.
+        This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: false
+  - type: textarea
+    id: info
+    attributes:
+      label: Problem description & steps to reproduce
+      description: >
+        Please give us a summary of the problem and tell us how to reproduce it (if applicable).
+    validations:
+      required: true
+  - type: textarea
+    id: first_bad_commit
+    attributes:
+      label: First Bad Commit
+      description: >
+        If the bug was not present on an earlier version and it's not trivial to track down: when did it start appearing?
+        If possible, please do a git bisect and identify the exact commit that introduced the bug.
+    validations:
+      required: false
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant log output
+      description: >
+          If applicable, please copy and paste any relevant log output, including any generated text.
+          This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: false

llama.cpp/.github/ISSUE_TEMPLATE/020-enhancement.yml ADDED Viewed

	@@ -0,0 +1,51 @@

+name: Enhancement
+description: Used to request enhancements for llama.cpp.
+title: "Feature Request: "
+labels: ["enhancement"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        [Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/ggml-org/llama.cpp/discussions/categories/ideas)
+  - type: checkboxes
+    id: prerequisites
+    attributes:
+      label: Prerequisites
+      description: Please confirm the following before submitting your enhancement request.
+      options:
+        - label: I am running the latest code. Mention the version if possible as well.
+          required: true
+        - label: I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
+          required: true
+        - label: I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
+          required: true
+        - label: I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.
+          required: true
+  - type: textarea
+    id: feature-description
+    attributes:
+      label: Feature Description
+      description: Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do as an enhancement.
+      placeholder: Detailed description of the enhancement
+    validations:
+      required: true
+  - type: textarea
+    id: motivation
+    attributes:
+      label: Motivation
+      description: Please provide a detailed written description of reasons why this feature is necessary and how it is useful to `llama.cpp` users.
+      placeholder: Explanation of why this feature is needed and its benefits
+    validations:
+      required: true
+  - type: textarea
+    id: possible-implementation
+    attributes:
+      label: Possible Implementation
+      description: If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.
+      placeholder: Detailed description of potential implementation
+    validations:
+      required: false

llama.cpp/.github/ISSUE_TEMPLATE/030-research.yml ADDED Viewed

	@@ -0,0 +1,52 @@

+name: Research
+description: Track new technical research area.
+title: "Research: "
+labels: ["research 🔬"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Don't forget to check for any [duplicate research issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)
+  - type: checkboxes
+    id: research-stage
+    attributes:
+      label: Research Stage
+      description: Track general state of this research ticket
+      options:
+        - label: Background Research (Let's try to avoid reinventing the wheel)
+        - label: Hypothesis Formed (How do you think this will work and it's effect?)
+        - label: Strategy / Implementation Forming
+        - label: Analysis of results
+        - label: Debrief / Documentation (So people in the future can learn from us)
+  - type: textarea
+    id: background
+    attributes:
+      label: Previous existing literature and research
+      description: Whats the current state of the art and whats the motivation for this research?
+  - type: textarea
+    id: hypothesis
+    attributes:
+      label: Hypothesis
+      description: How do you think this will work and it's effect?
+  - type: textarea
+    id: implementation
+    attributes:
+      label: Implementation
+      description: Got an approach? e.g. a PR ready to go?
+  - type: textarea
+    id: analysis
+    attributes:
+      label: Analysis
+      description: How does the proposed implementation behave?
+  - type: textarea
+    id: logs
+    attributes:
+      label: Relevant log output
+      description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.
+      render: shell

llama.cpp/.github/ISSUE_TEMPLATE/040-refactor.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: Refactor (Maintainers)
+description: Used to track refactoring opportunities.
+title: "Refactor: "
+labels: ["refactor"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Don't forget to [check for existing refactor issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
+        Also you may want to check [Pull request refactor label as well](https://github.com/ggml-org/llama.cpp/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.
+  - type: textarea
+    id: background-description
+    attributes:
+      label: Background Description
+      description: Please provide a detailed written description of the pain points you are trying to solve.
+      placeholder: Detailed description behind your motivation to request refactor
+    validations:
+      required: true
+  - type: textarea
+    id: possible-approaches
+    attributes:
+      label: Possible Refactor Approaches
+      description: If you have some idea of possible approaches to solve this problem. You may want to make it a todo list.
+      placeholder: Your idea of possible refactoring opportunity/approaches
+    validations:
+      required: false

llama.cpp/.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

	@@ -0,0 +1,11 @@

+blank_issues_enabled: true
+contact_links:
+  - name: Got an idea?
+    url: https://github.com/ggml-org/llama.cpp/discussions/categories/ideas
+    about: Pop it there. It may then become an enhancement ticket.
+  - name: Got a question?
+    url: https://github.com/ggml-org/llama.cpp/discussions/categories/q-a
+    about: Ask a question there!
+  - name: Want to contribute?
+    url: https://github.com/ggml-org/llama.cpp/wiki/contribute
+    about: Head to the contribution guide page of the wiki for areas you can help with

llama.cpp/.github/actions/get-tag-name/action.yml ADDED Viewed

	@@ -0,0 +1,22 @@

+name: "Determine tag name"
+description: "Determine the tag name to use for a release"
+outputs:
+  name:
+    description: "The name of the tag"
+    value: ${{ steps.tag.outputs.name }}
+runs:
+  using: "composite"
+  steps:
+    - name: Determine tag name
+      id: tag
+      shell: bash
+      run: |
+        BUILD_NUMBER="$(git rev-list --count HEAD)"
+        SHORT_HASH="$(git rev-parse --short=7 HEAD)"
+        if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
+          echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
+        else
+          SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
+          echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
+        fi

llama.cpp/.github/actions/windows-setup-cuda/action.yml ADDED Viewed

	@@ -0,0 +1,67 @@

+name: "Windows - Setup CUDA Toolkit"
+description: "Setup CUDA Toolkit for Windows"
+inputs:
+  cuda_version:
+    description: "CUDA toolkit version"
+    required: true
+runs:
+  using: "composite"
+  steps:
+    - name: Install Cuda Toolkit 11.7
+      if: ${{ inputs.cuda_version == '11.7' }}
+      shell: pwsh
+      run: |
+          mkdir -p "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7"
+          choco install unzip -y
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cudart/windows-x86_64/cuda_cudart-windows-x86_64-11.7.99-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/windows-x86_64/cuda_nvcc-windows-x86_64-11.7.99-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/windows-x86_64/cuda_nvrtc-windows-x86_64-11.7.99-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libcublas/windows-x86_64/libcublas-windows-x86_64-11.7.4.6-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvtx/windows-x86_64/cuda_nvtx-windows-x86_64-11.7.91-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/visual_studio_integration/windows-x86_64/visual_studio_integration-windows-x86_64-11.7.91-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvprof/windows-x86_64/cuda_nvprof-windows-x86_64-11.7.101-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cccl/windows-x86_64/cuda_cccl-windows-x86_64-11.7.91-archive.zip"
+          unzip '*.zip' -d "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7"
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_cudart-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvcc-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvrtc-windows-x86_64-11.7.99-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libcublas-windows-x86_64-11.7.4.6-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvtx-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\visual_studio_integration-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_nvprof-windows-x86_64-11.7.101-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\cuda_cccl-windows-x86_64-11.7.91-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" /E /I /H /Y
+          echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
+          echo "CUDA_PATH_V11_7=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
+    - name: Install Cuda Toolkit 12.4
+      if: ${{ inputs.cuda_version == '12.4' }}
+      shell: pwsh
+      run: |
+          mkdir -p "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
+          choco install unzip -y
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cudart/windows-x86_64/cuda_cudart-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/windows-x86_64/cuda_nvcc-windows-x86_64-12.4.131-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/windows-x86_64/cuda_nvrtc-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libcublas/windows-x86_64/libcublas-windows-x86_64-12.4.5.8-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvtx/windows-x86_64/cuda_nvtx-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_profiler_api/windows-x86_64/cuda_profiler_api-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/visual_studio_integration/windows-x86_64/visual_studio_integration-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvprof/windows-x86_64/cuda_nvprof-windows-x86_64-12.4.127-archive.zip"
+          curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cccl/windows-x86_64/cuda_cccl-windows-x86_64-12.4.127-archive.zip"
+          unzip '*.zip' -d "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4"
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_cudart-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvcc-windows-x86_64-12.4.131-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvrtc-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\libcublas-windows-x86_64-12.4.5.8-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvtx-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_profiler_api-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\visual_studio_integration-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_nvprof-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\cuda_cccl-windows-x86_64-12.4.127-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" /E /I /H /Y
+          echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\libnvvp" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
+          echo "CUDA_PATH_V12_4=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8

llama.cpp/.github/actions/windows-setup-curl/action.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+name: 'Windows - Setup CURL'
+description: 'Composite action, to be reused in other workflow'
+inputs:
+  curl_version:
+    description: 'CURL version'
+    required: false
+    default: '8.6.0_6'
+  architecture:
+    description: 'Architecture of the libcurl to download'
+    required: false
+    default: 'win64'
+outputs:
+  curl_path:
+    description: "Path to the downloaded libcurl"
+    value: ${{ steps.get_libcurl.outputs.curl_path }}
+runs:
+  using: "composite"
+  steps:
+    - name: libCURL
+      id: get_libcurl
+      shell: powershell
+      env:
+        CURL_VERSION: ${{ inputs.curl_version }}
+        ARCHITECTURE: ${{ inputs.architecture }}
+      run: |
+        curl.exe -o $env:RUNNER_TEMP/curl.zip -L "https://curl.se/windows/dl-${env:CURL_VERSION}/curl-${env:CURL_VERSION}-${env:ARCHITECTURE}-mingw.zip"
+        mkdir $env:RUNNER_TEMP/libcurl
+        tar.exe -xvf $env:RUNNER_TEMP/curl.zip --strip-components=1 -C $env:RUNNER_TEMP/libcurl
+        echo "curl_path=$env:RUNNER_TEMP/libcurl" >> $env:GITHUB_OUTPUT

llama.cpp/.github/labeler.yml ADDED Viewed

	@@ -0,0 +1,94 @@

+# https://github.com/actions/labeler
+Apple Metal:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-metal.h
+            - ggml/src/ggml-metal/**
+            - README-metal.md
+SYCL:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-sycl.h
+            - ggml/src/ggml-sycl/**
+            - docs/backend/SYCL.md
+            - examples/sycl/**
+Nvidia GPU:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-cuda.h
+            - ggml/src/ggml-cuda/**
+Vulkan:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-vulkan.h
+            - ggml/src/ggml-vulkan/**
+documentation:
+    - changed-files:
+        - any-glob-to-any-file:
+            - docs/**
+            - media/**
+testing:
+    - changed-files:
+        - any-glob-to-any-file:
+            - tests/**
+build:
+    - changed-files:
+        - any-glob-to-any-file:
+            - cmake/**
+            - CMakeLists.txt
+            - CMakePresets.json
+examples:
+    - changed-files:
+        - any-glob-to-any-file:
+            - examples/**
+            - tools/**
+devops:
+    - changed-files:
+        - any-glob-to-any-file:
+            - .devops/**
+            - .github/**
+            - ci/**
+python:
+    - changed-files:
+        - any-glob-to-any-file:
+            - "**/*.py"
+            - requirements/**
+            - gguf-py/**
+            - .flake8
+script:
+    - changed-files:
+        - any-glob-to-any-file:
+            - scripts/**
+android:
+    - changed-files:
+        - any-glob-to-any-file:
+            - examples/llama.android/**
+server:
+    - changed-files:
+        - any-glob-to-any-file:
+            - tools/server/**
+ggml:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/**
+nix:
+    - changed-files:
+        - any-glob-to-any-file:
+            - "**/*.nix"
+            - .github/workflows/nix-*.yml
+            - .devops/nix/nixpkgs-instances.nix
+embedding:
+    - changed-files:
+        - any-glob-to-any-file: examples/embedding/
+Ascend NPU:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-cann.h
+            - ggml/src/ggml-cann/**
+            - docs/backend/CANN.md
+OpenCL:
+    - changed-files:
+        - any-glob-to-any-file:
+            - ggml/include/ggml-opencl.h
+            - ggml/src/ggml-opencl/**

llama.cpp/.github/pull_request_template.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR