--- language: - en - ko pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 - llama-2-chat license: apache-2.0 library_name: peft --- # komt-Llama-2-7b-chat-hf-ggml https://github.com/davidkim205/komt This model quantized the [korean Llama 2 7B-chat](https://huggingface.co/davidkim205/komt-Llama-2-7b-chat-hf) using [llama.cpp](https://github.com/ggerganov/llama.cpp) to 4-bit quantization. Our model, being in the same format as TheBloke's ggml, supports the following libraries or UI. The following content references [TheBloke/Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML#metas-llama-2-13b-chat-ggml). GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as: * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with full GPU acceleration out of the box. Especially good for story telling. * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with GPU acceleration via the c_transformers backend. * [LM Studio](https://lmstudio.ai/), a fully featured local GUI. Supports full GPU accel on macOS. Also supports Windows, without GPU accel. * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Requires extra steps to enable GPU accel via llama.cpp backend. * [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server. * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server. ## Model Details * **Model Developers** : davidkim(changyeon kim) * **Repository** : https://github.com/davidkim205/komt * **quant methods** : q4_0, q4_1, q5_0, q5_1, q2_k, q3_k, q3_k_m, q3_k_l, q4_k, q4_k_s, q4_k_m, q5_k, q5_k_s, q5_k_m, q8_0, q4_0 ## Prompt Template ``` ### instruction: {prompt} ### Response: ``` Examples: ``` ### instruction: 자동차 종합(정기)검사 의무기간은 얼마인가요? ### Response: ``` response: ``` ### instruction: 자동차 종합(정기)검사 의무기간은 얼마인가요? ### Response:자동차 종합(정기)검사는 2년 1991년 7월 1일에 고시된 '자동차 보험료 조정기준'에서 취리로부터 제정된 기준 상 경량 살수차를 제외한 자동차 모든 승용자동차는 2년마다 필요하다. 이 법은 차량에 관계없이 2년마다 정기검사를 해야한다고 규제했다. ``` ## Usage When using the original [llama.cpp](https://github.com/ggerganov/llama.cpp) ``` make -j && ./main -m ./models/komt-Llama-2-7b-chat-hf-ggml/ggml-model-q4_0.bin -p "### instruction: 누전차단기가 내려가는 이유는 무엇입 니까?\n\n### Response:" ``` When using the modified llama.cpp for korean multi-task (recommended): Refer https://github.com/davidkim205/komt/tree/main/llama.cpp ``` make -j && ./main -m ./models/komt-Llama-2-7b-chat-hf-ggml/ggml-model-q4_0.bin -p "누전차단기가 내려가는 이유는 무엇입 니까?" ``` response: ``` $ make -j && ./main -m ./models/komt-Llama-2-7b-chat-hf-ggml/ggml-model-q4_0.bin -p "누전차단기가 내려가는 이유는 무엇입 니까?" I llama.cpp build info: I UNAME_S: Linux I UNAME_P: x86_64 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS I LDFLAGS: I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 make: Nothing to be done for 'default'. main: build = 987 (3ebb009) main: seed = 1692168046 llama.cpp: loading model from ./models/komt-Llama-2-7b-chat-hf-ggml/ggml-model-q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 5504 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_head_kv = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: n_gqa = 1 llama_model_load_internal: rnorm_eps = 5.0e-06 llama_model_load_internal: n_ff = 11008 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 0.08 MB llama_model_load_internal: mem required = 3647.96 MB (+ 256.00 MB per state) llama_new_context_with_model: kv self size = 256.00 MB llama_new_context_with_model: compute buffer total size = 71.84 MB system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0 ### instruction: 누전차단기가 내려가는 이유는 무엇입니까? ### Response:누전차단기가 내려가는 이유는 다음과 같습니다:1. 고장이나 오작동 확인: 누전차단기가 몇 차례 들어오면 고장이 나거나 오작동을 방지하는 데 도움이 됩니다.2. 누전 사고 피해: 많은 누전차단기가 내려가면 지역에서 일어나는 누전 사고의 영향을 줄이는 것으로 나타났습니다.3. 안정성: 누전차단기가 내려가면 전반적인 안정성이 향상됩니다. ```