StarFox7
/

Llama-2-ko-7B-chat-ggml

Model card Files Files and versions Community

StarFox7 commited on Aug 6, 2023

Commit

153758f

•

1 Parent(s): 844596b

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -10,6 +10,7 @@ Llama-2-ko-7V-chat-ggml 은 [beomi/llama-2-ko-7b](https://huggingface.co/beomi/l
 - Llama2 tokenizer 에 [kfkas/Llama-2-ko-7b-Chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) 에서 사용된 한국어 Additaional Token 을 반영하여 생성했습니다.
 - **GGML** 포맷 모델은 [llama.cpp](https://github.com/ggerganov/llama.cpp) 를 사용하여 C/C++ 기반으로 Inference 합니다.
 - [llama.cpp](https://github.com/ggerganov/llama.cpp) 의 Python Binding 패키지인 [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) 을 사용하면 python 환경에서도 Inference 가능합니다.
 참고로, [Llama-2-ko-7B-ggml](https://huggingface.co/StarFox7/Llama-2-ko-7B-ggml) 에서 [Llama-2-ko-7b-chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) 의 베이스모델인 [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) 의 **GGML** 포맷 모델을 찾을 수 있습니다.

 - Llama2 tokenizer 에 [kfkas/Llama-2-ko-7b-Chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) 에서 사용된 한국어 Additaional Token 을 반영하여 생성했습니다.
 - **GGML** 포맷 모델은 [llama.cpp](https://github.com/ggerganov/llama.cpp) 를 사용하여 C/C++ 기반으로 Inference 합니다.
+- **GGML** 포맷 모델은 비교적 낮은 사양의 컴퓨팅 자원에서도 Inference 가능합니다. ( 예: 4비트 양자화 모델 (q4) 은 CPU,7-8GB RAM 환경에서 Inference 가능 )
 - [llama.cpp](https://github.com/ggerganov/llama.cpp) 의 Python Binding 패키지인 [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) 을 사용하면 python 환경에서도 Inference 가능합니다.
 참고로, [Llama-2-ko-7B-ggml](https://huggingface.co/StarFox7/Llama-2-ko-7B-ggml) 에서 [Llama-2-ko-7b-chat](https://huggingface.co/kfkas/Llama-2-ko-7b-Chat) 의 베이스모델인 [beomi/llama-2-ko-7b](https://huggingface.co/beomi/llama-2-ko-7b) 의 **GGML** 포맷 모델을 찾을 수 있습니다.