initial release of minillama

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +56 -0
minillama.gguf +3 -0
training.txt +1 -0

.gitattributes CHANGED Viewed

@@ -1,6 +1,7 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text

 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
+*.gguf filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,59 @@
 ---
 license: mit
 ---

 ---
+inference: true
+language:
+- en
 license: mit
+model_creator: Mads Havmand
+model_name: minillama
+model_type: llama
+quantized_by: Havmand
+tags:
+- llama
+- test
+- development
 ---
+# minillama
+- Model creator: [Mads Havmand](https://huggingface.co/Havmand)
+## Description
+minillama is a minimal Large Language Model using the Llama architecture and distributed in the GGUF format.
+The purpose of the model is to be small and technically qualify as a model that can be loaded with llama.cpp without causing an error.
+I originally created this model because I needed a small model for my unit tests of Python code that used llama-cpp-python.
+The model __can technically__ be used for inference, but the output produced is a close to useless as you can get.
+Tokens per second is nice though, at around 1000 tokens per second on an Apple M2 Pro.
+To reduce file size, the model is quantized using Q2_K.
+The model contains 4.26 million parameters and is 3.26 MiB.
+As for the vocabulary, the model uses the llama vocabulary provided by [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/97c1549808d2742d37584a3c9df28154bdf34417/models/ggml-vocab-llama.gguf) (SHA512: `38a5acf305050422882044df0acc97e5ae992ed19b2838b3b58ebbbb1f61c59bfc12a6f686a724aed32227045806e4dd46aadf9822155d1169455fa56d38fbc2`)
+The training corpus consists of a space and a newline:
+```hexdump
+00000000  20 0a                                             | .|
+00000002
+```
+Finally, the model was build using llama.cpp's `train-text-from-scratch` (from commit [97c1549808d2742d37584a3c9df28154bdf34417](https://github.com/ggerganov/llama.cpp/tree/97c1549808d2742d37584a3c9df28154bdf34417)). The command used was:
+```sh
+./train-text-from-scratch \
+        --vocab-model models/ggml-vocab-llama.gguf \
+        --ctx 1 --embd 64 --head 1 --layer 1 \
+        --checkpoint-in  chk-minillama-LATEST.gguf \
+        --checkpoint-out chk-minillama-ITERATION.gguf \
+        --model-out ggml-minillama-f32-ITERATION.gguf \
+        --train-data "training.txt" \
+        -t 6 -b 16 --seed 1 --adam-iter 1 \
+        --no-checkpointing
+```
+Quantization happened using `./quantize ggml-minillama-f32-LATEST.gguf 10`.
+These files were quantized using hardware kindly provided by me.

minillama.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b75b6525e450d261d47552b9ba1ddda669889371526082cde7bb6a2d114efc3b
+size 4139456

training.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+