YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is a straightforward copy of the original 3B parameter model, but only with the following models:

Thanks to Green-Sky for also providing similar work.

  • HF to GGUF converted model in f16 precision -> model_f16.gguf
    • It was converted using llama.cpp with this specific commit.
    • Command: python3 path_to_llama_cpp/convert_hf_to_gguf.py --outfile ./model_f16.gguf --outtype f16
  • quantized (GGUF version) in Q1_3 format
    • Quantization is done via llama-quantize on that same commit.
  • quantized (GGUF version) in Q2_2 format
    • Quantization is done via llama-quantize on that same commit.

Please keep in mind that if you want to test this model through llama-cli on Metal (e.g., MacBook Pro with M3 Pro, as I did) you would need to use the --n-gpu-layers 0 flag, otherwise the following error will occur:

/Users/basavyr/Repos/external/llama.cpp/llama-cli -m model_quant_Q2_2.gguf -p "hey there"
Log start
main: build = 3505 (45719a24)
main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0
main: seed  = 1724230525
llama_model_loader: loaded meta data with 30 key-value pairs and 470 tensors from model_quant_Q2_2.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

.........................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Pro
ggml_metal_init: picking default device: Apple M3 Pro
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M3 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 12884.92 MB
llama_kv_cache_init:      Metal KV buffer size =   650.00 MiB
llama_new_context_with_model: KV self size  =  650.00 MiB, K (f16):  325.00 MiB, V (f16):  325.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.12 MiB
llama_new_context_with_model:      Metal compute buffer size =   157.00 MiB
llama_new_context_with_model:        CPU compute buffer size =    62.50 MiB
llama_new_context_with_model: graph nodes  = 1124
llama_new_context_with_model: graph splits = 3
ggml/src/ggml-metal.m:1612: MUL MAT-MAT not implemented
ggml/src/ggml-metal.m:1612: MUL MAT-MAT not implemented[1]    26436 abort      /Users/basavyr/Repos/external/llama.cpp/llama-cli -m model_quant_Q2_2.gguf -p
Downloads last month
16
GGUF
Model size
3.32B params
Architecture
bitnet
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.