QuantFactory
/

llama-161M-100B-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

munish0838 commited on Jul 14

Commit

9ace186

•

1 Parent(s): f6b0000

Create README.md

Files changed (1) hide show

README.md +28 -0

README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: abacaj/llama-161M-100B
+pipeline_tag: text-generation
+---
+# QuantFactory/llama-161M-100B-GGUF
+This is quantized version of [abacaj/llama-161M-100B](https://huggingface.co/abacaj/llama-161M-100B) created using llama.cpp
+# Model Description
+Trained on 100B tokens.
+- 1e-3 LR
+- 0.1 wd
+- WSD scheduler with 10% decay
+- 80% code, 10% NL, 10% instruction data
+- Dataset decontaminated against popular benchmarks following [bigcode](https://github.com/bigcode-project/bigcode-dataset/tree/main/decontamination)
+- 8x3090s 110~ hours
+This is a *base* pretrained model and requires further fine tuning to be useful.
+## Model Details
+| [openai/openai_humaneval](https://huggingface.co/datasets/openai/openai_humaneval) (greedy) | [mbpp](https://huggingface.co/datasets/google-research-datasets/mbpp) (greedy) |
+| :------------------ | :------------- |
+| 9.2% | 9.8% |