QuantFactory/llama-161M-100B-GGUF
This is quantized version of abacaj/llama-161M-100B created using llama.cpp
Model Description
Trained on 100B tokens.
- 1e-3 LR
- 0.1 wd
- WSD scheduler with 10% decay
- 80% code, 10% NL, 10% instruction data
- Dataset decontaminated against popular benchmarks following bigcode
- 8x3090s 110~ hours
This is a base pretrained model and requires further fine tuning to be useful.
Model Details
openai/openai_humaneval (greedy) | mbpp (greedy) |
---|---|
9.2% | 9.8% |
- Downloads last month
- 82
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for QuantFactory/llama-161M-100B-GGUF
Base model
abacaj/llama-161M-100B