smcleod's picture
Update README.md
36c4c11 verified
---
license: apache-2.0
datasets:
- smcleod/golang-coder
- smcleod/golang-programming-style-best-practices
- ExAi/Code-Golang-QA-2k
- google/code_x_glue_ct_code_to_text
- semeru/code-text-go
language:
- en
tags:
- golang
- code
- go
- programming
- llama
- text-generation-inference
---
# Llama 3.1 8b Golang Coder v3
This model has been trained on Golang style guides, best practices and code examples.
This should (hopefully) make it quite capable with Golang coding tasks.
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/630fff3f02ce39336c495fe9/5R1WZ9hvqX4XTKws-FaJ3.jpeg)
## LoRA
- [FP16](https://huggingface.co/smcleod/llama-3-1-8b-smcleod-golang-coder-v3/tree/main/smcleod-golang-coder-v3-lora-fp16)
- [BF16](https://huggingface.co/smcleod/llama-3-1-8b-smcleod-golang-coder-v3/tree/main/smcleod-golang-coder-v3-lora-bf16)
## GGUF
- Q8_0 (with f16 embeddings): https://huggingface.co/smcleod/llama-3-1-8b-smcleod-golang-coder-v3/blob/main/llama-3-1-8b-smcleod-golang-coder-v2.etf16-Q8_0.gguf
## Ollama
- https://ollama.com/sammcj/llama-3-1-8b-smcleod-golang-coder-v3
## Training
I trained this model (based on Llama 3.1 8b) on a merged dataset I created consisting of 50,627 rows, 13.3M input tokens and 2.2M output tokens.
The total training consisted of 1,020,719 input tokens and 445,810 output tokens from 45,565 items in the dataset.
The dataset I created for this consists of multiple golang/programming focused datasets cleaned and merged and my own synthetically generated dataset based on several open source golang coding guides.
- https://huggingface.co/datasets/smcleod/golang-coder
- https://huggingface.co/datasets/smcleod/golang-programming-style-best-practices