David Golchinfar PRO

DavidGF

AI & ML interests

finetune llms, improve german language understanding and generated text of llms

Organizations

Posts 2

view post
Post
1631
Please... feed this Llama some Sauerkraut! ๐Ÿฒ

Said and done. Here it is. Our Sauerkraut Version of the strong Llama3-8b by Meta. Released from HANNOVER MESSE, just in front of meta booth.
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

According to benchmarks (LM-Evaluation-Harness 0.4.2), our #SauerkrautLM Dataset and fine-tuning pipeline improved the Model noticeably (AVG = 74,57), especially Reasoning and Common Sense capabilities.

Again we provide some more detail on the whole process:
โœ… Original model: Llama-3-8b-Instruct
โœ… Training Duration: 12 hours
โœ… Training procedure: 2-staged DPO
โœ… Trained data: 70k (first stage) and 20k (second stage)
โœ… GPU: 4x RTX6000 ADA
โœ… New model: Llama-3-SauerkrautLM-8b-Instruct
โœ… Total training costs: 54,72 Dollar ๐Ÿ’ด - RunPod FTW (excluding synthesizing data, curating data, benchmarks, error handling, testing)

See our model card on Hugging Face for more details: VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

There will be more details on benchmarks during the next days.
view post
Post
2922
"How expensive is it actually to teach a #LanguageModel German through #finetuning ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ? We get asked this quite often.

There is no one-size-fits-all answer to this question, as among other factors:
โน each fine-tuning is different,
โน the hardware used can be a major cost driver,
โน the amount and type of training data can extend the process,
โน and the skills to be trained can increase the difficulty of fine-tuning.

However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)


Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b

Total cost: 312 euros ๐Ÿ’ถ

These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.

models

None public yet

datasets

None public yet