metadata
license: cc-by-nc-sa-4.0
datasets:
- SebastianBodza/Ger_WizardLM_evol_instruct_70k_V0
language:
- de
DElefant:
![](https://huggingface.co/SebastianBodza/DElefant/resolve/main/badge_gerlefant.png)
Model Description:
Full-Finetuning of the German-BLOOM model on an RTX 3090 with the translated WizardLM Dataset.
Roadmap:
If there is sufficient demand, additional adjustments can be made:
- Native German generated dataset
- Full Fine-Tuning of larger LLMs e.g. Falcon, Starcoderplus, ...
How to use:
Prompt-Template:
{instruction}\n\n### Response:
Code example for inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SebastianBodza/DElefant")
model = AutoModelForCausalLM.from_pretrained("SebastianBodza/DElefant", device_map="auto")
frage = "Wie heißt der Bundeskanzler?"
prompt = f"{frage}\n\n### Response:"
txt = tokenizer(prompt, return_tensors="pt").to("cuda")
txt = model.generate(**txt,
max_new_tokens=256,
eos_token_id=tokenizer.eos_token_id)
tokenizer.decode(txt[0], skip_special_tokens=True)
Training:
Training was based on Llama-X with the adaptions of WizardLMs training script.
deepspeed Llama-X/src/train_freeform.py \
--model_name_or_path malteos/bloom-6b4-clp-german \
--data_path ger_alpaca_evol_instruct_70k_e.json \
--output_dir ./full_finetune \
--num_train_epochs 2 \
--model_max_length 2048 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 400 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed deepspeed.json \
--bf16 True