--- library_name: transformers tags: - trl - sft license: apache-2.0 datasets: - bjoernp/tagesschau-2018-2023 language: - de - en metrics: - accuracy --- # this model was trained on summarising some short texts and finding headlines for newspapers ## Model Details This is the model card of a 🤗 transformers model that has been pushed on the Hub. - **Developed by:** Kamila Trinkenschuh - **Shared by:** Kamila Trinkenschuh - **Model type:** was fine tuned on performing more text generation and text summaration task - **Finetuned from model**:LeoLM/leo-hessianai-7b ## Use You can use this model to see some examples how the model deals with finding headlines for articles. I encourage you to fine tune it for your own purposes/tasks ### Out-of-Scope Use This model was fine tuned with a A100 GPU in Google Colab ## Bias, Risks, and Limitations The LLM was trained on a subset for 5000 samples of the bjoernp/tagesschau-2018-2023 dataset # Load model directly ``` from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) ``` # Use a pipeline as a high-level helper ``` from transformers import pipeline pipe = pipeline("text-generation", model="Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True) ``` ### Training Procedure The LeoLM Model was fine tuned with LoRA. #### Speeds, Sizes, Times ```python training_arguments = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", optim="paged_adamw_8bit", #used with QLoRA per_device_train_batch_size=4, #batch size per_device_eval_batch_size=4, #same but for evaluation gradient_accumulation_steps=1, #number of lines to accumulate gradient, carefull because it changes the size of a "step".Therefore, logging, evaluation, save will be conducted every gradient_accumulation_steps * xxx_step training example log_level="debug", #you can set it to ‘info’, ‘warning’, ‘error’ and ‘critical’ save_steps=500, #number of steps between checkpoints logging_steps=20, #number of steps between logging of the loss for monitoring adapt it to your dataset size learning_rate=4e-5, #you can try different value for this hyperparameter num_train_epochs=1, warmup_steps=100, lr_scheduler_type="constant", ) ``` ## Evaluation and Testing From the dataset sample, 1500 randomly assigned were for evaluation and 3500 for testing. The whole fine tuning process took less than 30 minutes (with Colab's A100 GPU, accessible only with Colab Pro+) ### Results - Epoch: 1 - Training Loss: 1.866900 - Validation Loss: 1.801998 #### Summary You can see the code in my github repo: https://github.com/KamilaTrinkenschuh/Ueberschriftengenerator_LEOLM