File size: 2,994 Bytes

---
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- bjoernp/tagesschau-2018-2023
language:
- de
- en
metrics:
- accuracy
---

# this model was trained on summarising some short texts and finding headlines for newspapers



## Model Details

This is the model card of a 🤗 transformers model that has been pushed on the Hub. 

- **Developed by:** Kamila Trinkenschuh 
- **Shared by:** Kamila Trinkenschuh 
- **Model type:** 
was fine tuned on performing more text generation and text summaration task
- **Finetuned from model**:LeoLM/leo-hessianai-7b



## Use
You can use this model to see some examples how the model deals with finding headlines for articles. I encourage you to fine tune it for your own purposes/tasks



### Out-of-Scope Use

This model was fine tuned with a A100 GPU in Google Colab



## Bias, Risks, and Limitations

The LLM was trained on a subset for 5000 samples of the bjoernp/tagesschau-2018-2023 dataset



# Load model directly
```
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)
```
# Use a pipeline as a high-level helper
```
from transformers import pipeline

pipe = pipeline("text-generation", model="Kamilatr/Ueberschriftengenerator_LEOLM", trust_remote_code=True)

```

### Training Procedure 

The LeoLM Model was fine tuned with LoRA. 


#### Speeds, Sizes, Times
```python
training_arguments = TrainingArguments(
        output_dir="./results",
        evaluation_strategy="epoch", 
        optim="paged_adamw_8bit", #used with QLoRA
        per_device_train_batch_size=4, #batch size
        per_device_eval_batch_size=4, #same but for evaluation
        gradient_accumulation_steps=1, #number of lines to accumulate gradient, carefull because it changes the size of a "step".Therefore, logging, evaluation, save will be conducted every gradient_accumulation_steps * xxx_step training example
        log_level="debug", #you can set it to  ‘info’, ‘warning’, ‘error’ and ‘critical’
        save_steps=500, #number of steps between checkpoints
        logging_steps=20, #number of steps between logging of the loss for monitoring adapt it to your dataset size
        learning_rate=4e-5, #you can try different value for this hyperparameter
        num_train_epochs=1,
        warmup_steps=100,
        lr_scheduler_type="constant",
)
```


## Evaluation and Testing

From the dataset sample, 1500 randomly assigned were for evaluation and 3500 for testing. The whole fine tuning process took less than 30 minutes (with Colab's A100 GPU, accessible only with Colab Pro+)



### Results

- Epoch: 1
- Training Loss: 1.866900	
- Validation Loss: 1.801998

#### Summary

You can see the code in my github repo: https://github.com/KamilaTrinkenschuh/Ueberschriftengenerator_LEOLM