File size: 3,100 Bytes

5fcfd7c
 
 
 
7dc3342
5fcfd7c
 
 
7dc3342
 
 
 
 
 
 
 
 
5fcfd7c
 
 
 
 
 
 
7dc3342
5fcfd7c
7dc3342
 
 
 
 
 
 
 
5fcfd7c
 
 
6712069
b6ffd4b
 
 
 
6712069
b6ffd4b
6712069
 
 
b6ffd4b
 
6712069
 
b6ffd4b
6712069
 
 
 
b6ffd4b
 
 
 
6712069
b6ffd4b
 
 
 
 
 
 
 
6712069
b6ffd4b
 
 
 
 
 
6712069
b6ffd4b
 
 
 
 
 
 
 
6712069
5fcfd7c
 
 
 
 
 
 
 
 
 
 
 
7dc3342
5fcfd7c

---
library_name: peft
tags:
- dpo
base_model: SGaleshchuk/Llama-2-13b-hf_uk_rank-32_ft
model-index:
- name: Llama-2-13b-summarization_uk_dpo
  results: []
license: apache-2.0
datasets:
- SGaleshchuk/XL_SUM_ukr_synthetic_hallucinations
- csebuetnlp/xlsum
language:
- uk
metrics:
- rouge
pipeline_tag: summarization
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Llama-2-13b-summarization_uk_dpo

This model is a fine-tuned version of [SGaleshchuk/Llama-2-13b-hf_uk_rank-32_ft](https://huggingface.co/SGaleshchuk/Llama-2-13b-hf_uk_rank-32_ft) on summarization dataset.

## Set-up step description

* Fine-tune Llama-2 model on training data
* Generate summaries using fine-tuned Llama-2 model on validation set
* Corrupt generated summaries by adding information not given in input text
* Align fine-tuned Llama-2 with golden summaries to choose and reject noisy synthetic text
* Apply both fine-tuned and aligned versions on test set
* Assess level of faithfulness hallucinations in generated texts using GPT-4 and Rouge-L, and human evaluation on a small subset


## Intended uses & limitations
```python
# tested with colab+A100 GPU
!pip install -q -U peft transformers==4.30
!pip install flash-attn --no-build-isolation
!pip install einops bitsandbytes accelerate
# unpatch flash attention
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model_id = "SGaleshchuk/Llama-2-13b-summarization_uk_dpo"

# load base LLM model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
  model_id,
  low_cpu_mem_usage=True,
  torch_dtype=torch.float16,
  load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)

def prepare_instruction(text):


    prompt = """The article to summarize in maximum 100 words:{text}. Summary:""" # adapt to your needs

    return prompt.format(
        text=text,
    )
def summarization(text):
    instruction = prepare_instruction(text)
    input_ids = tokenizer(instruction, return_tensors="pt", truncation=True).input_ids.cuda()
    with torch.inference_mode():
      outputs = model.generate(
              input_ids=input_ids,
              max_new_tokens=128,
              do_sample=True,
              top_p=0.9,
              temperature=1e-2,
            )
      result = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
      result = result[len(instruction) :]
      print(result)
      return result

text = """your text here to summarize"
result = summarization(text)

```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 10

### Training results



### Framework versions

- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.19.1
- Tokenizers 0.15.2