philschmid HF staff commited on
Commit
1c52a90
1 Parent(s): 173fa22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -12,4 +12,74 @@ tags:
12
  - flan
13
  ---
14
 
15
- # FLAN-T5-XXL LoRA fine-tuned on `samsum`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - flan
13
  ---
14
 
15
+ # FLAN-T5-XXL LoRA fine-tuned on `samsum`
16
+
17
+ PEFT tuned FLAN-T5 XXL model.
18
+
19
+
20
+ # flan-t5-base-samsum
21
+
22
+ This model is a fine-tuned version of [philschmid/flan-t5-xxl-sharded-fp16](https://huggingface.co/philschmid/flan-t5-xxl-sharded-fp16) on the samsum dataset.
23
+ It achieves the following results on the evaluation set:
24
+ - Loss: 1.3716
25
+ - Rouge1: 47.2358
26
+ - Rouge2: 23.5135
27
+ - Rougel: 39.6266
28
+ - Rougelsum: 43.3458
29
+ - Gen Len: 17.3907
30
+
31
+ -
32
+
33
+ ## How to use the model
34
+
35
+ The model was trained using 🤗 [PEFT](https://github.com/huggingface/peft). This repository only contains the fine-tuned adapter weights for LoRA and the configuration to load the model. Below you can find a snippet on how to run inference using the model. This will load the FLAN-T5-XXL from hugging face if not existing locally.
36
+
37
+ 1. load the model
38
+
39
+ ```python
40
+ import torch
41
+ from peft import PeftModel, PeftConfig
42
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
43
+
44
+ # Load peft config for pre-trained checkpoint etc.
45
+ peft_model_id = "philschmid/flan-t5-xxl-samsum-peft"
46
+ config = PeftConfig.from_pretrained(peft_model_id)
47
+
48
+ # load base LLM model and tokenizer
49
+ model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, load_in_8bit=True, device_map={"":0})
50
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
51
+
52
+ # Load the Lora model
53
+ model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
54
+ model.eval()
55
+ ```
56
+
57
+ 2. generate
58
+ ```python
59
+
60
+ text = "test"
61
+
62
+ input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.cuda()
63
+ outputs = model.generate(input_ids=input_ids, max_new_tokens=10, do_sample=True, top_p=0.9)
64
+ print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
65
+ ```
66
+
67
+ ## Training procedure
68
+
69
+ ### Training hyperparameters
70
+
71
+ The following hyperparameters were used during training:
72
+ - learning_rate: 1e-3
73
+ - train_batch_size: auto
74
+ - seed: 42
75
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
76
+ - lr_scheduler_type: linear
77
+ - num_epochs: 5
78
+
79
+ ### Framework versions
80
+
81
+ - Transformers 4.27.1
82
+ - Pytorch 1.13.1+cu117
83
+ - Datasets 2.9.1
84
+ - PEFT@main
85
+