talalH commited on
Commit
5e470aa
1 Parent(s): 0ab95f5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+ This model card provides information about a fine-tuned T5 base model that has been specifically trained for generating summaries. The model utilizes transfer learning techniques and is based on a subset of the XSum and ChatGPT datasets. We have made some key modifications to the training process to optimize the model's performance and provide the best possible summaries, particularly supporting greater length outputs.
11
+
12
+ **Dataset and Training:**
13
+ The fine-tuned T5 base model is trained on a carefully curated subset of the XSum and ChatGPT datasets. These datasets contain a wide range of text samples, including news articles and conversational data. By utilizing this diverse data, the model gains a broader understanding of language and improves its ability to generate accurate and coherent summaries.
14
+
15
+ **Transfer Learning for Summarization:**
16
+ Transfer learning is employed to enhance the model's performance in generating summaries. The T5 base model, pre-trained on a large corpus of text, is fine-tuned using the curated dataset mentioned above. This process allows the model to leverage its pre-existing knowledge while adapting specifically to the summarization task. By fine-tuning the model, we aim to improve its ability to capture important information and generate concise summaries.
17
+
18
+ **Target Output Length:**
19
+ One notable difference between this model and other similar models is that it is trained on the target output length of 512. This means that the model is explicitly trained to generate summaries that are up to 512 tokens long. By focusing on this target output length, we aim to provide summaries that are more comprehensive and informative, while still maintaining a reasonable length.
20
+
21
+ **Enhanced Support for Greater Length Output:**
22
+ We are confident that this fine-tuned T5 model will generate the best possible summaries, particularly for supporting greater length outputs. By training the model with a specific focus on generating longer summaries, we have enhanced its ability to handle and convey more detailed information. This makes the model particularly useful in scenarios where longer summaries are required, such as summarizing lengthy documents or providing in-depth analysis.
23
+
24
+ **Conclusion:**
25
+ The fine-tuned T5 base model described in this model card offers an advanced summarization capability, with a specific emphasis on supporting greater length outputs. By utilizing a carefully curated dataset and applying transfer learning techniques, the model has been optimized to generate accurate and informative summaries. We believe that this model will be a valuable tool for a wide range of applications that require comprehensive and well-structured summaries.
26
+
27
+
28
+ ## How to Get Started with the Model
29
+
30
+ Use the code below to get started with the model.
31
+
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
34
+ from math import ceil
35
+
36
+ model_name = "talalH/summarizer_on_T5_base"
37
+ tokenizer = AutoTokenizer.from_pretrained(model_name, device_map='auto')
38
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map='auto')
39
+
40
+ print("-"*100)
41
+ print("\nHi !! ENTER A TEXT TO SUMMARIZE (type 'exit' to end)\n")
42
+
43
+ while True:
44
+ user_input = input("USER: ")
45
+ if user_input.lower() == "exit":
46
+ break
47
+
48
+ input_text = user_input
49
+ input_ids = tokenizer(f'summarize: {input_text}</s>', return_tensors="pt"
50
+ , max_length=512, truncation=True,).input_ids
51
+ min_len = ceil(len(input_text.split())/2)
52
+ outputs = model.generate(input_ids, temperature=0.3, repetition_penalty=10.0
53
+ , num_return_sequences=3, no_repeat_ngram_size=2
54
+ , num_beams=10, num_beam_groups=5
55
+ , min_length = min_len, max_length=512, diversity_penalty=2.0
56
+ )
57
+ sequences = tokenizer.batch_decode(outputs, skip_special_tokens=True)
58
+
59
+ print("\nOUTPUT")
60
+ try:
61
+ if len(sequences):
62
+ for para_phrase in sequences:
63
+ print("T5: ",para_phrase,"\n")
64
+ else:
65
+ print("T5: NO RESPONSE RETURNED")
66
+ except:
67
+ print("T5: NO RESPONSE RETURNED")
68
+
69
+ print("-"*100)
70
+
71
+ ```
72
+ <!-- Provide a longer summary of what this model is. -->
73
+
74
+ - **Developed by:** Talal Hassan (talalhassan141@gmail.com)
75
+ - **Finetuned from model [optional]:** T5 BASE
76
+
77
+
78
+ ## Uses
79
+
80
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
81
+
82
+ For Text Summarization
83
+
84
+ ## Training Details
85
+
86
+ epochs: 4
87
+ Warmup_steps: 50
88
+ max_step: -1
89
+ lr:5e-5
90
+ batch_size = 4
91
+
92
+ - **Hardware Type:** Tesla K80 GPUs
93
+ - **Hours used:** 48h
94
+
95
+
96
+ ## Model Card Authors [optional]
97
+
98
+ Talal Hassan (talalhassan141@gmail.com)
99
+
100
+ ## Model Card Contact
101
+
102
+ Talal Hassan (talalhassan141@gmail.com)
103
+
104
+