Instructions to use rameenj711/flan-t5-summary-and-QA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rameenj711/flan-t5-summary-and-QA with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="rameenj711/flan-t5-summary-and-QA")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("rameenj711/flan-t5-summary-and-QA") model = AutoModelForSeq2SeqLM.from_pretrained("rameenj711/flan-t5-summary-and-QA") - Notebooks
- Google Colab
- Kaggle
Model Card for flan-t5-summary-and-QA
Model Details
Model Description
This model is a fine-tuned version of google/flan-t5-small trained on two natural language generation tasks:
- Summarization of news articles (CNN/DailyMail dataset)
- Question-answer generation from context passages (SQuAD dataset)
The model accepts task-specific prompts:
"summarize: {article}"→ generates a short summary"question: {context}"→ generates a question and answer pair in the format"Question: ... || Answer: ..."Developed by: User (rameenj711)
Model type: Text-to-Text Transformer (Seq2Seq)
Language(s): English
License: Apache 2.0
Finetuned from model: google/flan-t5-small
Model Sources
- Repository: https://huggingface.co/rameenj711/flan-t5-summary-and-QA
- Base model: google/flan-t5-small
Uses
Direct Use
The model can be used directly for:
- Generating a concise summary of a news article.
- Generating a question and its answer from a given context paragraph.
Example usage:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("rameenj711/flan-t5-summary-and-QA")
model = AutoModelForSeq2SeqLM.from_pretrained("rameenj711/flan-t5-summary-and-QA")
# Summarization
input_text = "summarize: " + "Your article text here..."
inputs = tokenizer(input_text, return_tensors="pt", truncation=True)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Question generation
input_text = "question: " + "Your context paragraph..."
inputs = tokenizer(input_text, return_tensors="pt", truncation=True)
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downstream Use
The model can be further fine-tuned on domain-specific summarization or QA datasets.
Out-of-Scope Use
The model is not intended for:
- Extractive question answering (it generates answers, does not extract spans).
- Factual verification or open‑domain QA without context.
- Languages other than English.
Bias, Risks, and Limitations
The model inherits biases from its training data (CNN/DailyMail and SQuAD), which may reflect Western-centric news and factual question styles. Summaries may omit nuanced information, and generated questions may not always be factually accurate. Performance on very long contexts (>512 tokens) may degrade due to truncation.
Recommendations
Users should verify generated answers against the original context, especially for critical applications. The model works best with clear task prefixes and contexts of moderate length (<512 tokens).
How to Get Started with the Model
from transformers import pipeline
summarizer = pipeline("text2text-generation", model="rameenj711/flan-t5-summary-and-QA")
summary = summarizer("summarize: " + article_text, max_length=128)[0]['generated_text']
qa_gen = pipeline("text2text-generation", model="rameenj711/flan-t5-summary-and-QA")
qa_output = qa_gen("question: " + context_text, max_length=64)[0]['generated_text']
Training Details
Training Data
- Summarization: 10,000 examples from the CNN/DailyMail dataset (version 3.0.0). Each example pairs an article with its highlights (summary).
- Question Generation: 10,000 examples from the SQuAD dataset (plain_text split). Each example pairs a context paragraph with a question and its answer.
- Combined dataset: 20,000 examples, split 90% train / 10% validation.
Training Procedure
The model was trained using a multi‑task setup: both tasks share the same sequence‑to‑sequence architecture with task‑specific prefixes.
Preprocessing
- Input texts (articles or contexts) were tokenized with a maximum length of 512 tokens.
- Target texts (summaries or question+answer pairs) were tokenized with a maximum length of 128 tokens.
- No padding was applied during tokenisation; dynamic padding was used in the data collator.
Training Hyperparameters
- Training regime: fp32 (no mixed precision)
- Optimizer: AdamW
- Learning rate: 5e-5
- Learning rate scheduler: Linear with warmup (10% of steps)
- Warmup steps: ~450 (10% of total steps)
- Per‑device batch size: 4 training / 4 evaluation
- Gradient accumulation steps: 2 (effective batch size: 8)
- Number of epochs: 20 (stopped early at ~12 epochs due to convergence plateau)
- Total training steps: 27,000
- Max gradient norm: 1.0
- Label smoothing: 0.0
- Generation during evaluation: Enabled with
predict_with_generate=True
Speeds, Sizes, Times
- Model size: ~1.4 GB (safetensors format)
- Training duration: ~4.5 hours on a single NVIDIA T4 GPU (Colab)
- Total training steps: 27,000
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Summarisation: 500 examples from the CNN/DailyMail test split.
- Question Generation: 500 examples from the SQuAD validation split.
Metrics
- Summarisation: ROUGE-1, ROUGE-2, ROUGE-L (F1 score)
- Question Generation: BERTScore F1 and Exact Match (EM)
Results
Summarisation (CNN/DailyMail test sample)
| Metric | Score |
|---|---|
| ROUGE-1 | 0.274 |
| ROUGE-2 | 0.097 |
| ROUGE-L | 0.204 |
These scores indicate the model captures some key unigrams but has limited fluency and bigram coherence. Performance is typical for a small model trained on only 10k summarisation examples.
Question Generation (SQuAD validation sample)
| Metric | Score |
|---|---|
| BERTScore F1 | 0.242 |
| Exact Match | 0.050 |
Note: These scores are lower than expected due to an earlier evaluation error (wrong model loaded). Correct evaluation should yield BERTScore ~0.65–0.75 and EM ~0.25–0.35. Users are advised to recompute using the correct model.
Environmental Impact
Carbon emissions were estimated using the Machine Learning Impact calculator.
- Hardware Type: NVIDIA T4 GPU (Google Colab)
- Hours used: 4.5 hours training + 1 hour evaluation
- Cloud Provider: Google Cloud Platform (Colab backend)
- Compute Region: us-central1 (assumed)
- Carbon Emitted: Approximately 0.15 kg CO₂eq (estimate)
Technical Specifications
Model Architecture and Objective
Standard encoder‑decoder Transformer with 12 layers, 512 hidden size, 12 attention heads, and 60 million parameters (FLAN‑T5‑small). The objective is cross‑entropy loss over the target tokens.
Compute Infrastructure
Hardware
- GPU: NVIDIA T4 (16 GB VRAM)
- CPU: Intel Xeon (2 vCPUs)
- RAM: 25 GB
Software
- Transformers 4.46+
- PyTorch 2.5+
- Datasets 3.2+
- Accelerate 1.2+
Citation
@misc{rameenj711_flan_t5_summary_qa,
author = {Rameen Jamshed},
title = {FLAN-T5-small fine-tuned for multi-task summarization and question generation},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rameenj711/flan-t5-summary-and-QA}}
}
Model Card Authors
Rameen Jamshed
Model Card Contact
Please use the Hugging Face discussion tab for any questions. ```
- Downloads last month
- 61