YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Dataset Card for Custom Text Dataset
Dataset Name
Custom CNN/DailyMail Text Summarization Dataset
Overview
This dataset is a custom subset and extension of the CNN/DailyMail dataset, consisting of news articles and their corresponding summaries.
Composition
Train Dataset: A custom train dataset consisting of one long news article with its manually written summary. Test Dataset: A test dataset sampled from the original CNN/DailyMail dataset, consisting of 100 articles and their corresponding highlights.
Collection Process
The custom train dataset was crafted using news articles from the CNN/DailyMail dataset.
Preprocessing
The intput text was tokenized.
How to Use
from datasets import load_from_disk
# Load the custom dataset
train_dataset = load_from_disk("./results/custom_dataset/train")
test_dataset = load_from_disk("./results/custom_dataset/test")
Evaluation
This dataset can be evaluated using metrics such as ROUGE or BLEU.
Limitations
The train dataset consists of only one example.
Ethical Considerations
The data originates from news sources, which may contain sensitive or politically biased contents.