metadata

annotations_creators:
  - no-annotation
language_creators:
  - found
languages:
  - id
licenses:
  - unknown
multilinguality:
  - monolingual
size_categories:
  - 100K<n<1M
source_datasets:
  - original
task_categories:
  - conditional-text-generation
task_ids:
  - summarization
paperswithcode_id: null

Dataset Card for ID-Collection

Dataset Description
Dataset Structure
Dataset Creation
Considerations for Using the Data
Additional Information

Dataset Description

Homepage:
Repository:
Paper:
Leaderboard:
Point of Contact:

Dataset Summary

This module load text dataset from local directory. The text dataset should have the format like Oscar dataset where each new entry is separated by empty lines.

You need to manually collect text datasets in a directory. The text dataset can then be loaded using the following command: datasets.load_dataset("./text_collection", data_dir="<path/to/dataset>").

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Indonesian

Dataset Structure

{
  'id': 'int64',
  'text': 'string',
}

Data Instances

An example of the dataset:

{
  'id': '1',
  'text': 'sultan agung dan dokternya bilang supaya adeknya diberi kacamata khusus'
}

Data Fields

id: id of the sample
text: content of the article

Data Splits

The dataset contains only train set.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

flax-community
/

gpt2-medium-indonesian

Dataset Card for ID-Collection

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions