metadata
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- code
pretty_name: opencores
size_categories:
- n<1K
Dataset Card for Opencores
We gathered high-quality specification-code pairs from Opencores, a community aimed to developing digital open-source hardware using electronic design automation (EDA). We then filtered out data instances exceeding 4096 characters in length and those that could not be parsed into Abstract Syntax Trees (AST). The final dataset comprises approximately 800 data instances.
Dataset Features
- instruction (string): The nature language instruction for Verilog code design.
- input (string): For dataset loading efficiency, no valid meaning.
- output (string): The expected Verilog code.
- name (string): Name of Verilof module.
Loading the dataset
from datasets import load_dataset
ds = load_dataset("WANGNingroci/opencores", split="train")
print(ds[0])
Citation
@article{wang2024large,
title={Large Language Model for Verilog Generation with Golden Code Feedback},
author={Wang, Ning and Yao, Bingkun and Zhou, Jie and Wang, Xi and Jiang, Zhe and Guan, Nan},
journal={arXiv preprint arXiv:2407.18271},
year={2024}
}