Diff Datasets
Datasets containing github diffs
Viewer • Updated • 10.7M • 1.31k • 3Note Diffs only, no full files
bigcode/github-commits-diff-dedup-pjjs-april
Viewer • Updated • 146k • 21.8kNote Contains full new and old file
ASSERT-KTH/megadiff-single-function
Viewer • Updated • 72.4k • 86 • 2Note Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size -- https://arxiv.org/pdf/2108.04631 Refined version of "ASSERT-KTH/megadiff" where each line has the old buggy function and the corrected one Only code, no commit messages
ASSERT-KTH/megadiff
Viewer • Updated • 657k • 344 • 1
mamiksik/processed-commit-diffs
Viewer • Updated • 77.8k • 93 • 3Note Patches, no diffs or full files. Only taken from high quality files
epinnock/commit-diffs
Viewer • Updated • 117k • 36 • 1Note Has new file, old file and diff
bigcode/commitpackft
Viewer • Updated • 702k • 4.91k • 58Note Has full old and new files Filtered bigcode/commitpack for high quality commit messages
ObscuraCoder/commit-chronicle
Viewer • Updated • 3.01M • 70 • 1Note Diff and commit message only Filtered version of the JetBrains-Research/commit-chronicle
JetBrains-Research/commit-chronicle
Viewer • Updated • 10.9M • 2k • 5Note Diffs with meta data
chargoddard/commitpack-ft-instruct
Viewer • Updated • 491k • 49 • 2Note Add a prefix question to the commit message as an instruction Data taken from bigcode/commitpackft
Maxscha/commitbench
Viewer • Updated • 1.66M • 241 • 5Note 4 years old but looks good quality Diff and commit message
ASSERT-KTH/repairllama-datasets
Viewer • Updated • 394k • 131 • 1Note 6 splits containing input output pairs where the input is the code with a bug and the output is the correction RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair -- https://arxiv.org/abs/2312.15698