view article Article Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM By Pclanglais • 1 day ago • 7
view article Article Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data By Pclanglais • 9 days ago • 17
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 11
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B 24 days ago • 17
matlok - Python Copilot Image Datasets Collection More extracted images on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/png • 4 items • Updated Feb 8 • 1
Image dataset Collection 10 datasets showcase how to configure and load image datasets • 10 items • Updated Dec 12, 2023 • 3
Geospatial Datasets Collection Geospatial datases on the Hub. If you want to submit more items to this collection, please request to join the geospatial organisation. • 9 items • Updated Nov 7, 2023 • 9
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 437