view article Article ⚗️ 🧑🏼🌾 Let's grow some Domain Specific Datasets together By burtenshaw • 24 days ago • 26
view article Article Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM By Pclanglais • 27 days ago • 10
view article Article Releasing Youtube-Commons: a massive open corpus for conversational and multimodal data By Pclanglais • Apr 18 • 20
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 21
view article Article Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B Apr 4 • 20
matlok - Python Copilot Image Datasets Collection More extracted images on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/png • 4 items • Updated Feb 8 • 1
Image dataset Collection 10 datasets showcase how to configure and load image datasets • 10 items • Updated Dec 12, 2023 • 3
Geospatial Datasets Collection Geospatial datases on the Hub. If you want to submit more items to this collection, please request to join the geospatial organisation. • 9 items • Updated Nov 7, 2023 • 9
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 447