Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 98
From screenshots to HTML Collection WebSight is a dataset of 823,000 HTML/CSS codes representing synthetically generated English websites, each accompanied by a corresponding screenshot. • 4 items • Updated 17 days ago • 15
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 52
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 171
StarVector: Generating Scalable Vector Graphics Code from Images Paper • 2312.11556 • Published Dec 17, 2023 • 26
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers Paper • 2311.15475 • Published Nov 27, 2023 • 2
FeedRec: News Feed Recommendation with Various User Feedbacks Paper • 2102.04903 • Published Feb 9, 2021 • 2
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models Paper • 2311.10093 • Published Nov 16, 2023 • 54
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 73
SVG Collection Collection Collection of SVG files from various sources. • 7 items • Updated Oct 22, 2023 • 4
CoEdIT Collection Collection of the publicly available CoEdIT dataset and instruction-tuned models for text editing. • 6 items • Updated 17 days ago • 5
Geospatial Datasets Collection Geospatial datases on the Hub. If you want to submit more items to this collection, please request to join the geospatial organisation. • 9 items • Updated Nov 7, 2023 • 9
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 53
Hierarchical Neural Coding for Controllable CAD Model Generation Paper • 2307.00149 • Published Jun 30, 2023 • 2
SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks Paper • 2207.04632 • Published Jul 11, 2022 • 2
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models Paper • 2211.11319 • Published Nov 21, 2022 • 4
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers Paper • 2304.14400 • Published Apr 27, 2023 • 4
DistilBERT release Collection Original DistilBERT model, checkpoints obtained from using teacher-student learning from the original BERT checkpoints. • 6 items • Updated 15 days ago • 10
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 54
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 165
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance Paper • 2307.00522 • Published Jul 2, 2023 • 27
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 40
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 52
MIMIC: Masked Image Modeling with Image Correspondences Paper • 2306.15128 • Published Jun 27, 2023 • 7
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing Paper • 2306.10012 • Published Jun 16, 2023 • 33
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 27
WizardCoder: Empowering Code Large Language Models with Evol-Instruct Paper • 2306.08568 • Published Jun 14, 2023 • 26
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds Paper • 2306.00980 • Published Jun 1, 2023 • 13
HuggingFace's Transformers: State-of-the-art Natural Language Processing Paper • 1910.03771 • Published Oct 9, 2019 • 15
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 9