Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 β’ 11 items β’ Updated 16 days ago β’ 352
Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated 16 days ago β’ 241
xLAM models Collection xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM β’ 9 items β’ Updated 3 days ago β’ 40
GLiNER bi-encoders Collection Bi-encoder and poly-encoder architectures of GLiNER β’ 5 items β’ Updated Sep 10 β’ 12
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated Aug 18 β’ 174
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs β’ 8 items β’ Updated May 16 β’ 14
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval Mar 22 β’ 57
Idefics2 πΆ Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. β’ 11 items β’ Updated May 6 β’ 88
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11 β’ 72
Vector-io compatible Datasets Collection These datasets can be loaded into your vector database with a single line bash command β’ 15 items β’ Updated 22 days ago β’ 3
Pokemons dataset captioned with different models Collection The Pokemons dataset from Lambda Labs is quite popular in the diffusion community because it lets us quickly validate ideas. β’ 3 items β’ Updated Nov 28, 2023 β’ 3
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. β’ 11 items β’ Updated Apr 3 β’ 108
Contra (Bottleneck T5) Collection Text autoencoders capable of embedding and generating text in a fixed-size latent space, useful for embeddings and latent space text editing. β’ 4 items β’ Updated Oct 3, 2023 β’ 27
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. β’ 19 items β’ Updated Apr 12 β’ 65
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. β’ 43 items β’ Updated Apr 12 β’ 114