Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 5 days ago • 15
Synthetic (text) Dataset Generation Collection Papers about synthetic dataset generation • 9 items • Updated 3 days ago • 3
sentence-transformers-from-synthetic-data Collection Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model • 3 items • Updated 1 day ago • 15
view article Article ⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2 By burtenshaw • 4 days ago • 20
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 3 days ago • 34
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 14
SimPO Collection This collections contains the list of model being trained and evaluated in the preprint: SimPO: Simple Preference Optimization with a Reference-Free R • 25 items • Updated 8 days ago • 9
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts Paper • 2405.11273 • Published 14 days ago • 15
C4AI Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated 9 days ago • 34
view article Article Enjoy the Power of Phi-3 with ONNX Runtime on your device By Emma-N • 11 days ago • 19
AkaLlama Collection Korean adaptation of Llama-3 LLM suites, developed by MIR Lab @ Yonsei University • 3 items • Updated 15 days ago • 1
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean Paper • 2403.10882 • Published Mar 16 • 5
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 11 items • Updated 16 days ago • 103
NuNerZero - Zero Shot NER Collection The best compact Zero-Shot NER models with MIT license • 4 items • Updated 22 days ago • 13
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • 25 days ago • 25
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 48
Llama 2 Family Collection This collection hosts the transformers and original repos of the Llama 2 and Llama Guard releases • 13 items • Updated Apr 18 • 36
Granite Time Series Models Collection A collection of time series models trained by IBM licensed under CDLA-permissive-2.0 license. • 3 items • Updated 25 days ago • 5
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 18 items • Updated 2 days ago • 135
WildChat: 1M ChatGPT Interaction Logs in the Wild Paper • 2405.01470 • Published about 1 month ago • 53
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 69
view article Article Overview of natively supported quantization schemes in 🤗 Transformers Sep 12, 2023 • 8
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 61
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1 • 53
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 40
Korean Datasets I've released so far. Collection 지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated 8 days ago • 14
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 22
FewMany Collection Benchmark For Few Shot Classification with Many Classes • 8 items • Updated Apr 18 • 6
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 238
LayoutLM Collection The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 5 items • Updated 11 days ago • 9
Table Transformer Collection The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. • 5 items • Updated 11 days ago • 12
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated 2 days ago • 299
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 73
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 557
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 235