view article Article Synthetic dataset generation techniques: generating custom sentence similarity data By davanstrien • 6 days ago • 11
🚀GGUF Collection Llama.cpp compatible models, can be used on CPUs and GPUs! • 663 items • Updated 3 days ago • 23
view article Article Train custom AI models with the trainer API and adapt them to 🤗 By not-lain • 4 days ago • 19
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA May 24, 2023 • 38
A Primer on the Inner Workings of Transformer-based Language Models Paper • 2405.00208 • Published 29 days ago • 7
Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate Paper • 2401.16788 • Published Jan 30 • 1
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 133
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 22
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 235
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 48
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 238
Learning to Route Among Specialized Experts for Zero-Shot Generalization Paper • 2402.05859 • Published Feb 8 • 4
Zephyr ORPO Collection Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12 • 14
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published Apr 8 • 4
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 69
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 13
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 60
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15 • 55
DRAGON Models Collection Production-grade RAG-optimized 6-7B parameter models - "Delivering RAG on ..." the leading foundation base models • 11 items • Updated Feb 3 • 41
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 20
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages Paper • 2403.01926 • Published Mar 4 • 1
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 10
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 567
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 94
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Data-efficient LLMs Collection dataset pruning for advancing the capabilities of LLMs • 24 items • Updated 5 days ago • 1
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection Paper • 2306.06881 • Published Jun 12, 2023 • 1
FaceForensics++: Learning to Detect Manipulated Facial Images Paper • 1901.08971 • Published Jan 25, 2019 • 1
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Paper • 2203.05482 • Published Mar 10, 2022 • 5
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104
Papers about model merging Collection referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 13
Model Merging Papers Collection Collection of relevant papers about model merging • 13 items • Updated Apr 2 • 5
SLIM Models Collection Structured Language Instruction Models (SLIMs) • 21 items • Updated 2 days ago • 24
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation Paper • 2402.11909 • Published Feb 19 • 1