Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate Paper • 2401.16788 • Published Jan 30 • 1
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community 18 days ago • 93
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training Paper • 2309.10400 • Published Sep 19, 2023 • 20
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 233
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • 8 days ago • 31
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published 10 days ago • 225
Learning to Route Among Specialized Experts for Zero-Shot Generalization Paper • 2402.05859 • Published Feb 8 • 4
Zephyr ORPO Collection Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated 21 days ago • 13
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published 24 days ago • 4
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 68
NEFTune: Noisy Embeddings Improve Instruction Finetuning Paper • 2310.05914 • Published Oct 9, 2023 • 13
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published about 1 month ago • 98
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published 29 days ago • 58
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 22
PERL: Parameter Efficient Reinforcement Learning from Human Feedback Paper • 2403.10704 • Published Mar 15 • 54
DRAGON Models Collection Production-grade RAG-optimized 6-7B parameter models - "Delivering RAG on ..." the leading foundation base models • 11 items • Updated Feb 3 • 39
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 20
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages Paper • 2403.01926 • Published Mar 4 • 1
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 10
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 560
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 93
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 87
Data-efficient LLMs Collection dataset pruning for advancing the capabilities of LLMs • 21 items • Updated 10 days ago • 1
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection Paper • 2306.06881 • Published Jun 12, 2023 • 1
FaceForensics++: Learning to Detect Manipulated Facial Images Paper • 1901.08971 • Published Jan 25, 2019 • 1
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Paper • 2203.05482 • Published Mar 10, 2022 • 5
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 104
Papers about model merging Collection referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13 • 13
Model Merging Papers Collection Collection of relevant papers about model merging • 13 items • Updated about 1 month ago • 4
SLIM Models Collection Structured Language Instruction Models (SLIMs) • 17 items • Updated Mar 19 • 22
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation Paper • 2402.11909 • Published Feb 19 • 1
🔮 Mixture of Experts Collection MoE done using mergekit and LazyMergekit: https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb#scrollTo=d5mYzDo1q96y • 13 items • Updated Mar 22 • 21
⛔️🔦 Provenance, Watermarking & Deepfake Detection Collection Technical tools for more control over non-consensual synthetic content • 14 items • Updated about 1 month ago • 35
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 31
Distilling LLMs' Decomposition Abilities into Compact Language Models Paper • 2402.01812 • Published Feb 2 • 1
Are Emergent Abilities of Large Language Models a Mirage? Paper • 2304.15004 • Published Apr 28, 2023 • 5
LLM as a Judge Collection Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs. • 14 items • Updated Feb 19 • 14