Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published 12 days ago • 18
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate 20 days ago • 25
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 9 days ago • 117
view article Article XLSCOUT Unveils ParaEmbed 2.0: a Powerful Embedding Model Tailored for Patents and IP with Expert Support from Hugging Face 8 days ago • 8
abliterated-v3 Collection Latest gen of the abliterated models I've produced • 17 items • Updated 30 days ago • 69
LongVA Collection Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/ • 5 items • Updated 7 days ago • 9
4M Models Collection Multimodal models from https://4m.epfl.ch/ • 14 items • Updated 18 days ago • 29
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 27 days ago • 226
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published 11 days ago • 14
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper • 2406.15704 • Published 11 days ago • 5
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper • 2406.14051 • Published 13 days ago • 9
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published 9 days ago • 10
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published 9 days ago • 16
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Paper • 2406.15927 • Published 10 days ago • 13
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published 8 days ago • 48
Evaluating D-MERIT of Partial-annotation on Information Retrieval Paper • 2406.16048 • Published 10 days ago • 34
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 8 days ago • 53
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality 9 days ago • 23
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 19 days ago • 147
Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models Paper • 2406.13099 • Published 14 days ago • 4
LiveMind: Low-latency Large Language Models with Simultaneous Inference Paper • 2406.14319 • Published 13 days ago • 14
Improving Visual Commonsense in Language Models via Multiple Image Generation Paper • 2406.13621 • Published 14 days ago • 13
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published 12 days ago • 30
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 13 days ago • 75
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published 16 days ago • 37
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published 13 days ago • 97
synthetic-data-generation-demos Collection A collection of demos for various approaches to synthetic data generation • 4 items • Updated 8 days ago • 10
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published 15 days ago • 14
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models Paper • 2406.12311 • Published 15 days ago • 7
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations Paper • 2406.11801 • Published 16 days ago • 15
Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published 15 days ago • 7
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning Paper • 2406.12292 • Published 15 days ago • 4
TabuLa-8B Collection Training, eval suite, and model from the paper "Large Scale Transfer Learning for Tabular Data via Language Modeling" https://arxiv.org/abs/2406.12031 • 4 items • Updated 14 days ago • 8
SteerLM Collection A collection of models and datasets relating to SteerLM and HelpSteer. • 7 items • Updated 15 days ago • 11
SSMs Collection A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated about 7 hours ago • 15
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 618
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published 15 days ago • 7
Large Scale Transfer Learning for Tabular Data via Language Modeling Paper • 2406.12031 • Published 15 days ago • 6
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models Paper • 2406.12042 • Published 15 days ago • 7
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries Paper • 2406.12824 • Published 15 days ago • 20
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published 15 days ago • 26
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 15 days ago • 28
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published 14 days ago • 48
view article Article Running Large Multimodal Models on an AI PC's NPU By bconsolvo • 21 days ago • 5
view article Article Low Latency CPU Based Educational Value Classifier With Generic Educational Value By kenhktsui • 20 days ago • 7
view article Article The CVPR Survival Guide: Discovering Research That's Interesting to YOU! By harpreetsahota • 19 days ago • 9
view article Article SwanLab and Transformers: Power Up Your NLP Experiments By Andyrasika • 16 days ago • 6
view article Article Introducing the Ultimate SEC LLM: Revolutionizing Financial Insights with Llama-3-70B By Crystalcareai • 14 days ago • 6