Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper β’ 2410.17243 β’ Published Oct 22 β’ 88
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper β’ 2410.12787 β’ Published Oct 16 β’ 30
Running on CPU Upgrade 11.8k π Open LLM Leaderboard 2 Track, rank and evaluate open LLMs and chatbots
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper β’ 2407.19672 β’ Published Jul 29 β’ 55
view post Post Reply If you're trying to run MoE Mixtral-8x7b under DeepSpeed w/ HF Transformers it's likely to hang on the first forward.The solution is here https://github.com/microsoft/DeepSpeed/pull/4966?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-US#issuecomment-1989671378and you need deepspeed>=0.13.0Thanks to Masahiro Tanaka for the fix. π 7 7 +
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 β’ 166
Audio Dialogues: Dialogues dataset for audio and music understanding Paper β’ 2404.07616 β’ Published Apr 11 β’ 15
SeaLLMs -- Large Language Models for Southeast Asia Paper β’ 2312.00738 β’ Published Dec 1, 2023 β’ 23
Contrastive Decoding Improves Reasoning in Large Language Models Paper β’ 2309.09117 β’ Published Sep 17, 2023 β’ 37
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models Paper β’ 2309.09958 β’ Published Sep 18, 2023 β’ 18