Omar Sanseviero

osanseviero

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Articles

Organizations

osanseviero's activity

replied to clem's post about 12 hours ago
replied to KingNish's post about 14 hours ago
replied to tomaarsen's post 1 day ago
view reply

This is very interesting, thanks for sharing!

replied to takeraparterer's post 2 days ago
replied to lorinma's post 2 days ago
replied to Xenova's post 5 days ago
replied to their post 28 days ago
posted an update 28 days ago
view post
Post
4574
Diaries of Open Source. Part 15 🤗

🕵️‍♀️Idefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

💾Snowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

✨Pile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

🤖CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
🦉 DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
👀Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science ⚗️https://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
📝New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
🔥Gretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql
·
replied to clem's post 30 days ago
view reply

I had missed this one, thanks for sharing!

replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
4120
Diaries of Open Source. Part 14 🤗

🔥CohereForAI releases Command R+, an open 104B model with:
- Tool usage capabilities
- Specialized in RAGs
- Multilingual
It's one of the first models to surpass GPT-4 in the lmsys arena, check it out!
Model: CohereForAI/c4ai-command-r-plus
Official demo: CohereForAI/c4ai-command-r-plus
Quantized: CohereForAI/c4ai-command-r-plus-4bit

🎉Google releases a new version of their Gemma instruct models, with improved quality, nicer to converse, and a fancier RL algorithm. The model is similar to Llama 2 70B in the Chat Arena!
Models: google/gemma-release-65d5efbccdbb8c4202ec078b
Try it out in HuggingChat https://hf.co/chat/models/google/gemma-1.1-7b-it

🪄VoiceCraft, a speech editing and TTS SOTA open model
Paper: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (2403.16973)
Model: pyp1/VoiceCraft

💻Google released CodeGemma, a family of code generation, completion, and chat models
Blog post: https://hf.co/blog/codegemma
Models: google/codegemma-release-66152ac7b683e2667abdee11
Report: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Misc models:
🦖T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex
👀 CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE
🐙Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2
  • 2 replies
·
replied to louisbrulenaudet's post about 1 month ago
replied to their post about 1 month ago
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
2252
Diaries of Open Source. Part 13 🤗

🤏Two different bitnet 1.5 open-source replications
Original paper: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)
1bitllm experiment: https://hf.co/blog/joey00072/experiments-with-bitnet-1-5
NousResearch experiment NousResearch/OLMo-Bitnet-1B

🥳Tiny and large multimodal models great for embeddings
GitHub: https://github.com/unum-cloud/uform
Encoders: https://hf.co/collections/unum-cloud/multimodal-encoders-660553903617c5297eb16838
ONNX weights: https://hf.co/collections/unum-cloud/uform-vl-english-large-onnx-66055a57c182d846f3bc1949

📜 SMPLer-X: Expressive Human Pose and Shape Estimation
Project website: https://caizhongang.com/projects/SMPLer-X/
Demo: caizhongang/SMPLer-X
Paper: SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation (2309.17448)

🧙GeoWizard: 3D Geometry Estimation
Project website: https://fuxiao0719.github.io/projects/geowizard/
Demo: lemonaddie/geowizard

Misc models and datasets
- Dolphin-2.8-mistral-7b-v0.2 cognitivecomputations/dolphin-2.8-mistral-7b-v02
- Hermes-2-Pro-11B, a self-frankenmerge 11B variant mattshumer/Hermes-2-Pro-11B
- Large conversational dataset based on Usenet data in the Italian language mii-community/UsenetArchiveIT-conversations
  • 3 replies
·
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
3490
Diaries of Open Source. Part 12 🤗

🚀Alibaba releases Qwen1.5-MoE-A2.7B, an interesting MoE with 2.7B activated parameters and 64 experts
Blog https://qwenlm.github.io/blog/qwen-moe/
Demo: Qwen/qwen1.5-MoE-A2.7B-Chat-demo
Models: https://hf.co/Qwen
GitHub: https://github.com/QwenLM/Qwen1.5

🎵VoiceCraft, SOTA speech editing and text to speech
GitHub: https://github.com/jasonppy/VoiceCraft
Model: pyp1/VoiceCraft

🐍 AI21Labs release Jamba, an SSM-Transformer, pretrained MoE which allows a large context window (256K) and high throughput
Blog https://www.ai21.com/blog/announcing-jamba
Model ai21labs/Jamba-v0.1

✨ Berkeley releases Starling-LM-7B, an RLHF-ed model, and -RM-34B, a Yi-based reward model very good for its size
Starling Beta: Nexusflow/Starling-LM-7B-beta
Starling RM: Nexusflow/Starling-RM-34B

🖥️Stability releases Stable Code Instruct 3B, an instruct model for code generation
Blog: https://stability.ai/news/introducing-stable-code-instruct-3b
Demo: stabilityai/stable-code-instruct-3b
Report: https://stability.ai/s/Stable_Code_TechReport_release.pdf

📚Common Corpus: the largest public domain dataset for training LLMs
Blog: https://hf.co/blog/Pclanglais/common-corpus
Dataset: PleIAs/common-corpus-65d46e3ea3980fdcd66a5613

Misc:
⚡GaLore: a very memory-efficient technique that allows pretraining models in consumer GPUs https://hf.co/blog/galore
Moirai
📈Moirai, foundation models for time series forecasting Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742
🔥 Mistral-ORPO-Capybara-7K, a high-quality Mistral fine-tune using ORPO, a new alignment technique kaist-ai/mistral-orpo-capybara-7k
🤯APISR, an anime super-resolution upscaling model HikariDawn/APISR
·
replied to JustinLin610's post about 2 months ago
view reply

This is super exciting! Congrats for the release!

replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
2053
Diaries of Open Source. Part 11 🚀

🚀Databricks release DBRX, potentially the best open access model! A 132B Mixture of Experts with 36B active params and trained on 12 trillion tokens
Blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Base and instruct models: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: databricks/dbrx-instruct

🤏1-bit and 2-bit quantization exploration using HQQ+
Blog post: https://mobiusml.github.io/1bit_blog/
Models: mobiuslabsgmbh/llama2-7b-hqq-6604257a96fc8b9c4e13e0fe
GitHub: https://github.com/mobiusml/hqq

📚Cosmopedia: a large-scale synthetic dataset for pre-training - it includes 25 billion tokens and 30 million files
Dataset: HuggingFaceTB/cosmopedia
Blog: https://hf.co/blog/cosmopedia

⭐Mini-Gemini: multi-modal VLMs, from 2B to 34B
Models: https://hf.co/collections/YanweiLi/mini-gemini-6603c50b9b43d044171d0854
Paper: Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models (2403.18814)
GitHub: https://github.com/dvlab-research/MiniGemini

🔥VILA - On Pre-training for VLMs
Models: Efficient-Large-Model/vila-on-pre-training-for-visual-language-models-65d8022a3a52cd9bcd62698e
Paper: VILA: On Pre-training for Visual Language Models (2312.07533)

Misc
👀 FeatUp: a framework for image features at any resolution: mhamilton723/FeatUp FeatUp: A Model-Agnostic Framework for Features at Any Resolution (2403.10516)
🍞ColBERTus Maxiums, a colbertialized embedding model mixedbread-ai/mxbai-colbert-large-v1
🖌️Semantic Palette, a new drawing paradigm ironjr/SemanticPalette
🧑‍⚕️HistoGPT, a vision model that generates accurate pathology reports marr-peng-lab/histogpt https://www.medrxiv.org/content/10.1101/2024.03.15.24304211v1
·
replied to monsoon-nlp's post about 2 months ago
replied to Locutusque's post about 2 months ago
replied to their post about 2 months ago
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
1590
Diaries of Open Source. Part 10 🚀

🌼Marigold-LCM: A super fast SOTA Depth Estimator
Demo: prs-eth/marigold-lcm
Original paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Model: prs-eth/marigold-lcm-v1-0

🌟Quiet-STaR: A self-teaching technique via internal monologue
Paper: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (2403.09629)
GitHub: https://github.com/ezelikman/quiet-star
Tweetutorial: https://twitter.com/ericzelikman/status/1768663835106513041

🖼️ WebSight v0.2: A image-to-code dataset containing tailwind CSS, images in screenshots, and more!
Dataset: HuggingFaceM4/WebSight
Paper: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
Blog: https://hf.co/blog/websight

🕵️Agent-FLAN - effective agent tuning for LLMs
Paper: Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881)
Model: internlm/Agent-FLAN-7b
Dataset: internlm/Agent-FLAN
Website: https://internlm.github.io/Agent-FLAN/

🔥HPT, a family of multimodal LLMs from HyperGAI
Blog post: https://hypergai.com/blog/introducing-hpt-a-family-of-leading-multimodal-llms
Model: HyperGAI/HPT
GitHub: https://github.com/hyperGAI/HPT

🌏Models and datasets around the world
- Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6
- UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI
- CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
·
replied to their post about 2 months ago
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
3256
Diaries of Open Source. Part 9!

⏰Amazon releases Chronos, a family of models for time series
Base model: amazon/chronos-t5-large
Paper: Chronos: Learning the Language of Time Series (2403.07815)
Models: amazon/chronos-models-65f1791d630a8d57cb718444

💡ORPO Alignment: align without a reference model nor SFT!
Paper: ORPO: Monolithic Preference Optimization without Reference Model (2403.07691)
Models: kaist-ai/orpo-65efef87544ba100aef30013
GitHub: https://github.com/xfactlab/orpo

🇺🇳Cohere releases 250M Wikipedia Embeddings in 300+ languages
Data: Cohere/wikipedia-2023-11-embed-multilingual-v3
Announcement: https://twitter.com/Nils_Reimers/status/1767891859207057618

🧬SegmentNT: a LLM for annotating DNA at single nucleotide resolution
Models: InstaDeepAI/segmentnt-65eb4941c57808b4a3fe1319
GitHub repo: https://github.com/instadeepai/nucleotide-transformer
Paper: https://www.biorxiv.org/content/10.1101/2024.03.14.584712v1

🚀DynamiCrafter: video generation models for interpolation and looping are out!
Project page: https://doubiiu.github.io/projects/DynamiCrafter/
GitHub: https://github.com/Doubiiu/DynamiCrafter
Demo: Doubiiu/DynamiCrafter_interp_loop

🚀Stanford releases Anticipatory Music Transformer:
GitHub: https://github.com/jthickstun/anticipation/
Models: https://hf.co/stanford-crfm
Original blog announcement: https://crfm.stanford.edu/2023/06/16/anticipatory-music-transformer.html
  • 2 replies
·
replied to giux78's post about 2 months ago
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
2511
Diaries of Open Source. Part 8!

🤯CRM: Image-to-3D Textured Mesh
Demo: Zhengyi/CRM
Model: Zhengyi/CRM
Project page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
Paper: CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model (2403.05034)

🤏Half Quadratic Quantization: super-fast quantization of very large models
Blog post: https://mobiusml.github.io/hqq_blog/
Colab: https://colab.research.google.com/drive/1cG_5R_u9q53Uond7F0JEdliwvoeeaXVN?usp=sharing
Repo: https://github.com/mobiusml/hqq

🤗GemMoE -Gemma + MoE
Model: Crystalcareai/GemMoE-Base-Random
Collection: Crystalcareai/gemmoe-65f11f4922af97ebe9943591

👀VeCLIP and MOFI, new 0-shot and image retrieval models by Apple, are now open-source!
GitHub: https://github.com/apple/ml-veclip/ and https://github.com/apple/ml-mofi
VeCLIP paper: From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions (2310.07699)
MOFI paper: MOFI: Learning Image Representations from Noisy Entity Annotated Images (2306.07952)

⚡SPIN: Recipe for alignment with very little data
Collection: argilla/dibt-prompt-collective-spin-65ef59062518776024395fc3
Tweetutorial: https://twitter.com/argilla_io/status/1767608154697699455

👀ViT Prisma - an interoperability library for vision models
GitHub: https://github.com/soniajoseph/ViT-Prisma

☕OpenLRM: full model and training code are open-sourced
Codebase: https://github.com/3DTopia/OpenLRM
Demo: zxhezexin/OpenLRM
Models: https://huggingface.co/zxhezexin

⚗️Oxford releases an extensive PEFT evaluation for bio models
Model: NTaylor/bio-mobilebert-mimic-mp-lora
GitHub: https://github.com/nlpie-research/efficient-ml
Paper: Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks (2402.10597)

🌍Data and models around the world
Hermes 2 Pro 7B: an upgraded Nous Hermes 2 model with strong function calling and JSON capabilities NousResearch/Hermes-2-Pro-Mistral-7B
Navarasa-2.0 : Gemma fine-tuned in 15 indian language Telugu-LLM-Labs/navarasa-65f5e6ffdf29f02c6d7767ce
·
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
1869
Diaries of Open Source. Part 7!

🔥Sakana releases Evolutionary Model Merge
Blog post: https://sakana.ai/evolutionary-model-merge/
Paper: Evolutionary Optimization of Model Merging Recipes (2403.13187)
Models and demo: https://hf.co/SakanaAI

🍞MixedBread releases new SoTA sentence embedding model
Announcement: https://www.mixedbread.ai/blog/mxbai-embed-large-v1
Model: mixedbread-ai/mxbai-embed-large-v1

🎥VideoMamba, a Mamba-based model for video understanding
Blog: https://hf.co/blog/vladbogo/video-mamba
Demo: OpenGVLab/VideoMamba
Model: OpenGVLab/VideoMamba

🔍 MathVerse, a visual math benchmark for multimodal LLMs
Paper page: https://mathverse-cuhk.github.io/
Dataset: AI4Math/MathVerse
Paper: MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? (2403.14624)

🧠GraphWiz, a family of instruct-tuned LLMs to solve graph problems
Repos: https://hf.co/GraphWiz
Paper: GraphWiz: An Instruction-Following Language Model for Graph Problems (2402.16029)

🪆NLLB-SigLIP-MRL: a combination of NLLB and SigLIP trained with Matryoshka representation learning
Model: visheratin/nllb-siglip-mrl-large
Tweet: https://twitter.com/visheratin/status/1766643219909984734?s=46

🧍HDM and ProciGen: Template-free reconstruction of human-object interactions
Paper page: https://virtualhumans.mpi-inf.mpg.de/procigen-hdm/
Demo: xiexh20/HDM-interaction-recon
Models: xiexh20/HDM-models

🌎Models and data around the world
EagleX 7B, multi-lingual RNN-based model https://hf.co/spaces/recursal/EagleX-7B-1.7T-Gradio-Demo
Tamil LLM mervinpraison/tamil-large-language-model-7b-v1.0
  • 2 replies
·
replied to lorraine2's post about 2 months ago
replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
1903
Diaries of Open Source. Part 6!

🏎️xAI releases Grok-1, a 314B MoE
Blog: https://x.ai/blog/grok-os
GH repo: https://github.com/xai-org/grok-1
Model: xai-org/grok-1

🕺MusicLang, a model for controllable music generation
Demo: musiclang/musiclang-predict
GH repo: https://github.com/musiclang/musiclang_predict

🔬BioT5: a family of models for biology and chemical text tasks
Base model: QizhiPei/biot5-base
Model for molecule captioning and design: QizhiPei/biot5-base-mol2text and QizhiPei/biot5-base-text2mol
GH Repo: https://github.com/QizhiPei/BioT5
Paper: BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations (2310.07276)

🤏Check out the AQLM and QMoE official weights from ISTA-DAS lab
Org: https://hf.co/ISTA-DASLab
Papers: Extreme Compression of Large Language Models via Additive Quantization (2401.06118) and QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)

🚀Community releases
Einstein-v4-7B, a Mistral fine-tune on high-quality data Weyaxi/Einstein-v4-7B
IL-7B, a Misttral fine-tune merge for rheumatology cmcmaster/il_7b
Caselaw Access Project, a collaboration to digitalize 40 million US court decisions from 6.7 million cases from 360 years TeraflopAI/Caselaw_Access_Project

🌍Data and models around the world
HPLT Monolingual, a dataset of 75 languages with over 40TB of data HPLT/hplt_monolingual_v1_2
OpenLLM Turkish Benchmarks & Leaderboard malhajar/openllmturkishleadboard-datasets-65e5854490a87c0f2670ec18 and malhajar/OpenLLMTurkishLeaderboard
Occiglot, a collaborative effort for European LLMs with an initial release of 7B models for French, German, Spanish, and Italian occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01
Guftagoo, a Hindi+Hinglish multi-turn conversational dataset https://hf.co/datasets/Tensoic/gooftagoo
AryaBhatta-Orca-Maths-Hindi dataset https://hf.co/datasets/GenVRadmin/Aryabhatta-Orca-Maths-Hindi
  • 1 reply
·
posted an update 2 months ago
view post
Post
Diaries of Open Source. Part 5!

🤯Contextual KTO Mistral PairRM: this model combines iterative KTO, SnorkelAI DPO dataset, Allenai PairRM for ranking, Mistral for the base model, and is a very strong model with Claude 3 quality on AlpacaEval 2.0
Final model: ContextualAI/Contextual_KTO_Mistral_PairRM
Dataset: snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Leaderboard: https://tatsu-lab.github.io/alpaca_eval/
Base model: mistralai/Mistral-7B-Instruct-v0.2

🤏 tinyBenchmarks: Quick and cheap LLM evaluation!
Code: https://github.com/felipemaiapolo/tinyBenchmarks
Paper: tinyBenchmarks: evaluating LLMs with fewer examples (2402.14992)
Data: tinyBenchmarks/tinyMMLU

🎨Transformers.js 2.16 includes StableLM, speaker verification and diarization, and better chat templating. Try some fun demos!
- Xenova/video-object-detection
- Xenova/cross-encoder-web
- Xenova/the-tokenizer-playground

🏴‍☠️ Abascus Liberated-Qwen1.5-72B, a Qwen 72B-based model that strongly follows system prompts
Model: abacusai/Liberated-Qwen1.5-72B

👀Design2Code: benchmark of webpage screenshots to code
Data: SALT-NLP/Design2Code
Project https://salt-nlp.github.io/Design2Code/
Paper Design2Code: How Far Are We From Automating Front-End Engineering? (2403.03163)

🌎Data and models around the world
- One of the biggest Italian datasets https://hf.co/datasets/manalog/UsenetArchiveIT
- IndicLLMSuite: argest Pre-training and Instruction Fine-tuning dataset collection across 22 Indic languages ai4bharat/indicllmsuite-65ee7d225c337fcfa0991707
- Hebrew-Gemma-11B, the best base Hebrew model yam-peleg/Hebrew-Gemma-11B
- Komodo-7B, a family of multiple Indonesian languages LLMs Yellow-AI-NLP/komodo-7b-base

You can find the previous part at https://huggingface.co/posts/osanseviero/127895284909100
replied to chiphuyen's post 2 months ago
view reply

Thanks for sharing! Btw Gradio is a separate org but is also HF :)

posted an update 2 months ago
view post
Post
Diaries of Open Source. Part 4!

🌏Cohere and Cohere4AI release Command-R, a 35B model that is multilingual, RAG-optimized, and can manage tools!
Model: CohereForAI/c4ai-command-r-v01
Blog post: https://txt.cohere.com/command-r/

🧑‍🍳StarChat2: A powerful code model that is conversational
Try it out: HuggingFaceH4/starchat2-playground
Repos: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
Training code: https://github.com/huggingface/alignment-handbook/tree/main/recipes/starchat2-15b

🐲Yi-9B: trained on 3 trillion tokens, this english-chinese LLM is quite good and with a very nice detailed report!
Model: 01-ai/Yi-9B
Paper: Yi: Open Foundation Models by 01.AI (2403.04652)

🐋DeepSeek-VL, 1.3B and 7B VLMs
Paper: DeepSeek-VL: Towards Real-World Vision-Language Understanding (2403.05525)
Large model: deepseek-ai/deepseek-vl-7b-chat

✍️Writer releases OmniACT: a dataset for multimodal agents for desktop and web.
Dataset: Writer/omniact
Paper: OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web (2402.17553)

🍎Apple releases MobileCLIP: fast image-text models! https://github.com/apple/ml-mobileclip

🦙💪LlamaGym - fine-tune LLM agents with RL in just a few lines of code! https://github.com/KhoomeiK/LlamaGym

🖼️New multimodal leaderboard ConTextual https://huggingface.co/blog/leaderboard-contextual

🎁 Design2Code: benchmark for multimodal LLMs for automating front-end development.
Dataset SALT-NLP/Design2Code
Paper Design2Code: How Far Are We From Automating Front-End Engineering? (2403.03163)
Project https://salt-nlp.github.io/Design2Code/

You can find the previous part at https://huggingface.co/posts/osanseviero/633758457910104
replied to Jaward's post 2 months ago
posted an update 2 months ago
view post
Post
Diaries of Open Source. Part 3! OS goes to the moon!

💻 OpenCodeInterpreter, a family of very powerful code generation models
Models: m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
Paper: OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (2402.14658)
Demo m-a-p/OpenCodeInterpreter_demo

🔷🔶Zephyr 7B Gemma, Gemma fine-tuned with the Zephyr recipe
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat
GH Repo: https://github.com/huggingface/alignment-handbook

🪆The MixedBread folks released a 2D Matryoshka text embedding model, which means you can dynamically change the embedding size and layer counts
Model: mixedbread-ai/mxbai-embed-2d-large-v1
Release blog post: https://www.mixedbread.ai/blog/mxbai-embed-2d-large-v1

🐋Microsoft released Orca Math, which includes 200K grade school math problems
Dataset: microsoft/orca-math-word-problems-200k

🥷IBM silently released Merlinite, a cool model trained on Mixtral-generated synthetic data using a novel LAB method ibm/merlinite-7b

🌚 Moondream2 - a small vision language model to run on-device!
Model: vikhyatk/moondream2
Demo: vikhyatk/moondream2

🏙️CityDreamer: 3D City Generation
Demo: hzxie/city-dreamer
Repo: https://github.com/hzxie/city-dreamer
Model: hzxie/city-dreamer

🌏ML in all languages
Sailor, a family of South-East Asian languages models sail/sailor-language-models-65e19a749f978976f1959825
Samvaad dataset, which includes 140k QA pairs in Hindi, Bengali, Marathi, Tamil, Telugu, Oriya, Punjabi, and Gujarati GenVRadmin/Samvaad-Mixed-Language-2

You can see the previous part at https://huggingface.co/posts/osanseviero/674644082063278
  • 1 reply
·
replied to mayank-mishra's post 2 months ago
replied to DmitryRyumin's post 2 months ago
view reply

Very cool! It would be great to have the checkpoints on the Hub, too :)
Congrats in getting accepted at ICLR

cc @dylanebert

replied to their post 2 months ago
replied to urchade's post 2 months ago
view reply

Very cool! Is the model and the data somewhere on Hugging Face to easily download?

posted an update 2 months ago
view post
Post
Diaries of Open Source. Part 2. Open Source is going brrrrr

🚀The European Space Agency releases MajorTOM, a dataset of earth observation covering half the earth. The dataset has 2.5 trillion pixels! Congrats @aliFrancis and @mikonvergence !
Dataset: Major-TOM/Core-S2L2A
Viewer: Major-TOM/MajorTOM-Core-Viewer

🍞Re-ranking models by MixedBreadAI, with very high quality, Apache 2 license, and easy to use!
Models: https://huggingface.co/models?other=reranker&sort=trending&search=mixedbread-ai
Blog: https://www.mixedbread.ai/blog/mxbai-rerank-v1

🧊StabilityAI and TripoAI release TripoSR, a super-fast MIT-licensed image-to-3D model!
Model: stabilityai/TripoSR
Demo: stabilityai/TripoSR

🤝Together AI and HazyResearch release Based
Models and datasets: hazyresearch/based-65d77fb76f9c813c8b94339c
GH repo: https://github.com/HazyResearch/based

🌊LaVague: an open-source pipeline to turn natural language into browser actions! It can run locally with HuggingFaceH4/zephyr-7b-gemma-v0.1
Read more about it at https://huggingface.co/posts/dhuynh95/717319217106504

🏆Berkeley Function-Calling Leaderboard
Read about it: https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
Leaderboard: https://gorilla.cs.berkeley.edu/leaderboard.html

🐬Sailor-Chat: chat models built on top of OpenOrca and @sarahooker CohereForAI Aya project. They can be used for South-East Asia languages such as Indonesian, Thai, Vietnamese, Malay and Lao!
Models: sail/sailor-language-models-65e19a749f978976f1959825
Demo: sail/Sailor-7B-Chat

🤗Arabic-OpenHermes-2.5: OpenHermes dataset translated to Arabic 2A2I/Arabic-OpenHermes-2.5

See the previous part here https://huggingface.co/posts/osanseviero/622788932781684
  • 3 replies
·
replied to robmarkcole's post 2 months ago
posted an update 2 months ago
view post
Post
Diaries of Open Source. Part 1.

What a week! Here are some of the exciting Open Source releases of the week!

1. BigCode releases The Stack v2 and StarCoder 2
Resources in https://huggingface.co/posts/loubnabnl/596860170283496
Blog https://huggingface.co/blog/starcoder2
Collection: bigcode/starcoder2-65de6da6e87db3383572be1a

2. Playground v2.5, a very powerful new text-to-image model
Model: playgroundai/playground-v2.5-1024px-aesthetic
Demo: playgroundai/playground-v2.5
Blog: https://playground.com/blog/playground-v2-5

3.Evo: DNA foundation models
Blog: https://arcinstitute.org/news/blog/evo
Models: togethercomputer/evo-1-131k-base

4. OpenHermesPreferences: a dataset of ~1 million AI Preferences argilla/OpenHermesPreferences

5. SpeechBrain 1.0: a toolkit with hundreds of recipes and pretrained models for audio-related tasks, such as speech recognition, diarization, and enhancement. New major release!
HF repos: https://huggingface.co/speechbrain
Website: https://speechbrain.github.io/

6. Tower: a suite of Llama-based multilingual translation models Unbabel/tower-659eaedfe36e6dd29eb1805c

7. AllenAI releases OLMo-7B-Instruct
allenai/olmo-suite-65aeaae8fe5b6b2122b46778

8. DIBT - An crowdsourced effort to human-rate prompts. Its 10k prompts dataset is released ttps://huggingface.co/datasets/DIBT/10k_prompts_ranked

9. ChatMusician: A Llama 2 fine-tuned model for music generation m-a-p/ChatMusician

10. Bonito, an model that converts data into synthetic instruction datasets
GitHub: https://github.com/BatsResearch/bonito
Model: BatsResearch/bonito-v1
Paper: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation (2402.18334)
·
posted an update 2 months ago
view post
Post
Introducing: Zephyr Gemma!

The community has struggled to do a good preference-tune of Gemma, so the amazing @lewtun and @philschmid built an open-source recipe and trained a model to help people get started.

Handbook: https://github.com/huggingface/alignment-handbook/blob/main/recipes/zephyr-7b-gemma/README.md
Model: HuggingFaceH4/zephyr-7b-gemma-v0.1
Demo: HuggingFaceH4/zephyr-7b-gemma-chat

Some interesting details
- Fine-tuned on DEITA and DPOed with Argilla DPO dataset
- Very strong MT Bench results (7.81), better than Zephyr Beta (mistral based) and Gemma Instruct
- Can run locally with tools such as llama.cpp on a Mac
- Not so good AGIEval results compared to mistral-based tunes
- All training code is open-sourced
- Trained for 105 minutes on 8x H100
- No system message

Big kudos to the team! Super exciting to see a good fine-tune for Gemma
  • 1 reply
·
replied to chiphuyen's post 2 months ago
replied to vladbogo's post 3 months ago
replied to trisfromgoogle's post 3 months ago
view reply

This is such an exciting release!! Amazing work from Google DeepMind and all the team!

posted an update 3 months ago
view post
Post
Mixture of experts: beware 🛡️⚔️

New paper by DeepMind: Buffer Overflow in MoE Buffer Overflow in Mixture of Experts (2402.05526)

The paper shows an adversarial attack strategy in which a user sends malicious queries that can affect the output of other user queries from the same batch.

So if in the same batch we have
- User A benign query
- User B malicious query
The response for A might be altered!😱

How is this possible?
One approach is to fill the token buffers with adversarial data, hence forcing the gating to use the non-ideal experts or to entirely drop the bening tokens (in the case of finite limit size).

This assumes that the adversary can use the model as a black-box but can observe the logit outputs + ensure that the data is always grouped in the same batch.

How to mitigate this?
- Randomize batch order (and even run twice if some queries are very sensitive)
- Use a large capacity slack
- Sample from gate weights instead of top-k (not great IMO, as that require more memory for inference)

Very cool paper!!
replied to akhaliq's post 3 months ago
replied to victor's post 3 months ago