Omar Sanseviero

osanseviero

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.๐Ÿฆ™

Articles

Organizations

osanseviero's activity

replied to lamhieu's post 11 days ago
view reply

Hey there! Do you plan to release the models as well?

replied to FeYuan's post 17 days ago
replied to RishabhBhardwaj's post 19 days ago
view reply

Impressive, very cool, thanks for sharing

replied to gokaygokay's post 22 days ago
replied to mrm8488's post 26 days ago
replied to nroggendorff's post about 1 month ago
replied to m-ric's post about 1 month ago
replied to Draichi's post about 1 month ago
replied to felfri's post about 2 months ago
replied to felfri's post about 2 months ago
replied to Tonic's post about 2 months ago
replied to BramVanroy's post about 2 months ago
replied to Taylor658's post about 2 months ago
replied to mayank-mishra's post 2 months ago
replied to Taylor658's post 2 months ago
replied to hakunamatata1997's post 2 months ago
replied to KingNish's post 2 months ago
replied to kaisugi's post 2 months ago
replied to SivilTaram's post 2 months ago
replied to KennyUTC's post 2 months ago
replied to clem's post 2 months ago
replied to KingNish's post 2 months ago
replied to tomaarsen's post 2 months ago
view reply

This is very interesting, thanks for sharing!

replied to takeraparterer's post 3 months ago
replied to lorinma's post 3 months ago
replied to Xenova's post 3 months ago
replied to their post 3 months ago
posted an update 3 months ago
view post
Post
7037
Diaries of Open Source. Part 15 ๐Ÿค—

๐Ÿ•ต๏ธโ€โ™€๏ธIdefics 2 is out, a multimodal open-source model with very nice capabilities
Models, demo, and datasets: HuggingFaceM4/idefics2-661d1971b7c50831dd3ce0fe
Blog: https://hf.co/blog/idefics2

๐Ÿ’พSnowflake released snowflake-arctic-embed, a family of powerful small embedding models
Model: Snowflake/snowflake-arctic-embed-m
Blog: https://www.snowflake.com/blog/introducing-snowflake-arctic-embed-snowflakes-state-of-the-art-text-embedding-family-of-models/

โœจPile-T5, EleutherAI's T5 model trained on 2T tokens
Blog: https://blog.eleuther.ai/pile-t5/
Models: EleutherAI/pile-t5-65a76a0d0022dd270b385a66
GitHub: https://github.com/EleutherAI/improved-t5

๐Ÿค–CodeQwen1.5-7B base and chat models. Models trained on 3T tokens strong benchmark results for code generation, editing and SQL
Blog post: https://qwenlm.github.io/blog/codeqwen1.5/
Demo: Qwen/CodeQwen1.5-7b-Chat-demo
Models: Qwen/CodeQwen1.5-7B and Qwen/CodeQwen1.5-7B-Chat

Misc
๐Ÿฆ‰ DocOwl1.5: Unified Stucture Learning for OCR-free Document Understanding mPLUG/DocOwl
๐Ÿ‘€Cerule - a tiny Vision LM model Tensoic/Cerule-v0.1
ChemLLM - a LLM for chemistry and molecule science โš—๏ธhttps://hf.co/AI4Chem/ChemLLM-7B-Chat-1.5-DPO
Distil Whisper Large
๐Ÿ“New pdf/OCR datasets with 19 samples pixparse/pdf-document-ocr-datasets-660701430b0346f97c4bc628
๐Ÿ”ฅGretel AI high quality text-to-sql synthetic dataset gretelai/synthetic_text_to_sql
ยท
replied to clem's post 3 months ago
view reply

I had missed this one, thanks for sharing!

replied to their post 4 months ago
posted an update 4 months ago
view post
Post
6542
Diaries of Open Source. Part 14 ๐Ÿค—

๐Ÿ”ฅCohereForAI releases Command R+, an open 104B model with:
- Tool usage capabilities
- Specialized in RAGs
- Multilingual
It's one of the first models to surpass GPT-4 in the lmsys arena, check it out!
Model: CohereForAI/c4ai-command-r-plus
Official demo: CohereForAI/c4ai-command-r-plus
Quantized: CohereForAI/c4ai-command-r-plus-4bit

๐ŸŽ‰Google releases a new version of their Gemma instruct models, with improved quality, nicer to converse, and a fancier RL algorithm. The model is similar to Llama 2 70B in the Chat Arena!
Models: google/gemma-release-65d5efbccdbb8c4202ec078b
Try it out in HuggingChat https://hf.co/chat/models/google/gemma-1.1-7b-it

๐Ÿช„VoiceCraft, a speech editing and TTS SOTA open model
Paper: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (2403.16973)
Model: pyp1/VoiceCraft

๐Ÿ’ปGoogle released CodeGemma, a family of code generation, completion, and chat models
Blog post: https://hf.co/blog/codegemma
Models: google/codegemma-release-66152ac7b683e2667abdee11
Report: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

Misc models:
๐Ÿฆ–T-Rex2, a very powerful object detection model for many applications https://github.com/IDEA-Research/T-Rex
๐Ÿ‘€ CT-RATE : A 3D dataset paired with text reports ibrahimhamamci/CT-RATE
๐Ÿ™Octopus v2: a Gemma-based model trained for Android API - extremely fast, better than Llama+RAG, great results NexaAIDev/Octopus-v2
  • 2 replies
ยท
replied to louisbrulenaudet's post 4 months ago
replied to their post 4 months ago
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
2266
Diaries of Open Source. Part 13 ๐Ÿค—

๐ŸคTwo different bitnet 1.5 open-source replications
Original paper: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (2402.17764)
1bitllm experiment: https://hf.co/blog/joey00072/experiments-with-bitnet-1-5
NousResearch experiment NousResearch/OLMo-Bitnet-1B

๐ŸฅณTiny and large multimodal models great for embeddings
GitHub: https://github.com/unum-cloud/uform
Encoders: https://hf.co/collections/unum-cloud/multimodal-encoders-660553903617c5297eb16838
ONNX weights: https://hf.co/collections/unum-cloud/uform-vl-english-large-onnx-66055a57c182d846f3bc1949

๐Ÿ“œ SMPLer-X: Expressive Human Pose and Shape Estimation
Project website: https://caizhongang.com/projects/SMPLer-X/
Demo: caizhongang/SMPLer-X
Paper: SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation (2309.17448)

๐Ÿง™GeoWizard: 3D Geometry Estimation
Project website: https://fuxiao0719.github.io/projects/geowizard/
Demo: lemonaddie/geowizard

Misc models and datasets
- Dolphin-2.8-mistral-7b-v0.2 cognitivecomputations/dolphin-2.8-mistral-7b-v02
- Hermes-2-Pro-11B, a self-frankenmerge 11B variant mattshumer/Hermes-2-Pro-11B
- Large conversational dataset based on Usenet data in the Italian language mii-community/UsenetArchiveIT-conversations
  • 3 replies
ยท
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
3501
Diaries of Open Source. Part 12 ๐Ÿค—

๐Ÿš€Alibaba releases Qwen1.5-MoE-A2.7B, an interesting MoE with 2.7B activated parameters and 64 experts
Blog https://qwenlm.github.io/blog/qwen-moe/
Demo: Qwen/qwen1.5-MoE-A2.7B-Chat-demo
Models: https://hf.co/Qwen
GitHub: https://github.com/QwenLM/Qwen1.5

๐ŸŽตVoiceCraft, SOTA speech editing and text to speech
GitHub: https://github.com/jasonppy/VoiceCraft
Model: pyp1/VoiceCraft

๐Ÿ AI21Labs release Jamba, an SSM-Transformer, pretrained MoE which allows a large context window (256K) and high throughput
Blog https://www.ai21.com/blog/announcing-jamba
Model ai21labs/Jamba-v0.1

โœจ Berkeley releases Starling-LM-7B, an RLHF-ed model, and -RM-34B, a Yi-based reward model very good for its size
Starling Beta: Nexusflow/Starling-LM-7B-beta
Starling RM: Nexusflow/Starling-RM-34B

๐Ÿ–ฅ๏ธStability releases Stable Code Instruct 3B, an instruct model for code generation
Blog: https://stability.ai/news/introducing-stable-code-instruct-3b
Demo: stabilityai/stable-code-instruct-3b
Report: https://stability.ai/s/Stable_Code_TechReport_release.pdf

๐Ÿ“šCommon Corpus: the largest public domain dataset for training LLMs
Blog: https://hf.co/blog/Pclanglais/common-corpus
Dataset: PleIAs/common-corpus-65d46e3ea3980fdcd66a5613

Misc:
โšกGaLore: a very memory-efficient technique that allows pretraining models in consumer GPUs https://hf.co/blog/galore
Moirai
๐Ÿ“ˆMoirai, foundation models for time series forecasting Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742
๐Ÿ”ฅ Mistral-ORPO-Capybara-7K, a high-quality Mistral fine-tune using ORPO, a new alignment technique kaist-ai/mistral-orpo-capybara-7k
๐ŸคฏAPISR, an anime super-resolution upscaling model HikariDawn/APISR
ยท
replied to JustinLin610's post 4 months ago
view reply

This is super exciting! Congrats for the release!

replied to their post 4 months ago
posted an update 4 months ago
view post
Post
2062
Diaries of Open Source. Part 11 ๐Ÿš€

๐Ÿš€Databricks release DBRX, potentially the best open access model! A 132B Mixture of Experts with 36B active params and trained on 12 trillion tokens
Blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Base and instruct models: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: databricks/dbrx-instruct

๐Ÿค1-bit and 2-bit quantization exploration using HQQ+
Blog post: https://mobiusml.github.io/1bit_blog/
Models: https://hf.co/collections/mobiuslabsgmbh/llama2-7b-hqq-6604257a96fc8b9c4e13e0fe
GitHub: https://github.com/mobiusml/hqq

๐Ÿ“šCosmopedia: a large-scale synthetic dataset for pre-training - it includes 25 billion tokens and 30 million files
Dataset: HuggingFaceTB/cosmopedia
Blog: https://hf.co/blog/cosmopedia

โญMini-Gemini: multi-modal VLMs, from 2B to 34B
Models: https://hf.co/collections/YanweiLi/mini-gemini-6603c50b9b43d044171d0854
Paper: Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models (2403.18814)
GitHub: https://github.com/dvlab-research/MiniGemini

๐Ÿ”ฅVILA - On Pre-training for VLMs
Models: Efficient-Large-Model/vila-on-pre-training-for-visual-language-models-65d8022a3a52cd9bcd62698e
Paper: VILA: On Pre-training for Visual Language Models (2312.07533)

Misc
๐Ÿ‘€ FeatUp: a framework for image features at any resolution: mhamilton723/FeatUp FeatUp: A Model-Agnostic Framework for Features at Any Resolution (2403.10516)
๐ŸžColBERTus Maxiums, a colbertialized embedding model mixedbread-ai/mxbai-colbert-large-v1
๐Ÿ–Œ๏ธSemantic Palette, a new drawing paradigm ironjr/SemanticPalette
๐Ÿง‘โ€โš•๏ธHistoGPT, a vision model that generates accurate pathology reports marr-peng-lab/histogpt https://www.medrxiv.org/content/10.1101/2024.03.15.24304211v1
ยท
replied to monsoon-nlp's post 4 months ago
replied to Locutusque's post 4 months ago
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
1602
Diaries of Open Source. Part 10 ๐Ÿš€

๐ŸŒผMarigold-LCM: A super fast SOTA Depth Estimator
Demo: prs-eth/marigold-lcm
Original paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Model: https://hf.co/prs-eth/marigold-lcm-v1-0

๐ŸŒŸQuiet-STaR: A self-teaching technique via internal monologue
Paper: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (2403.09629)
GitHub: https://github.com/ezelikman/quiet-star
Tweetutorial: https://twitter.com/ericzelikman/status/1768663835106513041

๐Ÿ–ผ๏ธ WebSight v0.2: A image-to-code dataset containing tailwind CSS, images in screenshots, and more!
Dataset: HuggingFaceM4/WebSight
Paper: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
Blog: https://hf.co/blog/websight

๐Ÿ•ต๏ธAgent-FLAN - effective agent tuning for LLMs
Paper: Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881)
Model: internlm/Agent-FLAN-7b
Dataset: internlm/Agent-FLAN
Website: https://internlm.github.io/Agent-FLAN/

๐Ÿ”ฅHPT, a family of multimodal LLMs from HyperGAI
Blog post: https://hypergai.com/blog/introducing-hpt-a-family-of-leading-multimodal-llms
Model: HyperGAI/HPT
GitHub: https://github.com/hyperGAI/HPT

๐ŸŒModels and datasets around the world
- Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6
- UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI
- CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
ยท
replied to their post 4 months ago
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
3265
Diaries of Open Source. Part 9!

โฐAmazon releases Chronos, a family of models for time series
Base model: amazon/chronos-t5-large
Paper: Chronos: Learning the Language of Time Series (2403.07815)
Models: https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444

๐Ÿ’กORPO Alignment: align without a reference model nor SFT!
Paper: ORPO: Monolithic Preference Optimization without Reference Model (2403.07691)
Models: kaist-ai/orpo-65efef87544ba100aef30013
GitHub: https://github.com/xfactlab/orpo

๐Ÿ‡บ๐Ÿ‡ณCohere releases 250M Wikipedia Embeddings in 300+ languages
Data: Cohere/wikipedia-2023-11-embed-multilingual-v3
Announcement: https://twitter.com/Nils_Reimers/status/1767891859207057618

๐ŸงฌSegmentNT: a LLM for annotating DNA at single nucleotide resolution
Models: InstaDeepAI/segmentnt-65eb4941c57808b4a3fe1319
GitHub repo: https://github.com/instadeepai/nucleotide-transformer
Paper: https://www.biorxiv.org/content/10.1101/2024.03.14.584712v1

๐Ÿš€DynamiCrafter: video generation models for interpolation and looping are out!
Project page: https://doubiiu.github.io/projects/DynamiCrafter/
GitHub: https://github.com/Doubiiu/DynamiCrafter
Demo: Doubiiu/DynamiCrafter_interp_loop

๐Ÿš€Stanford releases Anticipatory Music Transformer:
GitHub: https://github.com/jthickstun/anticipation/
Models: https://hf.co/stanford-crfm
Original blog announcement: https://crfm.stanford.edu/2023/06/16/anticipatory-music-transformer.html
  • 2 replies
ยท
replied to giux78's post 4 months ago
posted an update 4 months ago
view post
Post
2520
Diaries of Open Source. Part 8!

๐ŸคฏCRM: Image-to-3D Textured Mesh
Demo: Zhengyi/CRM
Model: Zhengyi/CRM
Project page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
Paper: CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model (2403.05034)

๐ŸคHalf Quadratic Quantization: super-fast quantization of very large models
Blog post: https://mobiusml.github.io/hqq_blog/
Colab: https://colab.research.google.com/drive/1cG_5R_u9q53Uond7F0JEdliwvoeeaXVN?usp=sharing
Repo: https://github.com/mobiusml/hqq

๐Ÿค—GemMoE -Gemma + MoE
Model: Crystalcareai/GemMoE-Base-Random
Collection: Crystalcareai/gemmoe-65f11f4922af97ebe9943591

๐Ÿ‘€VeCLIP and MOFI, new 0-shot and image retrieval models by Apple, are now open-source!
GitHub: https://github.com/apple/ml-veclip/ and https://github.com/apple/ml-mofi
VeCLIP paper: From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions (2310.07699)
MOFI paper: MOFI: Learning Image Representations from Noisy Entity Annotated Images (2306.07952)

โšกSPIN: Recipe for alignment with very little data
Collection: argilla/dibt-prompt-collective-spin-65ef59062518776024395fc3
Tweetutorial: https://twitter.com/argilla_io/status/1767608154697699455

๐Ÿ‘€ViT Prisma - an interoperability library for vision models
GitHub: https://github.com/soniajoseph/ViT-Prisma

โ˜•OpenLRM: full model and training code are open-sourced
Codebase: https://github.com/3DTopia/OpenLRM
Demo: zxhezexin/OpenLRM
Models: https://huggingface.co/zxhezexin

โš—๏ธOxford releases an extensive PEFT evaluation for bio models
Model: NTaylor/bio-mobilebert-mimic-mp-lora
GitHub: https://github.com/nlpie-research/efficient-ml
Paper: Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks (2402.10597)

๐ŸŒData and models around the world
Hermes 2 Pro 7B: an upgraded Nous Hermes 2 model with strong function calling and JSON capabilities NousResearch/Hermes-2-Pro-Mistral-7B
Navarasa-2.0โ€Š: Gemma fine-tuned in 15 indian language Telugu-LLM-Labs/navarasa-65f5e6ffdf29f02c6d7767ce
ยท
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
1877
Diaries of Open Source. Part 7!

๐Ÿ”ฅSakana releases Evolutionary Model Merge
Blog post: https://sakana.ai/evolutionary-model-merge/
Paper: Evolutionary Optimization of Model Merging Recipes (2403.13187)
Models and demo: https://hf.co/SakanaAI

๐ŸžMixedBread releases new SoTA sentence embedding model
Announcement: https://www.mixedbread.ai/blog/mxbai-embed-large-v1
Model: mixedbread-ai/mxbai-embed-large-v1

๐ŸŽฅVideoMamba, a Mamba-based model for video understanding
Blog: https://hf.co/blog/vladbogo/video-mamba
Demo: OpenGVLab/VideoMamba
Model: OpenGVLab/VideoMamba

๐Ÿ” MathVerse, a visual math benchmark for multimodal LLMs
Paper page: https://mathverse-cuhk.github.io/
Dataset: AI4Math/MathVerse
Paper: MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? (2403.14624)

๐Ÿง GraphWiz, a family of instruct-tuned LLMs to solve graph problems
Repos: https://hf.co/GraphWiz
Paper: GraphWiz: An Instruction-Following Language Model for Graph Problems (2402.16029)

๐Ÿช†NLLB-SigLIP-MRL: a combination of NLLB and SigLIP trained with Matryoshka representation learning
Model: visheratin/nllb-siglip-mrl-large
Tweet: https://twitter.com/visheratin/status/1766643219909984734?s=46

๐ŸงHDM and ProciGen: Template-free reconstruction of human-object interactions
Paper page: https://virtualhumans.mpi-inf.mpg.de/procigen-hdm/
Demo: xiexh20/HDM-interaction-recon
Models: xiexh20/HDM-models

๐ŸŒŽModels and data around the world
EagleX 7B, multi-lingual RNN-based model https://hf.co/spaces/recursal/EagleX-7B-1.7T-Gradio-Demo
Tamil LLM mervinpraison/tamil-large-language-model-7b-v1.0
  • 2 replies
ยท
replied to lorraine2's post 4 months ago