Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Articles

Organizations

merve's activity

posted an update 3 days ago
reacted to sayakpaul's post with โค๏ธ 3 days ago
view post
Post
2058
It's been a while we shipped native quantization support in diffusers ๐Ÿงจ

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
ยท
reacted to BlinkDL's post with ๐Ÿ”ฅ 3 days ago
view post
Post
2663
RWKV-6-world-v3 (+3.1T tokens) is our best multilingual 7B model as of now: BlinkDL/rwkv-6-world

It's 100% RNN and attention-free. MMLU 54.2% (previous world-v2.1 = 47.9%. note: without eval-boosting tricks such as annealing).

RWKV-7-world-v4 soon :)
reacted to davidberenstein1957's post with ๐Ÿ”ฅ๐Ÿ‘€ 3 days ago
view post
Post
1775
For anyone who struggles with NER or information extraction with LLM.

We showed an efficient workflow for token classification including zero-shot suggestions and model fine-tuning with Argilla, GliNER, the NuMind NuExtract LLM and SpanMarker. @argilla

Video: https://youtu.be/JvLpaYgNd84?feature=shared
Notebooks and slides included to try it yourself ๐Ÿ™‚
reacted to erikkaum's post with ๐Ÿ‘€๐Ÿ”ฅ 3 days ago
view post
Post
1605
A while ago I started experimenting with compiling the Python interpreter to WASM.

To build a secure, fast, and lightweight sandbox for code execution โ€” ideal for running LLM-generated Python code.

- Send code simply as a POST request
- 1-2ms startup times

Hack away:
https://github.com/ErikKaum/runner
reacted to AdinaY's post with ๐Ÿ‘€ 3 days ago
reacted to prithivMLmods's post with ๐Ÿค— 3 days ago
view post
Post
3824
Minimalistic Adapters ๐ŸŽƒ

๐Ÿš€Demo Here:
prithivMLmods/FLUX-LoRA-DLC

๐Ÿš€Model:
{ Quote Tuner } : prithivMLmods/Flux.1-Dev-Quote-LoRA
{ Stamp Art } : prithivMLmods/Flux.1-Dev-Stamp-Art-LoRA
{ Hand Sticky } : prithivMLmods/Flux.1-Dev-Hand-Sticky-LoRA
{ Poster HQ } : prithivMLmods/Flux.1-Dev-Poster-HQ-LoRA
{ Ctoon Min } : prithivMLmods/Flux.1-Dev-Ctoon-LoRA

๐Ÿš€Collection:
{ Flux LoRA Collection} : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
{ LoRA Space Collection } : prithivMLmods/lora-space-collections-6714b72e0d49e1c97fbd6a32

๐Ÿš€For More Visit
https://huggingface.co/strangerzonehf
.
.
.
๐Ÿค—@prithivMLmods
  • 3 replies
ยท
reacted to hexgrad's post with ๐Ÿ”ฅ 4 days ago
posted an update 4 days ago
view post
Post
4616
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
๐Ÿ’จ a new vision language model with 9x less image tokens, super efficient
๐Ÿ“– aligned with DPO for reducing hallucinations
โšก๏ธ Apache 2.0 license ๐Ÿ”ฅ

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
  • 4 replies
ยท
reacted to AdinaY's post with ๐Ÿ‘€๐Ÿ”ฅ 6 days ago
view post
Post
2476
Letโ€™s dive into the exciting releases from the Chinese community last week ๐Ÿ”ฅ๐Ÿš€
More details ๐Ÿ‘‰ https://huggingface.co/zh-ai-community

Code model:
โœจQwen 2.5 coder by Alibaba Qwen
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f
โœจOpenCoder by InflyAI - Fully open code model๐Ÿ™Œ
infly/opencoder-672cec44bbb86c39910fb55e

Image model:
โœจHunyuan3D-1.0 by Tencent
tencent/Hunyuan3D-1

MLLM:
โœจJanusFlow by DeepSeek
deepseek-ai/JanusFlow-1.3B
deepseek-ai/JanusFlow-1.3B
โœจMono-InternVL-2B by OpenGVlab
OpenGVLab/Mono-InternVL-2B

Video model:
โœจCogVideoX 1.5 by ChatGLM
THUDM/CogVideoX1.5-5B-SAT

Audio model:
โœจFish Agent by FishAudio
fishaudio/fish-agent-v0.1-3b

Dataset:
โœจOPI dataset by BAAIBeijing
BAAI/OPI
posted an update 6 days ago
view post
Post
1918
Amazing past days at open ML, it's raining coding models, let's have a recap ๐ŸŒง๏ธ Find all models and datasets here merve/nov-15-releases-67372d0ebdc354756a52ecd0

Models
๐Ÿ’ป Coding: Qwen team released two Qwen2.5-Coder checkpoints of 32B and 7B. Infly released OpenCoder: 1.5B and 8B coding models with instruction SFT'd versions and their datasets! ๐Ÿ’—

๐Ÿ–ผ๏ธ Image/Video Gen: Alibaba vision lab released In-context LoRA -- 10 LoRA models on different themes based on Flux. Also Mochi the sota video generation model with A2.0 license now comes natively supported in diffusers ๐Ÿ‘

๐Ÿ–ผ๏ธ VLMs/Multimodal: NexaAIDev released Omnivision 968M a new vision language model aligned with DPO for reducing hallucinations, also comes with GGUF ckpts ๐Ÿ‘ Microsoft released LLM2CLIP, a new CLIP-like model with longer context window allowing complex text inputs and better search

๐ŸŽฎ AGI?: Etched released Oasis 500M, a diffusion based open world model that takes keyboard input and outputs gameplay ๐Ÿคฏ

Datasets
Common Corpus: A text dataset with 2T tokens with permissive license for EN/FR on various sources: code, science, finance, culture ๐Ÿ“–
reacted to maxiw's post with ๐Ÿ‘๐Ÿš€๐Ÿ”ฅ๐Ÿค—โค๏ธ 7 days ago
view post
Post
4478
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
โค๏ธ: 7048 times
๐Ÿ”ฅ: 5921 times
๐Ÿ‘: 4856 times
๐Ÿš€: 2549 times
๐Ÿค—: 2065 times
ยท
posted an update 7 days ago
view post
Post
1621
Microsoft released LLM2CLIP: a CLIP model with longer context window for complex text inputs ๐Ÿคฏ
All models with Apache 2.0 license here microsoft/llm2clip-672323a266173cfa40b32d4c

TLDR; they replaced CLIP's text encoder with various LLMs fine-tuned on captioning, better top-k accuracy on retrieval.
This will enable better image-text retrieval, better zero-shot image classification, better vision language models ๐Ÿ”ฅ
Read the paper to learn more: LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2411.04997)