Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

gojiteji

posted an update 32 minutes ago

Post

VTuber Logo Generator❤️🪄⭐️ by @gojiteji
gojiteji/VTuberLogoGenerator
How this works:
- mistralai/Mixtral-8x7B-Instruct-v0.1 for Japanese transliteration.
- Stable Diffusion 3 for logo generation.
- simple k-means for color selection.

Jaward

posted an update about 7 hours ago

Post

515

New update to mlx-rag-gguf:
- mlx supported phi-3-mini-4k gguf weight.
- support for other gguf weights (llama arch) 4 & 8 bits quantized.
repo: https://github.com/Jaykef/mlx-rag-gguf
model Jaward/phi-3-mini-4k-instruct.Q4_0.gguf

1 reply

kadirnar

posted an update about 10 hours ago

Post

711

New SDXL model:

akhaliq

posted an update about 19 hours ago

Post

1357

Layer Skip

Enabling Early Exit Inference and Self-Speculative Decoding

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding (2404.16710)

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.

sosoai

posted an update about 19 hours ago

Post

908

Wow i can post on HF now!
Love HF so much 🤗❤️

ameerazam08

posted an update 1 day ago

Post

1100

Explore the Latest Top Papers with Papers Leaderboard!
We are excited to introduce a new way to explore the most impactful research papers: Papers Leaderboard! This feature allows you to easily find the most talked-about papers across a variety of fields.
Hf-demo : ameerazam08/Paper-LeaderBoard
Happy weekends!

danielhanchen

posted an update 1 day ago

Post

1202

Yay we got 500K+ monthly HF downloads on our Unsloth HF repo! :) Super appreciate everyone in the OSS community - and thanks for using Unsloth!!

4 replies

zaursamedov1

posted an update 1 day ago

Post

1048

@clem Happy to be HF'er :)

VictorSanh

posted an update 1 day ago

Post

915

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs. 🏆
2nd in its category (<10B parameters and open weights)!

While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data.

Leaderboard: opencompass/open_vlm_leaderboard
Model: HuggingFaceM4/idefics2-8b

3 replies

Sentdex

posted an update 1 day ago

Post

1016

Benchmarks!

I have lately been diving deep into the main benchmarks we all use to evaluate and compare models.

If you've never actually looked under the hood for how benchmarks work, check out the LM eval harness from EleutherAI: https://github.com/EleutherAI/lm-evaluation-harness

+ check out the benchmark datasets, you can find the ones for the LLM leaderboard on the about tab here: HuggingFaceH4/open_llm_leaderboard, then click the dataset and actually peak at the data that comprises these benchmarks.

It feels to me like benchmarks only represent a tiny portion of what we actually use and want LLMs for, and I doubt I'm alone in that sentiment.

Beyond this, the actual evaluations of responses from models are extremely strict and often use even rudimentary NLP techniques when, at this point, we have LLMs themselves that are more than capable at evaluating and scoring responses.

It feels like we've made great strides in the quality of LLMs themselves, but almost no change in the quality of how we benchmark.

If you have any ideas for how benchmarks could be a better assessment of an LLM, or know of good research papers that tackle this challenge, please share!

Recently active users