beomi (Lee Junbum)

reacted to ArthurZ's post with 🤝🤯❤️🔥 about 1 month ago

Post

2667

Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! 🔥

posted an update 2 months ago

Post

4530

# PyTorch == 2.5.0 Breaks Transformers' SDPAttention!

When you encounter "RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph."

We can use workaround like this:

torch.backends.cuda.enable_cudnn_sdp(False)

but this slow downs the performance gain from PyTorch 2.5.

Although it is fixed(not "fixed" but default option is turn-off the cuDNN SDPA) at here -- https://github.com/pytorch/pytorch/pull/138587 , but not released yet. (you need to install directly from source)

Fastest way for now : pip install "torch<2.5"

Ref: https://github.com/huggingface/diffusers/issues/9704#issuecomment-2422585273

reacted to danielhanchen's post with 🤗❤️🚀 6 months ago

Post

3686

Yay we got 500K+ monthly HF downloads on our Unsloth HF repo! :) Super appreciate everyone in the OSS community - and thanks for using Unsloth!!

4 replies

·

reacted to Tonic's post with ❤️ 7 months ago

Post

2486

appreciation post for @osanseviero + huggingface staff ( @reach-vb , @merve , many others many many others) , that fight hard for many weeks / months to fix the releases in many organisations to make it easier for us to test out so many things ... 🤗🤗🤗 thanks for that folks !

1 reply

·

reacted to clem's post with ❤️🚀 7 months ago

Post

3664

Who said you couldn't build a big business based on open-source AI? Congrats Mistral team: https://huggingface.co/mistralai

reacted to maywell's post with 👍🚀 8 months ago

Post

8755

🔥 Transfer model's Chat feature, Context length and Knowledge to another under 1 minute without any train.

Imagine being able to create chat models, expand context, and transfer domain-specific knowledge to models, all within a matter of minutes. Our innovative approach, based on a combination of diff-based techniques and sigmoid ratio calculations, makes this possible.

By considering the diffs between the desired information model (long context or chat) and the base model, as well as the diffs between the base model and the target model, we can efficiently transfer features and expand context without the need for extensive training or resources.

Our method minimizes model degradation and ensures that only the desired information is captured, resulting in high-quality models that can be created with just a single click. Whether you need a chat model, expanded context, or domain-specific knowledge transfer, our approach offers a rapid and effective solution.

In blog post below, we will dive into the details of our method, provide code examples, and showcase the impressive results achieved using our approach. Get ready to revolutionize your model creation process and unlock new possibilities with this powerful technique.

Blog - https://huggingface.co/blog/maywell/llm-feature-transfer

2 replies

·

reacted to vikhyatk's post with 🚀🔥 8 months ago

Post

3056

Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!

posted an update 8 months ago

Post

13305

#TPU #PyTorch #Jax

When You're trying to use PyTorch or Jax on TPU,

for v2/v3/v4:
use tpu-ubuntu2204-base

for v5p:
use v2-alpha-tpuv5

for v5e:
use v2-alpha-tpuv5-lite

You must use these base images for the system to 'boot'.

Previously used tpu-vm-v4-pt-1.13 images might seem to start the VM, but SSH connections do not work.

I thought it was a firewall issue and spent a lot of time on it before realizing it was a problem with the boot image 🥲

https://cloud.google.com/tpu/docs/runtimes#pytorch_and_jax

reacted to tomaarsen's post with 🔥 8 months ago

Post

3172

🚀 Sentence Transformers v2.7.0 is out! Featuring a new loss function, easier Matryoshka model inference & evaluation, CrossEncoder improvements & Intel Gaudi2 Accelerator support. Details:

1️⃣ A new loss function: CachedGISTEmbedLoss
This loss function is a combination of CachedMultipleNegativesRankingLoss and the GISTEmbedLoss, both of which are already excellent. The caching mechanism allows for much higher batch sizes with constant memory usage, which boosts training performance. The GIST part introduces a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

2️⃣ Automatic Matryoshka model truncation
Matryoshka models produce embeddings that are still useful after truncation. However, this truncation always had to be done manually, until now! We've added a truncate_dim option to the Sentence Transformer constructor. This also allows truncation when using HuggingFaceEmbeddings from LlamaIndex or LangChain.

3️⃣ Additionally, you can now specify truncate_dim in evaluators to get the performance after truncation. (Hint: it's surprisingly good, even for models not trained with MatryoshkaLoss, and it can speed up e.g. clustering, retrieval, etc.)

4️⃣ CrossEncoder improvements
The CrossEncoder now supports 'push_to_hub' to upload trained reranker models to Hugging Face. Additionally, CrossEncoders now support trust_remote_code to load models with custom modelling code.

5️⃣ Inference on Intel Gaudi2
If you have an Intel Gaudi2 Accelerator, Sentence Transformers now uses it automatically for even faster inference. No changes are necessary to your code, the device is automatically detected!

Check out the release notes for all of the details: https://github.com/UKPLab/sentence-transformers/releases/tag/v2.7.0

I'm very excited for the upcoming releases: I'm making great progress with a notable v3 refactor that should heavily improve the training process for embedding models!

2 replies

·

replied to their post 8 months ago

I'm testing on it, with 32K(claimed at paper) and 1M seq len.

I'm training those models with minipile dataset and for now, it seems minimal continual training let model to adapt 'memory' could be sufficient.(less than <1B tokens)

Train is not finished yet, but after the loss converges then I could test haystack test or inference tests. it won't be take long :)

posted an update 8 months ago

Post

12258

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉

2 replies

·

reacted to clefourrier's post with ❤️ 10 months ago

Post

New base pretrained models on the Open LLM Leaderboard!

Two new OSS models by Google, who's getting back in the game 😎
The 7B is 2nd of the leaderboard, and better than Mistral (notably on GSM8K, aka math).

google/gemma-7b
google/gemma-2b

Check more results on the leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Lee Junbum PRO

AI & ML interests

Recent Activity

Organizations

beomi's activity